IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
PTE_RPN_SHIFT is actually page size dependent. Even though PowerISA 3.0
expects only the lower 12 bits to be zero, we will always find the pages
to be PAGE_SHIFT aligned. In case of hash config, this also allows us to
use the additional 3 bits to track pte specific information. We need
to make sure we use these bits only for hash specific pte flags.
For both 4K and 64K config, pte now can hold 57 bits address.
Inorder to keep things simpler, drop PTE_RPN_SHIFT and PTE_RPN_SIZE and
specify the 57 bit detail explicitly.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
_PAGE_PRIVILEGED means the page can be accessed only by the kernel. This
is done to keep pte bits similar to PowerISA 3.0 Radix PTE format. User
pages are now marked by clearing _PAGE_PRIVILEGED bit.
Previously we allowed the kernel to have a privileged page in the lower
address range (USER_REGION). With this patch such access is denied.
We also prevent a kernel access to a non-privileged page in higher
address range (ie, REGION_ID != 0).
Both the above access scenarios should never happen.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have a common declaration in pte-common.h Add a book3s specific one
and switch to pte_user() in callchain.c. In a subsequent patch we will
switch _PAGE_USER to _PAGE_PRIVILEGED in the book3s version only.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In a subsequent patch we want to add a second definition of pte_user().
Before we do that, make the signature clear, ie. it takes a pte_t and
returns bool.
We move it up inside the existing #ifndef __ASSEMBLY__ block, but
otherwise it's a straight conversion.
Convert the call in settlbcam(), which passes an unsigned long, to pass
a pte_t.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Subpage protection used to depend on the _PAGE_USER bit to implement no
access mode. This patch switches that to use _PAGE_RWX. We clear Read,
Write and Execute access from the pte instead of clearing _PAGE_USER
now. This was done so that we can switch to _PAGE_PRIVILEGED in a later
patch.
subpage_protection() returns pte bits that need to be cleared. Instead
of updating the interface to handle no-access in a separate way, it
appears simpler to clear RWX acecss to indicate no access.
We still don't insert hash ptes for no access implied by !_PAGE_RWX.
Hence we should not get PROT_FAULT with change.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This splits the _PAGE_RW bit into _PAGE_READ and _PAGE_WRITE. It also
removes the dependency on _PAGE_USER for implying read only. Few things
to note here is that, we have read implied with write and execute
permission. Hence we should always find _PAGE_READ set on hash pte
fault.
We still can't switch PROT_NONE to !(_PAGE_RWX). Auto numa depends on
marking a prot none pte _PAGE_WRITE. (For more details look at
b191f9b106ea "mm: numa: preserve PTE write permissions across a NUMA
hinting fault")
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We can avoid doing endian conversions by using pte_raw() in pxx_same().
The swap of the constant (_PAGE_HPTEFLAGS) should be done at compile
time by the compiler.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Traditionally Power server machines have used the Hashed Page Table MMU
mode. In this mode Linux manages its own tree of nested page tables,
aka. "the Linux page tables", which are not used by the hardware
directly, and software loads translations into the hash page table for
use by the hardware.
Power ISA 3.0 defines a new MMU mode, known as Radix Tree Translation,
where the hardware can directly operate on the Linux page tables.
However the hardware requires that the page tables be in big endian
format.
To accommodate this, switch the pgtable types to __be64 and add
appropriate endian conversions.
Because we will be supporting a single kernel binary that boots using
either radix or hash mode, we always store the Linux page tables big
endian, even in hash mode where they are not actually used by the
hardware.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Fix sparse errors, flesh out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have five locations in 64-bit hash MMU code that do a cmpxchg() of a
PTE. Currently doing it inline OK, but in a future patch we will be
converting the PTEs to __be64 in some configs. In that case we will need
casts at every cmpxchg() site in order to keep sparse happy.
So move the logic into a helper, this is a reasonably nice cleanup on
its own.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
pmd_hugepage_update() is inside #ifdef CONFIG_TRANSPARENT_HUGEPAGE. THP
can only be enabled if PPC_BOOK3S_64=y && PPC_64K_PAGES=y, aka. hash64.
On hash64 we always define PTE_ATOMIC_UPDATES to 1, meaning the #ifdef
in pmd_hugepage_update() is unnecessary, so drop it.
That is also the only use of PTE_ATOMIC_UPDATES in any of the hash code,
meaning we no longer need to #define it at all in the hash headers.
Note it's still #defined and used in the nohash code.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Testing done by Paul Mackerras has shown that with a modern compiler
there is no negative effect on code generation from enabling
STRICT_MM_TYPECHECKS.
So remove the option, and always use the strict type definitions.
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
- cxl: Keep IRQ mappings on context teardown from Michael Neuling
- cxl: Poll for outstanding IRQs when detaching a context from Michael Neuling
- Wire up preadv2 and pwritev2 syscalls from Rui Salvaterra
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXI2HxAAoJEFHr6jzI4aWAfLgP/jxD+kfBtrK6KJXq5BVM+IWr
aevVTVCgv3F8yOiI0ZPyOSh7B23dP8nBGYcejpTxyQcb8lox20WL6Q+om7H+BleC
yrb9/sGzvJXIdazqMF77fzDjTHjjAMNizi9f82+8OzrghtQj8GJNogKwydIXe3QB
+27kZcbkpXhdJZ/V0qmsWCAMV+sdaW0BW3DREQ0jFf0k08I0HMHiyN/zrqwadLjU
Qx7af0iENdSRXtve1vGI41lflDPTaou39Y4NyUHfar1zGtt2rktrl5z16lmPC9nw
gio6CsTIKwjsWRZugzrAlPXaToZKGgCGmW634RRfBMkjOnFoEGk0/GN2w0A+wjp4
+jYq8v+2jss74Ngq12/NmIbB+b8iFsKsN7b0UPZnf91PsAKlprB6iDbCw35KSHgi
MLB8cOeEGBg+nm+ZSdrylyOa7RSJv3dK7cfEegtpXRAdxGwVAwCpjXvBqA+fdyUi
dfg2ChHJ91GWs3+ljPd/ee+OTPq3jY+o6PL/lQBaGhC6XuxrFQTsm537pNzlH6wf
sUZzF5duf1jpRvnpeGgzAMUqYHz7W/NbiHKVV8EC18jSDnc/7BANfVxENBk1Vk+o
2CdVWS26hDTUkRKdx+JbDRsStD1XxBgmBD37tEaDuD49VvbkqHB5yarjqAphM+zY
Pf3WwyuXpfsB1ppjrzCM
=4O+o
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A few more powerpc fixes for 4.6:
- cxl: Keep IRQ mappings on context teardown from Michael Neuling
- cxl: Poll for outstanding IRQs when detaching a context from
Michael Neuling
- Wire up preadv2 and pwritev2 syscalls from Rui Salvaterra"
* tag 'powerpc-4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: wire up preadv2 and pwritev2 syscalls
cxl: Poll for outstanding IRQs when detaching a context
cxl: Keep IRQ mappings on context teardown
This patch fix spelling typos in printk from various part
of the codes.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The default remains 127, which is good for most cases, and not even hit
most of the time, but then for some cases, as reported by Brendan, 1024+
deep frames are appearing on the radar for things like groovy, ruby.
And in some workloads putting a _lower_ cap on this may make sense. One
that is per event still needs to be put in place tho.
The new file is:
# cat /proc/sys/kernel/perf_event_max_stack
127
Chaging it:
# echo 256 > /proc/sys/kernel/perf_event_max_stack
# cat /proc/sys/kernel/perf_event_max_stack
256
But as soon as there is some event using callchains we get:
# echo 512 > /proc/sys/kernel/perf_event_max_stack
-bash: echo: write error: Device or resource busy
#
Because we only allocate the callchain percpu data structures when there
is a user, which allows for changing the max easily, its just a matter
of having no callchain users at that point.
Reported-and-Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Wire up preadv2/pwritev2 in the same way as preadv/pwritev. Fixes two
build warnings on ppc64.
mpe: Lightly tested with fio (slightly hacked to add the syscall
wrappers):
fio-4217 [009] .... 1304.635300: sys_preadv2(fd: 3, vec:
10025821de0, vlen: 1, pos_l: 6253000, pos_h: 0, flags: 1)
fio-4217 [009] .... 1304.635474: sys_preadv2 -> 0x1000
Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Minor cleanup patch to replace the raw event hex values in
power8-pmu.c with #defines.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In the ppc64 big endian ABI, function symbols point to function
descriptors. The symbols which point to the function entry points
have a dot in front of the function name. Consequently, when the
ftrace filter mechanism searches for the symbol corresponding to
an entry point address, it gets the dot symbol.
As a result, ftrace filter users have to be aware of this ABI detail on
ppc64 and prepend a dot to the function name when setting the filter.
The perf probe command insulates the user from this by ignoring the dot
in front of the symbol name when matching function names to symbols,
but the sysfs interface does not. This patch makes the ftrace filter
mechanism do the same when searching symbols.
Fixes the following failure in ftracetest's kprobe_ftrace.tc:
.../kprobe_ftrace.tc: line 9: echo: write error: Invalid argument
That failure is on this line of kprobe_ftrace.tc:
echo _do_fork > set_ftrace_filter
This is because there's no _do_fork entry in the functions list:
# cat available_filter_functions | grep _do_fork
._do_fork
This change introduces no regressions on the perf and ftracetest
testsuite results.
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Sparse doesn't seem to be passing -maltivec around properly, leading
to lots of errors:
.../include/altivec.h:34:2: error: Use the "-maltivec" flag to enable PowerPC AltiVec support
arch/powerpc/lib/xor_vmx.c:27:16: error: Expected ; at end of declaration
arch/powerpc/lib/xor_vmx.c:27:16: error: got signed
arch/powerpc/lib/xor_vmx.c:60:9: error: No right hand side of '*'-expression
arch/powerpc/lib/xor_vmx.c:60:9: error: Expected ; at end of statement
arch/powerpc/lib/xor_vmx.c:60:9: error: got v1_in
...
arch/powerpc/lib/xor_vmx.c:87:9: error: too many errors
Only include the altivec.h header for non-__CHECKER__ builds.
For builds with __CHECKER__, make up some stubs instead, as
suggested by Balbir. (The vector size of 16 is arbitrary.)
Suggested-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Tested-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The copy paste facility introduced in POWER9 provides an optimised
mechanism for a userspace application to copy a cacheline. This is
provided by a pair of instructions, copy and paste, while a third,
cp_abort (copy paste abort), provides a clean up of the state in case of
a failure.
The copy instruction will read a 128 byte cacheline and store it in an
internal buffer. The subsequent paste instruction will store this
internal buffer to memory and set a CR field if the paste succeeds.
Since the state of the copy paste buffer is internal (and not
architecturally visible), in the unlikely event of a context switch, the
state cannot be stored and the paste should therefore fail.
The cp_abort instruction exists to fail and clean up any such
interrupted copy paste sequence and is to be called by the kernel as
part of the context switch. Doing so prevents data from a preceding copy
in one process leaking into the paste of another.
This code enables use of the cp_abort instruction if a supported
processor is detected.
NOTE: this is for userspace only, not in kernel, and does not deal
with KVM guests.
Patch created with much assistance from Michael Neuling
<mikey@neuling.org>
Signed-off-by: Chris Smart <chris@distroguy.com>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
mpic_init_sys() currently doesn't check whether
subsys_system_register() succeeded or not. Check the return code of
subsys_system_register() and clean up if there's an error.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Found by smatch.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
- scan_features() updates incorrect bits for REAL_LE from Anton Blanchard
- Update cpu_user_features2 in scan_features() from Anton Blanchard
- Update TM user feature bits in scan_features() from Anton Blanchard
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXGfRfAAoJEFHr6jzI4aWAeCQP+wVdFa4Q+T0uFQf/lQBPfQLa
wDODtskmqwfWk2YqIv3LKTnrkubs8nGy2Vc5Tm27/G/x+uMPOrTD/eoBiG3DH8E3
eiX1tlqO+oM8dm+g/FxGSCbkWSoYKQrc1QQ1oWtADPPhSm1xU0GYdPqQrDyAfFEr
JEg5ifOkbTZ3VauuK2kXVxIgNHtwfPoeQfTEmKk7x1Pay0z3zNx65tKInGKmatL8
7dS+Y8gPNODUfky61M1Y0f/KDwOsBO7gyiIdkkJxnYW1IAqTTsvK90fKPVzSfM0Y
jnOBdlysIPsiXFNDjT0gBBBRm7McCZO/bZEjK7aSWUMP/FOWIR+HkB3+fpcB+o8u
tbmMJ8AfHvz4PXzf0h0TySOy6uhA65LJfBnssBoZcCQyxTzdIQC7crDH2639L+8q
djUYGzWXrplCq/r/uQrogY7a7Ff/UFW76mRMX1lj2hi8lpQEnr7sBPAqwEApgvtr
WRqBkUjyuTWnnDSnen0NR1QMfh9zMNNjHHBaU+UJv0gU3lBRL+VpmFizsdjVk8/E
0wC2l1lws3txxaWXGmpJNKve9P2+iEVfY5KAigkKO3yV5xUuwOhtED6sTKqTY5wR
kp3AC+7fjUxQFJ4g4BVohakie5RR6LkLZ7xCwB0LWPff3LUIiHGelsd+ii1kjxwm
/tiR9CYEO1O9KcKlwv8X
=woiB
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Three powerpc cpu feature fixes from Anton Blanchard:
- scan_features() updated incorrect bits for REAL_LE
- update cpu_user_features2 in scan_features()
- update TM user feature bits in scan_features()"
* tag 'powerpc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Update TM user feature bits in scan_features()
powerpc: Update cpu_user_features2 in scan_features()
powerpc: scan_features() updates incorrect bits for REAL_LE
The perf infrastructure uses a bit mask to find out valid registers to
display. Define a register mask for supported registers defined in
uapi/asm/perf_regs.h. The bit positions also correspond to register IDs
which is used by perf infrastructure to fetch the register values.
CONFIG_HAVE_PERF_REGS enables sampling of the interrupted machine state.
Signed-off-by: Anju T <anju@linux.vnet.ibm.com>
[mpe: Add license, use CONFIG_PPC64, fix 32-bit build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The enum definition assigns an 'id' to each register in "struct pt_regs"
of arch/powerpc. The order of these values in the enum definition are
based on the order of members in pt_regs.
Signed-off-by: Anju T <anju@linux.vnet.ibm.com>
[mpe: Rename LNK to LINK, use _UAPI_ASM for include guards]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The __end_handlers marker was intended to mark down upto code that gets
called from exception prologs. But that hasn't kept pace with code
changes. Case in point, slb_miss_realmode being called from exception
prolog code but isn't below __end_handlers marker. So, __end_handlers
marker is as good as a comment but could be misleading at times if it
isn't in sync with the code, as is the case now. So, let us avoid this
confusion by having a better comment and removing __end_handlers marker
altogether.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some of the interrupt vectors on 64-bit POWER server processors are only
32 bytes long (8 instructions), which is not enough for the full
first-level interrupt handler. For these we need to branch to an
out-of-line (OOL) handler. But when we are running a relocatable kernel,
interrupt vectors till __end_interrupts marker are copied down to real
address 0x100. So, branching to labels (ie. OOL handlers) outside this
section must be handled differently (see LOAD_HANDLER()), considering
relocatable kernel, which would need at least 4 instructions.
However, branching from interrupt vector means that we corrupt the
CFAR (come-from address register) on POWER7 and later processors as
mentioned in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions)
that contains the part up to the point where the CFAR is saved in the
PACA should be part of the short interrupt vectors before we branch out
to OOL handlers.
But as mentioned already, there are interrupt vectors on 64-bit POWER
server processors that are only 32 bytes long (like vectors 0x4f00,
0x4f20, etc.), which cannot accomodate the above two cases at the same
time owing to space constraint. Currently, in these interrupt vectors,
we simply branch out to OOL handlers, without using LOAD_HANDLER(),
which leaves us vulnerable when running a relocatable kernel (eg. kdump
case). While this has been the case for sometime now and kdump is used
widely, we were fortunate not to see any problems so far, for three
reasons:
1. In almost all cases, production kernel (relocatable) is used for
kdump as well, which would mean that crashed kernel's OOL handler
would be at the same place where we end up branching to, from short
interrupt vector of kdump kernel.
2. Also, OOL handler was unlikely the reason for crash in almost all
the kdump scenarios, which meant we had a sane OOL handler from
crashed kernel that we branched to.
3. On most 64-bit POWER server processors, page size is large enough
that marking interrupt vector code as executable (see commit
429d2e83) leads to marking OOL handler code from crashed kernel,
that sits right below interrupt vector code from kdump kernel, as
executable as well.
Let us fix this by moving the __end_interrupts marker down past OOL
handlers to make sure that we also copy OOL handlers to real address
0x100 when running a relocatable kernel.
This fix has been tested successfully in kdump scenario, on an LPAR with
4K page size by using different default/production kernel and kdump
kernel.
Also tested by manually corrupting the OOL handlers in the first kernel
and then kdump'ing, and then causing the OOL handlers to fire - mpe.
Fixes: c1fb6816fb1b ("powerpc: Add relocation on exception vector handlers")
Cc: stable@vger.kernel.org
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We need to update the user TM feature bits (PPC_FEATURE2_HTM and
PPC_FEATURE2_HTM) to mirror what we do with the kernel TM feature
bit.
At the moment, if firmware reports TM is not available we turn off
the kernel TM feature bit but leave the userspace ones on. Userspace
thinks it can execute TM instructions and it dies trying.
This (together with a QEMU patch) fixes PR KVM, which doesn't currently
support TM.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
scan_features() updates cpu_user_features but not cpu_user_features2.
Amongst other things, cpu_user_features2 contains the user TM feature
bits which we must keep in sync with the kernel TM feature bit.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The REAL_LE feature entry in the ibm_pa_feature struct is missing an MMU
feature value, meaning all the remaining elements initialise the wrong
values.
This means instead of checking for byte 5, bit 0, we check for byte 0,
bit 0, and then we incorrectly set the CPU feature bit as well as MMU
feature bit 1 and CPU user feature bits 0 and 2 (5).
Checking byte 0 bit 0 (IBM numbering), means we're looking at the
"Memory Management Unit (MMU)" feature - ie. does the CPU have an MMU.
In practice that bit is set on all platforms which have the property.
This means we set CPU_FTR_REAL_LE always. In practice that seems not to
matter because all the modern cpus which have this property also
implement REAL_LE, and we've never needed to disable it.
We're also incorrectly setting MMU feature bit 1, which is:
#define MMU_FTR_TYPE_8xx 0x00000002
Luckily the only place that looks for MMU_FTR_TYPE_8xx is in Book3E
code, which can't run on the same cpus as scan_features(). So this also
doesn't matter in practice.
Finally in the CPU user feature mask, we're setting bits 0 and 2. Bit 2
is not currently used, and bit 0 is:
#define PPC_FEATURE_PPC_LE 0x00000001
Which says the CPU supports the old style "PPC Little Endian" mode.
Again this should be harmless in practice as no 64-bit CPUs implement
that mode.
Fix the code by adding the missing initialisation of the MMU feature.
Also add a comment marking CPU user feature bit 2 (0x4) as reserved. It
would be unsafe to start using it as old kernels incorrectly set it.
Fixes: 44ae3ab3358e ("powerpc: Free up some CPU feature bits by moving out MMU-related features")
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
[mpe: Flesh out changelog, add comment reserving 0x4]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add the kconfig logic & assembly support for handling live patched
functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn
depends on the new -mprofile-kernel ftrace ABI, which is only supported
currently on ppc64le.
Live patching is handled by a special ftrace handler. This means it runs
from ftrace_caller(). The live patch handler modifies the NIP so as to
redirect the return from ftrace_caller() to the new patched function.
However there is one particularly tricky case we need to handle.
If a function A calls another function B, and it is known at link time
that they share the same TOC, then A will not save or restore its TOC,
and will call the local entry point of B.
When we live patch B, we replace it with a new function C, which may
not have the same TOC as A. At live patch time it's too late to modify A
to do the TOC save/restore, so the live patching code must interpose
itself between A and C, and do the TOC save/restore that A omitted.
An additionaly complication is that the livepatch code can not create a
stack frame in order to save the TOC. That is because if C takes > 8
arguments, or is varargs, A will have written the arguments for C in
A's stack frame.
To solve this, we introduce a "livepatch stack" which grows upward from
the base of the regular stack, and is used to store the TOC & LR when
calling a live patched function.
When the patched function returns, we retrieve the real LR & TOC from
the livepatch stack, restore them, and pop the livepatch "stack frame".
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
In order to support live patching we need to maintain an alternate
stack of TOC & LR values. We use the base of the stack for this, and
store the "live patch stack pointer" in struct thread_info.
Unlike the other fields of thread_info, we can not statically initialise
that value, so it must be done at run time.
This patch just adds the code to support that, it is not enabled until
the next patch which actually adds live patch support.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Add the powerpc specific livepatch definitions. In particular we provide
a non-default implementation of klp_get_ftrace_location().
This is required because the location of the mcount call is not constant
when using -mprofile-kernel (which we always do for live patching).
Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Sometimes when sparse warns about undefined symbols, it isn't
because they should have 'static' added, it's because they're
overriding __weak symbols defined elsewhere, and the header has
been missed.
Fix a few of them by adding appropriate headers.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
As sparse suggests, these should be made static.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch assigns numbers to OPAL_MSG macros of enum opal_msg_type
to prevent accidental insertion of any new value in between and thus
break OPAL API. This is also helpful while backporting mainline kernel
changes to distros which run downlevel kernel and thus don't have all
OPAL messages defined, avoiding unnecessary bugs due to enum values
order mismatch.
Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If CONFIG_HIBERNATION and CONFIG_PPC_BOOK3S_64 are set, code in
arch/powerpc/kernel/swsusp_amd64.S which uses the tlbia macro is enabled.
tlbia in turn uses tlbie, an instruction which takes more than one
operand in newer versions of POWER. As such, the kernel fails to build
due to the assembler complaining about missing operands.
This can be worked around by assembling the instruction as in POWER4.
This fixes the build breakage caused by enabling CONFIG_HIBERNATION.
Hibernation is currently only tested on G5 PowerMacs, which should be
unaffected by this change. For other platforms it may now build,
whether or not it works is a different story.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently on PPC64 changing kernel pagesize from 4K to 64K leaves
FORCE_MAX_ZONEORDER set to 13 - which produces a compile error.
The error occurs because of the following constraint (from
include/linux/mmzone.h) being violated:
MAX_ORDER -1 + PAGESHIFT <= SECTION_SIZE_BITS.
Expanding this out, we get:
FORCE_MAX_ZONEBITS <= 25 - PAGESHIFT,
which requires, for a 64K page, FORCE_MAX_ZONEBITS <= 9. Thus set max
value of FORCE_MAX_ZONEORDER for 64K pages to 9, and 4K pages to 13.
Also, check the minimum value:
In include/linux/huge_mm.h, we have the constraint HPAGE_PMD_ORDER <
MAX_ORDER which expands out to:
PTE_INDEX_SIZE < FORCE_MAX_ZONEORDER.
PTE_INDEX_SIZE is:
9 (4k hash or no hash 4K pgtable) or
8 (64K hash or no hash 64K pgtable).
Thus a min value of 8 for 64K pages and 9 for 4K pages is reasonable.
So, update the range of FORCE_MAX_ZONEORDER from 9-64 to 8-9 for 64K pages
and from 13-64 to 9-13 for 4K pages.
Signed-off-by: Rashmica Gupta <rashmicy@gmail.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The Makefile/Kconfig currently controlling compilation of this code is:
obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \
signal_64.o ptrace32.o \
paca.o nvram_64.o firmware.o
arch/powerpc/platforms/Kconfig.cputype:config PPC64
arch/powerpc/platforms/Kconfig.cputype: bool "64-bit kernel"
...meaning that it currently is not being built as a module by anyone.
Lets remove the modular code that is essentially orphaned, so that
when reading the driver there is no doubt it is builtin-only.
Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.
We don't replace module.h with init.h since the file already has that.
We delete the MODULE_LICENSE tag since that information is already
contained at the top of the file in the comments.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The Kconfig currently controlling compilation of this code is:
arch/powerpc/platforms/cell/Kconfig:config SPU_BASE
arch/powerpc/platforms/cell/Kconfig: bool
...meaning that it currently is not being built as a module by anyone.
Lets remove the modular code that is essentially orphaned, so that
when reading the driver there is no doubt it is builtin-only.
Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.
We also delete the MODULE_LICENSE tag etc. since all that information
is already contained at the top of the file in the comments.
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
IBM online documentation for EEH uses "extended error handling" and
"enhanced error handling" to refer to the same thing, in different
places. The only place mentioning it as "enhanced error handling" in the
kernel is the MAINTAINERS file, and it's "extended" in some documentation.
IBM originally defined EEH as "enhanced error handling", so standardise
all mentions of EEH to use that term.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
generic_memcpy() is only called from copy_32.S, so there's no reason for
it to be global.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have a bunch of SLB related code in the tree which is there to handle
dynamic VSIDs - but currently it's all disabled at compile time. The
comments say "Keep that around for when we re-implement dynamic VSIDs".
But that was over 10 years ago (commit 3c726f8dee6f ("[PATCH] ppc64:
support 64k pages")). The chance that it would still work unchanged is
minimal, and in the meantime it's confusing to folks browsing/grepping
the code. If we ever want to re-instate it, it's in the git history.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
The HMI code knows about three types of errors: CORE, NX and UNKNOWN.
If OPAL were to add a new type, it would not be handled at all since
there is no fallback case. Instead of explicitly checking for UNKNOWN,
treat any checkstop type without a handler as unknown.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The associativity array index specified for a LMB in the device tree,
/ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory, needs to be updated
prior to DLPAR adding a LMB and after DLPAR removing a LMB.
Without doing this step in the DLPAR add process a LMB could be configured
with the incorrect affinity. For a LMB that was not present at boot the
affinity index is set to 0xffffffff, which defaults to adding the LMB to
the first online node since the index is not a valid value. Or, the
affinity index could contain a stale value if the LMB was present at boot
but later DLPAR removed and is being DLPAR added back to the system.
This patch adds a step in the DLPAR add flow to look up the associativity
index for a LMB prior to adding a LMB and setting the associativity to
0xffffffff when a LMB is removed.
This patch also modifies the DLPAR add/remove flow to no longer do a single
update of the device tree property after all of the requested DLPAR
operations are complete and now does a property update during the add
or remove of each LMB.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Re-factor dlpar_lmb_add() routine by moving the validation of the lmb
flags and the acquireing of the DRC to a wrapper around the work to add
the memory to the system. This is done to make handling of errors
during the addition of the memory easier and to facilitate the upcoming
addition of updating the lmb's affinity prior to adding the memory.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Merge PAGE_CACHE_SIZE removal patches from Kirill Shutemov:
"PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
Let's stop pretending that pages in page cache are special. They are
not.
The first patch with most changes has been done with coccinelle. The
second is manual fixups on top.
The third patch removes macros definition"
[ I was planning to apply this just before rc2, but then I spaced out,
so here it is right _after_ rc2 instead.
As Kirill suggested as a possibility, I could have decided to only
merge the first two patches, and leave the old interfaces for
compatibility, but I'd rather get it all done and any out-of-tree
modules and patches can trivially do the converstion while still also
working with older kernels, so there is little reason to try to
maintain the redundant legacy model. - Linus ]
* PAGE_CACHE_SIZE-removal:
mm: drop PAGE_CACHE_* and page_cache_{get,release} definition
mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros