IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This allows us to squash some sparse warnings and also avoids having
to do explicity endian conversions in the code.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Similarly to opal_event_shutdown, opal_nvram_write can be called in
the crash path with irqs disabled. Special case the delay to avoid
sleeping in invalid context.
Fixes: 3b8070335f75 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops")
Cc: stable@vger.kernel.org # v3.2
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
A kernel crash in process context that calls emergency_restart from
panic will end up calling opal_event_shutdown with interrupts disabled
but not in interrupt. This causes a sleeping function to be called
which gives the following warning with sysrq+c:
Rebooting in 10 seconds..
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
in_atomic(): 0, irqs_disabled(): 1, pid: 7669, name: bash
CPU: 20 PID: 7669 Comm: bash Tainted: G D W 4.17.0-rc5+ #3
Call Trace:
dump_stack+0xb0/0xf4 (unreliable)
___might_sleep+0x174/0x1a0
mutex_lock+0x38/0xb0
__free_irq+0x68/0x460
free_irq+0x70/0xc0
opal_event_shutdown+0xb4/0xf0
opal_shutdown+0x24/0xa0
pnv_shutdown+0x28/0x40
machine_shutdown+0x44/0x60
machine_restart+0x28/0x80
emergency_restart+0x30/0x50
panic+0x2a0/0x328
oops_end+0x1ec/0x1f0
bad_page_fault+0xe8/0x154
handle_page_fault+0x34/0x38
--- interrupt: 300 at sysrq_handle_crash+0x44/0x60
LR = __handle_sysrq+0xfc/0x260
flag_spec.62335+0x12b844/0x1e8db4 (unreliable)
__handle_sysrq+0xfc/0x260
write_sysrq_trigger+0xa8/0xb0
proc_reg_write+0xac/0x110
__vfs_write+0x6c/0x240
vfs_write+0xd0/0x240
ksys_write+0x6c/0x110
Fixes: 9f0fd0499d30 ("powerpc/powernv: Add a virtual irqchip for opal events")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Variants of proc_create{,_data} that directly take a seq_file show
callback and drastically reduces the boilerplate code in the callers.
All trivial callers converted over.
Signed-off-by: Christoph Hellwig <hch@lst.de>
The interrupt controller inside the Wii's Hollywood chip is connected to
two masters, the "Broadway" PowerPC and the "Starlet" ARM926, each with
their own interrupt status and mask registers.
When booting the Wii with mini[1], interrupts from the SD card
controller (IRQ 7) are handled by the ARM, because mini provides SD
access over IPC. Linux however can't currently use or disable this IPC
service, so both sides try to handle IRQ 7 without coordination.
Let's instead make sure that all interrupts that are unmasked on the PPC
side are masked on the ARM side; this will also make sure that Linux can
properly talk to the SD card controller (and potentially other devices).
If access to a device through IPC is desired in the future, interrupts
from that device should not be handled by Linux directly.
[1]: https://github.com/lewurm/mini
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On the Wii, there is a secondary IRQ controller (hlwd-pic), so
flipper-pic's match operation should not be hardcoded to return 1.
In fact, the default matching logic is sufficient, and we can completely
omit flipper_pic_match.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use new return type vm_fault_t for fault handler. For now, this is
just documenting that the function returns a VM_FAULT value rather
than an errno. Once all instances are converted, vm_fault_t will
become a distinct type. See commit 1c8f422059ae ("mm: change return
type to vm_fault_t").
We are fixing a minor bug, that the error from vm_insert_pfn() was
being ignored and the effect of this is likely to be only felt in OOM
situations.
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
At the moment we assume that IODA2 and newer PHBs can always do 4K/64K/16M
IOMMU pages, however this is not the case for POWER9 and now skiboot
advertises the supported sizes via the device so we use that instead
of hard coding the mask.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently memtrace doesn't build if NUMA=n:
In function ‘memtrace_alloc_node’:
arch/powerpc/platforms/powernv/memtrace.c:134:6:
error: the address of ‘contig_page_data’ will always evaluate as ‘true’
if (!NODE_DATA(nid) || !node_spanned_pages(nid))
^
This is because for NUMA=n NODE_DATA(nid) points to an always
allocated structure, contig_page_data.
But even in the NUMA=y case memtrace_alloc_node() is only called for
online nodes, and we should always have a NODE_DATA() allocated for an
online node. So remove the (hopefully) overly paranoid check, which
also means we can build when NUMA=n.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently the select system call is wired up with the SYSX_SPU()
macro. The SYSX_SPU() is not handled by systbl_chk.c, which means the
syscall number for select is not checked.
That hides the fact that the syscall number for select is actually
__NR__newselect not __NR_select.
In a following patch we'd like to drop ppc32_select() which means
select will become a regular COMPAT_SYS_SPU() syscall. But
COMPAT_SYS_SPU() can't deal with the fact that the syscall number is
actually __NR__newselect. We also can't just redefine __NR_select
because that's still used for the old select call.
So add a new COMPAT_NEW_SPU() that does the same thing as
COMPAT_SYS_SPU() except it encodes that we're using the new number.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The hcall H_INT_RESET should be called to make sure XIVE is fully
reseted.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The hcall_exit() tracepoint has retval defined as unsigned long. That
leads to humours results like:
bash-3686 [009] d..2 854.134094: hcall_entry: opcode=24
bash-3686 [009] d..2 854.134095: hcall_exit: opcode=24 retval=18446744073709551609
It's normal for some hcalls to return negative values, displaying them
as unsigned isn't very helpful. So change it to signed.
bash-3711 [001] d..2 471.691008: hcall_entry: opcode=24
bash-3711 [001] d..2 471.691008: hcall_exit: opcode=24 retval=-7
Which can be more easily compared to H_NOT_FOUND in hvcall.h
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Instead select the PHYS_ADDR_T_64BIT for 32-bit architectures that need a
64-bit phys_addr_t type directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: James Hogan <jhogan@kernel.org>
This commit was a stop-gap to prevent crashes on hotunplug, caused by
the mismatch between the 1G mappings used for the linear mapping and the
memory block size. Those issues are now resolved because we split the
linear mapping at hotunplug time if necessary, as implemented in commit
4dd5f8a99e79 ("powerpc/mm/radix: Split linear mapping on hot-unplug").
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Tested-by: Rashmica Gupta <rashmica.g@gmail.com>
Tested-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
I no longer have a functional version of this board for even the most
basic sanity boot testing, and they have not been available for purchase
for quite some years now.
There is no point in adding a burden to testing coverage that does
walk all the possible defconfigs, so with all the above in mind, it
makes sense to remove it. Of course it will remain in the git history
for anyone who happens to stumble on one and wants to tinker with it.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Call clear_siginfo to ensure every stack allocated siginfo is properly
initialized before being passed to the signal sending functions.
Note: It is not safe to depend on C initializers to initialize struct
siginfo on the stack because C is allowed to skip holes when
initializing a structure.
The initialization of struct siginfo in tracehook_report_syscall_exit
was moved from the helper user_single_step_siginfo into
tracehook_report_syscall_exit itself, to make it clear that the local
variable siginfo gets fully initialized.
In a few cases the scope of struct siginfo has been reduced to make it
clear that siginfo siginfo is not used on other paths in the function
in which it is declared.
Instances of using memset to initialize siginfo have been replaced
with calls clear_siginfo for clarity.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or
OPAL_BUSY_EVENT from firmware, which causes large scheduling
latencies, up to 50 seconds have been observed here when RTC stops
responding (BMC reboot can do it).
Fix this by converting it to the standard form OPAL_BUSY loop that
sleeps.
Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks")
Cc: stable@vger.kernel.org # v3.2+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The NPU has a limited number of address translation shootdown (ATSD)
registers and the GPU has limited bandwidth to process ATSDs. This can
result in contention of ATSD registers leading to soft lockups on some
threads, particularly when invalidating a large address range in
pnv_npu2_mn_invalidate_range().
At some threshold it becomes more efficient to flush the entire GPU
TLB for the given MM context (PID) than individually flushing each
address in the range. This patch will result in ranges greater than
2MB being converted from 32+ ATSDs into a single ATSD which will flush
the TLB for the given PID on each GPU.
Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Tested-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There is a single npu context per set of callback parameters. Callers
should be prevented from overwriting existing callback values so
instead return an error if different parameters are passed.
Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Mark Hairgrove <mhairgrove@nvidia.com>
Tested-by: Mark Hairgrove <mhairgrove@nvidia.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The pnv_npu2_init_context() and pnv_npu2_destroy_context() functions
are used to allocate/free contexts to allow address translation and
shootdown by the NPU on a particular GPU. Context initialisation is
implicitly safe as it is protected by the requirement mmap_sem be held
in write mode, however pnv_npu2_destroy_context() does not require
mmap_sem to be held and it is not safe to call with a concurrent
initialisation for a different GPU.
It was assumed the driver would ensure destruction was not called
concurrently with initialisation. However the driver may be simplified
by allowing concurrent initialisation and destruction for different
GPUs. As npu context creation/destruction is not a performance
critical path and the critical section is not large a single spinlock
is used for simplicity.
Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Mark Hairgrove <mhairgrove@nvidia.com>
Tested-by: Mark Hairgrove <mhairgrove@nvidia.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Don't do this via custom code, instead now that we have support in the
arch hotplug/hotunplug code, rely on those routines to do the right
thing.
The existing flush doesn't work because it uses ppc64_caches.l1d.size
instead of ppc64_caches.l1d.line_size.
Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing")
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 95846ecf9dac ("pid: replace pid bitmap implementation with IDR
API") changed last field of /proc/loadavg (last pid allocated) to be off
by one:
# unshare -p -f --mount-proc cat /proc/loadavg
0.00 0.00 0.00 1/60 2 <===
It should be 1 after first fork into pid namespace.
This is formally a regression but given how useless this field is I
don't think anyone is affected.
Bug was found by /proc testsuite!
Link: http://lkml.kernel.org/r/20180413175408.GA27246@avx2
Fixes: 95846ecf9dac508 ("pid: replace pid bitmap implementation with IDR API")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gargi Sharma <gs051095@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix crashes when loading modules built with a different CONFIG_RELOCATABLE
value by adding CONFIG_RELOCATABLE to vermagic.
- Fix busy loops in the OPAL NVRAM driver if we get certain error conditions
from firmware.
- Remove tlbie trace points from KVM code that's called in real mode, because
it causes crashes.
- Fix checkstops caused by invalid tlbiel on Power9 Radix.
- Ensure the set of CPU features we "know" are always enabled is actually the
minimal set when we build with support for firmware supplied CPU features.
Thanks to:
Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin.
-----BEGIN PGP SIGNATURE-----
iQIwBAABCAAaBQJa0oWBExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAV
EA//UQB7n33HlroMBL3841VswUrIgY36gSi9+QZPjxHiTYGVStI+u0FHq5hm8OMm
1FkronD0sfbN7t8LJ9FS6vCoAlW15vMBj95pWJXAL7NuQneO7cM5JvRbowVKbNbq
GESn4EkUj9bAXl7aTzX2yA7jruzMJIwE/2bgl1J6YkpByHx5x/fgzKTuoCRggQqY
Xd656ZZyWR/uIuWhAjCPFdrPnekVvo7/Gy5Yo5kcR6LbX4MO0JFNOHpiT3wPGGv/
LJedJtdpH1biMk7SrqLfWD7mpMsVJeMelpkftEbe8zUXf17wBd1rl4JkybLTDwZD
HznZ0nbnh0j5Q/KRHwOh0sLDSisJF6pQHH/Bnc4Xqrsn4OERwEk40/Ym/O0kDsjH
n2F3X5a5QLWoBWRsdV3BMyEzc0bgnb++tXeBWOV4jJBEOhoqxwimCU+3fmLiy95A
urJYf8Sljwve9I5kJyXKpruopgeoJ1aumRwopE8YCG9lfBgzvU/90u6+uKqAdkOv
kpZxS5FDT2lNIwbCP9WjEelAOGrtA6JQNWU0iPGwsVAvAxX/p1RwwbQeWVlWs3Ek
UdVF8eGezClpLPa/FQFdDyXfGE6WIf1vK9YnG7k3rUwwkSqoYxKgVQ3o1Vetp0Lg
fzPsDdDi2V2I3KDYNav8s6kohAIIS/a9XYk4BGvT/htRnUg=
=0e7c
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix crashes when loading modules built with a different
CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic.
- Fix busy loops in the OPAL NVRAM driver if we get certain error
conditions from firmware.
- Remove tlbie trace points from KVM code that's called in real mode,
because it causes crashes.
- Fix checkstops caused by invalid tlbiel on Power9 Radix.
- Ensure the set of CPU features we "know" are always enabled is
actually the minimal set when we build with support for firmware
supplied CPU features.
Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin.
* tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features
powerpc/mm/radix: Fix checkstops caused by invalid tlbiel
KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode
powerpc/8xx: Fix build with hugetlbfs enabled
powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
powerpc/fscr: Enable interrupts earlier before calling get_user()
powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
The OPAL NVRAM driver does not sleep in case it gets OPAL_BUSY or
OPAL_BUSY_EVENT from firmware, which causes large scheduling
latencies, and various lockup errors to trigger (again, BMC reboot
can cause it).
Fix this by converting it to the standard form OPAL_BUSY loop that
sleeps.
Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks")
Depends-on: 34dd25de9fe3 ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops")
Cc: stable@vger.kernel.org # v3.2+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* A rework of the filesytem-dax implementation provides for detection of
unmap operations (truncate / hole punch) colliding with in-progress
device-DMA. A fix for these collisions remains a work-in-progress
pending resolution of truncate latency and starvation regressions.
* The of_pmem driver expands the users of libnvdimm outside of x86 and
ACPI to describe an implementation of persistent memory on PowerPC with
Open Firmware / Device tree.
* Address Range Scrub (ARS) handling is completely rewritten to account for
the fact that ARS may run for 100s of seconds and there is no platform
defined way to cancel it. ARS will now no longer block namespace
initialization.
* The NVDIMM Namespace Label implementation is updated to handle label
areas as small as 1K, down from 128K.
* Miscellaneous cleanups and updates to unit test infrastructure.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJazDt5AAoJEB7SkWpmfYgCqGMQALLwdPeY87cUK7AvQ2IXj46B
lJgeVuHPzyQDbC03AS5uUYnnU3I5lFd7i4y7ZrywNpFs4lsb/bNmbUpQE5xp+Yvc
1MJ/JYDIP5X4misWYm3VJo85N49+VqSRgAQk52PBigwnZ7M6/u4cSptXM9//c9JL
/NYbat6IjjY6Tx49Tec6+F3GMZjsFLcuTVkQcREoOyOqVJE4YpP0vhNjEe0vq6vr
EsSWiqEI5VFH4PfJwKdKj/64IKB4FGKj2A5cEgjQBxW2vw7tTJnkRkdE3jDUjqtg
xYAqGp/Dqs4+bgdYlT817YhiOVrcr5mOHj7TKWQrBPgzKCbcG5eKDmfT8t+3NEga
9kBlgisqIcG72lwZNA7QkEHxq1Omy9yc1hUv9qz2YA0G+J1WE8l1T15k1DOFwV57
qIrLLUypklNZLxvrzNjclempboKc4JCUlj+TdN5E5Y6pRs55UWTXaP7Xf5O7z0vf
l/uiiHkc3MPH73YD2PSEGFJ8m8EU0N8xhrcz3M9E2sHgYCnbty1Lw3FH0/GhThVA
ya1mMeDdb8A2P7gWCBk1Lqeig+rJKXSey4hKM6D0njOEtMQO1H4tFqGjyfDX1xlJ
3plUR9WBVEYzN5+9xWbwGag/ezGZ+NfcVO2gmy6yXiEph796BxRAZx/18zKRJr0m
9eGJG1H+JspcbtLF9iHn
=acZQ
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This cycle was was not something I ever want to repeat as there were
several late changes that have only now just settled.
Half of the branch up to commit d2c997c0f145 ("fs, dax: use
page->mapping to warn...") have been in -next for several releases.
The of_pmem driver and the address range scrub rework were late
arrivals, and the dax work was scaled back at the last moment.
The of_pmem driver missed a previous merge window due to an oversight.
A sense of obligation to rectify that miss is why it is included for
4.17. It has acks from PowerPC folks. Stephen reported a build failure
that only occurs when merging it with your latest tree, for now I have
fixed that up by disabling modular builds of of_pmem. A test merge
with your tree has received a build success report from the 0day robot
over 156 configs.
An initial version of the ARS rework was submitted before the merge
window. It is self contained to libnvdimm, a net code reduction, and
passing all unit tests.
The filesystem-dax changes are based on the wait_var_event()
functionality from tip/sched/core. However, late review feedback
showed that those changes regressed truncate performance to a large
degree. The branch was rewound to drop the truncate behavior change
and now only includes preparation patches and cleanups (with full acks
and reviews). The finalization of this dax-dma-vs-trnucate work will
need to wait for 4.18.
Summary:
- A rework of the filesytem-dax implementation provides for detection
of unmap operations (truncate / hole punch) colliding with
in-progress device-DMA. A fix for these collisions remains a
work-in-progress pending resolution of truncate latency and
starvation regressions.
- The of_pmem driver expands the users of libnvdimm outside of x86
and ACPI to describe an implementation of persistent memory on
PowerPC with Open Firmware / Device tree.
- Address Range Scrub (ARS) handling is completely rewritten to
account for the fact that ARS may run for 100s of seconds and there
is no platform defined way to cancel it. ARS will now no longer
block namespace initialization.
- The NVDIMM Namespace Label implementation is updated to handle
label areas as small as 1K, down from 128K.
- Miscellaneous cleanups and updates to unit test infrastructure"
* tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits)
libnvdimm, of_pmem: workaround OF_NUMA=n build error
nfit, address-range-scrub: add module option to skip initial ars
nfit, address-range-scrub: rework and simplify ARS state machine
nfit, address-range-scrub: determine one platform max_ars value
powerpc/powernv: Create platform devs for nvdimm buses
doc/devicetree: Persistent memory region bindings
libnvdimm: Add device-tree based driver
libnvdimm: Add of_node to region and bus descriptors
libnvdimm, region: quiet region probe
libnvdimm, namespace: use a safe lookup for dimm device name
libnvdimm, dimm: fix dpa reservation vs uninitialized label area
libnvdimm, testing: update the default smart ctrl_temperature
libnvdimm, testing: Add emulation for smart injection commands
nfit, address-range-scrub: introduce nfit_spa->ars_state
libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device'
nfit, address-range-scrub: fix scrub in-progress reporting
dax, dm: allow device-mapper to operate without dax support
dax: introduce CONFIG_DAX_DRIVER
fs, dax: use page->mapping to warn if truncate collides with a busy page
ext2, dax: introduce ext2_dax_aops
...
Scan the devicetree for an nvdimm-bus compatible and create
a platform device for them.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Currently powernv reboot and shutdown requests just leave secondaries
to do their own things. This is undesirable because they can trigger
any number of watchdogs while waiting for reboot, but also we don't
know what else they might be doing -- they might be causing trouble,
trampling memory, etc.
The opal scheduled flash update code already ran into watchdog problems
due to flashing taking a long time, and it was fixed with 2196c6f1ed
("powerpc/powernv: Return secondary CPUs to firmware before FW update"),
which returns secondaries to opal. It's been found that regular reboots
can take over 10 seconds, which can result in the hard lockup watchdog
firing,
reboot: Restarting system
[ 360.038896709,5] OPAL: Reboot request...
Watchdog CPU:0 Hard LOCKUP
Watchdog CPU:44 detected Hard LOCKUP other CPUS:16
Watchdog CPU:16 Hard LOCKUP
watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0]
This patch removes the special case for flash update, and calls
smp_send_stop in all cases before calling reboot/shutdown.
smp_send_stop could return CPUs to OPAL, the main reason not to is
that the request could come from a NMI that interrupts OPAL code,
so re-entry to OPAL can cause a number of problems. Putting
secondaries into simple spin loops improves the chances of a
successful reboot.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The PSSCR value is not stored to PACA_REQ_PSSCR if the CPU does not
have the XER[SO] bug.
Fix this by storing up-front, outside the workaround code. The initial
test is not required because it is a slow path.
The workaround is made to depend on CONFIG_KVM_BOOK3S_HV_POSSIBLE, to
match pnv_power9_force_smt4_catch() where it is used. Drop the comment
on pnv_power9_force_smt4_catch() as it's no longer true.
Fixes: 7672691a08c8 ("powerpc/powernv: Provide a way to force a core into SMT4 mode")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
After migration the security feature flags might have changed (e.g.,
destination system with unpatched firmware), but some flags are not
set/clear again in init_cpu_char_feature_flags() because it assumes
the security flags to be the defaults.
Additionally, if the H_GET_CPU_CHARACTERISTICS hypercall fails then
init_cpu_char_feature_flags() does not run again, which potentially
might leave the system in an insecure or sub-optimal configuration.
So, just restore the security feature flags to the defaults assumed
by init_cpu_char_feature_flags() so it can set/clear them correctly,
and to ensure safe settings are in place in case the hypercall fail.
Fixes: f636c14790ea ("powerpc/pseries: Set or clear security feature flags")
Depends-on: 19887d6a28e2 ("powerpc: Move default security feature flags")
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add GENERIC_CPU support for little-endian rather than using POWER8
specific selection for POWER9 and above.
Restrict GENERIC_CPU to POWER8 and above on little endian.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Duplicate GENERIC_CPU to avoid a kbuild warning about the prompt
being redefined. Spell out that GENERIC means >= POWER4 for BE.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
POWER4 has been broken since at least the change 49d09bf2a6
("powerpc/64s: Optimise MSR handling in exception handling"), which
requires mtmsrd L=1 support. This was introduced in ISA v2.01, and
POWER4 supports ISA v2.00.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Implement a new function to invoke stop, power9_offline_stop, which is
like power9_idle_stop but used by the cpu hotplug code.
Move KVM secondary state manipulation code to the offline case.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Because the two memory blocks (usually called MEM1 and MEM2) are not
merged anymore, __request_region in kernel/resource.c will correctly
allow reserving regions in the physical address space between MEM1 and
MEM2, where many important peripherals are (GPIO, MMC, USB, ...).
A previous change to __ioremap_caller in arch/powerpc/mm/pgtable_32.c
ensures that multiple memblocks are properly considered in ioremap; this
makes it unnecessary to set __allow_ioremap_reserved.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The Hollywood chipset's GPIO controller has two sets of registers: One
for access by the PowerPC CPU, and one for access by the ARM coprocessor
(but both are accessible from the PPC because the memory firewall
(AHBPROT) is usually disabled when booting Linux, today).
The wii_power_off function currently assumes that the poweroff GPIO pin
is configured for use via the ARM side, but the upcoming GPIO driver
configures all pins for use via the PPC side, breaking poweroff.
Configure the owner register explicitly in wii_power_off to make
wii_power_off work with and without the new GPIO driver.
I think the Wii can be switched to the generic gpio-poweroff driver,
after the GPIO driver is merged.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Previously, wii_device_probe would only initialize devices under the
/hollywood node. After this patch, platform devices placed outside of
/hollywood will also be initialized.
The intended usecase for this are devices located outside of the
Hollywood chip, such as GPIO LEDs and GPIO buttons.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Bring in yet another series that touches KVM code, and might need to
be merged into the kvm-ppc branch to resolve conflicts.
This required some changes in pnv_power9_force_smt4_catch/release()
due to the paca array becomming an array of pointers.
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
[mpe: Add missing ';' to make it compile]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
opal_nvram_write currently just assumes success if it encounters an
error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO
on other errors instead.
Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks")
Cc: stable@vger.kernel.org # v3.2+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The H_CPU_BEHAV_* flags should be checked for in the 'behaviour' field
of 'struct h_cpu_char_result' -- 'character' is for H_CPU_CHAR_*
flags.
Found by playing around with QEMU's implementation of the hypercall:
H_CPU_CHAR=0xf000000000000000
H_CPU_BEHAV=0x0000000000000000
This clears H_CPU_BEHAV_FAVOUR_SECURITY and H_CPU_BEHAV_L1D_FLUSH_PR
so pseries_setup_rfi_flush() disables 'rfi_flush'; and it also
clears H_CPU_CHAR_L1D_THREAD_PRIV flag. So there is no RFI flush
mitigation at all for cpu_show_meltdown() to report; but currently
it does:
Original kernel:
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: RFI Flush
Patched kernel:
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Not affected
H_CPU_CHAR=0x0000000000000000
H_CPU_BEHAV=0xf000000000000000
This sets H_CPU_BEHAV_BNDS_CHK_SPEC_BAR so cpu_show_spectre_v1() should
report vulnerable; but currently it doesn't:
Original kernel:
# cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
Not affected
Patched kernel:
# cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
Vulnerable
Brown-paper-bag-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes: f636c14790ea ("powerpc/pseries: Set or clear security feature flags")
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We no longer allocate lppacas in an array, so this patch removes the
1kB static alignment for the structure, and enforces the PAPR
alignment requirements at allocation time. We can not reduce the 1kB
allocation size however, due to existing KVM hypervisors.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Change the paca array into an array of pointers to pacas. Allocate
pacas individually.
This allows flexibility in where the PACAs are allocated. Future work
will allocate them node-local. Platforms that don't have address limits
on PACAs would be able to defer PACA allocations until later in boot
rather than allocate all possible ones up-front then freeing unused.
This is slightly more overhead (one additional indirection) for cross
CPU paca references, but those aren't too common.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Checking for a "fully active" device state requires testing two flag
bits, which is open coded in several places, so add a function to do
it.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
GPUs and the corresponding NVLink bridges get different PEs as they
have separate translation validation entries (TVEs). We put these PEs
to the same IOMMU group so they cannot be passed through separately.
So the iommu_table_group_ops::set_window/unset_window for GPUs do set
tables to the NPU PEs as well which means that iommu_table's list of
attached PEs (iommu_table_group_link) has both GPU and NPU PEs linked.
This list is used for TCE cache invalidation.
The problem is that NPU PE has just a single TVE and can be programmed
to point to 32bit or 64bit windows while GPU PE has two (as any other
PCI device). So we end up having an 32bit iommu_table struct linked to
both PEs even though only the 64bit TCE table cache can be invalidated
on NPU. And a relatively recent skiboot detects this and prints
errors.
This changes GPU's iommu_table_group_ops::set_window/unset_window to
make sure that NPU PE is only linked to the table actually used by the
hardware. If there are two tables used by an IOMMU group, the NPU PE
will use the last programmed one which with the current use scenarios
is expected to be a 64bit one.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With enabled DEBUG, there is a compile error:
"error: ‘flags’ is used uninitialized in this function".
This moves pr_devel() little further where @flags are initialized.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that we have the security flags we can simplify the code in
pseries_setup_rfi_flush() because the security flags have pessimistic
defaults.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that we have the security flags we can significantly simplify the
code in pnv_setup_rfi_flush(), because we can use the flags instead of
checking device tree properties and because the security flags have
pessimistic defaults.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>