Commit Graph

729 Commits

Author SHA1 Message Date
Benjamin Herrenschmidt
aba0eb84c8 Merge branch 'eeh' into next 2012-03-13 10:15:35 +11:00
Benjamin Herrenschmidt
7230c56441 powerpc: Rework lazy-interrupt handling
The current implementation of lazy interrupts handling has some
issues that this tries to address.

We don't do the various workarounds we need to do when re-enabling
interrupts in some cases such as when returning from an interrupt
and thus we may still lose or get delayed decrementer or doorbell
interrupts.

The current scheme also makes it much harder to handle the external
"edge" interrupts provided by some BookE processors when using the
EPR facility (External Proxy) and the Freescale Hypervisor.

Additionally, we tend to keep interrupts hard disabled in a number
of cases, such as decrementer interrupts, external interrupts, or
when a masked decrementer interrupt is pending. This is sub-optimal.

This is an attempt at fixing it all in one go by reworking the way
we do the lazy interrupt disabling from the ground up.

The base idea is to replace the "hard_enabled" field with a
"irq_happened" field in which we store a bit mask of what interrupt
occurred while soft-disabled.

When re-enabling, either via arch_local_irq_restore() or when returning
from an interrupt, we can now decide what to do by testing bits in that
field.

We then implement replaying of the missed interrupts either by
re-using the existing exception frame (in exception exit case) or via
the creation of a new one from an assembly trampoline (in the
arch_local_irq_enable case).

This removes the need to play with the decrementer to try to create
fake interrupts, among others.

In addition, this adds a few refinements:

 - We no longer  hard disable decrementer interrupts that occur
while soft-disabled. We now simply bump the decrementer back to max
(on BookS) or leave it stopped (on BookE) and continue with hard interrupts
enabled, which means that we'll potentially get better sample quality from
performance monitor interrupts.

 - Timer, decrementer and doorbell interrupts now hard-enable
shortly after removing the source of the interrupt, which means
they no longer run entirely hard disabled. Again, this will improve
perf sample quality.

 - On Book3E 64-bit, we now make the performance monitor interrupt
act as an NMI like Book3S (the necessary C code for that to work
appear to already be present in the FSL perf code, notably calling
nmi_enter instead of irq_enter). (This also fixes a bug where BookE
perfmon interrupts could clobber r14 ... oops)

 - We could make "masked" decrementer interrupts act as NMIs when doing
timer-based perf sampling to improve the sample quality.

Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

v2:

- Add hard-enable to decrementer, timer and doorbells
- Fix CR clobber in masked irq handling on BookE
- Make embedded perf interrupt act as an NMI
- Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
  to retrigger an interrupt without preventing hard-enable

v3:

 - Fix or vs. ori bug on Book3E
 - Fix enabling of interrupts for some exceptions on Book3E

v4:

 - Fix resend of doorbells on return from interrupt on Book3E

v5:

 - Rebased on top of my latest series, which involves some significant
rework of some aspects of the patch.

v6:
 - 32-bit compile fix
 - more compile fixes with various .config combos
 - factor out the asm code to soft-disable interrupts
 - remove the C wrapper around preempt_schedule_irq

v7:
 - Fix a bug with hard irq state tracking on native power7
2012-03-09 13:25:06 +11:00
Gavin Shan
3780444c4f powerpc/eeh: pseries platform config space access in EEH
With the original EEH implementation, the access to config space of
the corresponding PCI device is done by RTAS sensitive function. That
depends on pci_dn heavily. That would limit EEH extension to other
platforms like powernv because other platforms might have different
ways to access PCI config space.

The patch splits those functions used to access PCI config space
and implement them in platform related EEH component. It would be
helpful to support EEH on multiple platforms simutaneously in future.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:39:51 +11:00
Gavin Shan
e575f8db1e powerpc/eeh: Introduce struct eeh_stats for EEH
With the original EEH implementation, the EEH global statistics
are maintained by individual global variables. That makes the
code a little hard to maintain.

The patch introduces extra struct eeh_stats for the EEH global
statistics so that it can be maintained in collective fashion.

It's the rework on the corresponding v5 patch. According to
the comments from David Laight, the EEH global statistics have
been changed for a litte bit so that they have fixed-type of
"u64". Also, the format used to print them has been changed to
"%llu" based on David's suggestion. Also, the output format of
EEH global statistics should be kept as intacted according to
Michael's suggestion that there might be tools parsing them.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:39:49 +11:00
Gavin Shan
54793d0ef1 powerpc/eeh: Replace pci_dn with eeh_dev for EEH on pSeries
The pci_dn has been replaced with eeh_dev. In order to comply with
the rule, the EEH platform implementation on pSeries should also
be adjusted for a little bit so that it will depend on eeh_dev instead
of pci_dn.

The patch replaces pci_dn with eeh_dev. The corresponding information
will be retrieved from eeh_dev instead of pci_dn.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:39:48 +11:00
Gavin Shan
40a7cd9219 powerpc/eeh: Replace pci_dn with eeh_dev for EEH aux components
The original EEH implementation is heavily depending on struct pci_dn.
We have to put EEH related information to pci_dn. Actually, we could
split struct pci_dn so that the EEH sensitive information to form an
individual struct, then EEH looks more independent.

The patch replaces pci_dn with eeh_dev for EEH aux components like
event and driver. Also, the eeh_event struct has been adjusted for
a little bit since eeh_dev has linked the associated FDT (Flat Device
Tree) node and PCI device. It's not necessary for eeh_event struct to
trace FDT node and PCI device. We can just simply to trace eeh_dev in
eeh_event.

The patch also renames function pcid_name() to eeh_pcid_name(), which
should be missed in the previous patch where the EEH aux components
have been cleaned up.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:39:46 +11:00
Gavin Shan
f631acd3e9 powerpc/eeh: Replace pci_dn with eeh_dev for EEH core
The original EEH implementation is heavily depending on struct pci_dn.
We have to put EEH related information to pci_dn. Actually, we could
split struct pci_dn so that the EEH sensitive information to form an
individual struct, then EEH looks more independent.

The patch replaces pci_dn with eeh_dev for EEH core.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:39:44 +11:00
Gavin Shan
d50a7d4c6f powerpc/eeh: Replace pci_dn with eeh_dev for EEH address cache
With original EEH implementation, struct pci_dn is used while building
PCI I/O address cache, which helps on searching the corresponding
PCI device according to the given physical I/O address. Besides, pci_dn
is associated with the corresponding PCI device while building its
I/O cache.

The patch replaces struct pci_dn with struct eeh_dev so that EEH address
cache won't depend on struct pci_dn. That will help EEH to become an
independent module in future. Besides, the binding of eeh_dev and PCI
device is done while building PCI device I/O cache.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:39:42 +11:00
Gavin Shan
44da8edc6a powerpc/eeh: Replace pci_dn with eeh_dev for EEH sysfs
With original EEH implementation, all EEH related statistics have
been put into struct pci_dn. We've introduced struct eeh_dev to
replace struct pci_dn in EEH core components, including EEH sysfs
component.

The patch shows EEH statistics from struct eeh_dev instead of struct
pci_dn in EEH sysfs component. Besides, it also fixed the EEH device
retrieval from PCI device, which was introduced by the previous patch
in the series of patch.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:39:40 +11:00
Gavin Shan
eb740b5f3e powerpc/eeh: Introduce EEH device
Original EEH implementation depends on struct pci_dn heavily. However,
EEH shouldn't depend on that actually because EEH needn't share much
information with other PCI components. That's to say, EEH should have
worked independently.

The patch introduces struct eeh_dev so that EEH core components needn't
be working based on struct pci_dn in future. Also, struct pci_dn, struct
eeh_dev instances are created in dynamic fasion and the binding with EEH
device, OF node, PCI device is implemented as well.

The EEH devices are created after PHBs are detected and initialized, but
PCI emunation hasn't started yet. Apart from that, PHB might be created
dynamically through DLPAR component and the EEH devices should be creatd
as well. Another case might be OF node is created dynamically by DR
(Dynamic Reconfiguration), which has been defined by PAPR. For those OF
nodes created by DR, EEH devices should be also created accordingly. The
binding between EEH device and OF node is done while the EEH device is
initially created.

The binding between EEH device and PCI device should be done after PCI
emunation is done. Besides, PCI hotplug also needs the binding so that
the EEH devices could be traced from the newly coming PCI buses or PCI
devices.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:39:29 +11:00
Gavin Shan
def9d83da4 powerpc/eeh: Cleanup function names in EEH aux components
The patch does some cleanup on the function names of EEH
aux components. Currently, only couple of function names from
eeh_cache have been adjusted so that:

        * The function name has prefix "eeh_addr_cache".
        * Move around pci_addr_cache_build() in the header file
          to reflect function call sequence.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:11:37 +11:00
Gavin Shan
29f8bf1b7f powerpc/pseries: Cleanup comments in EEH aux components
There're several EEH aux components and the patch does some cleanup
for them so that they look more clean.

        * Duplicated comments have been removed from the header file.
        * Comments have been reorganized so that it looks more clean.
        * The leading comments of functions are adjusted for a little
          bit so that the result of "make pdfdocs" would be more
          unified.
        * Function calls "xxx ()" has been replaced by "xxx()".

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:11:20 +11:00
Gavin Shan
1823fbf119 powerpc/eeh: pseries platform EEH configure bridge
In order to enable particular PCI device, which has been included
in the parent PE. The involved PCI bridges should be enabled explicitly
if there has. On pSeries platform, there're dedicated RTAS calls
to fulfil the purpose.

The patch implements the function of configuring PCI bridges through
the dedicated RTAS calls. Besides, the function has been abstracted
by struct eeh_ops::configure_bridge so that the EEH core components
could support multiple platforms in future.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:11:11 +11:00
Gavin Shan
8d633291b4 powerpc/eeh: pseries platform EEH error log retrieval
On RTAS compliant pSeries platform, one dedicated RTAS call has
been introduced to retrieve EEH temporary or permanent error log.

The patch implements the function of retriving EEH error log through
RTAS call. Besides, it has been abstracted by struct eeh_ops::get_log
so that EEH core components could support multiple platforms in future.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:11:01 +11:00
Gavin Shan
2652481f75 powerpc/eeh: pseries platform EEH reset PE
On RTAS compliant pSeries platform, there is a dedicated RTAS call
(ibm,set-slot-reset) to reset the specified PE. Furthermore, two
types of resets are supported: hot and fundamental. the type of
reset is to be used actually depends on the included PCI device's
requirements.

The patch implements resetting PE on pSeries platform through RTAS
call. Besides, it has been abstracted through struct eeh_ops::reset
so that EEH core components could support multiple platforms in future.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:10:49 +11:00
Gavin Shan
b0e5f742f1 powerpc/eeh: pseries platform EEH wait PE state
On pSeries platform, the PE state might be temporarily unavailable.
In that case, the firmware will return the corresponding wait time.
That means the kernel has to wait for appropriate time in order to
get the PE state.

The patch does the implementation for that. Besides, the function
has been abstracted through struct eeh_ops::wait_state so that EEH core
components could support multiple platforms in future.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:10:39 +11:00
Gavin Shan
eb594a4754 powerpc/eeh: pseries platform PE state retrieval
On pSeries platform, there're 2 dedicated RTAS calls introduced to
retrieve the corresponding PE's state: ibm,read-slot-reset-state and
ibm,read-slot-reset-state2.

The patch implements the retrieval of PE's state according to the
given PE address. Besides, the implementation has been abstracted by
struct eeh_ops::get_state so that EEH core components could support
multiple platforms in future.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:10:26 +11:00
Gavin Shan
c8c29b38fb powerpc/eeh: pseries platform EEH PE address retrieval
There're 2 types of addresses used for EEH operations. The first
one would be BDF (Bus/Device/Function) address which is retrieved
from the reg property of the corresponding FDT node. Another one
is PE address that should be enquired from firmware through RTAS
call on pSeries platform. When issuing EEH operation, the PE address
has precedence over BDF address.

The patch implements retrieving PE address according to the given
BDF address on pSeries platform. Also, the struct eeh_early_enable_info
has been removed since the information can be figured out from
dn->pdn->phb->buid directly and that simplifies the code.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:10:09 +11:00
Gavin Shan
8fb8f70902 powerpc/eeh: pseries platform EEH operations
There're 4 EEH operations that are covered by the dedicated RTAS
call <ibm,set-eeh-option>: enable or disable EEH, enable MMIO and
enable DMA. At early stage of system boot, the EEH would be tried
to enable on PCI device related device node. MMIO and DMA for
particular PE should be enabled when doing recovery on EEH errors
so that the PE could function properly again.

The patch implements it and abstract that through struct
eeh_ops::set_eeh. It would be help for EEH to support multiple
platforms in future.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:09:49 +11:00
Gavin Shan
e2af155c2a powerpc/eeh: pseries platform EEH initialization
The platform specific EEH operations have been abstracted by
struct eeh_ops. The individual platroms, including pSeries, needs
doing necessary initialization before the platform dependent EEH
operations work properly.

The patch is addressing that and do necessary platform initialization
for pSeries platform. More specificly, it will figure out the tokens
of EEH related RTAS calls.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:09:36 +11:00
Gavin Shan
aa1e6374ae powerpc/eeh: Platform dependent EEH operations
EEH has been implemented on RTAS-compliant pSeries platform.
That's to say, the EEH operations will be implemented through RTAS
calls eventually. The situation limited feasible extension on EEH.
In order to support EEH on multiple platforms like pseries and powernv
simutaneously. We have to split the platform dependent EEH options
up out of current implementation.

The patch addresses supporting EEH on multiple platforms. The pseries
platform dependent EEH operations will be abstracted by struct eeh_ops.
EEH core components will be built based on the registered EEH operations.
With the mechanism, what the individual platform needs to do is implement
platform dependent EEH operations.

For now, the pseries platform is covered under the mechanism. That means
we have to think about other platforms to support EEH, like powernv.
Besides, we only have framework for the mechanism and we have to implement
it for pseries platform later.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:08:54 +11:00
Gavin Shan
cce4b2d243 powerpc/eeh: Cleanup function names in the EEH core
The EEH has been implemented on pSeries platform. The original
code looks a little bit nasty. The patch does cleanup on the
current EEH implementation so that it looks more clean.

        * Try adding prefix "eeh" for functions.
        * Some function names have been adjusted so that they looks
          shorter and meaningful.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:08:37 +11:00
Gavin Shan
cb3bc9d0de powerpc/eeh: Cleanup comments in the EEH core
The EEH has been implemented on pSeries platform. The original
code looks a little bit nasty. The patch does cleanup on the
current EEH implementation so that it looks more clean.

        * Duplicated comments have been removed from the corresponding
          header files.
        * Comments have been reorganized so that it looks more clean.
        * The leading comments of functions are adjusted for a little
          bit so that the result of "make pdfdocs" would be more
          unified.
        * Function definitions and calls have unified format as "xxx()".
          That means the format "xxx ()" has been replaced by "xxx()".
        * There're multiple functions implemented for resetting PE. The
          position of those functions have been move around so that they
          are adjacent to each other to reflect their relationship.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:08:11 +11:00
Stephen Rothwell
3d066d77cf powerpc: remove CONFIG_PPC_ISERIES from the architecture Kconfig files
After this, we can remove the legacy iSeries code more easily.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-27 11:33:58 +11:00
Mahesh Salgaonkar
12d9299241 fadump: Remove the phyp assisted dump code.
Remove the phyp assisted dump implementation which is not is use.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:03 +11:00
Kyle Moffett
e55d7f737d powerpc/mpic: Remove duplicate MPIC_WANTS_RESET flag
There are two separate flags controlling whether or not the MPIC is
reset during initialization, which is completely unnecessary, and only
one of them can be specified in the device tree.

Also, most platforms in-tree right now do actually want to reset the
MPIC during initialization anyways, which means lots of duplicate code
passing the MPIC_WANTS_RESET flag.

Fix all of the callers which currently do not pass the MPIC_WANTS_RESET
flag to pass the MPIC_NO_RESET flag, then remove the MPIC_WANTS_RESET
flag and make the code reset the MPIC by default.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:00 +11:00
Kyle Moffett
5019609fce powerpc/mpic: Remove MPIC_BROKEN_FRR_NIRQS and duplicate irq_count
The mpic->irq_count variable is only used as a software error-checking
limit to determine whether or not an IRQ number is valid.  In board code
which does not manually specify an IRQ count to mpic_alloc(), i.e. 0, it
is automatically detected from the number of ISUs and the ISU size.

In practice, all hardware ends up with irq_count == num_sources, so all
of the runtime checks on mpic->irq_count should just check the value of
mpic->num_sources instead.

When platform hardware does not correctly report the number of IRQs,
which only happens on the MPC85xx/MPC86xx, the MPIC_BROKEN_FRR_NIRQS
flag is used to override the detected value of num_sources with the
manual irq_count parameter.  Since there's no need to manually specify
the number of IRQs except in this case, the extra flag can be eliminated
and the test changed to "irq_count != 0".

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:49:59 +11:00
Thadeu Lima de Souza Cascardo
778a785f02 powerpc/pseries/eeh: Fix crash when error happens during device probe
EEH may happen during a PCI driver probe. If the driver is trying to
access some register in a loop, the EEH code will try to print the
driver name. But the driver pointer in struct pci_dev is not set until
probe returns successfully.

Use a function to test if the device and the driver pointer is NULL
before accessing the driver's name.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-14 15:01:39 +11:00
Brian King
444080d13d powerpc/pseries: Fix partition migration hang in stop_topology_update
This fixes a hang that was observed during live partition migration.
Since stop_topology_update must not be called from an interrupt
context, call it earlier in the migration process. The hang observed
can be seen below:

WARNING: at kernel/timer.c:1011
Modules linked in: ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables ipv6 fuse loop ibmveth sg ext3 jbd mbcache raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid10 raid1 raid0 scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_round_robin dm_multipath scsi_dh sd_mod crc_t10dif ibmvfc scsi_transport_fc scsi_tgt scsi_mod dm_snapshot dm_mod
NIP: c0000000000c52d8 LR: c00000000004be28 CTR: 0000000000000000
REGS: c00000005ffd77d0 TRAP: 0700   Not tainted  (3.2.0-git-00001-g07d106d)
MSR: 8000000000021032 <ME,CE,IR,DR>  CR: 48000084  XER: 00000001
CFAR: c00000000004be20
TASK = c00000005ec78860[0] 'swapper/3' THREAD: c00000005ec98000 CPU: 3
GPR00: 0000000000000001 c00000005ffd7a50 c000000000fbbc98 c000000000ec8340
GPR04: 00000000282a0020 0000000000000000 0000000000004000 0000000000000101
GPR08: 0000000000000012 c00000005ffd4000 0000000000000020 c000000000f3ba88
GPR12: 0000000000000000 c000000007f40900 0000000000000001 0000000000000004
GPR16: 0000000000000001 0000000000000000 0000000000000000 c000000001022310
GPR20: 0000000000000001 0000000000000000 0000000000200200 c000000001029e14
GPR24: 0000000000000000 0000000000000001 0000000000000040 c00000003f74bc80
GPR28: c00000003f74bc84 c000000000f38038 c000000000f16b58 c000000000ec8340
NIP [c0000000000c52d8] .del_timer_sync+0x28/0x60
LR [c00000000004be28] .stop_topology_update+0x20/0x38
Call Trace:
[c00000005ffd7a50] [c00000005ec78860] 0xc00000005ec78860 (unreliable)
[c00000005ffd7ad0] [c00000000004be28] .stop_topology_update+0x20/0x38
[c00000005ffd7b40] [c000000000028378] .__rtas_suspend_last_cpu+0x58/0x260
[c00000005ffd7bf0] [c0000000000fa230] .generic_smp_call_function_interrupt+0x160/0x358
[c00000005ffd7cf0] [c000000000036ec8] .smp_ipi_demux+0x88/0x100
[c00000005ffd7d80] [c00000000005c154] .icp_hv_ipi_action+0x5c/0x80
[c00000005ffd7e00] [c00000000012a088] .handle_irq_event_percpu+0x100/0x318
[c00000005ffd7f00] [c00000000012e774] .handle_percpu_irq+0x84/0xd0
[c00000005ffd7f90] [c000000000022ba8] .call_handle_irq+0x1c/0x2c
[c00000005ec9ba20] [c00000000001157c] .do_IRQ+0x22c/0x2a8
[c00000005ec9bae0] [c0000000000054bc] hardware_interrupt_entry+0x18/0x1c
Exception: 501 at .cpu_idle+0x194/0x2f8
    LR = .cpu_idle+0x194/0x2f8
[c00000005ec9bdd0] [c000000000017e58] .cpu_idle+0x188/0x2f8 (unreliable)
[c00000005ec9be90] [c00000000067ec18] .start_secondary+0x3e4/0x524
[c00000005ec9bf90] [c0000000000093e8] .start_secondary_prolog+0x10/0x14
Instruction dump:
ebe1fff8 4e800020 fbe1fff8 7c0802a6 f8010010 7c7f1b78 f821ff81 78290464
80090014 5400019e 7c0000d0 78000fe0 <0b000000> 4800000c 7c210b78 7c421378

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-14 15:01:39 +11:00
Deepthi Dharwar
f7aa554510 powerpc/cpuidle: Make it a bool, not a tristate
As pointed out, asm/system.h has empty inline implementations for
update_smt_snooze_delay and pseries_notify_cpuidle_add_cpu, which are
used when CONFIG_PSERIES_IDLE is undefined. Since those two functions
are used in core power architecture functions (store_smt_snooze_delay
at kernel/sysfs.c and smp_xics_setup_cpu at platforms/pseries/smp.c),

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-25 09:43:06 +11:00
Linus Torvalds
8e63dd6e1c Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Fix unpaired __trace_hcall_entry and __trace_hcall_exit
  powerpc: Fix RCU idle and hcall tracing
2012-01-14 12:26:23 -08:00
WANG Cong
a3dd332305 kexec: remove KMSG_DUMP_KEXEC
KMSG_DUMP_KEXEC is useless because we already save kernel messages inside
/proc/vmcore, and it is unsafe to allow modules to do other stuffs in a
crash dump scenario.

[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:11 -08:00
Li Zhong
ebb7f616ab powerpc: Fix unpaired __trace_hcall_entry and __trace_hcall_exit
Unpaired calling of __trace_hcall_entry and __trace_hcall_exit could
 cause incorrect preempt count. And it might happen as the global
 variable hcall_tracepoint_refcount is checked separately before calling
 them.

 Instead, store the value that was used on entry in the stack frame
 and retreive it from there after the call

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-11 12:50:26 +11:00
Anton Blanchard
a5ccfee05a powerpc: Fix RCU idle and hcall tracing
Tracepoints should not be called inside an rcu_idle_enter/rcu_idle_exit
region. Since pSeries calls H_CEDE in the idle loop, we were violating
this rule.

commit a7b152d534 (powerpc: Tell RCU about idle after hcall tracing)
tried to work around it by delaying the rcu_idle_enter until after we
called the hcall tracepoint, but there are a number of issues with it.

The hcall tracepoint trampoline code is called conditionally when the
tracepoint is enabled. If the tracepoint is not enabled we never call
rcu_idle_enter. The idle_uses_rcu check was also done at compile time
which breaks multiplatform builds.

The simple fix is to avoid tracing H_CEDE and rely on other tracepoints
and the hypervisor dispatch trace log to work out if we called H_CEDE.

This fixes a hang during boot on pSeries.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-11 11:54:20 +11:00
Linus Torvalds
7affca3537 Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
  arm: fix up some samsung merge sysdev conversion problems
  firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
  Drivers:hv: Fix a bug in vmbus_driver_unregister()
  driver core: remove __must_check from device_create_file
  debugfs: add missing #ifdef HAS_IOMEM
  arm: time.h: remove device.h #include
  driver-core: remove sysdev.h usage.
  clockevents: remove sysdev.h
  arm: convert sysdev_class to a regular subsystem
  arm: leds: convert sysdev_class to a regular subsystem
  kobject: remove kset_find_obj_hinted()
  m86k: gpio - convert sysdev_class to a regular subsystem
  mips: txx9_sram - convert sysdev_class to a regular subsystem
  mips: 7segled - convert sysdev_class to a regular subsystem
  sh: dma - convert sysdev_class to a regular subsystem
  sh: intc - convert sysdev_class to a regular subsystem
  power: suspend - convert sysdev_class to a regular subsystem
  power: qe_ic - convert sysdev_class to a regular subsystem
  power: cmm - convert sysdev_class to a regular subsystem
  s390: time - convert sysdev_class to a regular subsystem
  ...

Fix up conflicts with 'struct sysdev' removal from various platform
drivers that got changed:
 - arch/arm/mach-exynos/cpu.c
 - arch/arm/mach-exynos/irq-eint.c
 - arch/arm/mach-s3c64xx/common.c
 - arch/arm/mach-s3c64xx/cpu.c
 - arch/arm/mach-s5p64x0/cpu.c
 - arch/arm/mach-s5pv210/common.c
 - arch/arm/plat-samsung/include/plat/cpu.h
 - arch/powerpc/kernel/sysfs.c
and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
2012-01-07 12:03:30 -08:00
Linus Torvalds
e4e88f31bc Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (185 commits)
  powerpc: fix compile error with 85xx/p1010rdb.c
  powerpc: fix compile error with 85xx/p1023_rds.c
  powerpc/fsl: add MSI support for the Freescale hypervisor
  arch/powerpc/sysdev/fsl_rmu.c: introduce missing kfree
  powerpc/fsl: Add support for Integrated Flash Controller
  powerpc/fsl: update compatiable on fsl 16550 uart nodes
  powerpc/85xx: fix PCI and localbus properties in p1022ds.dts
  powerpc/85xx: re-enable ePAPR byte channel driver in corenet32_smp_defconfig
  powerpc/fsl: Update defconfigs to enable some standard FSL HW features
  powerpc: Add TBI PHY node to first MDIO bus
  sbc834x: put full compat string in board match check
  powerpc/fsl-pci: Allow 64-bit PCIe devices to DMA to any memory address
  powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exit
  offb: Fix setting of the pseudo-palette for >8bpp
  offb: Add palette hack for qemu "standard vga" framebuffer
  offb: Fix bug in calculating requested vram size
  powerpc/boot: Change the WARN to INFO for boot wrapper overlap message
  powerpc/44x: Fix build error on currituck platform
  powerpc/boot: Change the load address for the wrapper to fit the kernel
  powerpc/44x: Enable CRASH_DUMP for 440x
  ...

Fix up a trivial conflict in arch/powerpc/include/asm/cputime.h due to
the additional sparse-checking code for cputime_t.
2012-01-06 17:58:22 -08:00
Li Zhong
e4f387d8db powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exit
Unpaired calling of probe_hcall_entry and probe_hcall_exit might happen
as following, which could cause incorrect preempt count.

__trace_hcall_entry => trace_hcall_entry -> probe_hcall_entry =>
get_cpu_var => preempt_disable

__trace_hcall_exit => trace_hcall_exit -> probe_hcall_exit =>
put_cpu_var => preempt_enable

where:
A => B and A -> B means A calls B, but
=> means A will call B through function name, and B will definitely be
called.
-> means A will call B through function pointer, so B might not be
called if the function pointer is not set.

So error happens when only one of probe_hcall_entry and probe_hcall_exit
get called during a hcall.

This patch tries to move the preempt count operations from
probe_hcall_entry and probe_hcall_exit to its callers.

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: stable@kernel.org [v2.6.32+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-03 12:09:27 +11:00
Kay Sievers
edbaa603eb driver-core: remove sysdev.h usage.
The sysdev.h file should not be needed by any in-kernel code, so remove
the .h file from these random files that seem to still want to include
it.

The sysdev code will be going away soon, so this include needs to be
removed no matter what.

Cc: Jiandong Zheng <jdzheng@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: Daniel Walker <dwalker@fifo99.com>
Cc: Bryan Huntsman <bryanh@codeaurora.org>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Wan ZongShun <mcuos.com@gmail.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: "Venkatesh Pallipadi
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
2011-12-21 16:26:03 -08:00
Kay Sievers
86ba41d033 power: suspend - convert sysdev_class to a regular subsystem
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 15:09:52 -08:00
Kay Sievers
6c9d290952 power: cmm - convert sysdev_class to a regular subsystem
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 15:09:51 -08:00
Kay Sievers
8a25a2fd12 cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.

After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Userspace relies on events and generic sysfs subsystem infrastructure
from sysdev devices, which are made available with this conversion.

Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Len Brown <lenb@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 14:29:42 -08:00
Benjamin Herrenschmidt
43ca5d347a Merge branch 'kexec' into next 2011-12-16 11:09:21 +11:00
Benjamin Herrenschmidt
e6f08d37e6 Merge branch 'cpuidle' into next 2011-12-16 11:09:11 +11:00
Paul E. McKenney
a7b152d534 powerpc: Tell RCU about idle after hcall tracing
The PowerPC pSeries platform (CONFIG_PPC_PSERIES=y) enables
hypervisor-call tracing for CONFIG_TRACEPOINTS=y kernels.  One of the
hypervisor calls that is traced is the H_CEDE call in the idle loop
that tells the hypervisor that this OS instance no longer needs the
current CPU.  However, tracing uses RCU, so this combination of kernel
configuration variables needs to avoid telling RCU about the current CPU's
idleness until after the H_CEDE-entry tracing completes on the one hand,
and must tell RCU that the the current CPU is no longer idle before the
H_CEDE-exit tracing starts.

In all other cases, it suffices to inform RCU of CPU idleness upon
idle-loop entry and exit.

This commit makes the required adjustments.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:39 -08:00
Anton Blanchard
120a52c388 powerpc/nvram: Add spinlock to oops_to_nvram to prevent oops in compression code.
When issuing a system reset we almost always oops in the oops_to_nvram
code because multiple CPUs are using the deflate work area. Add a
spinlock to protect it.

To play it safe I'm using trylock to avoid locking up if the NVRAM
code oopses. This means we might miss multiple CPUs oopsing at exactly
the same time but I think it's best to play it safe for now. Once we
are happy with the reliability we can change it to a full spinlock.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-08 14:22:54 +11:00
Anton Blanchard
a934904d8a powerpc: Reduce pseries panic timeout from 180s to 10s
We've had a 180 second panic timeout on ppc64 for as long as I
can remember. This patch reduces it to 10 seconds on pseries for a few
reasons:

- Almost all pseries machines have a hypervisor console so panic
  output will be available in a scrollback buffer.

- The 180 seconds impacts our availability, users (other than
  kernel hackers) just want the box to come back around so it
  can continue its work.

- I spend a lot of my life staring at the 180 second panic timeout.
  Many pseries machines take minutes to power cycle, so it's quicker
  to sit through the 180 seconds than it is to power cycle.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-08 14:02:23 +11:00
Deepthi Dharwar
e8bb3e00cf powerpc/cpuidle: Handle power_save=off
This patch makes pseries_idle_driver not to be registered when
power_save=off kernel boot option is specified. The
cpuidle_disable variable used here is similar to
its usage on x86. If cpuidle_disable is set then
sysfs entries for cpuidle framework are not created
and the required drivers are not loaded.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Signed-off-by: Arun R Bharadwaj <arun.r.bharadwaj@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-08 13:57:34 +11:00
Deepthi Dharwar
e179816ce6 powerpc/cpuidle: Enable cpuidle and directly call cpuidle_idle_call() for pSeries
This patch enables cpuidle for pSeries and pSeries_idle is
directly called from the idle loop. As a result of pSeries_idle, cpuidle
driver registered with cpuidle subsystem comes into action. On
failure of loading of the driver or cpuidle framework default idle
is executed as part of the function. This patch
also removes the routines pseries_shared_idle_sleep and
pseries_dedicated_idle_sleep as they are now implemented as part of
pseries_idle cpuidle driver.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Signed-off-by: Arun R Bharadwaj <arun.r.bharadwaj@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-08 13:57:20 +11:00
Deepthi Dharwar
707827f338 powerpc/cpuidle: cpuidle driver for pSeries
This patch implements a back-end cpuidle driver for pSeries
based on pseries_dedicated_idle_loop and pseries_shared_idle_loop
routines.  The driver is built only if CONFIG_CPU_IDLE is set. This
cpuidle driver uses global registration of idle states and
not per-cpu.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Signed-off-by: Arun R Bharadwaj <arun.r.bharadwaj@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-08 13:56:31 +11:00
Kyle Moffett
be8bec56df powerpc/mpic: Invert the meaning of MPIC_PRIMARY
It turns out that there are only 2 in-tree platforms which use MPICs
which are not "primary":  IBM Cell and PowerMac.  To reduce the
complexity of the typical board setup code, invert the MPIC_PRIMARY bit
into MPIC_SECONDARY.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-07 13:43:08 +11:00