6414 Commits

Author SHA1 Message Date
Anton Blanchard
de4376c284 powerpc: Preload application text segment instead of TASK_UNMAPPED_BASE
TASK_UNMAPPED_BASE is not used with the new top down mmap layout. We can
reuse this preload slot by loading in the segment at 0x10000000, where almost
all PowerPC binaries are linked at.

On a microbenchmark that bounces a token between two 64bit processes over pipes
and calls gettimeofday each iteration (to access the VDSO), both the 32bit and
64bit context switch rate improves (tested on a 4GHz POWER6):

32bit: 273k/sec -> 283k/sec
64bit: 277k/sec -> 284k/sec

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:26 +10:00
Anton Blanchard
5eb9bac040 powerpc: Rearrange SLB preload code
With the new top down layout it is likely that the pc and stack will be in the
same segment, because the pc is most likely in a library allocated via a top
down mmap. Right now we bail out early if these segments match.

Rearrange the SLB preload code to sanity check all SLB preload addresses
are not in the kernel, then check all addresses for conflicts.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:25 +10:00
Anton Blanchard
30d0b36828 powerpc: Move 64bit VDSO to improve context switch performance
On 64bit applications the VDSO is the only thing in segment 0. Since the VDSO
is position independent we can remove the hint and let get_unmapped_area pick
an area. This will mean the vdso will be near other mmaps and will share
an SLB entry:

10000000-10001000 r-xp 00000000 08:06 5778459        /root/context_switch_64
10010000-10011000 r--p 00000000 08:06 5778459        /root/context_switch_64
10011000-10012000 rw-p 00001000 08:06 5778459        /root/context_switch_64
fffa92ae000-fffa92b0000 rw-p 00000000 00:00 0
fffa92b0000-fffa9453000 r-xp 00000000 08:06 4334051  /lib64/power6/libc-2.9.so
fffa9453000-fffa9462000 ---p 001a3000 08:06 4334051  /lib64/power6/libc-2.9.so
fffa9462000-fffa9466000 r--p 001a2000 08:06 4334051  /lib64/power6/libc-2.9.so
fffa9466000-fffa947c000 rw-p 001a6000 08:06 4334051  /lib64/power6/libc-2.9.so
fffa947c000-fffa9480000 rw-p 00000000 00:00 0
fffa9480000-fffa94a8000 r-xp 00000000 08:06 4333852  /lib64/ld-2.9.so
fffa94b3000-fffa94b4000 rw-p 00000000 00:00 0

fffa94b4000-fffa94b7000 r-xp 00000000 00:00 0        [vdso] <----- here I am

fffa94b7000-fffa94b8000 r--p 00027000 08:06 4333852  /lib64/ld-2.9.so
fffa94b8000-fffa94bb000 rw-p 00028000 08:06 4333852  /lib64/ld-2.9.so
fffa94bb000-fffa94bc000 rw-p 00000000 00:00 0
fffe4c10000-fffe4c25000 rw-p 00000000 00:00 0        [stack]

On a microbenchmark that bounces a token between two 64bit processes over pipes
and calls gettimeofday each iteration (to access the VDSO), our context switch
rate goes from 268k to 277k ctx switches/sec (tested on a 4GHz POWER6).

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:24 +10:00
Geoff Thorpe
0d2d3e38f7 powerpc: expose the multi-bit ops that underlie single-bit ops.
The bitops.h functions that operate on a single bit in a bitfield are
implemented by operating on the corresponding word location. In all
cases the inner logic is valid if the mask being applied has more than
one bit set, so this patch exposes those inner operations. Indeed,
set_bits() was already available, but it duplicated code from
set_bit() (rather than making the latter a wrapper) - it was also
missing the PPC405_ERR77() workaround and the "volatile" address
qualifier present in other APIs. This corrects that, and exposes the
other multi-bit equivalents.

One advantage of these multi-bit forms is that they allow word-sized
variables to essentially be their own spinlocks, eg. very useful for
state machines where an atomic "flags" variable can obviate the need
for any additional locking.

Signed-off-by: Geoff Thorpe <geoff@geoffthorpe.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:23 +10:00
Michael Ellerman
11a6b292c1 powerpc/mpic: Fix MPIC_BROKEN_REGREAD on non broken MPICs
The workaround enabled by CONFIG_MPIC_BROKEN_REGREAD does not work
on non-broken MPICs. The symptom is no interrupts being received.

The fix is twofold. Firstly the code was broken for multiple isus,
we need to index into the shadow array with the src_no, not the idx.
Secondly, we always do the read, but only use the VECPRI_MASK and
VECPRI_ACTIVITY bits from the hardware, the rest of "val" comes
from the shadow.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:22 +10:00
Gerhard Pircher
66dc3304f3 powerpc/amigaone: Convert amigaone_init() to a machine_device_initcall()
This allows to remove the ppc_md.init() hook in the setup code.

Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:21 +10:00
Paul Mackerras
20002ded4d perf_counter: powerpc: Add callchain support
This adds support for tracing callchains for powerpc, both 32-bit
and 64-bit, and both in the kernel and userspace, from PMU interrupt
context.

The first three entries stored for each callchain are the NIP (next
instruction pointer), LR (link register), and the contents of the LR
save area in the second stack frame (the first is ignored because the
ABI convention on powerpc is that functions save their return address
in their caller's stack frame).  Because leaf functions don't have to
save their return address (LR value) and don't have to establish a
stack frame, it's possible for either or both of LR and the second
stack frame's LR save area to have valid return addresses in them.
This is basically impossible to disambiguate without either reading
the code or looking at auxiliary information such as CFI tables.
Since we don't want to do either of those things at interrupt time,
we store both LR and the second stack frame's LR save area.

Once we get past the second stack frame, there is no ambiguity; all
return addresses we get are reliable.

For kernel traces, we check whether they are valid kernel instruction
addresses and store zero instead if they are not (rather than
omitting them, which would make it impossible for userspace to know
which was which).  We also store zero instead of the second stack
frame's LR save area value if it is the same as LR.

For kernel traces, we check for interrupt frames, and for user traces,
we check for signal frames.  In each case, since we're starting a new
trace, we store a PERF_CONTEXT_KERNEL/USER marker so that userspace
knows that the next three entries are NIP, LR and the second stack frame
for the interrupted context.

We read user memory with __get_user_inatomic.  On 64-bit, if this
PMU interrupt occurred while interrupts are soft-disabled, and
there is no MMU hash table entry for the page, we will get an
-EFAULT return from __get_user_inatomic even if there is a valid
Linux PTE for the page, since hash_page isn't reentrant.  Thus we
have code here to read the Linux PTE and access the page via the
kernel linear mapping.  Since 64-bit doesn't use (or need) highmem
there is no need to do kmap_atomic.  On 32-bit, we don't do soft
interrupt disabling, so this complication doesn't occur and there
is no need to fall back to reading the Linux PTE, since hash_page
(or the TLB miss handler) will get called automatically if necessary.

Note that we cannot get PMU interrupts in the interval during
context switch between switch_mm (which switches the user address
space) and switch_to (which actually changes current to the new
process).  On 64-bit this is because interrupts are hard-disabled
in switch_mm and stay hard-disabled until they are soft-enabled
later, after switch_to has returned.  So there is no possibility
of trying to do a user stack trace when the user address space is
not current's address space.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-08-18 14:48:47 +10:00
Paul Mackerras
9c1e105238 powerpc: Allow perf_counters to access user memory at interrupt time
This provides a mechanism to allow the perf_counters code to access
user memory in a PMU interrupt routine.  Such an access can cause
various kinds of interrupt: SLB miss, MMU hash table miss, segment
table miss, or TLB miss, depending on the processor.  This commit
only deals with 64-bit classic/server processors, which use an MMU
hash table.  32-bit processors are already able to access user memory
at interrupt time.  Since we don't soft-disable on 32-bit, we avoid
the possibility of reentering hash_page or the TLB miss handlers,
since they run with interrupts disabled.

On 64-bit processors, an SLB miss interrupt on a user address will
update the slb_cache and slb_cache_ptr fields in the paca.  This is
OK except in the case where a PMU interrupt occurs in switch_slb,
which also accesses those fields.  To prevent this, we hard-disable
interrupts in switch_slb.  Interrupts are already soft-disabled at
this point, and will get hard-enabled when they get soft-enabled
later.

This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice,
and to make sure that it clears the slb_cache_ptr when called from
other callers than switch_slb, the existing routine is renamed to
__slb_flush_and_rebolt, which is called by switch_slb and the new
version of slb_flush_and_rebolt.

Similarly, switch_stab (used on POWER3 and RS64 processors) gets a
hard_irq_disable() to protect the per-cpu variables used there and
in ste_allocate.

If a MMU hashtable miss interrupt occurs, normally we would call
hash_page to look up the Linux PTE for the address and create a HPTE.
However, hash_page is fairly complex and takes some locks, so to
avoid the possibility of deadlock, we check the preemption count
to see if we are in a (pseudo-)NMI handler, and if so, we don't call
hash_page but instead treat it like a bad access that will get
reported up through the exception table mechanism.  An interrupt
whose handler runs even though the interrupt occurred when
soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI
handler, which should use nmi_enter()/nmi_exit() rather than
irq_enter()/irq_exit().

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-08-18 14:48:43 +10:00
Paul Mackerras
1660e9d3d0 powerpc/32: Always order writes to halves of 64-bit PTEs
On 32-bit systems with 64-bit PTEs, the PTEs have to be written in two
32-bit halves.  On SMP we write the higher-order half and then the
lower-order half, with a write barrier between the two halves, but on
UP there was no particular ordering of the writes to the two halves.

This extends the ordering that we already do on SMP to the UP case as
well.  The reason is that with the perf_counter subsystem potentially
accessing user memory at interrupt time to get stack traces, we have
to be careful not to create an incorrect but apparently valid PTE even
on UP.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-08-18 14:48:39 +10:00
Tejun Heo
c2a7e81801 powerpc64: convert to dynamic percpu allocator
Now that percpu allows arbitrary embedding of the first chunk,
powerpc64 can easily be converted to dynamic percpu allocator.
Convert it.  powerpc supports several large page sizes.  Cap atom_size
at 1M.  There isn't much to gain by going above that anyway.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-14 15:00:53 +09:00
Tejun Heo
384be2b18a Merge branch 'percpu-for-linus' into percpu-for-next
Conflicts:
	arch/sparc/kernel/smp_64.c
	arch/x86/kernel/cpu/perf_counter.c
	arch/x86/kernel/setup_percpu.c
	drivers/cpufreq/cpufreq_ondemand.c
	mm/percpu.c

Conflicts in core and arch percpu codes are mostly from commit
ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
num_possible_cpus() with nr_cpu_ids.  As for-next branch has moved all
the first chunk allocators into mm/percpu.c, the changes are moved
from arch code to mm/percpu.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-08-14 14:45:31 +09:00
David S. Miller
aa11d958d1 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	arch/microblaze/include/asm/socket.h
2009-08-12 17:44:53 -07:00
Rafael J. Wysocki
dcbf77cac6 Merge branch 'master' into for-linus 2009-08-10 23:40:50 +02:00
Linus Torvalds
d00aa6695b Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
  perf_counter: Zero dead bytes from ftrace raw samples size alignment
  perf_counter: Subtract the buffer size field from the event record size
  perf_counter: Require CAP_SYS_ADMIN for raw tracepoint data
  perf_counter: Correct PERF_SAMPLE_RAW output
  perf tools: callchain: Fix bad rounding of minimum rate
  perf_counter tools: Fix libbfd detection for systems with libz dependency
  perf: "Longum est iter per praecepta, breve et efficax per exempla"
  perf_counter: Fix a race on perf_counter_ctx
  perf_counter: Fix tracepoint sampling to be part of generic sampling
  perf_counter: Work around gcc warning by initializing tracepoint record unconditionally
  perf tools: callchain: Fix sum of percentages to be 100% by displaying amount of ignored chains in fractal mode
  perf tools: callchain: Fix 'perf report' display to be callchain by default
  perf tools: callchain: Fix spurious 'perf report' warnings: ignore empty callchains
  perf record: Fix the -A UI for empty or non-existent perf.data
  perf util: Fix do_read() to fail on EOF instead of busy-looping
  perf list: Fix the output to not include tracepoints without an id
  perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support
  perf stat: Fix tool option consistency: rename -S/--scale to -c/--scale
  perf report: Add debug help for the finding of symbol bugs - show the symtab origin (DSO, build-id, kernel, etc)
  perf report: Fix per task mult-counter stat reporting
  ...
2009-08-10 11:48:51 -07:00
Benjamin Herrenschmidt
b2f2e8fee3 powerpc/dma: pci_set_dma_mask() shouldn't fail if mask fits in RAM
On an iMac G5, the b43 driver is failing to initialise because trying to
set the dma mask to 30-bit fails. Even though there's only 512MiB of RAM
in the machine anyway:
	https://bugzilla.redhat.com/show_bug.cgi?id=514787

We should probably let it succeed if the available RAM in the system
doesn't exceed the requested limit.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-10 16:36:38 +10:00
Linus Torvalds
17d11ba149 Merge branch 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Avoid redelivery of edge interrupt before next edge
  KVM: MMU: limit rmap chain length
  KVM: ia64: fix build failures due to ia64/unsigned long mismatches
  KVM: Make KVM_HPAGES_PER_HPAGE unsigned long to avoid build error on powerpc
  KVM: fix ack not being delivered when msi present
  KVM: s390: fix wait_queue handling
  KVM: VMX: Fix locking imbalance on emulation failure
  KVM: VMX: Fix locking order in handle_invalid_guest_state
  KVM: MMU: handle n_free_mmu_pages > n_alloc_mmu_pages in kvm_mmu_change_mmu_pages
  KVM: SVM: force new asid on vcpu migration
  KVM: x86: verify MTRR/PAT validity
  KVM: PIT: fix kpit_elapsed division by zero
  KVM: Fix KVM_GET_MSR_INDEX_LIST
2009-08-09 14:58:21 -07:00
Paul Mackerras
f36a1a133a perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support
If we have the powerpc perf_counter backend compiled in, but
the cpu we are running on is one where we don't support the
PMU, we currently oops in hw_perf_group_sched_in if we try to
use any counters, because ppmu is NULL in that case, and we
unconditionally dereference ppmu.

This fixes the problem by adding a check if ppmu is NULL at the
beginning of hw_perf_group_sched_in, and also at the beginning
of the other functions that get called from the perf_counter
core, i.e. hw_perf_disable, hw_perf_enable, and
hw_perf_counter_setup.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: benh@kernel.crashing.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-09 12:54:37 +02:00
Benjamin Herrenschmidt
e0d82a0a4e perf_counter/powerpc: Check oprofile_cpu_type for NULL before using it
If the current CPU doesn't support performance counters,
cur_cpu_spec->oprofile_cpu_type can be NULL. The current
perf_counter modules don't test for that case and would thus
crash at boot time.

Bug reported by David Woodhouse.

Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul Mackerras <paulus@samba.org>
LKML-Reference: <19066.48028.446975.501454@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-06 13:55:09 +02:00
Rafael J. Wysocki
c00aafcd49 Merge branch 'master' into for-linus 2009-08-05 23:56:54 +02:00
Jan Engelhardt
0d6038ee76 net: implement a SO_DOMAIN getsockoption
This sockopt goes in line with SO_TYPE and SO_PROTOCOL. It makes it
possible for userspace programs to pass around file descriptors — I
am referring to arguments-to-functions, but it may even work for the
fd passing over UNIX sockets — without needing to also pass the
auxiliary information (PF_INET6/IPPROTO_TCP).

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05 13:02:57 -07:00
Jan Engelhardt
49c794e946 net: implement a SO_PROTOCOL getsockoption
Similar to SO_TYPE returning the socket type, SO_PROTOCOL allows to
retrieve the protocol used with a given socket.

I am not quite sure why we have that-many copies of socket.h, and why
the values are not the same on all arches either, but for where hex
numbers dominate, I use 0x1029 for SO_PROTOCOL as that seems to be
the next free unused number across a bunch of operating systems, or
so Google results make me want to believe. SO_PROTOCOL for others
just uses the next free Linux number, 38.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05 13:02:56 -07:00
Stephen Rothwell
c428dcc9b9 KVM: Make KVM_HPAGES_PER_HPAGE unsigned long to avoid build error on powerpc
Eliminates this compiler warning:

arch/powerpc/kvm/../../../virt/kvm/kvm_main.c:1178: error: integer overflow in expression

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-08-05 14:51:33 +03:00
David Woodhouse
6a12235c7d agp: kill phys_to_gart() and gart_to_phys()
There seems to be no reason for these -- they're a 1:1 mapping on all
platforms.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-08-03 09:05:00 +01:00
Kumar Gala
34466c5be4 powerpc: Update defconfigs for embedded 6xx/7xxx, 8xx, 8{3,5,6}xxx
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-07-29 23:34:01 -05:00
Martyn Welch
083e268c8b powerpc/86xx: Update GE Fanuc sbc310 default configuration
General update of defconfig including the following notable changes:
 - Enable Highmem support.
 - Support for PCMCIA based daughter card.

Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-07-29 23:28:08 -05:00
Martyn Welch
f27d4d47dc powerpc/86xx: Update defconfig for GE Fanuc's PPC9A
General update of defconfig including the following notable changes:
 - Enable GPIO access via sysfs on GE Fanuc's PPC9A.
 - Enable Highmem support.
 - Support for PCMCIA based daughter card.

Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-07-29 23:28:05 -05:00
Anton Vorontsov
1333c3d6d3 powerpc/83xx: Fix PCI IO base address on MPC837xE-RDB boards
U-Boot maps PCI IO at 0xe0300000, while current dts files specify
0xe2000000. This leads to the following oops with CONFIG_8139TOO_PIO=y.

8139too Fast Ethernet driver 0.9.28
Machine check in kernel mode.
Caused by (from SRR1=41000): Transfer error ack signal
Oops: Machine check, sig: 7 [#1]
MPC837x RDB
[...]
NIP [00000900] 0x900
LR [c0439df8] rtl8139_init_board+0x238/0x524
Call Trace:
[cf831d90] [c0439dcc] rtl8139_init_board+0x20c/0x524 (unreliable)
[cf831de0] [c043a15c] rtl8139_init_one+0x78/0x65c
[cf831e40] [c0235250] pci_call_probe+0x20/0x30
[...]

This patch fixes the issue by specifying the correct PCI IO base
address.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-07-29 23:18:41 -05:00
Anton Vorontsov
8a0b177f36 powerpc/85xx: Don't scan for TBI PHY addresses on MPC8569E-MDS boards
Sometimes (e.g. when there are no UEMs attached to a board)
fsl_pq_mdio_find_free() fails to find a spare address for a TBI PHY,
this is because get_phy_id() returns bogus 0x0000ffff values
(0xffffffff is expected), and therefore mdio bus probing fails with
the following message:

  fsl-pq_mdio: probe of e0082120.mdio failed with error -16

And obviously ethernet doesn't work after this.

This patch solves the problem by adding tbi-phy node into mdio node,
so that we won't scan for spare addresses, we'll just use a fixed one.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-07-29 23:16:39 -05:00
Anton Vorontsov
c4673f9a32 powerpc/85xx: Fix ethernet link detection on MPC8569E-MDS boards
Linux isn't able to detect link changes on ethernet ports that were
used by U-Boot. This is because U-Boot wrongly clears interrupt
polarity bit (INTPOL, 0x400) in the extended status register (EXT_SR,
0x1b) of Marvell PHYs.

There is no easy way for PHY drivers to know IRQ line polarity (we
could extract it from the device tree and pass it to phydevs, but
that'll be quite a lot of work), so for now just reset the PHYs to
their default states.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-07-29 23:14:18 -05:00
Kumar Gala
5156ddce6c powerpc/mm: Fix SMP issue with MMU context handling code
In switch_mmu_context() if we call steal_context_smp() to get a context
to use we shouldn't fall through and than call steal_context_up().  Doing
so can be problematic in that the 'mm' that steal_context_up() ends up
using will not get marked dirty in the stale_map[] for other CPUs that
might have used that mm.  Thus we could end up with stale TLB entries in
the other CPUs that can cause all kinda of havoc.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-07-29 23:05:43 -05:00
Rafael J. Wysocki
b4093d6235 Merge branch 'master' into for-linus 2009-07-29 20:28:08 +02:00
FUJITA Tomonori
8ab7ff42c9 powerpc: remove unused swiotlb_phys_to_bus() and swiotlb_bus_to_phys()
phys_to_dma() and dma_to_phys() are used instead of
swiotlb_phys_to_bus() and swiotlb_bus_to_phys().

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
2009-07-28 14:19:20 +09:00
FUJITA Tomonori
8d4f5339d1 x86, IA64, powerpc: add phys_to_dma() and dma_to_phys()
This adds two functions, phys_to_dma() and dma_to_phys() to x86, IA64
and powerpc. swiotlb uses them. phys_to_dma() converts a physical
address to a dma address. dma_to_phys() does the opposite.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
2009-07-28 14:19:20 +09:00
FUJITA Tomonori
30945fdf6a powerpc: remove unncesary swiotlb_arch_address_needs_mapping
swiotlb doesn't use swiotlb_arch_address_needs_mapping(); it uses
dma_capalbe(). We can remove unnecessary
swiotlb_arch_address_needs_mapping().

We can remove swiotlb_addr_needs_map() and is_buffer_dma_capable() in
swiotlb_pci_addr_needs_map() too; dma_capable() handles the features
that both provide.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
2009-07-28 14:19:19 +09:00
FUJITA Tomonori
9a937c91ee powerpc: add dma_capable() to replace is_buffer_dma_capable()
dma_capable() eventually replaces is_buffer_dma_capable(), which tells
if a memory area is dma-capable or not. The problem of
is_buffer_dma_capable() is that it doesn't take a pointer to struct
device so it doesn't work for POWERPC.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
2009-07-28 14:19:19 +09:00
FUJITA Tomonori
02ca646e73 swiotlb: remove unnecessary swiotlb_bus_to_virt
swiotlb_bus_to_virt is unncessary; we can use swiotlb_bus_to_phys and
phys_to_virt instead.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
2009-07-28 14:19:18 +09:00
Benjamin Herrenschmidt
9e1b32caa5 mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()
mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()

Upcoming paches to support the new 64-bit "BookE" powerpc architecture
will need to have the virtual address corresponding to PTE page when
freeing it, due to the way the HW table walker works.

Basically, the TLB can be loaded with "large" pages that cover the whole
virtual space (well, sort-of, half of it actually) represented by a PTE
page, and which contain an "indirect" bit indicating that this TLB entry
RPN points to an array of PTEs from which the TLB can then create direct
entries. Thus, in order to invalidate those when PTE pages are deleted,
we need the virtual address to pass to tlbilx or tlbivax instructions.

The old trick of sticking it somewhere in the PTE page struct page sucks
too much, the address is almost readily available in all call sites and
almost everybody implemets these as macros, so we may as well add the
argument everywhere. I added it to the pmd and pud variants for consistency.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV]
Acked-by: Nick Piggin <npiggin@suse.de>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-27 12:10:38 -07:00
Magnus Damm
d7aacaddca Driver Core: Add platform device arch data V3
Allow architecture specific data in struct platform_device V3.

With this patch struct pdev_archdata is added to struct
platform_device, similar to struct dev_archdata in found in
struct device. Useful for architecture code that needs to
keep extra data associated with each platform device.

Struct pdev_archdata is different from dev.platform_data, the
convention is that dev.platform_data points to driver-specific
data. It may or may not be required by the driver. The format
of this depends on driver but is the same across architectures.

The structure pdev_archdata is a place for architecture specific
data. This data is handled by architecture specific code (for
example runtime PM), and since it is architecture specific it
should _never_ be touched by device driver code. Exactly like
struct dev_archdata but for platform devices.

[rjw: This change is for power management mostly and that's why it
 goes through the suspend tree.]

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: Kevin Hilman <khilman@deeprootsystems.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-07-22 00:28:38 +02:00
Andreas Schwab
0115cb544b powerpc: Fix another bug in move of altivec code to vector.S
When moving load_up_altivec to vector.S a typo in a comment caused a
thinko setting the wrong variable.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-15 17:41:46 +10:00
Dave Kleikamp
28477fb1ed powerpc: Fix booke user_disable_single_step()
On booke processors, gdb is seeing spurious SIGTRAPs when setting a
watchpoint.

user_disable_single_step() simply quits when the DAC is non-zero.  It should
be clearing the DBCR0_IC and DBCR0_BT bits from the dbcr0 register and
TIF_SINGLESTEP from the thread flag.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-15 17:41:45 +10:00
Alexey Dobriyan
405f55712d headers: smp_lock.h redux
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
  It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

  This will make hardirq.h inclusion cheaper for every PREEMPT=n config
  (which includes allmodconfig/allyesconfig, BTW)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-12 12:22:34 -07:00
Linus Torvalds
85be928c41 Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
  perf report: Add "Fractal" mode output - support callchains with relative overhead rate
  perf_counter tools: callchains: Manage the cumul hits on the fly
  perf report: Change default callchain parameters
  perf report: Use a modifiable string for default callchain options
  perf report: Warn on callchain output request from non-callchain file
  x86: atomic64: Inline atomic64_read() again
  x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()
  x86: atomic64: Improve atomic64_xchg()
  x86: atomic64: Export APIs to modules
  x86: atomic64: Improve atomic64_read()
  x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP
  x86: atomic64: Fix unclean type use in atomic64_xchg()
  x86: atomic64: Make atomic_read() type-safe
  x86: atomic64: Reduce size of functions
  x86: atomic64: Improve atomic64_add_return()
  x86: atomic64: Improve cmpxchg8b()
  x86: atomic64: Improve atomic64_read()
  x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file
  x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too
  perf report: Annotate variable initialization
  ...
2009-07-10 14:25:03 -07:00
Peter Zijlstra
c99e6efe1b sched: INIT_PREEMPT_COUNT
Pull the initial preempt_count value into a single
definition site.

Maintainers for: alpha, ia64 and m68k, please have a look,
your arch code is funny.

The header magic is a bit odd, but similar to the KERNEL_DS
one, CPP waits with expanding these macros until the
INIT_THREAD_INFO macro itself is expanded, which is in
arch/*/kernel/init_task.c where we've already included
sched.h so we're good.

Cc: tony.luck@intel.com
Cc: rth@twiddle.net
Cc: geert@linux-m68k.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-10 14:24:05 -07:00
Tejun Heo
023bf6f1b8 linker script: unify usage of discard definition
Discarded sections in different archs share some commonality but have
considerable differences.  This led to linker script for each arch
implementing its own /DISCARD/ definition, which makes maintaining
tedious and adding new entries error-prone.

This patch makes all linker scripts to move discard definitions to the
end of the linker script and use the common DISCARDS macro.  As ld
uses the first matching section definition, archs can include default
discarded sections by including them earlier in the linker script.

ia64 is notable because it first throws away some ia64 specific
subsections and then include the rest of the sections into the final
image, so those sections must be discarded before the inclusion.

defconfig compile tested for x86, x86-64, powerpc, powerpc64, ia64,
alpha, sparc, sparc64 and s390.  Michal Simek tested microblaze.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Tested-by: Michal Simek <monstr@monstr.eu>
Cc: linux-arch@vger.kernel.org
Cc: Michal Simek <monstr@monstr.eu>
Cc: microblaze-uclinux@itee.uq.edu.au
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Tony Luck <tony.luck@intel.com>
2009-07-09 11:27:40 +09:00
Anton Vorontsov
ea96025a26 powerpc: Don't use alloc_bootmem() in init_IRQ() path
This patch fixes various badnesses like this for all interrupt
controllers:

------------[ cut here ]------------
Badness at c04db9dc [verbose debug info unavailable]
NIP: c04db9dc LR: c04db9ac CTR: 00000000
REGS: c053de30 TRAP: 0700   Not tainted  (2.6.31-rc1-00432-ge69b2b5-dirty)
MSR: 00021000 <ME,CE>  CR: 22020084  XER: 00000000
TASK = c0500480[0] 'swapper' THREAD: c053c000
GPR00: 00000001 c053dee0 c0500480 00000000 00000050 00000020 3fffffff 00000000
GPR08: 00000001 c0540000 e0080080 00000000 22000084 64183600 3ff8f800 00000000
GPR16: 841b0240 449a0303 00000000 00000000 00000000 00000000 00000000 c04f5bf4
GPR24: 00000000 00000000 00000000 00000050 00000020 00000000 3fffffff 00000050
NIP [c04db9dc] alloc_arch_preferred_bootmem+0x48/0x74
LR [c04db9ac] alloc_arch_preferred_bootmem+0x18/0x74
Call Trace:
[c053dee0] [c000a5a4] __of_address_to_resource+0x44/0xd0 (unreliable)
[c053def0] [c04dba58] ___alloc_bootmem_nopanic+0x50/0x108
[c053df20] [c04dbb28] ___alloc_bootmem+0x18/0x50
[c053df30] [c04d5de0] qe_ic_init+0x5c/0x1b0
[c053df70] [c04d77b0] mpc85xx_mds_pic_init+0xb8/0x10c
[c053dfb0] [c04cf374] init_IRQ+0x28/0x3c

p.s. commit 85355bb272db31a3f2dd99d547eef794805e1319 ("powerpc: Fix
mpic alloc warning") missed some alloc_bootmem() instances, this is
now fixed.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:25 +10:00
Grant Likely
ad9064d5e2 powerpc: Fix spin_event_timeout() to be robust over context switches
Current implementation of spin_event_timeout can be interrupted by an
IRQ or context switch after testing the condition, but before checking
the timeout.  This can cause the loop to report a timeout when the
condition actually became true in the middle.

This patch adds one final check of the condition upon exit of the loop
if the last test of the condition was still false.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:24 +10:00
Michael Ellerman
30c5af435b powerpc: Use pr_devel() in do_dcache_icache_coherency()
pr_debug() can now result in code being generated even when DEBUG
is not defined. That's not really desirable in some places.

With CONFIG_DYNAMIC_DEBUG=y:

size before:
   text    data     bss     dec     hex filename
   2036     368       8    2412     96c arch/powerpc/mm/pgtable.o

size after:
   text    data     bss     dec     hex filename
   1677     248       8    1933     78d arch/powerpc/mm/pgtable.o

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:24 +10:00
Michael Ellerman
33875f0330 powerpc/cell: Use pr_devel() in axon_msi.c
pr_debug() can now result in code being generated even when DEBUG
is not defined. That's not really desirable in some places.

With CONFIG_DYNAMIC_DEBUG=y:

size before:
   text    data     bss     dec     hex filename
   7083    1616       0    8699    21fb arch/powerpc/../axon_msi.o

size after:
   text    data     bss     dec     hex filename
   5772    1208       0    6980    1b44 arch/powerpc/../axon_msi.o

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:23 +10:00
Michael Ellerman
29e5fa59e5 powerpc: Use pr_devel() in arch/powerpc/mm/gup.c
pr_debug() can now result in code being generated even when DEBUG
is not defined. That's not really desirable in some places.

With CONFIG_DYNAMIC_DEBUG=y:

size before:
   text    data     bss     dec     hex filename
   3252     384       0    3636     e34 arch/powerpc/mm/gup.o

size after:
   text    data     bss     dec     hex filename
   2576      96       0    2672     a70 arch/powerpc/mm/gup.o

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:23 +10:00
Michael Ellerman
651e2dd2a1 powerpc: Cleanup & use pr_devel() in arch/powerpc/mm/slb.c
pr_debug() can now result in code being generated even when DEBUG
is not defined. That's not really desirable in some places.

With CONFIG_DYNAMIC_DEBUG=y:

size before:
   text    data     bss     dec     hex filename
   3261     416       4    3681     e61 arch/powerpc/mm/slb.o

size after:
   text    data     bss     dec     hex filename
   2861     248       4    3113     c29 arch/powerpc/mm/slb.o

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:22 +10:00