441 Commits

Author SHA1 Message Date
bob picco
3dee9df548 sparc64: find_node adjustment
We have seen an issue with guest boot into LDOM that causes early boot failures
because of no matching rules for node identitity of the memory. I analyzed this
on my T4 and concluded there might not be a solution. I saw the issue in
mainline too when booting into the control/primary domain - with guests
configured.  Note, this could be a firmware bug on some older machines.

I'll provide a full explanation of the issues below. Should we not find a
matching BEST latency group for a real address (RA) then we will assume node 0.
On the T4-2 here with the information provided I can't see an alternative.

Technically the LDOM shown below should match the MBLOCK to the
favorable latency group. However other factors must be considered too. Were
the memory controllers configured "fine" grained interleave or "coarse"
grain interleaved -  T4. Also should a "group" MD node be considered a NUMA
node?

There has to be at least one Machine Description (MD) "group" and hence one
NUMA node. The group can have one or more latency groups (lg) - more than one
memory controller. The current code chooses the smallest latency as the most
favorable per group. The latency and lg information is in MLGROUP below.
MBLOCK is the base and size of the RAs for the machine as fetched from OBP
/memory "available" property. My machine has one MBLOCK but more would be
possible - with holes?

For a T4-2 the following information has been gathered:
with LDOM guest
MEMBLOCK configuration:
 memory size = 0x27f870000
 memory.cnt  = 0x3
 memory[0x0]    [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes
 memory[0x1]    [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes
 memory[0x2]    [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes
 reserved.cnt  = 0x2
 reserved[0x0]  [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes
 reserved[0x1]  [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes
MBLOCK[0]: base[20000000] size[280000000] offset[0]
(note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values)
(note: (RA + offset) & mask = val is the formula to detect a match for the
memory controller. should there be no match for find_node node, a return
value of -1 resulted for the node - BAD)

There is one group. It has these forward links
MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000]
MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000]
NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8])
(note: "val" is the best lg's (smallest latency) "match")

no LDOM guest - bare metal
MEMBLOCK configuration:
 memory size = 0xfdf2d0000
 memory.cnt  = 0x3
 memory[0x0]    [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes
 memory[0x1]    [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes
 memory[0x2]    [0x00000fff766000-0x00000fff771fff], 0xc000 bytes
 reserved.cnt  = 0x2
 reserved[0x0]  [0x00000020800000-0x00000021a04580], 0x1204581 bytes
 reserved[0x1]  [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes
MBLOCK[0]: base[20000000] size[fe0000000] offset[0]

there are two groups
group node[16d5]
MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000]
MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000]
NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8])
group node[171d]
MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000]
MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000]
NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8])
(note: for this two "group" bare metal machine, 1/2 memory is in group one's
lg and 1/2 memory is in group two's lg).

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 17:55:09 -07:00
bob picco
4ccb927289 sparc64: sun4v TLB error power off events
We've witnessed a few TLB events causing the machine to power off because
of prom_halt. In one case it was some nfs related area during rmmod. Another
was an mmapper of /dev/mem. A more recent one is an ITLB issue with
a bad pagesize which could be a hardware bug. Bugs happen but we should
attempt to not power off the machine and/or hang it when possible.

This is a DTLB error from an mmapper of /dev/mem:
[root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1
SUN4V-DTLB: TPC<0xfffff80100903e6c>
SUN4V-DTLB: O7[fffff801081979d0]
SUN4V-DTLB: O7<0xfffff801081979d0>
SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2]
.

This is recent mainline for ITLB:
[ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc>
[ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8]
[ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8>
[ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4]
.

Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call
prom_halt() and drop us to OF command prompt "ok". This isn't the case for
LDOMs and the machine powers off.

For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause
a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask
of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal
one("1"). Otherwise, for %tl > 1,  we proceed eventually to die_if_kernel().

The logic of this patch was partially inspired by David Miller's feedback.

Power off of large sparc64 machines is painful. Plus die_if_kernel provides
more context. A reset sequence isn't a brief period on large sparc64 but
better than power-off/power-on sequence.

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 17:46:44 -07:00
Christoph Lameter
494fc42170 sparc: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x).  This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area.  __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset.  Thereby address calculations are avoided and less registers
are used when code is generated.

At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.

The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e.  using a global
register that may be set to the per cpu base.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

	DEFINE_PER_CPU(int, y);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	__this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	__this_cpu_inc(y)

Cc: sparclinux@vger.kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26 13:45:55 -04:00
David S. Miller
5b6ff9df05 sparc64: Fix up merge thinko.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05 19:09:19 -07:00
David S. Miller
e9011d0866 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Conflicts:
	arch/sparc/mm/init_64.c

Conflict was simple non-overlapping additions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05 18:57:18 -07:00
David S. Miller
4ca9a23765 sparc64: Guard against flushing openfirmware mappings.
Based almost entirely upon a patch by Christopher Alexander Tobias
Schulze.

In commit db64fe02258f1507e13fe5212a989922323685ce ("mm: rewrite vmap
layer") lazy VMAP tlb flushing was added to the vmalloc layer.  This
causes problems on sparc64.

Sparc64 has two VMAP mapped regions and they are not contiguous with
eachother.  First we have the malloc mapping area, then another
unrelated region, then the vmalloc region.

This "another unrelated region" is where the firmware is mapped.

If the lazy TLB flushing logic in the vmalloc code triggers after
we've had both a module unload and a vfree or similar, it will pass an
address range that goes from somewhere inside the malloc region to
somewhere inside the vmalloc region, and thus covering the
openfirmware area entirely.

The sparc64 kernel learns about openfirmware's dynamic mappings in
this region early in the boot, and then services TLB misses in this
area.  But openfirmware has some locked TLB entries which are not
mentioned in those dynamic mappings and we should thus not disturb
them.

These huge lazy TLB flush ranges causes those openfirmware locked TLB
entries to be removed, resulting in all kinds of problems including
hard hangs and crashes during reboot/reset.

Besides causing problems like this, such huge TLB flush ranges are
also incredibly inefficient.  A plea has been made with the author of
the VMAP lazy TLB flushing code, but for now we'll put a safety guard
into our flush_tlb_kernel_range() implementation.

Since the implementation has become non-trivial, stop defining it as a
macro and instead make it a function in a C source file.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-04 20:16:00 -07:00
David S. Miller
18f3813252 sparc64: Do not insert non-valid PTEs into the TSB hash table.
The assumption was that update_mmu_cache() (and the equivalent for PMDs) would
only be called when the PTE being installed will be accessible by the user.

This is not true for code paths originating from remove_migration_pte().

There are dire consequences for placing a non-valid PTE into the TSB.  The TLB
miss frramework assumes thatwhen a TSB entry matches we can just load it into
the TLB and return from the TLB miss trap.

So if a non-valid PTE is in there, we will deadlock taking the TLB miss over
and over, never satisfying the miss.

Just exit early from update_mmu_cache() and friends in this situation.

Based upon a report and patch from Christopher Alexander Tobias Schulze.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-04 16:34:01 -07:00
bob picco
f6d4fb5cc0 sparc64 - add mem to iomem resource
This patch adds sparc RAM to /proc/iomem. It also identifies the
code, data and bss regions of the kernel.

Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21 21:37:05 -07:00
Linus Torvalds
c4222e4635 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next
Pull sparc fixes from David Miller:
 "Sparc sparse fixes from Sam Ravnborg"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (67 commits)
  sparc64: fix sparse warnings in int_64.c
  sparc64: fix sparse warning in ftrace.c
  sparc64: fix sparse warning in kprobes.c
  sparc64: fix sparse warning in kgdb_64.c
  sparc64: fix sparse warnings in compat_audit.c
  sparc64: fix sparse warnings in init_64.c
  sparc64: fix sparse warnings in aes_glue.c
  sparc: fix sparse warnings in smp_32.c + smp_64.c
  sparc64: fix sparse warnings in perf_event.c
  sparc64: fix sparse warnings in kprobes.c
  sparc64: fix sparse warning in tsb.c
  sparc64: clean up compat_sigset_t.seta handling
  sparc64: fix sparse "Should it be static?" warnings in signal32.c
  sparc64: fix sparse warnings in sys_sparc32.c
  sparc64: fix sparse warning in pci.c
  sparc64: fix sparse warnings in smp_64.c
  sparc64: fix sparse warning in prom_64.c
  sparc64: fix sparse warning in btext.c
  sparc64: fix sparse warnings in sys_sparc_64.c + unaligned_64.c
  sparc64: fix sparse warning in process_64.c
  ...

Conflicts:
	arch/sparc/include/asm/pgtable_64.h
2014-06-19 07:50:07 -10:00
Naoya Horiguchi
c177c81e09 hugetlb: restrict hugepage_migration_support() to x86_64
Currently hugepage migration is available for all archs which support
pmd-level hugepage, but testing is done only for x86_64 and there're
bugs for other archs.  So to avoid breaking such archs, this patch
limits the availability strictly to x86_64 until developers of other
archs get interested in enabling this feature.

Simply disabling hugepage migration on non-x86_64 archs is not enough to
fix the reported problem where sys_move_pages() hits the BUG_ON() in
follow_page(FOLL_GET), so let's fix this by checking if hugepage
migration is supported in vma_migratable().

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org>	[3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:51 -07:00
Sam Ravnborg
48d372164d sparc64: fix sparse warnings in int_64.c
Fix following warnings:
init_64.c:798:5: warning: symbol 'numa_cpu_lookup_table' was not declared. Should it be static?
init_64.c:799:11: warning: symbol 'numa_cpumask_lookup_table' was not declared. Should it be static?

The warnings were present with an allnoconfig
Fix so the variables are only declared if CONFIG_NEED_MULTIPLE_NODES is defined.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 19:01:35 -07:00
Sam Ravnborg
59dec13b27 sparc64: fix sparse warnings in init_64.c
Fix following warnings:
init_64.c:191:10: warning: symbol 'dcpage_flushes' was not declared. Should it be static?
init_64.c:193:10: warning: symbol 'dcpage_flushes_xcall' was not declared. Should it be static?

Add extern declaration to asm/setup.h and drop local declaration in smp_64.h

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 19:01:34 -07:00
Sam Ravnborg
8c7260c0d9 sparc64: fix sparse warning in tsb.c
Fix following warning:
tsb.c:290:5: warning: symbol 'sysctl_tsb_ratio' was not declared. Should it be static?

Add extern declaration in asm/setup.h and remove local declaration
in kernel/sysctl.c

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 19:01:32 -07:00
Sam Ravnborg
8df52620e6 sparc64: fix sparse warnings in sys_sparc_64.c + unaligned_64.c
Fix following warnings:
kernel/sys_sparc_64.c:643:17: warning: symbol 'sys_kern_features' was not declared. Should it be static?
kernel/unaligned_64.c:297:17: warning: symbol 'kernel_unaligned_trap' was not declared. Should it be static?
kernel/unaligned_64.c:387:5: warning: symbol 'handle_popc' was not declared. Should it be static?
kernel/unaligned_64.c:428:5: warning: symbol 'handle_ldf_stq' was not declared. Should it be static?
kernel/unaligned_64.c:553:6: warning: symbol 'handle_ld_nf' was not declared. Should it be static?
kernel/unaligned_64.c:579:6: warning: symbol 'handle_lddfmna' was not declared. Should it be static?
kernel/unaligned_64.c:643:6: warning: symbol 'handle_stdfmna' was not declared. Should it be static?

Functions that are only used in kernel/ - add prototypes in kernel.h
Functions used outside kernel/ - add prototype in asm/setup.h
Removed local prototypes

One of the local prototypes had wrong signature (return void - not int).

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 19:01:30 -07:00
Sam Ravnborg
2e74a74f27 sparc: drop use of extern for prototypes in arch/sparc/*
Drop the remaining uses of extern for prototypes in .h files
in the sparc specific part of the kernel tree.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 19:01:29 -07:00
Sam Ravnborg
1918660b90 sparc32: fix sparse warning in io-unit.c
Fix following warning:
io-unit.c:56:13: warning: incorrect type in assignment (different address spaces)

The page table for the io unit resides in __iomem.
Fix up all users of the io unit page table.
Introduce sbus helers for all read/write operations.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 19:01:26 -07:00
Sam Ravnborg
f977ea49ae sparc32: fix sparse warning in iommu.c
Fix following warning:
iommu.c:69:21: warning: incorrect type in assignment (different address spaces)

iommu_struct.regs is __iomem - fix up all users.
Introduce sbus operations for all read/write operations.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 19:01:26 -07:00
David S. Miller
b18eb2d779 sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus.
Access to the TSB hash tables during TLB misses requires that there be
an atomic 128-bit quad load available so that we fetch a matching TAG
and DATA field at the same time.

On cpus prior to UltraSPARC-III only virtual address based quad loads
are available.  UltraSPARC-III and later provide physical address
based variants which are easier to use.

When we only have virtual address based quad loads available this
means that we have to lock the TSB into the TLB at a fixed virtual
address on each cpu when it runs that process.  We can't just access
the PAGE_OFFSET based aliased mapping of these TSBs because we cannot
take a recursive TLB miss inside of the TLB miss handler without
risking running out of hardware trap levels (some trap combinations
can be deep, such as those generated by register window spill and fill
traps).

Without huge pages it's working perfectly fine, but when the huge TSB
got added another chunk of fixed virtual address space was not
allocated for this second TSB mapping.

So we were mapping both the 8K and 4MB TSBs to the same exact virtual
address, causing multiple TLB matches which gives undefined behavior.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-08 14:59:07 -07:00
David S. Miller
e5c460f46a sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault addresses.
This was found using Dave Jone's trinity tool.

When a user process which is 32-bit performs a load or a store, the
cpu chops off the top 32-bits of the effective address before
translating it.

This is because we run 32-bit tasks with the PSTATE_AM (address
masking) bit set.

We can't run the kernel with that bit set, so when the kernel accesses
userspace no address masking occurs.

Since a 32-bit process will have no mappings in that region we will
properly fault, so we don't try to handle this using access_ok(),
which can safely just be a NOP on sparc64.

Real faults from 32-bit processes should never generate such addresses
so a bug check was added long ago, and it barks in the logs if this
happens.

But it also barks when a kernel user access causes this condition, and
that _can_ happen.  For example, if a pointer passed into a system call
is "0xfffffffc" and the kernel access 4 bytes offset from that pointer.

Just handle such faults normally via the exception entries.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-06 21:27:37 -07:00
David S. Miller
0eef331a3d sparc64: Use 'ILOG2_4MB' instead of constant '22'.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03 22:52:50 -07:00
David S. Miller
70ffc6ebae sparc64: Fix top-level fault handling bugs.
Make get_user_insn() able to cope with huge PMDs.

Next, make do_fault_siginfo() more robust when get_user_insn() can't
actually fetch the instruction.  In particular, use the MMU announced
fault address when that happens, instead of calling
compute_effective_address() and computing garbage.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03 22:41:19 -07:00
David S. Miller
04df419de3 sparc64: Fix bugs in get_user_pages_fast() wrt. THP.
The large PMD path needs to check _PAGE_VALID not _PAGE_PRESENT, to
decide if it needs to bail and return 0.

pmd_large() should therefore just check _PAGE_PMD_HUGE.

Calls to gup_huge_pmd() are guarded with a check of pmd_large(), so we
just need to add a valid bit check.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03 22:32:37 -07:00
David S. Miller
51e5ef1bb7 sparc64: Fix huge PMD invalidation.
On sparc64 "present" and "valid" are seperate PTE bits, this allows us to
naturally distinguish between the user explicitly asking for PROT_NONE
with mprotect() and other situations.

However we weren't handling this properly in the huge PMD paths.

First of all, the page table walker in the TSB miss path only checks
for _PAGE_PMD_HUGE.  So the generic pmdp_invalidate() would clear
_PAGE_PRESENT but the TLB miss paths would still load it into the TLB
as a valid huge PMD.

Fix this by clearing the valid bit in pmdp_invalidate(), and also
checking the valid bit in USER_PGTABLE_CHECK_PMD_HUGE using "brgez"
since _PAGE_VALID is bit 63 in both the sun4u and sun4v pte layouts.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03 22:31:52 -07:00
David S. Miller
5b1e94fa43 sparc64: Fix executable bit testing in set_pmd_at() paths.
This code was mistakenly using the exec bit from the PMD in all
cases, even when the PMD isn't a huge PMD.

If it's not a huge PMD, test the exec bit in the individual ptes down
in tlb_batch_pmd_scan().

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03 22:30:36 -07:00
Sam Ravnborg
9edfae3f69 sparc32: fix sparse warnings in unaligned_32.c
Fix following warnings:
unaligned_32.c:146:15: warning: symbol 'safe_compute_effective_address' was not declared. Should it be static?
unaligned_32.c:235:17: warning: symbol 'kernel_unaligned_trap' was not declared. Should it be static?
unaligned_32.c:319:17: warning: symbol 'user_unaligned_trap' was not declared. Should it be static?

Add proper declarations in kernel.h + setup.h

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:26 -04:00
Sam Ravnborg
8885ec7ca9 sparc32: fix sparse warning in devices.c
Fix following warning:
devices.c:114:13: warning: symbol 'device_scan' was not declared. Should it be static?

Add prototype to asm/setup.h

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:26 -04:00
Sam Ravnborg
d191723fee sparc32: fix sparse warnings in setup_32.c
Fix following warnings:
setup_32.c:106:15: warning: symbol 'cmdline_memory_size' was not declared. Should it be static?
setup_32.c:270:16: warning: symbol 'fake_swapper_regs' was not declared. Should it be static?
setup_32.c:368:55: warning: Using plain integer as NULL pointer

Add missing declaration of cmdline_memory_size and remove the local one in init_32.c
fake_swapper_regs was only used locally - so defined static.
When replacing 0 with NULL also add a few spaces around operators

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:25 -04:00
Sam Ravnborg
a2b0aa9463 sparc32: fix sparse "Should it be static?" in mm/
Fix following warnings:
srmmu.c:870:13: warning: symbol 'srmmu_paging_init' was not declared. Should it be static?
iommu.c:430:13: warning: symbol 'ld_mmu_iommu' was not declared. Should it be static?
leon_mm.c:21:5: warning: symbol 'srmmu_swprobe_trace' was not declared. Should it be static?

Add proper prototypes or define static to fix them.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:25 -04:00
Sam Ravnborg
e8c29c839b sparc32: fix sparse warnings in srmmu.c
Fix following warnings:
srmmu.c:78:5: warning: symbol 'flush_page_for_dma_global' was not declared. Should it be static?
srmmu.c:85:5: warning: symbol 'viking_mxcc_present' was not declared. Should it be static?
srmmu.c:103:6: warning: symbol 'srmmu_nocache_bitmap' was not declared. Should it be static?
srmmu.c:176:24: warning: Using plain integer as NULL pointer
srmmu.c:731:46: warning: Using plain integer as NULL pointer
srmmu.c:731:46: warning: Using plain integer as NULL pointer
srmmu.c:731:46: warning: Using plain integer as NULL pointer
srmmu.c:870:13: warning: symbol 'srmmu_paging_init' was not declared. Should it be static?

Add proper prototypes in mm_32.h and drop local prototype in init_32.c
Replace 0 with NULL

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:25 -04:00
Sam Ravnborg
4c9660f796 sparc32: fix sparse warning in init_32.c
Fix following warning:
init_32.c:112:22: warning: symbol 'bootmem_init' was not declared. Should it be static?

Fix by adding a proper prototype in pgtable_32.h and drop
the local prototype in srmmu.c

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:24 -04:00
Sam Ravnborg
e1b2f13488 sparc32: fix sparse warning in fault_32.c
Fix following warning:
fault_32.c:38:24: error: symbol 'unhandled_fault' redeclared with different type - different modifiers

When this warning was fixed several new warnings popped up - fix them too.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:24 -04:00
Sam Ravnborg
ddb7417ea9 sparc32: rename mm/srmmu.h to mm/mm_32.h
This file will be used for more than just srmmu stuff, so the old name was misleading.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:24 -04:00
Doug Wilson
151b628f10 sparc64:tsb.c:use array size macro rather than number
This is a small patch which uses ARRAY_SIZE macro
    rather than a number to make code readability better.

Signed-off-by: Doug Wilson <doug.lkml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17 15:55:49 -04:00
Paul Gortmaker
a56b072fa3 sparc32: make copy_to/from_user_page() usable from modular code
While copy_to/from_user_page() users are uncommon, there is one in
drivers/staging/lustre/lustre/libcfs/linux/linux-curproc.c which leads
to the following:

ERROR: "sparc32_cachetlb_ops" [drivers/staging/lustre/lustre/libcfs/libcfs.ko] undefined!

during routine allmodconfig build coverage.  The reason this happens
is as follows:

In arch/sparc/include/asm/cacheflush_32.h we have:

 #define flush_cache_page(vma,addr,pfn) \
        sparc32_cachetlb_ops->cache_page(vma, addr)

 #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
        do {                                                    \
                flush_cache_page(vma, vaddr, page_to_pfn(page));\
                memcpy(dst, src, len);                          \
        } while (0)
 #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
        do {                                                    \
                flush_cache_page(vma, vaddr, page_to_pfn(page));\
                memcpy(dst, src, len);                          \
        } while (0)

However, sparc32_cachetlb_ops isn't exported and hence the error.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-19 19:49:48 -05:00
Paul Gortmaker
8b2abcbc5e sparc: delete non-required instances of include <linux/init.h>
None of these files are actually using any __init type directives
and hence don't need to include <linux/init.h>.  Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-28 23:38:23 -08:00
Tang Chen
e7e8de5918 memblock: make memblock_set_node() support different memblock_type
[sfr@canb.auug.org.au: fix powerpc build]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
Cc: Chen Tang <imtangchen@gmail.com>
Cc: Gong Chen <gong.chen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Liu Jiang <jiang.liu@huawei.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:44 -08:00
Stephen Rothwell
6a328f3fe0 sparc64: merge fix
After merging the final tree, today's linux-next build (sparc64 defconfig)
failed like this:

arch/sparc/mm/init_64.c: In function 'pte_alloc_one':
arch/sparc/mm/init_64.c:2568:9: error: unused variable 'pte' [-Werror=unused-variable]

Caused by the merge between commit 37b3a8ff3e08 ("sparc64: Move from 4MB
to 8MB huge pages") and commit 1ae9ae5f7df7 ("sparc: handle
pgtable_page_ctor() fail") (I had the following merge fix in linux-next,
but it didn't seem to propagate upstream - may have forgotten to point it
out :-().

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-18 15:08:52 -08:00
Linus Torvalds
1b2722752f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next
Pull sparc update from David Miller:

 1) Implement support for up to 47-bit physical addresses on sparc64.

 2) Support HAVE_CONTEXT_TRACKING on sparc64, from Kirill Tkhai.

 3) Fix Simba bridge window calculations, from Kjetil Oftedal.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next:
  sparc64: Implement HAVE_CONTEXT_TRACKING
  sparc64: Add self-IPI support for smp_send_reschedule()
  sparc: PCI: Fix incorrect address calculation of PCI Bridge windows on Simba-bridges
  sparc64: Encode huge PMDs using PTE encoding.
  sparc64: Move to 64-bit PGDs and PMDs.
  sparc64: Move from 4MB to 8MB huge pages.
  sparc64: Make PAGE_OFFSET variable.
  sparc64: Fix inconsistent max-physical-address defines.
  sparc64: Document the shift counts used to validate linear kernel addresses.
  sparc64: Define PAGE_OFFSET in terms of physical address bits.
  sparc64: Use PAGE_OFFSET instead of a magic constant.
  sparc64: Clean up 64-bit mmap exclusion defines.
2013-11-15 14:16:30 +09:00
Kirill A. Shutemov
1ae9ae5f7d sparc: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:19 +09:00
Kirill A. Shutemov
c389a250ab mm, thp: do not access mm->pmd_huge_pte directly
Currently mm->pmd_huge_pte protected by page table lock.  It will not
work with split lock.  We have to have per-pmd pmd_huge_pte for proper
access serialization.

For now, let's just introduce wrapper to access mm->pmd_huge_pte.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:14 +09:00
Kirill Tkhai
812cb83a56 sparc64: Implement HAVE_CONTEXT_TRACKING
Mark the places when the system are in user or are in kernel.
This is used to make full dynticks system (tickless) --
CONFIG_NO_HZ_FULL dependence.

Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 14:57:21 -08:00
David S. Miller
a7b9403f0e sparc64: Encode huge PMDs using PTE encoding.
Now that we have 64-bits for PMDs we can stop using special encodings
for the huge PMD values, and just put real PTEs in there.

We allocate a _PAGE_PMD_HUGE bit to distinguish between plain PMDs and
huge ones.  It is the same for both 4U and 4V PTE layouts.

We also use _PAGE_SPECIAL to indicate the splitting state, since a
huge PMD cannot also be special.

All of the PMD --> PTE translation code disappears, and most of the
huge PMD bit modifications and tests just degenerate into the PTE
operations.  In particular USER_PGTABLE_CHECK_PMD_HUGE becomes
trivial.

As a side effect, normal PMDs don't shift the physical address around.
This also speeds up the page table walks in the TLB miss paths since
they don't have to do the shifts any more.

Another non-trivial aspect is that pte_modify() has to be changed
to preserve the _PAGE_PMD_HUGE bits as well as the page size field
of the pte.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-13 12:33:08 -08:00
David S. Miller
2b77933c28 sparc64: Move to 64-bit PGDs and PMDs.
To make the page tables compact, we were using 32-bit PGDs and PMDs.
We only had to support <= 43 bits of physical addresses so this was
quite feasible.

In order to support larger physical addresses we have to move to
64-bit PGDs and PMDs.

Most of the changes are straight-forward:

1) {pgd,pmd}_t --> unsigned long

2) Anything that tries to use plain "unsigned int" types with pgd/pmd
   values needs to be adjusted.  In particular things like "0U" become
   "0UL".

3) {PGDIR,PMD}_BITS decrease by one.

4) In the assembler page table walkers, use "ldxa" instead of "lduwa"
   and adjust the low bit masks to clear out the low 3 bits instead of
   just the low 2 bits during pgd/pmd address formation.

Also, use PTRS_PER_PGD and PTRS_PER_PMD in the sizing of the
swapper_{pg_dir,low_pmd_dir} arrays.

This patch does not try to take advantage of having 64-bits in the
PMDs to simplify the hugepage code, that will come in a subsequent
change.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-12 15:22:35 -08:00
David S. Miller
37b3a8ff3e sparc64: Move from 4MB to 8MB huge pages.
The impetus for this is that we would like to move to 64-bit PMDs and
PGDs, but that would result in only supporting a 42-bit address space
with the current page table layout.  It'd be nice to support at least
43-bits.

The reason we'd end up with only 42-bits after making PMDs and PGDs
64-bit is that we only use half-page sized PTE tables in order to make
PMDs line up to 4MB, the hardware huge page size we use.

So what we do here is we make huge pages 8MB, and fabricate them using
4MB hw TLB entries.

Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in
places that really need to operate on hardware 4MB pages.

Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT,
PGD_SHIFT, and the build time CPP test as needed.  Use a CPP test to
make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up.

This makes the pgtable cache completely unused, so remove the code
managing it and the state used in mm_context_t.  Now we have less
spinlocks taken in the page table allocation path.

The technique we use to fabricate the 8MB pages is to transfer bit 22
from the missing virtual address into the PTEs physical address field.
That takes care of the transparent huge pages case.

For hugetlb, we fill things in at the PTE level and that code already
puts the sub huge page physical bits into the PTEs, based upon the
offset, so there is nothing special we need to do.  It all just works
out.

So, a small amount of complexity in the THP case, but this code is
about to get much simpler when we move the 64-bit PMDs as we can move
away from the fancy 32-bit huge PMD encoding and just put a real PTE
value in there.

With bug fixes and help from Bob Picco.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-12 15:22:34 -08:00
David S. Miller
b2d4383480 sparc64: Make PAGE_OFFSET variable.
Choose PAGE_OFFSET dynamically based upon cpu type.

Original UltraSPARC-I (spitfire) chips only supported a 44-bit
virtual address space.

Newer chips (T4 and later) support 52-bit virtual addresses
and up to 47-bits of physical memory space.

Therefore we have to adjust PAGE_SIZE dynamically based upon
the capabilities of the chip.

Note that this change alone does not allow us to support > 43-bit
physical memory, to do that we need to re-arrange our page table
support.  The current encodings of the pmd_t and pgd_t pointers
restricts us to "32 + 11" == 43 bits.

This change can waste quite a bit of memory for the various tables.
In particular, a future change should work to size and allocate
kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically.
This isn't easy as we really cannot take a TLB miss when accessing
kern_linear_bitmap[].  We'd have to lock it into the TLB or similar.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2013-11-12 15:22:34 -08:00
David S. Miller
bb7b435388 sparc64: Document the shift counts used to validate linear kernel addresses.
This way we can see exactly what they are derived from, and in particular
how they would change if we were to use a different PAGE_OFFSET value.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2013-11-12 15:22:34 -08:00
David S. Miller
922631b988 sparc64: Use PAGE_OFFSET instead of a magic constant.
This pertains to all of the computations of the kernel fast
TLB miss xor values.

Based upon a patch by Bob Picco.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2013-11-12 15:22:33 -08:00
David S. Miller
c920745e69 sparc64: Clean up 64-bit mmap exclusion defines.
Older UltraSPARC chips had an address space hole due to the MMU only
supporting 44-bit virtual addresses.

The top end of this hole also has the same value as the current
definition of PAGE_OFFSET, so this can be confusing.

Consolidate the defines for the userspace mmap exclusion range into
page_64.h and use them in sys_sparc_64.c and hugetlbpage.c

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2013-11-12 15:22:33 -08:00
Johannes Weiner
759496ba64 arch: mm: pass userspace fault flag to generic fault handler
Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.

Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:01 -07:00
Naoya Horiguchi
83467efbdb mm: migrate: check movability of hugepage in unmap_and_move_huge_page()
Currently hugepage migration works well only for pmd-based hugepages
(mainly due to lack of testing,) so we had better not enable migration of
other levels of hugepages until we are ready for it.

Some users of hugepage migration (mbind, move_pages, and migrate_pages) do
page table walk and check pud/pmd_huge() there, so they are safe.  But the
other users (softoffline and memory hotremove) don't do this, so without
this patch they can try to migrate unexpected types of hugepages.

To prevent this, we introduce hugepage_migration_support() as an
architecture dependent check of whether hugepage are implemented on a pmd
basis or not.  And on some architecture multiple sizes of hugepages are
available, so hugepage_migration_support() also checks hugepage size.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:49 -07:00