108936 Commits

Author SHA1 Message Date
Vladimir Murzin
d30eae4733 arm: add support for memtest
Add support for memtest command line option.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:06 -07:00
Vladimir Murzin
36dd9086cb arm64: add support for memtest
Add support for memtest command line option.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:06 -07:00
Vladimir Murzin
4a20799d11 mm: move memtest under mm
Memtest is a simple feature which fills the memory with a given set of
patterns and validates memory contents, if bad memory regions is detected
it reserves them via memblock API.  Since memblock API is widely used by
other architectures this feature can be enabled outside of x86 world.

This patch set promotes memtest to live under generic mm umbrella and
enables memtest feature for arm/arm64.

It was reported that this patch set was useful for tracking down an issue
with some errant DMA on an arm64 platform.

This patch (of 6):

There is nothing platform dependent in the core memtest code, so other
platforms might benefit from this feature too.

[linux@roeck-us.net: MEMTEST depends on MEMBLOCK]
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:06 -07:00
Kees Cook
204db6ed17 mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
The arch_randomize_brk() function is used on several architectures,
even those that don't support ET_DYN ASLR. To avoid bulky extern/#define
tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for
the architectures that support it, while still handling CONFIG_COMPAT_BRK.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Russell King <linux@arm.linux.org.uk>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Arun Chandran <achandran@mvista.com>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Min-Hua Chen <orca.chen@gmail.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Vineeth Vijayan <vvijayan@mvista.com>
Cc: Jeff Bailey <jeffbailey@google.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Behan Webster <behanw@converseincode.com>
Cc: Ismael Ripoll <iripoll@upv.es>
Cc: Jan-Simon Mller <dl9pf@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:05 -07:00
Kees Cook
d1fd836dcf mm: split ET_DYN ASLR from mmap ASLR
This fixes the "offset2lib" weakness in ASLR for arm, arm64, mips,
powerpc, and x86.  The problem is that if there is a leak of ASLR from
the executable (ET_DYN), it means a leak of shared library offset as
well (mmap), and vice versa.  Further details and a PoC of this attack
is available here:

  http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html

With this patch, a PIE linked executable (ET_DYN) has its own ASLR
region:

  $ ./show_mmaps_pie
  54859ccd6000-54859ccd7000 r-xp  ...  /tmp/show_mmaps_pie
  54859ced6000-54859ced7000 r--p  ...  /tmp/show_mmaps_pie
  54859ced7000-54859ced8000 rw-p  ...  /tmp/show_mmaps_pie
  7f75be764000-7f75be91f000 r-xp  ...  /lib/x86_64-linux-gnu/libc.so.6
  7f75be91f000-7f75beb1f000 ---p  ...  /lib/x86_64-linux-gnu/libc.so.6
  7f75beb1f000-7f75beb23000 r--p  ...  /lib/x86_64-linux-gnu/libc.so.6
  7f75beb23000-7f75beb25000 rw-p  ...  /lib/x86_64-linux-gnu/libc.so.6
  7f75beb25000-7f75beb2a000 rw-p  ...
  7f75beb2a000-7f75beb4d000 r-xp  ...  /lib64/ld-linux-x86-64.so.2
  7f75bed45000-7f75bed46000 rw-p  ...
  7f75bed46000-7f75bed47000 r-xp  ...
  7f75bed47000-7f75bed4c000 rw-p  ...
  7f75bed4c000-7f75bed4d000 r--p  ...  /lib64/ld-linux-x86-64.so.2
  7f75bed4d000-7f75bed4e000 rw-p  ...  /lib64/ld-linux-x86-64.so.2
  7f75bed4e000-7f75bed4f000 rw-p  ...
  7fffb3741000-7fffb3762000 rw-p  ...  [stack]
  7fffb377b000-7fffb377d000 r--p  ...  [vvar]
  7fffb377d000-7fffb377f000 r-xp  ...  [vdso]

The change is to add a call the newly created arch_mmap_rnd() into the
ELF loader for handling ET_DYN ASLR in a separate region from mmap ASLR,
as was already done on s390.  Removes CONFIG_BINFMT_ELF_RANDOMIZE_PIE,
which is no longer needed.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Russell King <linux@arm.linux.org.uk>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Arun Chandran <achandran@mvista.com>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Min-Hua Chen <orca.chen@gmail.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Vineeth Vijayan <vvijayan@mvista.com>
Cc: Jeff Bailey <jeffbailey@google.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Behan Webster <behanw@converseincode.com>
Cc: Ismael Ripoll <iripoll@upv.es>
Cc: Jan-Simon Mller <dl9pf@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:05 -07:00
Kees Cook
c6f5b001e6 s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
In preparation for moving ET_DYN randomization into the ELF loader (which
requires a static ELF_ET_DYN_BASE), this redefines s390's existing ET_DYN
randomization in a call to arch_mmap_rnd(). This refactoring results in
the same ET_DYN randomization on s390.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:05 -07:00
Kees Cook
2b68f6caea mm: expose arch_mmap_rnd when available
When an architecture fully supports randomizing the ELF load location,
a per-arch mmap_rnd() function is used to find a randomized mmap base.
In preparation for randomizing the location of ET_DYN binaries
separately from mmap, this renames and exports these functions as
arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE
for describing this feature on architectures that support it
(which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390
already supports a separated ET_DYN ASLR from mmap ASLR without the
ARCH_BINFMT_ELF_RANDOMIZE_PIE logic).

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Russell King <linux@arm.linux.org.uk>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Arun Chandran <achandran@mvista.com>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Min-Hua Chen <orca.chen@gmail.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Vineeth Vijayan <vvijayan@mvista.com>
Cc: Jeff Bailey <jeffbailey@google.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Behan Webster <behanw@converseincode.com>
Cc: Ismael Ripoll <iripoll@upv.es>
Cc: Jan-Simon Mller <dl9pf@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:05 -07:00
Kees Cook
8e89a356fe s390: standardize mmap_rnd() usage
In preparation for splitting out ET_DYN ASLR, this refactors the use of
mmap_rnd() to be used similarly to arm and x86, and extracts the
checking of PF_RANDOMIZE.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:05 -07:00
Kees Cook
ed6322746a powerpc: standardize mmap_rnd() usage
In preparation for splitting out ET_DYN ASLR, this refactors the use of
mmap_rnd() to be used similarly to arm and x86.

(Can mmap ASLR be safely enabled in the legacy mmap case here?  Other
archs use "mm->mmap_base = TASK_UNMAPPED_BASE + random_factor".)

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:05 -07:00
Kees Cook
1f0569df0b mips: extract logic for mmap_rnd()
In preparation for splitting out ET_DYN ASLR, extract the mmap ASLR
selection into a separate function.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:05 -07:00
Kees Cook
dd04cff1dc arm64: standardize mmap_rnd() usage
In preparation for splitting out ET_DYN ASLR, this refactors the use of
mmap_rnd() to be used similarly to arm and x86.  This additionally
enables mmap ASLR on legacy mmap layouts, which appeared to be missing
on arm64, and was already supported on arm.  Additionally removes a
copy/pasted declaration of an unused function.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:05 -07:00
Kees Cook
82168140bc x86: standardize mmap_rnd() usage
In preparation for splitting out ET_DYN ASLR, this refactors the use of
mmap_rnd() to be used similarly to arm, and extracts the checking of
PF_RANDOMIZE.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:05 -07:00
Kees Cook
fbbc400f39 arm: factor out mmap ASLR into mmap_rnd
To address the "offset2lib" ASLR weakness[1], this separates ET_DYN ASLR
from mmap ASLR, as already done on s390.  The architectures that are
already randomizing mmap (arm, arm64, mips, powerpc, s390, and x86), have
their various forms of arch_mmap_rnd() made available via the new
CONFIG_ARCH_HAS_ELF_RANDOMIZE.  For these architectures,
arch_randomize_brk() is collapsed as well.

This is an alternative to the solutions in:
https://lkml.org/lkml/2015/2/23/442

I've been able to test x86 and arm, and the buildbot (so far) seems happy
with building the rest.

[1] http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html

This patch (of 10):

In preparation for splitting out ET_DYN ASLR, this moves the ASLR
calculations for mmap on ARM into a separate routine, similar to x86.
This also removes the redundant check of personality (PF_RANDOMIZE is
already set before calling arch_pick_mmap_layout).

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Russell King <linux@arm.linux.org.uk>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Arun Chandran <achandran@mvista.com>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Min-Hua Chen <orca.chen@gmail.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Vineeth Vijayan <vvijayan@mvista.com>
Cc: Jeff Bailey <jeffbailey@google.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Behan Webster <behanw@converseincode.com>
Cc: Ismael Ripoll <iripoll@upv.es>
Cc: Jan-Simon Mller <dl9pf@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:05 -07:00
Toshi Kani
6b6378355b x86, mm: support huge KVA mappings on x86
Implement huge KVA mapping interfaces on x86.

On x86, MTRRs can override PAT memory types with a 4KB granularity.  When
using a huge page, MTRRs can override the memory type of the huge page,
which may lead a performance penalty.  The processor can also behave in an
undefined manner if a huge page is mapped to a memory range that MTRRs
have mapped with multiple different memory types.  Therefore, the mapping
code falls back to use a smaller page size toward 4KB when a mapping range
is covered by non-WB type of MTRRs.  The WB type of MTRRs has no affect on
the PAT memory types.

pud_set_huge() and pmd_set_huge() call mtrr_type_lookup() to see if a
given range is covered by MTRRs.  MTRR_TYPE_WRBACK indicates that the
range is either covered by WB or not covered and the MTRR default value is
set to WB.  0xFF indicates that MTRRs are disabled.

HAVE_ARCH_HUGE_VMAP is selected when X86_64 or X86_32 with X86_PAE is set.
 X86_32 without X86_PAE is not supported since such config can unlikey be
benefited from this feature, and there was an issue found in testing.

[fengguang.wu@intel.com: ioremap_pud_capable can be static]
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Robert Elliott <Elliott@hp.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:04 -07:00
Toshi Kani
5d72b4fba4 x86, mm: support huge I/O mapping capability I/F
Implement huge I/O mapping capability interfaces for ioremap() on x86.

IOREMAP_MAX_ORDER is defined to PUD_SHIFT on x86/64 and PMD_SHIFT on
x86/32, which overrides the default value defined in <linux/vmalloc.h>.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Robert Elliott <Elliott@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:04 -07:00
Toshi Kani
0ddab1d2ed lib/ioremap.c: add huge I/O map capability interfaces
Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which return 1 when
I/O mappings with pud/pmd are enabled on the kernel.

ioremap_huge_init() calls arch_ioremap_pud_supported() and
arch_ioremap_pmd_supported() to initialize the capabilities at boot-time.

A new kernel option "nohugeiomap" is also added, so that user can disable
the huge I/O map capabilities when necessary.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Robert Elliott <Elliott@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:04 -07:00
Michal Hocko
f91e8d6d76 sparc: clarify __GFP_NOFAIL allocation
Commit 920c3ed74134 ("[SPARC64]: Add basic infrastructure for MD
add/remove notification") has added __GFP_NOFAIL for the allocation
request but it hasn't mentioned why is this strict requirement really
needed.  The code was handling an allocation failure and propagated it
properly up the callchain so it is not clear why it is needed.

Dave has clarified the intention when I tried to remove the flag as not
being necessary:

: It is a serious failure.
:
: If we miss an MDESC update due to this allocation failure, the update
: is not an event which gets retransmitted so we will lose the updated
: machine description forever.
:
: We really need this allocation to succeed.

So add a comment to clarify the nofail flag and get rid of the failure
check because __GFP_NOFAIL allocation doesn't fail.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Vipul Pandya <vipul@chelsio.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:04 -07:00
Kirill A. Shutemov
235a8f0286 mm: define default PGTABLE_LEVELS to two
By this time all architectures which support more than two page table
levels should be covered.  This patch add default definiton of
PGTABLE_LEVELS equal 2.

We also add assert to detect inconsistence between CONFIG_PGTABLE_LEVELS
and __PAGETABLE_PMD_FOLDED/__PAGETABLE_PUD_FOLDED.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:02 -07:00
Kirill A. Shutemov
9823336833 x86: expose number of page table levels on Kconfig level
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:02 -07:00
Kirill A. Shutemov
6b8ce2a1a4 um: expose number of page table levels
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Jeff Dike <jdike@addtoit.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:02 -07:00
Kirill A. Shutemov
909d45e628 tile: expose number of page table levels
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Chris Metcalf <cmetcalf@ezchip.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:02 -07:00
Kirill A. Shutemov
81a2936c1a sparc: expose number of page table levels
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:02 -07:00
Kirill A. Shutemov
69543d6399 sh: expose number of page table levels
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: linux-sh@vger.kernel.org
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:02 -07:00
Kirill A. Shutemov
c81956c9cd s390: expose number of page table levels
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Core mm expects __PAGETABLE_{PUD,PMD}_FOLDED to be defined if these page
table levels folded.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:02 -07:00
Kirill A. Shutemov
06ef42a16f powerpc: expose number of page table levels on Kconfig level
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:02 -07:00
Kirill A. Shutemov
f24ffde432 parisc: expose number of page table levels on Kconfig level
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:02 -07:00
Kirill A. Shutemov
a728ab5270 mips: expose number of page table levels on Kconfig level
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:02 -07:00
Kirill A. Shutemov
980d5b7387 m68k: mark PMD folded and expose number of page table levels
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Core mm expects __PAGETABLE_{PUD,PMD}_FOLDED to be defined if these page
table levels folded.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:02 -07:00
Kirill A. Shutemov
4d66bcc7cf ia64: expose number of page table levels on Kconfig level
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

We need to define PGTABLE_LEVELS before sourcing init/Kconfig:
arch/Kconfig will define default value and it's sourced from init/Kconfig.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:02 -07:00
Kirill A. Shutemov
1bcad26e9d arm: expose number of page table levels on Kconfig level
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:01 -07:00
Kirill A. Shutemov
9f25e6ad58 arm64: expose number of page table levels on Kconfig level
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

ARM64_PGTABLE_LEVELS is renamed to PGTABLE_LEVELS and defined before
sourcing init/Kconfig: arch/Kconfig will define default value and it's
sourced from init/Kconfig.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:01 -07:00
Kirill A. Shutemov
f5d4547ad2 alpha: expose number of page table levels on Kconfig level
I've implemented accounting for pmd page tables as we have for pte (see
mm->nr_ptes).  It's requires a new counter in mm_struct: mm->nr_pmds.

But the feature doesn't make any sense if an architecture has PMD level
folded and it would be nice get rid of the counter in this case.

The problem is that we cannot use __PAGETABLE_PMD_FOLDED in
<linux/mm_types.h> due to circular dependencies:

<linux/mm_types> -> <asm/pgtable.h> -> <linux/mm_types.h>

In most cases <asm/pgtable.h> wants <linux/mm_types.h> to get definition
of struct page and struct vm_area_struct.  I've tried to split mm_struct
into separate header file to be able to user <asm/pgtable.h> there.

But it doesn't fly on some architectures, like ARM: it wants mm_struct
<asm/pgtable.h> to implement tlb flushing.  I don't see how to fix it
without massive de-inlining or coverting a lot for inline functions to
macros.

This is other approach: expose number of page tables in use via Kconfig
and use it in <linux/mm_types.h> instead of __PAGETABLE_PMD_FOLDED from
<asm/pgtable.h>.

This patch (of 19):

We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:01 -07:00
Ulrich Obergfell
692297d8f9 watchdog: introduce the hardlockup_detector_disable() function
Have kvm_guest_init() use hardlockup_detector_disable() instead of
watchdog_enable_hardlockup_detector(false).

Remove the watchdog_hardlockup_detector_is_enabled() and the
watchdog_enable_hardlockup_detector() function which are no longer needed.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:48:59 -07:00
David Rientjes
1cf370c611 arch/sh/kernel/dwarf.c: use mempool_create_slab_pool()
Mempools created for slab caches should use mempool_create_slab_pool().

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:48:56 -07:00
David Rientjes
7b4b425897 arch/sh/kernel/dwarf.c: destroy mempools on cleanup
dwarf_reg_pool and dwarf_frame_pool are not properly destroyed when
cleaning up the dwarf unwinder.  Destroy them with mempool_destroy().

Also mark dwarf_unwinder_cleanup() as __init.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:48:56 -07:00
Linus Torvalds
ca2ec32658 Merge branch 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs update from Al Viro:
 "Part one:

   - struct filename-related cleanups

   - saner iov_iter_init() replacements (and switching the syscalls to
     use of those)

   - ntfs switch to ->write_iter() (Anton)

   - aio cleanups and splitting iocb into common and async parts
     (Christoph)

   - assorted fixes (me, bfields, Andrew Elble)

  There's a lot more, including the completion of switchover to
  ->{read,write}_iter(), d_inode/d_backing_inode annotations, f_flags
  race fixes, etc, but that goes after #for-davem merge.  David has
  pulled it, and once it's in I'll send the next vfs pull request"

* 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (35 commits)
  sg_start_req(): use import_iovec()
  sg_start_req(): make sure that there's not too many elements in iovec
  blk_rq_map_user(): use import_single_range()
  sg_io(): use import_iovec()
  process_vm_access: switch to {compat_,}import_iovec()
  switch keyctl_instantiate_key_common() to iov_iter
  switch {compat_,}do_readv_writev() to {compat_,}import_iovec()
  aio_setup_vectored_rw(): switch to {compat_,}import_iovec()
  vmsplice_to_user(): switch to import_iovec()
  kill aio_setup_single_vector()
  aio: simplify arguments of aio_setup_..._rw()
  aio: lift iov_iter_init() into aio_setup_..._rw()
  lift iov_iter into {compat_,}do_readv_writev()
  NFS: fix BUG() crash in notify_change() with patch to chown_common()
  dcache: return -ESTALE not -EBUSY on distributed fs race
  NTFS: Version 2.1.32 - Update file write from aio_write to write_iter.
  VFS: Add iov_iter_fault_in_multipages_readable()
  drop bogus check in file_open_root()
  switch security_inode_getattr() to struct path *
  constify tomoyo_realpath_from_path()
  ...
2015-04-14 15:31:03 -07:00
Linus Torvalds
6c8a53c9e6 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Core kernel changes:

   - One of the more interesting features in this cycle is the ability
     to attach eBPF programs (user-defined, sandboxed bytecode executed
     by the kernel) to kprobes.

     This allows user-defined instrumentation on a live kernel image
     that can never crash, hang or interfere with the kernel negatively.
     (Right now it's limited to root-only, but in the future we might
     allow unprivileged use as well.)

     (Alexei Starovoitov)

   - Another non-trivial feature is per event clockid support: this
     allows, amongst other things, the selection of different clock
     sources for event timestamps traced via perf.

     This feature is sought by people who'd like to merge perf generated
     events with external events that were measured with different
     clocks:

       - cluster wide profiling

       - for system wide tracing with user-space events,

       - JIT profiling events

     etc.  Matching perf tooling support is added as well, available via
     the -k, --clockid <clockid> parameter to perf record et al.

     (Peter Zijlstra)

  Hardware enablement kernel changes:

   - x86 Intel Processor Trace (PT) support: which is a hardware tracer
     on steroids, available on Broadwell CPUs.

     The hardware trace stream is directly output into the user-space
     ring-buffer, using the 'AUX' data format extension that was added
     to the perf core to support hardware constraints such as the
     necessity to have the tracing buffer physically contiguous.

     This patch-set was developed for two years and this is the result.
     A simple way to make use of this is to use BTS tracing, the PT
     driver emulates BTS output - available via the 'intel_bts' PMU.
     More explicit PT specific tooling support is in the works as well -
     will probably be ready by 4.2.

     (Alexander Shishkin, Peter Zijlstra)

   - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware
     feature of Intel Xeon CPUs that allows the measurement and
     allocation/partitioning of caches to individual workloads.

     These kernel changes expose the measurement side as a new PMU
     driver, which exposes various QoS related PMU events.  (The
     partitioning change is work in progress and is planned to be merged
     as a cgroup extension.)

     (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P
     Waskiewicz Jr)

   - x86 Intel Haswell LBR call stack support: this is a new Haswell
     feature that allows the hardware recording of call chains, plus
     tooling support.  To activate this feature you have to enable it
     via the new 'lbr' call-graph recording option:

        perf record --call-graph lbr
        perf report

     or:

        perf top --call-graph lbr

     This hardware feature is a lot faster than stack walk or dwarf
     based unwinding, but has some limitations:

       - It reuses the current LBR facility, so LBR call stack and
         branch record can not be enabled at the same time.

       - It is only available for user-space callchains.

     (Yan, Zheng)

   - x86 Intel Broadwell CPU support and various event constraints and
     event table fixes for earlier models.

     (Andi Kleen)

   - x86 Intel HT CPUs event scheduling workarounds.  This is a complex
     CPU bug affecting the SNB,IVB,HSW families that results in counter
     value corruption.  The mitigation code is automatically enabled and
     is transparent.

     (Maria Dimakopoulou, Stephane Eranian)

  The perf tooling side had a ton of changes in this cycle as well, so
  I'm only able to list the user visible changes here, in addition to
  the tooling changes outlined above:

  User visible changes affecting all tools:

      - Improve support of compressed kernel modules (Jiri Olsa)
      - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo)
      - Bash completion for subcommands (Yunlong Song)
      - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
      - Support missing -f to override perf.data file ownership. (Yunlong Song)
      - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)

  User visible changes in individual tools:

    'perf data':

        New tool for converting perf.data to other formats, initially
        for the CTF (Common Trace Format) from LTTng (Jiri Olsa,
        Sebastian Siewior)

    'perf diff':

        Add --kallsyms option (David Ahern)

    'perf list':

        Allow listing events with 'tracepoint' prefix (Yunlong Song)

        Sort the output of the command (Yunlong Song)

    'perf kmem':

        Respect -i option (Jiri Olsa)

        Print big numbers using thousands' group (Namhyung Kim)

        Allow -v option (Namhyung Kim)

        Fix alignment of slab result table (Namhyung Kim)

    'perf probe':

        Support multiple probes on different binaries on the same command line (Masami Hiramatsu)

        Support unnamed union/structure members data collection. (Masami Hiramatsu)

        Check kprobes blacklist when adding new events. (Masami Hiramatsu)

    'perf record':

        Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)

        Support recording running/enabled time (Andi Kleen)

    'perf sched':

        Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song)

    'perf report' and 'perf top':

        Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo)

        Indicate which callchain entries are annotated in the
        TUI hists browser (Arnaldo Carvalho de Melo)

        Add pid/tid filtering to 'report' and 'script' commands (David Ahern)

        Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
        cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
        events (Arnaldo Carvalho de Melo)

    'perf stat':

        Report unsupported events properly (Suzuki K. Poulose)

        Output running time and run/enabled ratio in CSV mode (Andi Kleen)

    'perf trace':

        Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)

        Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo)

        Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo)

        Dump stack on segfaults (Arnaldo Carvalho de Melo)

        No need to explicitely enable evsels for workload started from perf, let it
        be enabled via perf_event_attr.enable_on_exec, removing some events that take
        place in the 'perf trace' before a workload is really started by it.
        (Arnaldo Carvalho de Melo)

        Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo)

  There's also been a ton of infrastructure work done, such as the
  split-out of perf's build system into tools/build/ and other changes -
  see the shortlog and changelog for details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits)
  perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
  perf evlist: Fix type for references to data_head/tail
  perf probe: Check the orphaned -x option
  perf probe: Support multiple probes on different binaries
  perf buildid-list: Fix segfault when show DSOs with hits
  perf tools: Fix cross-endian analysis
  perf tools: Fix error path to do closedir() when synthesizing threads
  perf tools: Fix synthesizing fork_event.ppid for non-main thread
  perf tools: Add 'I' event modifier for exclude_idle bit
  perf report: Don't call map__kmap if map is NULL.
  perf tests: Fix attr tests
  perf probe: Fix ARM 32 building error
  perf tools: Merge all perf_event_attr print functions
  perf record: Add clockid parameter
  perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
  perf sched replay: Support using -f to override perf.data file ownership
  perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
  perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
  perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
  perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
  ...
2015-04-14 14:37:47 -07:00
Russell King
4b2f883847 Merge branch 'devel-stable' into for-next 2015-04-14 22:28:32 +01:00
Russell King
c848791f03 Merge branches 'misc', 'vdso' and 'fixes' into for-next
Conflicts:
	arch/arm/mm/proc-macros.S
2015-04-14 22:28:25 +01:00
Russell King
79403cda37 ARM: update errata 430973 documentation to cover Cortex A8 r1p*
This errata covers all r1 variants of Cortex A8, it's not limited to
just r1p0..r1p2.  Update the documentation to reflect this.  The code
already applies the workaround to all r1p* A8 CPUs.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:28:07 +01:00
Russell King
57ca654bef ARM: ensure delay timer has sufficient accuracy for delays
We have recently had an example of someone wanting to use a 90kHz timer
for the software delay loop.

udelay() needs to have at least microsecond resolution to allow drivers
access to a delay mechanism with a reasonable chance of delaying the
period they requested within at least a 50% marging of error, especially
for small delays.

Discussion about the udelay() accuracy can be found at:
	https://lkml.org/lkml/2011/1/9/37

Reject timers which are unable to supply this level of resolution.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:28:07 +01:00
Russell King
37463be865 ARM: switch to use the generic show_mem() implementation
Switch ARM to use the generic show_mem() implementation, which displays
the statistics from the mm zone rather than walking the page arrays.

Acked-by: Mel Gorman <mgorman <mgorman@suse.de>
Tested-by: Gregory Fong <gregory.0xf0@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:28:06 +01:00
Russell King
a6d7467898 ARM: proc-v7: avoid errata 430973 workaround for non-Cortex A8 CPUs
Avoid the errata 430973 workaround for non-Cortex A8 CPUs.  Having this
workaround enabled introduces an additional branch target buffer flush
into the context switching path, something we wish to avoid.  To allow
this errata to be enabled in multiplatform kernels while reducing its
impact, rearrange the Cortex-A8 CPU support to avoid impacting on other
Version 7 CPUs.

Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:28:06 +01:00
Russell King
e5a5de4447 ARM: enable ARM errata 643719 workaround by default
The effects of not having ARM errata 643719 enabled on affected CPUs
can be very confusing and hard to debug.  Rather than leave this to
chance, enable this workaround by default.  Now that we have rearranged
the code, it should have a low impact on the majority of CPUs.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:26:52 +01:00
Russell King
aaf4b5d92c ARM: cache-v7: optimise test for Cortex A9 r0pX devices
Eliminate one unnecessary instruction from this test by pre-shifting
the Cortex A9 ID - we can shift the actual ID in the teq instruction
thereby losing the pX bit of the ID at no cost.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:26:52 +01:00
Russell King
d3cd451dfb ARM: cache-v7: optimise branches in v7_flush_cache_louis
Optimise the branches such that for the majority of unaffected devices,
we avoid needing to execute the errata work-around code path by
branching to start_flush_levels early.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:26:52 +01:00
Russell King
cd8b24d9e8 ARM: cache-v7: consolidate initialisation of cache level index
Both v7_flush_cache_louis and v7_flush_dcache_all both begin the
flush_levels loop with r10 initialised to zero.  In each case, this
is done immediately prior to entering the loop.  Branch to this
instruction in v7_flush_dcache_all from v7_flush_cache_louis and
eliminate the unnecessary initialisation in v7_flush_cache_louis.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:26:52 +01:00
Russell King
47b8484ea6 ARM: cache-v7: shift CLIDR to extract appropriate field before masking
Rather than have code which masks and then shifts, such as:

	mrc     p15, 1, r0, c0, c0, 1
ALT_SMP(ands	r3, r0, #7 << 21)
ALT_UP( ands	r3, r0, #7 << 27)
ALT_SMP(mov	r3, r3, lsr #20)
ALT_UP(	mov	r3, r3, lsr #26)

re-arrange this as a shift and then mask.  The masking is the same for
each field which we want to extract, so this allows the mask to be
shared amongst code paths:

	mrc     p15, 1, r0, c0, c0, 1
ALT_SMP(mov	r3, r0, lsr #20)
ALT_UP(	mov	r3, r0, lsr #26)
	ands	r3, r3, #7 << 1

Use this method for the LoUIS, LoUU and LoC fields.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:26:51 +01:00
Russell King
5aca370826 ARM: cache-v7: use movw/movt instructions
We always build cache-v7.S for ARMv7, so we can use the ARMv7 16-bit
move instructions to load large constants, rather than using constants
in a literal pool.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:26:51 +01:00
Russell King
89c6bc5884 ARM: allow 16-bit instructions in ALT_UP()
Allow ALT_UP() to cope with a 16-bit Thumb instruction by automatically
inserting a following nop instruction.  This allows us to care less
about getting the assembler to emit a 32-bit thumb instruction.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:26:51 +01:00