141521 Commits

Author SHA1 Message Date
Ingo Molnar
0fd2e9c53d Merge commit 'upstream-x86-entry' into WIP.x86/mm
Pull in a minimal set of v4.15 entry code changes, for a base for the MM isolation patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 12:58:53 +01:00
Matthew Wilcox
5f0e3fe6b1 x86/build: Make isoimage work on Debian
Debian does not ship a 'mkisofs' symlink to genisoimage.  All modern
distros ship genisoimage, so just use that directly.  That requires
renaming the 'genisoimage' function.  Also neaten up the 'for' loop
while I'm in here.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-16 16:23:31 +01:00
Linus Torvalds
f6f3732162 Revert "mm: replace p??_write with pte_access_permitted in fault + gup paths"
This reverts commits 5c9d2d5c269c, c7da82b894e9, and e7fe7b5cae90.

We'll probably need to revisit this, but basically we should not
complicate the get_user_pages_fast() case, and checking the actual page
table protection key bits will require more care anyway, since the
protection keys depend on the exact state of the VM in question.

Particularly when doing a "remote" page lookup (ie in somebody elses VM,
not your own), you need to be much more careful than this was.  Dave
Hansen says:

 "So, the underlying bug here is that we now a get_user_pages_remote()
  and then go ahead and do the p*_access_permitted() checks against the
  current PKRU. This was introduced recently with the addition of the
  new p??_access_permitted() calls.

  We have checks in the VMA path for the "remote" gups and we avoid
  consulting PKRU for them. This got missed in the pkeys selftests
  because I did a ptrace read, but not a *write*. I also didn't
  explicitly test it against something where a COW needed to be done"

It's also not entirely clear that it makes sense to check the protection
key bits at this level at all.  But one possible eventual solution is to
make the get_user_pages_fast() case just abort if it sees protection key
bits set, which makes us fall back to the regular get_user_pages() case,
which then has a vma and can do the check there if we want to.

We'll see.

Somewhat related to this all: what we _do_ want to do some day is to
check the PAGE_USER bit - it should obviously always be set for user
pages, but it would be a good check to have back.  Because we have no
generic way to test for it, we lost it as part of moving over from the
architecture-specific x86 GUP implementation to the generic one in
commit e585513b76f7 ("x86/mm/gup: Switch GUP to the generic
get_user_page_fast() implementation").

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-15 18:53:22 -08:00
Linus Torvalds
7a3c296ae0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Clamp timeouts to INT_MAX in conntrack, from Jay Elliot.

 2) Fix broken UAPI for BPF_PROG_TYPE_PERF_EVENT, from Hendrik
    Brueckner.

 3) Fix locking in ieee80211_sta_tear_down_BA_sessions, from Johannes
    Berg.

 4) Add missing barriers to ptr_ring, from Michael S. Tsirkin.

 5) Don't advertise gigabit in sh_eth when not available, from Thomas
    Petazzoni.

 6) Check network namespace when delivering to netlink taps, from Kevin
    Cernekee.

 7) Kill a race in raw_sendmsg(), from Mohamed Ghannam.

 8) Use correct address in TCP md5 lookups when replying to an incoming
    segment, from Christoph Paasch.

 9) Add schedule points to BPF map alloc/free, from Eric Dumazet.

10) Don't allow silly mtu values to be used in ipv4/ipv6 multicast, also
    from Eric Dumazet.

11) Fix SKB leak in tipc, from Jon Maloy.

12) Disable MAC learning on OVS ports of mlxsw, from Yuval Mintz.

13) SKB leak fix in skB_complete_tx_timestamp(), from Willem de Bruijn.

14) Add some new qmi_wwan device IDs, from Daniele Palmas.

15) Fix static key imbalance in ingress qdisc, from Jiri Pirko.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
  net: qcom/emac: Reduce timeout for mdio read/write
  net: sched: fix static key imbalance in case of ingress/clsact_init error
  net: sched: fix clsact init error path
  ip_gre: fix wrong return value of erspan_rcv
  net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support
  pkt_sched: Remove TC_RED_OFFLOADED from uapi
  net: sched: Move to new offload indication in RED
  net: sched: Add TCA_HW_OFFLOAD
  net: aquantia: Increment driver version
  net: aquantia: Fix typo in ethtool statistics names
  net: aquantia: Update hw counters on hw init
  net: aquantia: Improve link state and statistics check interval callback
  net: aquantia: Fill in multicast counter in ndev stats from hardware
  net: aquantia: Fill ndev stat couters from hardware
  net: aquantia: Extend stat counters to 64bit values
  net: aquantia: Fix hardware DMA stream overload on large MRRS
  net: aquantia: Fix actual speed capabilities reporting
  sock: free skb in skb_complete_tx_timestamp on error
  s390/qeth: update takeover IPs after configuration change
  s390/qeth: lock IP table while applying takeover changes
  ...
2017-12-15 13:08:37 -08:00
Linus Torvalds
06f976ecc7 arm64 fixes:
- Fix FPSIMD context switch regression introduced in -rc2
 
 - Fix ABI break with SVE CPUID register reporting
 
 - Fix use of uninitialised variable
 
 - Fixes to hardware access/dirty management and sanity checking
 
 - CPU erratum workaround for Falkor CPUs
 
 - Fix reporting of writeable+executable mappings
 
 - Fix signal reporting for RAS errors
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJaNAZcAAoJELescNyEwWM0brIH/i69foOwEb5CFE8B6Bwh1yMR
 WtiNMiuLeaOoOmAzTLz5ZMi0W087cth+ycgiXuvnMQtzLIAFXK0gWCZ+CLBHgsiz
 Q6ba7Li0JbFuSqOyxjxcLMeDRFsD6eVZuoVhBeVi+bjz6Ni44nXF4+TXhep82+Ws
 xMfK5S8qjhAwFqFuOlL6Goq1zg5lGVJQjpBHkipiWRpmU8AdY16dsajsvMvbZl0A
 4LhIntEo5qx1+6un+8w3xoMt5uzb0BeVseTKCEghDgZ2wE351pwQEEQUam0KVhv4
 6b803ccpiBbl3oo4yAbgvXigTW6HBjyKA9e/Xy9SG9gpSFZdUNhBcGoCOnaIF/A=
 =kjAU
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "There are some significant fixes in here for FP state corruption,
  hardware access/dirty PTE corruption and an erratum workaround for the
  Falkor CPU.

  I'm hoping that things finally settle down now, but never say never...

  Summary:

   - Fix FPSIMD context switch regression introduced in -rc2

   - Fix ABI break with SVE CPUID register reporting

   - Fix use of uninitialised variable

   - Fixes to hardware access/dirty management and sanity checking

   - CPU erratum workaround for Falkor CPUs

   - Fix reporting of writeable+executable mappings

   - Fix signal reporting for RAS errors"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: fpsimd: Fix copying of FP state from signal frame into task struct
  arm64/sve: Report SVE to userspace via CPUID only if supported
  arm64: fix CONFIG_DEBUG_WX address reporting
  arm64: fault: avoid send SIGBUS two times
  arm64: hw_breakpoint: Use linux/uaccess.h instead of asm/uaccess.h
  arm64: Add software workaround for Falkor erratum 1041
  arm64: Define cputype macros for Falkor CPU
  arm64: mm: Fix false positives in set_pte_at access/dirty race detection
  arm64: mm: Fix pte_mkclean, pte_mkdirty semantics
  arm64: Initialise high_memory global variable earlier
2017-12-15 12:44:49 -08:00
Linus Torvalds
e53000b1ed Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - fix the s2ram regression related to confusion around segment
     register restoration, plus related cleanups that make the code more
     robust

   - a guess-unwinder Kconfig dependency fix

   - an isoimage build target fix for certain tool chain combinations

   - instruction decoder opcode map fixes+updates, and the syncing of
     the kernel decoder headers to the objtool headers

   - a kmmio tracing fix

   - two 5-level paging related fixes

   - a topology enumeration fix on certain SMP systems"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Resync objtool's instruction decoder source code copy with the kernel's latest version
  x86/decoder: Fix and update the opcodes map
  x86/power: Make restore_processor_context() sane
  x86/power/32: Move SYSENTER MSR restoration to fix_processor_context()
  x86/power/64: Use struct desc_ptr for the IDT in struct saved_context
  x86/unwinder/guess: Prevent using CONFIG_UNWINDER_GUESS=y with CONFIG_STACKDEPOT=y
  x86/build: Don't verify mtools configuration file for isoimage
  x86/mm/kmmio: Fix mmiotrace for page unaligned addresses
  x86/boot/compressed/64: Print error if 5-level paging is not supported
  x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
  x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation
2017-12-15 12:14:33 -08:00
Linus Torvalds
bde6b37e49 xen: fixes for 4.15-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJaM17CAAoJELDendYovxMvZs0H/3YIAN3wXEqcaZuAoMKdTU3C
 gSipBUJKwcwo/1qJCtE5CzQnl7U6qC4BPnwNfed1AJUULFBYtpEMw6KFMhK2KELn
 ADWoJzVbBzouZnj0GHuky4dQnsC5lPL3RjlWqKkzlEA3KebrxLsO7inw5Pru9zPZ
 y1Isl/nLxV04f8SYITjgThCCdo9pnQOdx275YAcbQprqsOK45j+4d1Mn0fYb1Rjk
 7Fk4i9UAZlY/VYycpgyC/m3PKFdawl0uqHXwu4yTKXQJGZ3BwED7ifTvHknaSOhA
 DVxktnEOTg5IrrqDrr4EdZIgXQcqQF19oIOtEWCKCLvW8j/sBlSAQYkgtn5OpxY=
 =gqWy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.15-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Two minor fixes for running as Xen dom0:

   - when built as 32 bit kernel on large machines the Xen LAPIC
     emulation should report a rather modern LAPIC in order to support
     enough APIC-Ids

   - The Xen LAPIC emulation is needed for dom0 only, so build it only
     for kernels supporting to run as Xen dom0"

* tag 'for-linus-4.15-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: XEN_ACPI_PROCESSOR is Dom0-only
  x86/Xen: don't report ancient LAPIC version
2017-12-15 11:32:09 -08:00
Daniel Borkmann
07aee94394 bpf, sparc: fix usage of wrong reg for load_skb_regs after call
When LD_ABS/IND is used in the program, and we have a BPF helper
call that changes packet data (bpf_helper_changes_pkt_data() returns
true), then in case of sparc JIT, we try to reload cached skb data
from bpf2sparc[BPF_REG_6]. However, there is no such guarantee or
assumption that skb sits in R6 at this point, all helpers changing
skb data only have a guarantee that skb sits in R1. Therefore,
store BPF R1 in L7 temporarily and after procedure call use L7 to
reload cached skb data. skb sitting in R6 is only true at the time
when LD_ABS/IND is executed.

Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-15 09:19:35 -08:00
Daniel Borkmann
87338c8e2c bpf, ppc64: do not reload skb pointers in non-skb context
The assumption of unconditionally reloading skb pointers on
BPF helper calls where bpf_helper_changes_pkt_data() holds
true is wrong. There can be different contexts where the helper
would enforce a reload such as in case of XDP. Here, we do
have a struct xdp_buff instead of struct sk_buff as context,
thus this will access garbage.

JITs only ever need to deal with cached skb pointer reload
when ld_abs/ind was seen, therefore guard the reload behind
SEEN_SKB.

Fixes: 156d0e290e96 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-15 09:19:35 -08:00
Daniel Borkmann
6d59b7dbf7 bpf, s390x: do not reload skb pointers in non-skb context
The assumption of unconditionally reloading skb pointers on
BPF helper calls where bpf_helper_changes_pkt_data() holds
true is wrong. There can be different contexts where the
BPF helper would enforce a reload such as in case of XDP.
Here, we do have a struct xdp_buff instead of struct sk_buff
as context, thus this will access garbage.

JITs only ever need to deal with cached skb pointer reload
when ld_abs/ind was seen, therefore guard the reload behind
SEEN_SKB only. Tested on s390x.

Fixes: 9db7f2b81880 ("s390/bpf: recache skb->data/hlen for skb_vlan_push/pop")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-15 09:19:35 -08:00
Will Deacon
a454483137 arm64: fpsimd: Fix copying of FP state from signal frame into task struct
Commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
state after signals") fixed an issue reported in our FPSIMD signal
restore code but inadvertently introduced another issue which tends to
manifest as random SEGVs in userspace.

The problem is that when we copy the struct fpsimd_state from the kernel
stack (populated from the signal frame) into the struct held in the
current thread_struct, we blindly copy uninitialised stack into the
"cpu" field, which means that context-switching of the FP registers is
no longer reliable.

This patch fixes the problem by copying only the user_fpsimd member of
struct fpsimd_state. We should really rework the function prototypes
to take struct user_fpsimd_state * instead, but let's just get this
fixed for now.

Cc: Dave Martin <Dave.Martin@arm.com>
Fixes: 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD state after signals")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-15 16:12:35 +00:00
Andy Lutomirski
c739f930be x86/espfix/64: Fix espfix double-fault handling on 5-level systems
Using PGDIR_SHIFT to identify espfix64 addresses on 5-level systems
was wrong, and it resulted in panics due to unhandled double faults.
Use P4D_SHIFT instead, which is correct on 4-level and 5-level
machines.

This fixes a panic when running x86 selftests on 5-level machines.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 1d33b219563f ("x86/espfix: Add support for 5-level paging")
Link: http://lkml.kernel.org/r/24c898b4f44fdf8c22d93703850fb384ef87cfdc.1513035461.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-15 16:58:53 +01:00
Martin Schwidefsky
9f37e79754 s390: fix preemption race in disable_sacf_uaccess
With CONFIG_PREEMPT=y there is a possible race in disable_sacf_uaccess.

The new set_fs value needs to be stored the the task structure first,
the control register update needs to be second. Otherwise a preemptive
schedule may interrupt the code right after the control register update
has been done and the next time the task is scheduled we get an incorrect
value in the control register due to the old set_fs setting.

Fixes: 0aaba41b58 ("s390: remove all code using the access register mode")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-12-15 15:05:21 +01:00
Randy Dunlap
f5b5fab178 x86/decoder: Fix and update the opcodes map
Update x86-opcode-map.txt based on the October 2017 Intel SDM publication.
Fix INVPID to INVVPID.
Add UD0 and UD1 instruction opcodes.

Also sync the objtool and perf tooling copies of this file.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/aac062d7-c0f6-96e3-5c92-ed299e2bd3da@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-15 13:45:20 +01:00
Andy Lutomirski
7ee18d6779 x86/power: Make restore_processor_context() sane
My previous attempt to fix a couple of bugs in __restore_processor_context():

  5b06bbcfc2c6 ("x86/power: Fix some ordering bugs in __restore_processor_context()")

... introduced yet another bug, breaking suspend-resume.

Rather than trying to come up with a minimal fix, let's try to clean it up
for real.  This patch fixes quite a few things:

 - The old code saved a nonsensical subset of segment registers.
   The only registers that need to be saved are those that contain
   userspace state or those that can't be trivially restored without
   percpu access working.  (On x86_32, we can restore percpu access
   by writing __KERNEL_PERCPU to %fs.  On x86_64, it's easier to
   save and restore the kernel's GSBASE.)  With this patch, we
   restore hardcoded values to the kernel state where applicable and
   explicitly restore the user state after fixing all the descriptor
   tables.

 - We used to use an unholy mix of inline asm and C helpers for
   segment register access.  Let's get rid of the inline asm.

This fixes the reported s2ram hangs and make the code all around
more logical.

Analyzed-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reported-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Tested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Zhang Rui <rui.zhang@intel.com>
Fixes: 5b06bbcfc2c6 ("x86/power: Fix some ordering bugs in __restore_processor_context()")
Link: http://lkml.kernel.org/r/398ee68e5c0f766425a7b746becfc810840770ff.1513286253.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-15 12:21:38 +01:00
Andy Lutomirski
896c80bef4 x86/power/32: Move SYSENTER MSR restoration to fix_processor_context()
x86_64 restores system call MSRs in fix_processor_context(), and
x86_32 restored them along with segment registers.  The 64-bit
variant makes more sense, so move the 32-bit code to match the
64-bit code.

No side effects are expected to runtime behavior.

Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: http://lkml.kernel.org/r/65158f8d7ee64dd6bbc6c1c83b3b34aaa854e3ae.1513286253.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-15 12:18:29 +01:00
Andy Lutomirski
090edbe23f x86/power/64: Use struct desc_ptr for the IDT in struct saved_context
x86_64's saved_context nonsensically used separate idt_limit and
idt_base fields and then cast &idt_limit to struct desc_ptr *.

This was correct (with -fno-strict-aliasing), but it's confusing,
served no purpose, and required #ifdeffery. Simplify this by
using struct desc_ptr directly.

No change in functionality.

Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: http://lkml.kernel.org/r/967909ce38d341b01d45eff53e278e2728a3a93a.1513286253.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-15 12:18:29 +01:00
Lan Tianyu
f298103359 KVM/x86: Check input paging mode when cs.l is set
Reported by syzkaller:
    WARNING: CPU: 0 PID: 27962 at arch/x86/kvm/emulate.c:5631 x86_emulate_insn+0x557/0x15f0 [kvm]
    Modules linked in: kvm_intel kvm [last unloaded: kvm]
    CPU: 0 PID: 27962 Comm: syz-executor Tainted: G    B   W        4.15.0-rc2-next-20171208+ #32
    Hardware name: Intel Corporation S1200SP/S1200SP, BIOS S1200SP.86B.01.03.0006.040720161253 04/07/2016
    RIP: 0010:x86_emulate_insn+0x557/0x15f0 [kvm]
    RSP: 0018:ffff8807234476d0 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffff88072d0237a0 RCX: ffffffffa0065c4d
    RDX: 1ffff100e5a046f9 RSI: 0000000000000003 RDI: ffff88072d0237c8
    RBP: ffff880723447728 R08: ffff88072d020000 R09: ffffffffa008d240
    R10: 0000000000000002 R11: ffffed00e7d87db3 R12: ffff88072d0237c8
    R13: ffff88072d023870 R14: ffff88072d0238c2 R15: ffffffffa008d080
    FS:  00007f8a68666700(0000) GS:ffff880802200000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000002009506c CR3: 000000071fec4005 CR4: 00000000003626f0
    Call Trace:
     x86_emulate_instruction+0x3bc/0xb70 [kvm]
     ? reexecute_instruction.part.162+0x130/0x130 [kvm]
     vmx_handle_exit+0x46d/0x14f0 [kvm_intel]
     ? trace_event_raw_event_kvm_entry+0xe7/0x150 [kvm]
     ? handle_vmfunc+0x2f0/0x2f0 [kvm_intel]
     ? wait_lapic_expire+0x25/0x270 [kvm]
     vcpu_enter_guest+0x720/0x1ef0 [kvm]
     ...

When CS.L is set, vcpu should run in the 64 bit paging mode.
Current kvm set_sregs function doesn't have such check when
userspace inputs sreg values. This will lead unexpected behavior.
This patch is to add checks for CS.L, EFER.LME, EFER.LMA and
CR4.PAE when get SREG inputs from userspace in order to avoid
unexpected behavior.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Tianyu Lan <tianyu.lan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-15 10:01:46 +01:00
Linus Torvalds
c4f988ee51 pci-v4.15-fixes-1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJaMwhcAAoJEFmIoMA60/r88ygP/3fuGrYrolx1yOM/XKjiG1Za
 B+FF5IK0xcV4Xrdg4K9TrqCkVBlSmTkF2wvD1raFQhvPvzNnTMI0JvI5iwUogSlA
 EI8rKvKRT0QRtT27URhGWCZOrPMgylE7Vtq7ZGLin2o19A4JQw3TC8mz9QirSbii
 C5H+WZ6iAQ3ISjOuiFUREqQDCE8FzbYQC69uh/xMRshZFG1WPsX96ZRSvVfPQUeC
 XwvKqiO4s9I7zyxhS/WSRnnEHO7LNSRRiq8kzP0VKKMAfTo8gLYPoeIPakvXSsQZ
 j7N1ycWub9DSn7wjh9Zd922wp1EYfxkCW6jilVOd+BlmOxLjyPPGgahbNIrIojwa
 qiTY5SNdty3h27IT3Hep4yDBoqhXXPORXgh7QRlzizu8Te2SjHY7n9qi1S8Uobn+
 s98PoNYU+oelivi9MtKLKcqCaNL0RE8aCQfuA3TzCU/dzR4Trx8obV50f67vsL6e
 eK67S/KpOVUjHQmGec8iqnrJfHPZkxgaRTOVVYCXKidtJxTf/LyjDKYh8A2Q27A5
 wXdRaSxLw25r0alAWzUEIVj/iUPd4nYKY9RTeRPWiXU+wsEUcBO+js6HE1xZAAfk
 ULDqZKnkKsgW9aWlQU6DzzLsCjxPTRcC8SfK79K47Otua2laABzL47mzo2sSKKFs
 mNq3i4OiNFq8XCsN18lo
 =o3Rn
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.15-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - add a pci_get_domain_bus_and_slot() stub for the CONFIG_PCI=n case to
   avoid build breakage in the v4.16 merge window if a
   pci_get_bus_and_slot() -> pci_get_domain_bus_and_slot() patch gets
   merged before the PCI tree (Randy Dunlap)

 - fix an AMD boot regression in the 64bit BAR support added in v4.15
   (Christian König)

 - fix an R-Car use-after-free that causes a crash if no PCIe card is
   present (Geert Uytterhoeven)

* tag 'pci-v4.15-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: rcar: Fix use-after-free in probe error path
  x86/PCI: Only enable a 64bit BAR on single-socket AMD Family 15h
  x86/PCI: Fix infinite loop in search for 64bit BAR placement
  PCI: Add pci_get_domain_bus_and_slot() stub
2017-12-14 17:02:39 -08:00
Thiago Rafael Becker
bdcf0a423e kernel: make groups_sort calling a responsibility group_info allocators
In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.

This patch:
 - Make groups_sort globally visible.
 - Move the call to groups_sort to the modifiers of group_info
 - Remove the call to groups_sort from set_groups

Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14 16:00:49 -08:00
Dave Martin
3fab39997a arm64/sve: Report SVE to userspace via CPUID only if supported
Currently, the SVE field in ID_AA64PFR0_EL1 is visible
unconditionally to userspace via the CPU ID register emulation,
irrespective of the kernel config.  This means that if a kernel
configured with CONFIG_ARM64_SVE=n is run on SVE-capable hardware,
userspace will see SVE reported as present in the ID regs even
though the kernel forbids execution of SVE instructions.

This patch makes the exposure of the SVE field in ID_AA64PFR0_EL1
conditional on CONFIG_ARM64_SVE=y.

Since future architecture features are likely to encounter a
similar requirement, this patch adds a suitable helper macros for
use when declaring config-conditional ID register fields.

Fixes: 43994d824e84 ("arm64/sve: Detect SVE and activate runtime support")
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-14 15:14:30 +00:00
Mark Rutland
1d08a044cf arm64: fix CONFIG_DEBUG_WX address reporting
In ptdump_check_wx(), we pass walk_pgd() a start address of 0 (rather
than VA_START) for the init_mm. This means that any reported W&X
addresses are offset by VA_START, which is clearly wrong and can make
them appear like userspace addresses.

Fix this by telling the ptdump code that we're walking init_mm starting
at VA_START. We don't need to update the addr_markers, since these are
still valid bounds regardless.

Cc: <stable@vger.kernel.org>
Fixes: 1404d6f13e47 ("arm64: dump: Add checking for writable and exectuable pages")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@redhat.com>
Reported-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-14 10:18:23 +00:00
Peter Xu
5663d8f9bb kvm: x86: fix WARN due to uninitialized guest FPU state
------------[ cut here ]------------
 Bad FPU state detected at kvm_put_guest_fpu+0xd8/0x2d0 [kvm], reinitializing FPU registers.
 WARNING: CPU: 1 PID: 4594 at arch/x86/mm/extable.c:103 ex_handler_fprestore+0x88/0x90
 CPU: 1 PID: 4594 Comm: qemu-system-x86 Tainted: G    B      OE    4.15.0-rc2+ #10
 RIP: 0010:ex_handler_fprestore+0x88/0x90
 Call Trace:
  fixup_exception+0x4e/0x60
  do_general_protection+0xff/0x270
  general_protection+0x22/0x30
 RIP: 0010:kvm_put_guest_fpu+0xd8/0x2d0 [kvm]
 RSP: 0018:ffff8803d5627810 EFLAGS: 00010246
  kvm_vcpu_reset+0x3b4/0x3c0 [kvm]
  kvm_apic_accept_events+0x1c0/0x240 [kvm]
  kvm_arch_vcpu_ioctl_run+0x1658/0x2fb0 [kvm]
  kvm_vcpu_ioctl+0x479/0x880 [kvm]
  do_vfs_ioctl+0x142/0x9a0
  SyS_ioctl+0x74/0x80
  do_syscall_64+0x15f/0x600

where kvm_put_guest_fpu is called without a prior kvm_load_guest_fpu.
To fix it, move kvm_load_guest_fpu to the very beginning of
kvm_arch_vcpu_ioctl_run.

Cc: stable@vger.kernel.org
Fixes: f775b13eedee2f7f3c6fdd4e90fb79090ce5d339
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:24:35 +01:00
Wanpeng Li
d73235d17b KVM: X86: Fix load RFLAGS w/o the fixed bit
*** Guest State ***
 CR0: actual=0x0000000000000030, shadow=0x0000000060000010, gh_mask=fffffffffffffff7
 CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffe871
 CR3 = 0x00000000fffbc000
 RSP = 0x0000000000000000  RIP = 0x0000000000000000
 RFLAGS=0x00000000         DR7 = 0x0000000000000400
        ^^^^^^^^^^

The failed vmentry is triggered by the following testcase when ept=Y:

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <linux/kvm.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>

    long r[5];
    int main()
    {
    	r[2] = open("/dev/kvm", O_RDONLY);
    	r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
    	r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
    	struct kvm_regs regs = {
    		.rflags = 0,
    	};
    	ioctl(r[4], KVM_SET_REGS, &regs);
    	ioctl(r[4], KVM_RUN, 0);
    }

X86 RFLAGS bit 1 is fixed set, userspace can simply clearing bit 1
of RFLAGS with KVM_SET_REGS ioctl which results in vmentry fails.
This patch fixes it by oring X86_EFLAGS_FIXED during ioctl.

Cc: stable@vger.kernel.org
Suggested-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Quan Xu <quan.xu0@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:24:26 +01:00
Wanpeng Li
ed52870f46 KVM: MMU: Fix infinite loop when there is no available mmu page
The below test case can cause infinite loop in kvm when ept=0.

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <linux/kvm.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>

    long r[5];
    int main()
    {
    	r[2] = open("/dev/kvm", O_RDONLY);
    	r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
    	r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
    	ioctl(r[4], KVM_RUN, 0);
    }

It doesn't setup the memory regions, mmu_alloc_shadow/direct_roots() in
kvm return 1 when kvm fails to allocate root page table which can result
in beblow infinite loop:

    vcpu_run() {
    	for (;;) {
	    	r = vcpu_enter_guest()::kvm_mmu_reload() returns 1
	    	if (r <= 0)
	    		break;
	    	if (need_resched())
	    		cond_resched();
      }
    }

This patch fixes it by returning -ENOSPC when there is no available kvm mmu
page for root page table.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 26eeb53cf0f (KVM: MMU: Bail out immediately if there is no available mmu page)
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:24:14 +01:00
Linus Torvalds
4e746cf4f7 RISC-V Fixes for 4.15-rc4
This pull request contains three small fixes that I'm hoping to get into
 4.15-rc4:
 
 * A fix to a typo in sys_riscv_flush_icache.  This only effects error
   handling, but I think it's a small and obvious enough change that it's
   sane outside the merge window.
 * The addition of smp_mb__after_spinlock(), which was recently removed
   due to an incorrect comment.  This is largly a comment change (as
   there's a big one now), and while it's necessary for complience with
   the RISC-V memory model the lack of this fence shouldn't manifest as a
   bug on current implementations.  Nonetheless, it still seems saner to
   have the fence in 4.15.
 * The removal of some of the HVC_RISCV_SBI driver that snuck into the
   arch port.  This is compile-time dead code in 4.15 (as the driver
   isn't in yet), and during the review process we found a better way to
   implement early printk on RISC-V.  While this change doesn't do
   anything, it will make staging our HVC driver easier: without this
   change the HVC driver we hope to upstream won't build on 4.15 (because
   the 4.15 arch code would reference a function that no longer exists).
 
 Additionally, I'm instituting a bit of a process change so I don't
 submit things too quickly again:
 
 * All the patches I submit during an RC will be from the week before, so
   everyone has gotten a chance to see them and they've made it through
   our autobuilders and integration trees.
 * I'll cherry-pick (single patches) or merge (if it's a patch set) on
   top of the new RC on Monday morning.
 * I'll sign the tag on Monday morning, to let the autobuilders pick up
   exactly what I'm submitting.
 * Assuming nothing goes wrong, I'll mail the pull request out on
   Wednesday.  If something goes wrong, I'll wait at least a day after
   re-spinning the tag to let the autobuilders pick things up.
 
 Hopefully this will avoid any headaches in the future, barring any
 emergency fixes.
 
 I don't think this is the last patch set we'll want for 4.15: I think
 I'll want to remove some of the first-level irqchip driver that snuck in
 as well, which will look a lot like the HVC patch here.  This is pending
 some asm-generic cleanup I'm doing that I haven't quite gotten clean
 enough to send out yet, though, but hopefully it'll be ready by next
 week (and still OK for that late).
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlou0sQTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQebsD/4gPSr0adXxl3hXhrHWQIcigWM1QNgp
 EjNnsHiT1zLs9SNK+CBLQXEjXp/GwqfYDiE6yzaJEhoBm5rHB9c72IThCJkt6fRR
 5fuybYXxBXaXVxq0CbaItNzTdqm99iqSCD2p3mh7Dmw/X23EglkYdPqQlUIxWTtw
 BmlCaH8tn8PwGIgVATwJe8fWEe8tep8uPUW6NAMznLVCKKaOd56fkP+FVA1+M7u4
 7kaoG6B8tg3mEbHdsfvo4VSII7HcHsaTa4tOaD7RPjEDKdxFQbzY8HUP38BunqOu
 v9bsV0BXYvG5dHF0eWtQAfB+y5V7n1fipcfKtZl2HagN9hbKruwQMPZTO+xPT7Am
 F/VyYxoirzfY5quA0ancS+DV54LfcHC5MH9Xvhacvg7yY3zYTvhsZfEaND0GAMLT
 sqO3Mi24+yw5Cs24uSJEj6qkxxuHMZUePpzWJWET2vduXe+pqOYF8jQn13SqHyxs
 QoIaw18RqcBPtX6vABFoMKP6WHPk8No3iki5DrSVSQ0jN9r3HSBxvYCIdG89P6A+
 E0a65H91GhY/kVNJAE4YC8MF3e6D8/+KX9+h/q1wyueJV2tTE0cd0ZiQY7hN36HY
 /QIf8zH4tmYhHAJFNLQnK1sg1Ipino6ABHNv76MhZMxguYD4FAE0nAHsfv1AAa97
 Webx5zolwHEz8A==
 =FRZX
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-4.15-rc4-riscv_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "This contains three small fixes:

   - A fix to a typo in sys_riscv_flush_icache. This only effects error
     handling, but I think it's a small and obvious enough change that
     it's sane outside the merge window.

   - The addition of smp_mb__after_spinlock(), which was recently
     removed due to an incorrect comment. This is largly a comment
     change (as there's a big one now), and while it's necessary for
     complience with the RISC-V memory model the lack of this fence
     shouldn't manifest as a bug on current implementations.
     Nonetheless, it still seems saner to have the fence in 4.15.

   - The removal of some of the HVC_RISCV_SBI driver that snuck into the
     arch port. This is compile-time dead code in 4.15 (as the driver
     isn't in yet), and during the review process we found a better way
     to implement early printk on RISC-V. While this change doesn't do
     anything, it will make staging our HVC driver easier: without this
     change the HVC driver we hope to upstream won't build on 4.15
     (because the 4.15 arch code would reference a function that no
     longer exists).

  I don't think this is the last patch set we'll want for 4.15: I think
  I'll want to remove some of the first-level irqchip driver that snuck
  in as well, which will look a lot like the HVC patch here. This is
  pending some asm-generic cleanup I'm doing that I haven't quite gotten
  clean enough to send out yet, though, but hopefully it'll be ready by
  next week (and still OK for that late)"

 * tag 'riscv-for-linus-4.15-rc4-riscv_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux:
  RISC-V: Remove unused CONFIG_HVC_RISCV_SBI code
  RISC-V: Resurrect smp_mb__after_spinlock()
  RISC-V: Logical vs Bitwise typo
2017-12-13 20:13:05 -08:00
David S. Miller
8c8f67a46f Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2017-12-13

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Addition of explicit scheduling points to map alloc/free
   in order to avoid having to hold the CPU for too long,
   from Eric.

2) Fixing of a corruption in overlapping perf_event_output
   calls from different BPF prog types on the same CPU out
   of different contexts, from Daniel.

3) Fallout fixes for recent correction of broken uapi for
   BPF_PROG_TYPE_PERF_EVENT. um had a missing asm header
   that needed to be pulled in from asm-generic and for
   BPF selftests the asm-generic include did not work,
   so similar asm include scheme was adapted for that
   problematic header that perf is having with other
   header files under tools, from Daniel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 17:30:04 -05:00
Russell King
cd8165c3d5 ARM: dts: vf610-zii-dev: use XAUI for DSA link ports
Use XAUI rather than XGMII for DSA link ports, as this is the interface
mode that the switches actually use. XAUI is the 4 lane bus with clock
per direction, whereas XGMII is a 32 bit bus with clock.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 14:59:15 -05:00
Dongjiu Geng
faa75e147b arm64: fault: avoid send SIGBUS two times
do_sea() calls arm64_notify_die() which will always signal
user-space. It also returns whether APEI claimed the external
abort as a RAS notification. If it returns failure do_mem_abort()
will signal user-space too.

do_mem_abort() wants to know if we handled the error, we always
call arm64_notify_die() so can always return success.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-13 09:58:13 +00:00
Sebastian Ott
a5f1005517 s390/pci: handle insufficient resources during dma tlb flush
In a virtualized setup lazy flushing can lead to the hypervisor
running out of resources when lots of guest pages need to be
pinned. In this situation simply trigger a global flush to give
the hypervisor a chance to free some of these resources.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-12-13 10:51:33 +01:00
Anju T Sudhakar
110df8bd3e powerpc/perf: Fix kfree memory allocated for nest pmus
imc_common_cpuhp_mem_free() is the common function for all
IMC (In-memory Collection counters) domains to unregister cpuhotplug
callback and free memory. Since kfree of memory allocated for
nest-imc (per_nest_pmu_arr) is in the common code, all
domains (core/nest/thread) can do the kfree in the failure case.

This could potentially create a call trace as shown below, where
core(/thread/nest) imc pmu initialization fails and in the failure
path imc_common_cpuhp_mem_free() free the memory(per_nest_pmu_arr),
which is allocated by successfully registered nest units.

The call trace is generated in a scenario where core-imc
initialization is made to fail and a cpuhotplug is performed in a p9
system. During cpuhotplug ppc_nest_imc_cpu_offline() tries to access
per_nest_pmu_arr, which is already freed by core-imc.

  NIP [c000000000cb6a94] mutex_lock+0x34/0x90
  LR [c000000000cb6a88] mutex_lock+0x28/0x90
  Call Trace:
    mutex_lock+0x28/0x90 (unreliable)
    perf_pmu_migrate_context+0x90/0x3a0
    ppc_nest_imc_cpu_offline+0x190/0x1f0
    cpuhp_invoke_callback+0x160/0x820
    cpuhp_thread_fun+0x1bc/0x270
    smpboot_thread_fn+0x250/0x290
    kthread+0x1a8/0x1b0
    ret_from_kernel_thread+0x5c/0x74

To address this scenario do the kfree(per_nest_pmu_arr) only in case
of nest-imc initialization failure, and when there is no other nest
units registered.

Fixes: 73ce9aec65b1 ("powerpc/perf: Fix IMC_MAX_PMU macro")
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-13 20:51:22 +11:00
Anju T Sudhakar
ad2b6e0102 powerpc/perf/imc: Fix nest-imc cpuhotplug callback failure
Oops is observed during boot:

  Faulting instruction address: 0xc000000000248340
  cpu 0x0: Vector: 380 (Data Access Out of Range) at [c000000ff66fb850]
      pc: c000000000248340: event_function_call+0x50/0x1f0
      lr: c00000000024878c: perf_remove_from_context+0x3c/0x100
      sp: c000000ff66fbad0
     msr: 9000000000009033
     dar: 7d20e2a6f92d03c0
    pid = 14, comm = cpuhp/0

While registering the cpuhotplug callbacks for nest-imc, if we fail in
the cpuhotplug online path for any random node in a multi node
system (because the opal call to stop nest-imc counters fails for that
node), ppc_nest_imc_cpu_offline() will get invoked for other nodes who
successfully returned from cpuhotplug online path.

This call trace is generated since in the ppc_nest_imc_cpu_offline()
path we are trying to migrate the event context, when nest-imc
counters are not even initialized.

Patch to add a check to ensure that nest-imc is registered before
migrating the event context.

Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-13 20:36:53 +11:00
Ravi Bangoria
f41d84dddc powerpc/perf: Dereference BHRB entries safely
It's theoretically possible that branch instructions recorded in
BHRB (Branch History Rolling Buffer) entries have already been
unmapped before they are processed by the kernel. Hence, trying to
dereference such memory location will result in a crash. eg:

    Unable to handle kernel paging request for data at address 0xd000000019c41764
    Faulting instruction address: 0xc000000000084a14
    NIP [c000000000084a14] branch_target+0x4/0x70
    LR [c0000000000eb828] record_and_restart+0x568/0x5c0
    Call Trace:
    [c0000000000eb3b4] record_and_restart+0xf4/0x5c0 (unreliable)
    [c0000000000ec378] perf_event_interrupt+0x298/0x460
    [c000000000027964] performance_monitor_exception+0x54/0x70
    [c000000000009ba4] performance_monitor_common+0x114/0x120

Fix it by deferefencing the addresses safely.

Fixes: 691231846ceb ("powerpc/perf: Fix setting of "to" addresses for BHRB")
Cc: stable@vger.kernel.org # v3.10+
Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Use probe_kernel_read() which is clearer, tweak change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-13 20:29:20 +11:00
Maciej W. Rozycki
c8c5a3a24d MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses
Complement commit c23b3d1a5311 ("MIPS: ptrace: Change GP regset to use
correct core dump register layout") and also reject outsized
PTRACE_SETREGSET requests to the NT_PRFPREG regset, like with the
NT_PRSTATUS regset.

Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Fixes: c23b3d1a5311 ("MIPS: ptrace: Change GP regset to use correct core dump register layout")
Cc: James Hogan <james.hogan@mips.com>
Cc: Paul Burton <Paul.Burton@mips.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # v3.17+
Patchwork: https://patchwork.linux-mips.org/patch/17930/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-12-12 19:14:12 +01:00
Maciej W. Rozycki
006501e039 MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET
Complement commit d614fd58a283 ("mips/ptrace: Preserve previous
registers for short regset write") and like with the PTRACE_GETREGSET
ptrace(2) request also apply a BUILD_BUG_ON check for the size of the
`elf_fpreg_t' type in the PTRACE_SETREGSET request handler.

Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Fixes: d614fd58a283 ("mips/ptrace: Preserve previous registers for short regset write")
Cc: James Hogan <james.hogan@mips.com>
Cc: Paul Burton <Paul.Burton@mips.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # v4.11+
Patchwork: https://patchwork.linux-mips.org/patch/17929/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-12-12 19:13:42 +01:00
Maciej W. Rozycki
be07a6a118 MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA
Fix a commit 72b22bbad1e7 ("MIPS: Don't assume 64-bit FP registers for
FP regset") public API regression, then activated by commit 1db1af84d6df
("MIPS: Basic MSA context switching support"), that caused the FCSR
register not to be read or written for CONFIG_CPU_HAS_MSA kernel
configurations (regardless of actual presence or absence of the MSA
feature in a given processor) with ptrace(2) PTRACE_GETREGSET and
PTRACE_SETREGSET requests nor recorded in core dumps.

This is because with !CONFIG_CPU_HAS_MSA configurations the whole of
`elf_fpregset_t' array is bulk-copied as it is, which includes the FCSR
in one half of the last, 33rd slot, whereas with CONFIG_CPU_HAS_MSA
configurations array elements are copied individually, and then only the
leading 32 FGR slots while the remaining slot is ignored.

Correct the code then such that only FGR slots are copied in the
respective !MSA and MSA helpers an then the FCSR slot is handled
separately in common code.  Use `ptrace_setfcr31' to update the FCSR
too, so that the read-only mask is respected.

Retrieving a correct value of FCSR is important in debugging not only
for the human to be able to get the right interpretation of the
situation, but for correct operation of GDB as well.  This is because
the condition code bits in FSCR are used by GDB to determine the
location to place a breakpoint at when single-stepping through an FPU
branch instruction.  If such a breakpoint is placed incorrectly (i.e.
with the condition reversed), then it will be missed, likely causing the
debuggee to run away from the control of GDB and consequently breaking
the process of investigation.

Fortunately GDB continues using the older PTRACE_GETFPREGS ptrace(2)
request which is unaffected, so the regression only really hits with
post-mortem debug sessions using a core dump file, in which case
execution, and consequently single-stepping through branches is not
possible.  Of course core files created by buggy kernels out there will
have the value of FCSR recorded clobbered, but such core files cannot be
corrected and the person using them simply will have to be aware that
the value of FCSR retrieved is not reliable.

Which also means we can likely get away without defining a replacement
API which would ensure a correct value of FSCR to be retrieved, or none
at all.

This is based on previous work by Alex Smith, extensively rewritten.

Signed-off-by: Alex Smith <alex@alex-smith.me.uk>
Signed-off-by: James Hogan <james.hogan@mips.com>
Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Fixes: 72b22bbad1e7 ("MIPS: Don't assume 64-bit FP registers for FP regset")
Cc: Paul Burton <Paul.Burton@mips.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # v3.15+
Patchwork: https://patchwork.linux-mips.org/patch/17928/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-12-12 19:13:12 +01:00
Maciej W. Rozycki
80b3ffce01 MIPS: Consistently handle buffer counter with PTRACE_SETREGSET
Update commit d614fd58a283 ("mips/ptrace: Preserve previous registers
for short regset write") bug and consistently consume all data supplied
to `fpr_set_msa' with the ptrace(2) PTRACE_SETREGSET request, such that
a zero data buffer counter is returned where insufficient data has been
given to fill a whole number of FP general registers.

In reality this is not going to happen, as the caller is supposed to
only supply data covering a whole number of registers and it is verified
in `ptrace_regset' and again asserted in `fpr_set', however structuring
code such that the presence of trailing partial FP general register data
causes `fpr_set_msa' to return with a non-zero data buffer counter makes
it appear that this trailing data will be used if there are subsequent
writes made to FP registers, which is going to be the case with the FCSR
once the missing write to that register has been fixed.

Fixes: d614fd58a283 ("mips/ptrace: Preserve previous registers for short regset write")
Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Cc: James Hogan <james.hogan@mips.com>
Cc: Paul Burton <Paul.Burton@mips.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # v4.11+
Patchwork: https://patchwork.linux-mips.org/patch/17927/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-12-12 19:12:23 +01:00
Maciej W. Rozycki
dc24d0edf3 MIPS: Guard against any partial write attempt with PTRACE_SETREGSET
Complement commit d614fd58a283 ("mips/ptrace: Preserve previous
registers for short regset write") and ensure that no partial register
write attempt is made with PTRACE_SETREGSET, as we do not preinitialize
any temporaries used to hold incoming register data and consequently
random data could be written.

It is the responsibility of the caller, such as `ptrace_regset', to
arrange for writes to span whole registers only, so here we only assert
that it has indeed happened.

Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Fixes: 72b22bbad1e7 ("MIPS: Don't assume 64-bit FP registers for FP regset")
Cc: James Hogan <james.hogan@mips.com>
Cc: Paul Burton <Paul.Burton@mips.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # v3.15+
Patchwork: https://patchwork.linux-mips.org/patch/17926/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-12-12 19:11:53 +01:00
Maciej W. Rozycki
a03fe72572 MIPS: Factor out NT_PRFPREG regset access helpers
In preparation to fix a commit 72b22bbad1e7 ("MIPS: Don't assume 64-bit
FP registers for FP regset") FCSR access regression factor out
NT_PRFPREG regset access helpers for the non-MSA and the MSA variants
respectively, to avoid having to deal with excessive indentation in the
actual fix.

No functional change, however use `target->thread.fpu.fpr[0]' rather
than `target->thread.fpu.fpr[i]' for FGR holding type size determination
as there's no `i' variable to refer to anymore, and for the factored out
`i' variable declaration use `unsigned int' rather than `unsigned' as
its type, following the common style.

Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Fixes: 72b22bbad1e7 ("MIPS: Don't assume 64-bit FP registers for FP regset")
Cc: James Hogan <james.hogan@mips.com>
Cc: Paul Burton <Paul.Burton@mips.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # v3.15+
Patchwork: https://patchwork.linux-mips.org/patch/17925/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-12-12 19:11:26 +01:00
Daniel Borkmann
a23f06f06d bpf: fix build issues on um due to mising bpf_perf_event.h
Since c895f6f703ad ("bpf: correct broken uapi for
BPF_PROG_TYPE_PERF_EVENT program type") um (uml) won't build
on i386 or x86_64:

  [...]
    CC      init/main.o
  In file included from ../include/linux/perf_event.h:18:0,
                   from ../include/linux/trace_events.h:10,
                   from ../include/trace/syscall.h:7,
                   from ../include/linux/syscalls.h:82,
                   from ../init/main.c:20:
  ../include/uapi/linux/bpf_perf_event.h:11:32: fatal error:
  asm/bpf_perf_event.h: No such file or directory #include
  <asm/bpf_perf_event.h>
  [...]

Lets add missing bpf_perf_event.h also to um arch. This seems
to be the only one still missing.

Fixes: c895f6f703ad ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Richard Weinberger <richard@sigma-star.at>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Richard Weinberger <richard@sigma-star.at>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-12 09:51:12 -08:00
James Hogan
17278a91e0 MIPS: CPS: Fix r1 .set mt assembler warning
MIPS CPS has a build warning on kernels configured for MIPS32R1 or
MIPS64R1, due to the use of .set mt without a prior .set mips{32,64}r2:

arch/mips/kernel/cps-vec.S Assembler messages:
arch/mips/kernel/cps-vec.S:238: Warning: the `mt' extension requires MIPS32 revision 2 or greater

Add .set MIPS_ISA_LEVEL_RAW before .set mt to silence the warning.

Fixes: 245a7868d2f2 ("MIPS: smp-cps: rework core/VPE initialisation")
Signed-off-by: James Hogan <jhogan@kernel.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <james.hogan@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/17699/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-12-12 17:19:56 +01:00
Jan Beulich
0f3922a9b9 x86/Xen: don't report ancient LAPIC version
Unconditionally reporting a value seen on the P4 or older invokes
functionality like io_apic_get_unique_id() on 32-bit builds, resulting
in a panic() with sufficiently many CPUs and/or IO-APICs. Doing what
that function does would be the hypervisor's responsibility anyway, so
makes no sense to be used when running on Xen. Uniformly report a more
modern version; this shouldn't matter much as both LAPIC and IO-APIC are
being managed entirely / mostly by the hypervisor.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-12-12 09:39:17 -05:00
Will Deacon
0e17cada2a arm64: hw_breakpoint: Use linux/uaccess.h instead of asm/uaccess.h
The only inclusion of asm/uaccess.h should be by linux/uaccess.h. All
other headers should use the latter.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-12 11:53:26 +00:00
Shanker Donthineni
932b50c7c1 arm64: Add software workaround for Falkor erratum 1041
The ARM architecture defines the memory locations that are permitted
to be accessed as the result of a speculative instruction fetch from
an exception level for which all stages of translation are disabled.
Specifically, the core is permitted to speculatively fetch from the
4KB region containing the current program counter 4K and next 4K.

When translation is changed from enabled to disabled for the running
exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
Falkor core may errantly speculatively access memory locations outside
of the 4KB region permitted by the architecture. The errant memory
access may lead to one of the following unexpected behaviors.

1) A System Error Interrupt (SEI) being raised by the Falkor core due
   to the errant memory access attempting to access a region of memory
   that is protected by a slave-side memory protection unit.
2) Unpredictable device behavior due to a speculative read from device
   memory. This behavior may only occur if the instruction cache is
   disabled prior to or coincident with translation being changed from
   enabled to disabled.

The conditions leading to this erratum will not occur when either of the
following occur:
 1) A higher exception level disables translation of a lower exception level
   (e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
 2) An exception level disabling its stage-1 translation if its stage-2
    translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
    to 0 when HCR_EL2[VM] has a value of 1).

To avoid the errant behavior, software must execute an ISB immediately
prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-12 11:45:19 +00:00
Shanker Donthineni
c622cc013c arm64: Define cputype macros for Falkor CPU
Add cputype definition macros for Qualcomm Datacenter Technologies
Falkor CPU in cputype.h. It's unfortunate that the first revision
of the Falkor CPU used the wrong part number 0x800, got fixed in v2
chip with part number 0xC00, and would be used the same value for
future revisions.

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-12 11:45:19 +00:00
Will Deacon
86c9e8126e arm64: mm: Fix false positives in set_pte_at access/dirty race detection
Jiankang reports that our race detection in set_pte_at is firing when
copying the page tables in dup_mmap as a result of a fork(). In this
situation, the page table isn't actually live and so there is no way
that we can race with a concurrent update from the hardware page table
walker.

This patch reworks the race detection so that we require either the
mm to match the current active_mm (i.e. currently installed in our TTBR0)
or the mm_users count to be greater than 1, implying that the page table
could be live in another CPU. The mm_users check might still be racy,
but we'll avoid false positives and it's not realistic to validate that
all the necessary locks are held as part of this assertion.

Cc: Yisheng Xie <xieyisheng1@huawei.com>
Reported-by: Jiankang Chen <chenjiankang1@huawei.com>
Tested-by: Jiankang Chen <chenjiankang1@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-12 11:42:24 +00:00
Linus Torvalds
916b20e02e Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This push fixes the following issues:

   - buffer overread in RSA

   - potential use after free in algif_aead.

   - error path null pointer dereference in af_alg

   - forbid combinations such as hmac(hmac(sha3)) which may crash

   - crash in salsa20 due to incorrect API usage"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: salsa20 - fix blkcipher_walk API usage
  crypto: hmac - require that the underlying hash algorithm is unkeyed
  crypto: af_alg - fix NULL pointer dereference in
  crypto: algif_aead - fix reference counting of null skcipher
  crypto: rsa - fix buffer overread when stripping leading zeroes
2017-12-11 16:32:45 -08:00
Andrey Ryabinin
0a373d4fc2 x86/unwinder/guess: Prevent using CONFIG_UNWINDER_GUESS=y with CONFIG_STACKDEPOT=y
Stackdepot doesn't work well with CONFIG_UNWINDER_GUESS=y.
The 'guess' unwinder generate awfully large and inaccurate stacktraces,
thus stackdepot can't deduplicate stacktraces because they all look like
unique. Eventually stackdepot reaches its capacity limit:

  WARNING: CPU: 0 PID: 545 at lib/stackdepot.c:119 depot_save_stack+0x28e/0x550
  Call Trace:
   ? kasan_kmalloc+0x144/0x160
   ? depot_save_stack+0x1f5/0x550
   ? do_raw_spin_unlock+0xda/0xf0
   ? preempt_count_sub+0x13/0xc0

  <...90 lines...>

   ? do_raw_spin_unlock+0xda/0xf0

Add a STACKDEPOT=n dependency to UNWINDER_GUESS to avoid the problem.

Reported-by: kernel test robot <xiaolong.ye@intel.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171130123554.4330-1-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-11 19:07:07 +01:00
Changbin Du
f79ce87fa4 x86/build: Don't verify mtools configuration file for isoimage
If mtools.conf is not generated before, 'make isoimage' could complain:

  Kernel: arch/x86/boot/bzImage is ready  (#597)
    GENIMAGE arch/x86/boot/image.iso
   *** Missing file: arch/x86/boot/mtools.conf
  arch/x86/boot/Makefile:144: recipe for target 'isoimage' failed

mtools.conf is not used for isoimage generation, so do not check it.

Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 4366d57af1 ("x86/build: Factor out fdimage/isoimage generation commands to standalone script")
Link: http://lkml.kernel.org/r/1512053480-8083-1-git-send-email-changbin.du@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-11 18:55:38 +01:00
Steve Capper
8781bcbc5e arm64: mm: Fix pte_mkclean, pte_mkdirty semantics
On systems with hardware dirty bit management, the ltp madvise09 unit
test fails due to dirty bit information being lost and pages being
incorrectly freed.

This was bisected to:
	arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()

Reverting this commit leads to a separate problem, that the unit test
retains pages that should have been dropped due to the function
madvise_free_pte_range(.) not cleaning pte's properly.

Currently pte_mkclean only clears the software dirty bit, thus the
following code sequence can appear:

	pte = pte_mkclean(pte);
	if (pte_dirty(pte))
		// this condition can return true with HW DBM!

This patch also adjusts pte_mkclean to set PTE_RDONLY thus effectively
clearing both the SW and HW dirty information.

In order for this to function on systems without HW DBM, we need to
also adjust pte_mkdirty to remove the read only bit from writable pte's
to avoid infinite fault loops.

Cc: <stable@vger.kernel.org>
Fixes: 64c26841b349 ("arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()")
Reported-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
Tested-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-11 16:13:10 +00:00