120444 Commits

Author SHA1 Message Date
Jann Horn
378c6520e7 fs/coredump: prevent fsuid=0 dumps into user-controlled directories
This commit fixes the following security hole affecting systems where
all of the following conditions are fulfilled:

 - The fs.suid_dumpable sysctl is set to 2.
 - The kernel.core_pattern sysctl's value starts with "/". (Systems
   where kernel.core_pattern starts with "|/" are not affected.)
 - Unprivileged user namespace creation is permitted. (This is
   true on Linux >=3.8, but some distributions disallow it by
   default using a distro patch.)

Under these conditions, if a program executes under secure exec rules,
causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
namespace, changes its root directory and crashes, the coredump will be
written using fsuid=0 and a path derived from kernel.core_pattern - but
this path is interpreted relative to the root directory of the process,
allowing the attacker to control where a coredump will be written with
root privileges.

To fix the security issue, always interpret core_pattern for dumps that
are written under SUID_DUMP_ROOT relative to the root directory of init.

Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Andy Lutomirski
f970165bee x86/compat: remove is_compat_task()
x86's is_compat_task always checked the current syscall type, not the
task type.  It has no non-arch users any more, so just remove it to
avoid confusion.

On x86, nothing should really be checking the task ABI.  There are
legitimate users for the syscall ABI and for the mm ABI.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Andy Lutomirski
203f79078f sparc/syscall: fix syscall_get_arch
Sparc's syscall_get_arch was buggy: it returned the task arch, not the
syscall arch.  This could confuse seccomp and audit.

I don't think this is as bad for seccomp as it looks: sparc's 32-bit and
64-bit syscalls are numbered the same.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Andy Lutomirski
069923d87e sparc/compat: provide an accurate in_compat_syscall implementation
On sparc64 compat-enabled kernels, any task can make 32-bit and 64-bit
syscalls.  is_compat_task returns true in 32-bit tasks, which does not
necessarily imply that the current syscall is 32-bit.

Provide an in_compat_syscall implementation that checks whether the
current syscall is compat.

As far as I know, sparc is the only architecture on which is_compat_task
checks the compat status of the task and on which the compat status of a
syscall can differ from the compat status of the task.  On x86,
is_compat_task checks the syscall type, not the task type.

[akpm@linux-foundation.org: add comment, per Sam]
[akpm@linux-foundation.org: update comment, per Andy]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Linus Torvalds
55fc733c7e xen: features and fixes for 4.6-rc0
- Make earlyprintk=xen work for HVM guests.
 - Remove module support for things never built as modules.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW8YhlAAoJEFxbo/MsZsTRiCgH/3GnL93s/BsRTrA/DYykYqA4
 rPPkDd6WMlEbrko6FY7o4IxUAZ50LWU8wjmDNVTT+R/VKlWCVh2+ksxjbH8e0nzr
 0QohAYTdG1VGmhkrwTjKj0wC3Nr8vaDqoBXAFRz4DiQdbobn12wzXjaVrbl8RRNc
 oHDqR+DfZj6ontqx8oFkkTk7CFKIGEOtm/uBhCAV2fNkQSUCzuQyKLlarxM6jq/2
 EnLR1bhnyoy5zNFzXxmaJgZOit8QFMKvPdDRknlp9yDHYJ5zolJkv+6FPiAG6SZs
 A8yAUPQoAD+ekAiV2DIhb8qoNNNdZXdRCFBpoudwX3GyyU4O2P5Intl0xEg17Nk=
 =L8S4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.6-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from David Vrabel:
 "Features and fixes for 4.6:

  - Make earlyprintk=xen work for HVM guests

  - Remove module support for things never built as modules"

* tag 'for-linus-4.6-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  drivers/xen: make platform-pci.c explicitly non-modular
  drivers/xen: make sys-hypervisor.c explicitly non-modular
  drivers/xen: make xenbus_dev_[front/back]end explicitly non-modular
  drivers/xen: make [xen-]ballon explicitly non-modular
  xen: audit usages of module.h ; remove unnecessary instances
  xen/x86: Drop mode-selecting ifdefs in startup_xen()
  xen/x86: Zero out .bss for PV guests
  hvc_xen: make early_printk work with HVM guests
  hvc_xen: fix xenboot for DomUs
  hvc_xen: add earlycon support
2016-03-22 12:55:17 -07:00
Linus Torvalds
b4af7f773e IOMMU Updates for Linux v4.6
This time with:
 
 	* Updates for the Exynos IOMMU driver to make use of default
 	  domains and to add support for the SYSMMU v5
 
 	* New Mediatek IOMMU driver
 
 	* Support for the ARMv7 short descriptor format in the
 	  io-pgtable code
 
 	* Default domain support for the ARM SMMU
 
 	* Couple of other small fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJW7/7rAAoJECvwRC2XARrjKvgP/2sgR6lzIGksKpZRNNNoyJEp
 PbFt3zxBvIPYow6rQtfMqU82FAi6psq+EVKq+M0EOeJrjFGawwWpN9H/e0ZCs5Z/
 /s6DIljRFKrbty59eFsHn57Pd+302Pt0GkwnSgdgBJD7FimozyyeMJnAOs5gPjYT
 jF2ajV9FYa5rIRrMsSD2KjLKgBb3xVsgUlW72NU2WwldnOB6fSsfg4ll01kbzTon
 IQENT5ywk9zZFouLyrX6EvcvowHslO/sZhGe3Py9qOOHpu9roW7EE7rEGYdabn47
 PGpw8O5NOeSrQNzlmhXje5tuKxkh33DV55s7vVcaOy66kWbYExJGoz1/V7Vju4n1
 pok82L3N8eauMs3xqNOiQMV8UsWIXOzdMMaGypM18pCVKMaAUiz9vO9rLSmR4Z20
 IYFiX0yBXhc1AXMnrRlq/xR2WjBX2L2s0VguvYoSssdmJUZ9aKYxsurF8Ylqpm+1
 wymOj+gjM056DqAXcYBVg4ZPOEezRjnUe2qD8lZ4et3xOVUL3LXRi8FmacztEB97
 chUSB5mur/XRy6bOVI2l1uRQaqdfErgbCey0fa9N/SWKSHKWtAfR6CYYVpoR6m0L
 H/xL7yCn6jUEoadKxZyTKnX8GIN6wNcZdI+58OOMz3sjlmWs69wgdPt8Xx2RNHpm
 7caf/9sTdpUeh+V2fySD
 =uiAk
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:

 - updates for the Exynos IOMMU driver to make use of default domains
   and to add support for the SYSMMU v5

 - new Mediatek IOMMU driver

 - support for the ARMv7 short descriptor format in the io-pgtable code

 - default domain support for the ARM SMMU

 - couple of other small fixes all over the place

* tag 'iommu-updates-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits)
  iommu/ipmmu-vmsa: Add r8a7795 DT binding
  iommu/mediatek: Check for NULL instead of IS_ERR()
  iommu/io-pgtable-armv7s: Fix kmem_cache_alloc() flags
  iommu/mediatek: Fix handling of of_count_phandle_with_args result
  iommu/dma: Fix NEED_SG_DMA_LENGTH dependency
  iommu/mediatek: Mark PM functions as __maybe_unused
  iommu/mediatek: Select ARM_DMA_USE_IOMMU
  iommu/exynos: Use proper readl/writel register interface
  iommu/exynos: Pointers are nto physical addresses
  dts: mt8173: Add iommu/smi nodes for mt8173
  iommu/mediatek: Add mt8173 IOMMU driver
  memory: mediatek: Add SMI driver
  dt-bindings: mediatek: Add smi dts binding
  dt-bindings: iommu: Add binding for mediatek IOMMU
  iommu/ipmmu-vmsa: Use ARCH_RENESAS
  iommu/exynos: Support multiple attach_device calls
  iommu/exynos: Add Maintainers entry for Exynos SYSMMU driver
  iommu/exynos: Add support for v5 SYSMMU
  iommu/exynos: Update device tree documentation
  iommu/exynos: Add support for SYSMMU controller with bogus version reg
  ...
2016-03-22 11:57:43 -07:00
Paolo Bonzini
a6adb10622 KVM: page_track: fix access to NULL slot
This happens when doing the reboot test from virt-tests:

[  131.833653] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  131.842461] IP: [<ffffffffa0950087>] kvm_page_track_is_active+0x17/0x60 [kvm]
[  131.850500] PGD 0
[  131.852763] Oops: 0000 [#1] SMP
[  132.007188] task: ffff880075fbc500 ti: ffff880850a3c000 task.ti: ffff880850a3c000
[  132.138891] Call Trace:
[  132.141639]  [<ffffffffa092bd11>] page_fault_handle_page_track+0x31/0x40 [kvm]
[  132.149732]  [<ffffffffa093380f>] paging64_page_fault+0xff/0x910 [kvm]
[  132.172159]  [<ffffffffa092c734>] kvm_mmu_page_fault+0x64/0x110 [kvm]
[  132.179372]  [<ffffffffa06743c2>] handle_exception+0x1b2/0x430 [kvm_intel]
[  132.187072]  [<ffffffffa067a301>] vmx_handle_exit+0x1e1/0xc50 [kvm_intel]
...

Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: 3d0c27ad6ee465f174b09ee99fcaf189c57d567a
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 17:27:28 +01:00
Paolo Bonzini
0af574be32 KVM: PPC: do not compile in vfio.o unconditionally
Build on 32-bit PPC fails with the following error:

 int kvm_vfio_ops_init(void)
      ^
 In file included from arch/powerpc/kvm/../../../virt/kvm/vfio.c:21:0:
 arch/powerpc/kvm/../../../virt/kvm/vfio.h:8:90: note: previous definition of ‘kvm_vfio_ops_init’ was here
 arch/powerpc/kvm/../../../virt/kvm/vfio.c:292:6: error: redefinition of ‘kvm_vfio_ops_exit’
 void kvm_vfio_ops_exit(void)
             ^
 In file included from arch/powerpc/kvm/../../../virt/kvm/vfio.c:21:0:
 arch/powerpc/kvm/../../../virt/kvm/vfio.h:12:91: note: previous definition of ‘kvm_vfio_ops_exit’ was here
 scripts/Makefile.build:258: recipe for target arch/powerpc/kvm/../../../virt/kvm/vfio.o failed
 make[3]: *** [arch/powerpc/kvm/../../../virt/kvm/vfio.o] Error 1

Check whether CONFIG_KVM_VFIO is set before including vfio.o
in the build.

Reported-by: Pranith Kumar <bobby.prani@gmail.com>
Tested-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:38 +01:00
Rik van Riel
9db284f303 kvm, rt: change async pagefault code locking for PREEMPT_RT
The async pagefault wake code can run from the idle task in exception
context, so everything here needs to be made non-preemptible.

Conversion to a simple wait queue and raw spinlock does the trick.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:38 +01:00
Lan Tianyu
489153c746 KVM/PPC: update the comment of memory barrier in the kvmppc_prepare_to_enter()
The barrier also orders the write to mode from any reads
to the page tables done and so update the comment.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:37 +01:00
Lan Tianyu
0f127d12e4 KVM/x86: update the comment of memory barrier in the vcpu_enter_guest()
The barrier also orders the write to mode from any reads
to the page tables done and so update the comment.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:35 +01:00
Lan Tianyu
7bfdf21778 KVM/x86: Call smp_wmb() before increasing tlbs_dirty
Update spte before increasing tlbs_dirty to make sure no tlb flush
in lost after spte is zapped. This pairs with the barrier in the
kvm_flush_remote_tlbs().

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:32 +01:00
Lan Tianyu
36ca7e0a57 KVM/x86: Replace smp_mb() with smp_store_mb/release() in the walk_shadow_page_lockless_begin/end()
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:29 +01:00
Lan Tianyu
9753f52915 KVM: Remove redundant smp_mb() in the kvm_mmu_commit_zap_page()
There is already a barrier inside of kvm_flush_remote_tlbs() which can
help to make sure everyone sees our modifications to the page tables and
see changes to vcpu->mode here. So remove the smp_mb in the
kvm_mmu_commit_zap_page() and update the comment.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:27 +01:00
Huaitong Han
b9baba8614 KVM, pkeys: expose CPUID/CR4 to guest
X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation:
CPUID.7.0.ECX[3]:PKU. X86_FEATURE_OSPKE is software support for pkeys,
enumerated with CPUID.7.0.ECX[4]:OSPKE, and it reflects the setting of
CR4.PKE(bit 22).

This patch disables CPUID:PKU without ept, because pkeys is not yet
implemented for shadow paging.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:17 +01:00
Huaitong Han
be94f6b710 KVM, pkeys: add pkeys support for permission_fault
Protection keys define a new 4-bit protection key field (PKEY) in bits
62:59 of leaf entries of the page tables, the PKEY is an index to PKRU
register(16 domains), every domain has 2 bits(write disable bit, access
disable bit).

Static logic has been produced in update_pkru_bitmask, dynamic logic need
read pkey from page table entries, get pkru value, and deduce the correct
result.

[ Huaitong: Xiao helps to modify many sections. ]

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:23:37 +01:00
Huaitong Han
2d344105f5 KVM, pkeys: introduce pkru_mask to cache conditions
PKEYS defines a new status bit in the PFEC. PFEC.PK (bit 5), if some
conditions is true, the fault is considered as a PKU violation.
pkru_mask indicates if we need to check PKRU.ADi and PKRU.WDi, and
does cache some conditions for permission_fault.

[ Huaitong: Xiao helps to modify many sections. ]

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:21:06 +01:00
Xiao Guangrong
1be0e61c1f KVM, pkeys: save/restore PKRU when guest/host switches
Currently XSAVE state of host is not restored after VM-exit and PKRU
is managed by XSAVE so the PKRU from guest is still controlling the
memory access even if the CPU is running the code of host. This is
not safe as KVM needs to access the memory of userspace (e,g QEMU) to
do some emulation.

So we save/restore PKRU when guest/host switches.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:21:06 +01:00
Xiao Guangrong
9e90199c25 x86: pkey: introduce write_pkru() for KVM
KVM will use it to switch pkru between guest and host.

CC: Ingo Molnar <mingo@redhat.com>
CC: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:21:05 +01:00
Huaitong Han
17a511f878 KVM, pkeys: add pkeys support for xsave state
This patch adds pkeys support for xsave state.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:21:05 +01:00
Huaitong Han
ddba262891 KVM, pkeys: disable pkeys for guests in non-paging mode
Pkeys is disabled if CPU is in non-paging mode in hardware. However KVM
always uses paging mode to emulate guest non-paging, mode with TDP. To
emulate this behavior, pkeys needs to be manually disabled when guest
switches to non-paging mode.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:21:04 +01:00
Huaitong Han
e0b18ef718 KVM: x86: remove magic number with enum cpuid_leafs
This patch removes magic number with enum cpuid_leafs.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:21:04 +01:00
Paolo Bonzini
f13577e8aa KVM: MMU: return page fault error code from permission_fault
This will help in the implementation of PKRU, where the PK bit of the page
fault error code cannot be computed in advance (unlike I/D, R/W and U/S).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:20:54 +01:00
Alexey Kardashevskiy
31217db720 KVM: PPC: Create a virtual-mode only TCE table handlers
Upcoming in-kernel VFIO acceleration needs different handling in real
and virtual modes which makes it hard to support both modes in
the same handler.

This creates a copy of kvmppc_rm_h_stuff_tce and kvmppc_rm_h_put_tce
in addition to the existing kvmppc_rm_h_put_tce_indirect.

This also fixes linker breakage when only PR KVM was selected (leaving
HV KVM off): the kvmppc_h_put_tce/kvmppc_h_stuff_tce functions
would not compile at all and the linked would fail.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 12:02:51 +01:00
Paolo Bonzini
ef697a712a KVM: VMX: fix nested vpid for old KVM guests
Old KVM guests invoke single-context invvpid without actually checking
whether it is supported.  This was fixed by commit 518c8ae ("KVM: VMX:
Make sure single type invvpid is supported before issuing invvpid
instruction", 2010-08-01) and the patch after, but pre-2.6.36
kernels lack it including RHEL 6.

Reported-by: jmontleo@redhat.com
Tested-by: jmontleo@redhat.com
Cc: stable@vger.kernel.org
Fixes: 99b83ac893b84ed1a62ad6d1f2b6cc32026b9e85
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 12:02:46 +01:00
Paolo Bonzini
f6870ee9e5 KVM: VMX: avoid guest hang on invalid invvpid instruction
A guest executing an invalid invvpid instruction would hang
because the instruction pointer was not updated.

Reported-by: jmontleo@redhat.com
Tested-by: jmontleo@redhat.com
Cc: stable@vger.kernel.org
Fixes: 99b83ac893b84ed1a62ad6d1f2b6cc32026b9e85
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 12:02:42 +01:00
Paolo Bonzini
2849eb4f99 KVM: VMX: avoid guest hang on invalid invept instruction
A guest executing an invalid invept instruction would hang
because the instruction pointer was not updated.

Cc: stable@vger.kernel.org
Fixes: bfd0a56b90005f8c8a004baf407ad90045c2b11e
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 12:02:38 +01:00
Ludovic Desroches
b02acd4e62 ARM: dts: at91: sama5d4 Xplained: don't disable hsmci regulator
If enabling the hsmci regulator on card detection, the board can reboot
on sd card insertion. Keeping the regulator always enabled fixes this
issue.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Fixes: 8d545f32bd77 ("ARM: at91/dt: sama5d4 xplained: add regulators for v(q)mmc1 supplies")
Cc: stable@vger.kernel.org #4.3 and later
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2016-03-22 11:54:32 +01:00
Ludovic Desroches
ae3fc8ea08 ARM: dts: at91: sama5d3 Xplained: don't disable hsmci regulator
If enabling the hsmci regulator on card detection, the board can reboot
on sd card insertion. Keeping the regulator always enabled fixes this
issue.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Fixes: 1b53e3416dd0 ("ARM: at91/dt: sama5d3 xplained: add fixed regulator for vmmc0")
Cc: stable@vger.kernel.org #4.3 and later
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2016-03-22 11:54:26 +01:00
Linus Torvalds
2c856e14da arm[64] perf updates for 4.6:
- Initial support for ARMv8.1 CPU PMUs
 
 - Support for the CPU PMU in Cavium ThunderX
 
 - CPU PMU support for systems running 32-bit Linux in secure mode
 
 - Support for the system PMU in ARM CCI-550 (Cache Coherent Interconnect)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJW794rAAoJELescNyEwWM0O5IH/0ejoUjip3n4dFZnSzAbQQZe
 VxCy3DXW5gS8YaswwX2dFw9K772/BpHlazq8AIJIhaR+b+Zzl5t0iOc12HluDilV
 pMvi0JTCxwJhsEiKZnP0cVAU9HM6MAgtMOEegkd/YNESKQey30NeDtIcz/pQfTUV
 28AF71+w5VPj/1EpHEEhHQsASRIx7eDbKzThzdlb8PnDS0o23QJhL9HjVTNIAlB8
 BGxrUBKtBu0eH2Hx33vNjc7UYx1WZQlCk5cAaXevA8mbFXzYaMQI2Cel2nbNMO9i
 eu5zPkDUCG7dq16PxK6IgM4AsDCtmmDuckLdN6UEQWYxkLbb2qHNRKtj0bKB8Sk=
 =E4PE
 -----END PGP SIGNATURE-----

Merge tag 'arm64-perf' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm[64] perf updates from Will Deacon:
 "I have another mixed bag of ARM-related perf patches here.

  It's about 25% CPU and 75% interconnect, but with drivers/bus/
  languishing without an obvious maintainer or tree, Olof and I agreed
  to keep all of these PMU patches together.  I suspect a whole load of
  code from drivers/bus/arm-* can be moved under drivers/perf/, so
  that's on the radar for the future.

  Summary:

   - Initial support for ARMv8.1 CPU PMUs

   - Support for the CPU PMU in Cavium ThunderX

   - CPU PMU support for systems running 32-bit Linux in secure mode

   - Support for the system PMU in ARM CCI-550 (Cache Coherent Interconnect)"

* tag 'arm64-perf' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (26 commits)
  drivers/perf: arm_pmu: avoid NULL dereference when not using devicetree
  arm64: perf: Extend ARMV8_EVTYPE_MASK to include PMCR.LC
  arm-cci: remove unused variable
  arm-cci: don't return value from void function
  arm-cci: make private functions static
  arm-cci: CoreLink CCI-550 PMU driver
  arm-cci500: Rearrange PMU driver for code sharing with CCI-550 PMU
  arm-cci: CCI-500: Work around PMU counter writes
  arm-cci: Provide hook for writing to PMU counters
  arm-cci: Add helper to enable PMU without synchornising counters
  arm-cci: Add routines to save/restore all counters
  arm-cci: Get the status of a counter
  arm-cci: write_counter: Remove redundant check
  arm-cci: Delay PMU counter writes to pmu::pmu_enable
  arm-cci: Refactor CCI PMU enable/disable methods
  arm-cci: Group writes to counter
  arm-cci: fix handling cpumask_any_but return value
  arm-cci: simplify sysfs attr handling
  drivers/perf: arm_pmu: implement CPU_PM notifier
  arm64: dts: Add Cavium ThunderX specific PMU
  ...
2016-03-21 13:14:16 -07:00
Linus Torvalds
d34687ab97 ARC updates for 4.6-rc1
- Big Endian io accessors fix [Lada]
 - Spellos fixes [Adam]
 - Fix for DW GMAC breakage [Alexey]
 - Making DMA API 64-bit ready
 - Shutting up -Wmaybe-uninitialized noise for ARC
 - Other minor fixes here and there, comments update
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW738lAAoJEGnX8d3iisJeProP/icm32aIHY0QXmJCBXCmQfLa
 HHzfBeJ2KsG8pIRgrvraK3FJkmFr+WxZ7x6b5hPNYeHIT3c179/GZ3DlssM1md0u
 sa50o5jmwd/J4o5jCKpUB/hx7wiAjpC2CYb6qIg39A2Nq5JhOFJV30XMbCscXkLI
 ae/o8oATi1502cf1OQ2EqNWKfME4ogG1KsEUNrSzcd+1P8LZxsnEVBmXuPHVdHLw
 kTHVgmCELsEchaV/QY9pY+uHkm9Y4vV18v0vqbklwED+cHkjmXQ2UysP3/J8KXKN
 PVSqmtUJIS2vxDGK5mWvz6jkWmU8gRXoT14ZqdmMARmhVhp3+JTm2fQ53NUwZ+b2
 JpPNGWVQRi86AaiUE8Fm+eWjC242CAm+lsBfx+mvqWpEvFGMlnRKw8oZiyeJhhIw
 3M1yrulQG7QbTSuQrgQwfGqtrhl2nnq+X0uoMJXYHupNDQ42QK8wmJ9bT7cmutD0
 K3Tmi84qoiSnN/HhWK/D9d60bLGvUY4RKiLjAcJz7lbMjtRhT/rpFFcFYCIhJyZs
 y//jOZK67o1ecDXBTaUcvT+edOrQVsmatn3w0p9VwATe8OiKHsLA/0UD34gwiECy
 o9g/i4tc2GfOLFoLv66czXTU9IuoKDh3HrTJgET7r1Re/+FKgJ+2+GX6AbiJzbhY
 9jsAAI/ZpsS6qMhvSz3d
 =n0fk
 -----END PGP SIGNATURE-----

Merge tag 'arc-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC architecture updates from Vineet Gupta:
 - Big Endian io accessors fix [Lada]
 - Spellos fixes [Adam]
 - Fix for DW GMAC breakage [Alexey]
 - Making DMA API 64-bit ready
 - Shutting up -Wmaybe-uninitialized noise for ARC
 - Other minor fixes here and there, comments update

* tag 'arc-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (21 commits)
  ARCv2: ioremap: Support dynamic peripheral address space
  ARC: dma: reintroduce platform specific dma<->phys
  ARC: dma: ioremap: use phys_addr_t consistenctly in code paths
  ARC: dma: pass_phys() not sg_virt() to cache ops
  ARC: dma: non-coherent pages need V-P mapping if in HIGHMEM
  ARC: dma: Use struct page based page allocator helpers
  ARC: build: Turn off -Wmaybe-uninitialized for ARC gcc 4.8
  ARC: [plat-axs10x] add Ethernet PHY description in .dts
  arc: use of_platform_default_populate() to populate default bus
  ARC: thp: unbork !CONFIG_TRANSPARENT_HUGEPAGE build
  arc: [plat-nsimosci*] use ezchip network driver
  ARCv2: LLSC: software backoff is NOT needed starting HS2.1c
  ARC: mm: Use virt_to_pfn() for addr >> PAGE_SHIFT pattern
  ARC: [plat-nsim] document ranges
  ARC: build: Better way to detect ISA compatible toolchain
  ARCv2: Allow enabling PAE40 w/o HIGHMEM
  ARC: [BE] readl()/writel() to work in Big Endian CPU configuration
  ARC: [*defconfig] No need to specify CONFIG_CROSS_COMPILE
  ARC: [BE] Select correct CROSS_COMPILE prefix
  ARC: bitops: Remove non relevant comments
  ...
2016-03-21 13:00:46 -07:00
Paul Gortmaker
59aa56bf2a xen: audit usages of module.h ; remove unnecessary instances
Code that uses no modular facilities whatsoever should not be
sourcing module.h at all, since that header drags in a bunch
of other headers with it.

Similarly, code that is not explicitly using modular facilities
like module_init() but only is declaring module_param setup
variables should be using moduleparam.h and not the larger
module.h file for that.

In making this change, we also uncover an implicit use of BUG()
in inline fcns within arch/arm/include/asm/xen/hypercall.h so
we explicitly source <linux/bug.h> for that file now.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-03-21 15:13:32 +00:00
Joerg Roedel
70cf769c5b Merge branches 'arm/rockchip', 'arm/exynos', 'arm/smmu', 'arm/mediatek', 'arm/io-pgtable', 'arm/renesas' and 'core' into next 2016-03-21 14:58:47 +01:00
Catalin Marinas
a6cdf1c08c kvm: arm64: Disable compiler instrumentation for hypervisor code
With the recent rewrite of the arm64 KVM hypervisor code in C, enabling
certain options like KASAN would allow the compiler to generate memory
accesses or function calls to addresses not mapped at EL2. This patch
disables the compiler instrumentation on the arm64 hypervisor code for
gcov-based profiling (GCOV_KERNEL), undefined behaviour sanity checker
(UBSAN) and kernel address sanitizer (KASAN).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: <stable@vger.kernel.org> # 4.5+
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-03-21 14:02:17 +01:00
Catalin Marinas
f09f1bacfe arm64: Split pr_notice("Virtual kernel memory layout...") into multiple pr_cont()
The printk() implementation has a limit of LOG_LINE_MAX (== 1024 - 32)
buffer per call which the arm64 mem_init() breaches when printing the
virtual memory layout with CONFIG_KASAN enabled. The result is that the
last line is no longer printed. This patch splits the call into a
pr_notice() + additional pr_cont() calls.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
2016-03-21 12:19:12 +00:00
Kefeng Wang
cc7c0cda8f arm64: drop unused __local_flush_icache_all()
After commit 65da0a8e34a8 ("arm64: use non-global mappings for UEFI
runtime regions"), nobody use __local_flush_icache_all() anymore,
so drop it.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-03-21 12:10:03 +00:00
Mark Rutland
b90b4a608e arm64: fix KASLR boot-time I-cache maintenance
Commit f80fb3a3d50843a4 ("arm64: add support for kernel ASLR") missed a
DSB necessary to complete I-cache maintenance in the primary boot path,
and hence stale instructions may still be present in the I-cache and may
be executed until the I-cache maintenance naturally completes.

Since commit 8ec41987436d566f ("arm64: mm: ensure patched kernel text is
fetched from PoU"), all CPUs invalidate their I-caches after their MMU
is enabled. Prior a CPU's MMU having been enabled, arbitrary lines may
have been fetched from the PoC into I-caches. We never patch text
expected to be executed with the MMU off. Thus, it is unnecessary to
perform broadcast I-cache maintenance in the primary boot path.

This patch reduces the scope of the I-cache maintenance to the local
CPU, and adds the missing DSB with similar scope, matching prior
maintenance in the primary boot path.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesehvuel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-03-21 12:08:50 +00:00
Ard Biesheuvel
b660950c60 arm64/kernel: fix incorrect EL0 check in inv_entry macro
The implementation of macro inv_entry refers to its 'el' argument without
the required leading backslash, which results in an undefined symbol
'el' to be passed into the kernel_entry macro rather than the index of
the exception level as intended.

This undefined symbol strangely enough does not result in build failures,
although it is visible in vmlinux:

     $ nm -n vmlinux |head
                      U el
     0000000000000000 A _kernel_flags_le_hi32
     0000000000000000 A _kernel_offset_le_hi32
     0000000000000000 A _kernel_size_le_hi32
     000000000000000a A _kernel_flags_le_lo32
     .....

However, it does result in incorrect code being generated for invalid
exceptions taken from EL0, since the argument check in kernel_entry
assumes EL1 if its argument does not equal '0'.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-03-21 12:05:34 +00:00
Srinivas Pandruvada
7b0fd56930 perf/x86/intel/rapl: Add missing Broadwell models
Added Broadwell-H and Broadwell-Server.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: bp@alien8.de
Link: http://lkml.kernel.org/r/1458517938-25308-1-git-send-email-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 11:16:19 +01:00
Kan Liang
cb2252522a perf/x86/intel/uncore: Remove ev_sel_ext bit support for PCU
The ev_sel_ext in PCU_MSR_PMON_CTL is locked on some CPU models, so despite
it being documented in the SDM, if we write 1 to that bit then we can get a #GP
fault.

Which #GP the perf fuzzer happily triggered in Peter Zijlstra's testing.

Also, there are no public events which use that bit, so remove ev_sel_ext
bit support for PCU.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1458500301-3594-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 11:16:19 +01:00
Marc Zyngier
2510ffe17f arm64: KVM: Turn kvm_ksym_ref into a NOP on VHE
When running with VHE, there is no need to translate kernel pointers
to the EL2 memory space, since we're already there (and we have a much
saner memory map to start with).

Unfortunately, kvm_ksym_ref is getting in the way, and the first
call into the "hypervisor" section is going to end up in fireworks,
since we're now branching into nowhereland. Meh.

A potential solution is to test if VHE is engaged or not, and only
perform the translation in the negative case. With this in place,
VHE is able to run again.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-03-21 10:47:18 +01:00
Eric Auger
898f949fb7 KVM: arm/arm64: disable preemption when calling smp_call_function_many
Preemption must be disabled when calling smp_call_function_many

Reported-by: bartosz.wawrzyniak@tieto.com
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-03-21 10:45:22 +01:00
Huang Rui
c7ab62bfbe perf/x86/amd/power: Add AMD accumulated power reporting mechanism
Introduce an AMD accumlated power reporting mechanism for the Family
15h, Model 60h processor that can be used to calculate the average
power consumed by a processor during a measurement interval. The
feature support is indicated by CPUID Fn8000_0007_EDX[12].

This feature will be implemented both in hwmon and perf. The current
design provides one event to report per package/processor power
consumption by counting each compute unit power value.

Here the gory details of how the computation is done:

* Tsample: compute unit power accumulator sample period
* Tref: the PTSC counter period (PTSC: performance timestamp counter)
* N: the ratio of compute unit power accumulator sample period to the
  PTSC period

* Jmax: max compute unit accumulated power which is indicated by
  MSR_C001007b[MaxCpuSwPwrAcc]

* Jx/Jy: compute unit accumulated power which is indicated by
  MSR_C001007a[CpuSwPwrAcc]

* Tx/Ty: the value of performance timestamp counter which is indicated
  by CU_PTSC MSR_C0010280[PTSC]
* PwrCPUave: CPU average power

i. Determine the ratio of Tsample to Tref by executing CPUID Fn8000_0007.
	N = value of CPUID Fn8000_0007_ECX[CpuPwrSampleTimeRatio[15:0]].

ii. Read the full range of the cumulative energy value from the new
    MSR MaxCpuSwPwrAcc.
	Jmax = value returned.

iii. At time x, software reads CpuSwPwrAcc and samples the PTSC.
	Jx = value read from CpuSwPwrAcc and Tx = value read from PTSC.

iv. At time y, software reads CpuSwPwrAcc and samples the PTSC.
	Jy = value read from CpuSwPwrAcc and Ty = value read from PTSC.

v. Calculate the average power consumption for a compute unit over
time period (y-x). Unit of result is uWatt:

	if (Jy < Jx) // Rollover has occurred
		Jdelta = (Jy + Jmax) - Jx
	else
		Jdelta = Jy - Jx
	PwrCPUave = N * Jdelta * 1000 / (Ty - Tx)

Simple example:

  root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' make -j4
    CHK     include/config/kernel.release
    CHK     include/generated/uapi/linux/version.h
    CHK     include/generated/utsrelease.h
    CHK     include/generated/timeconst.h
    CHK     include/generated/bounds.h
    CHK     include/generated/asm-offsets.h
    CALL    scripts/checksyscalls.sh
    CHK     include/generated/compile.h
    SKIPPED include/generated/compile.h
    Building modules, stage 2.
  Kernel: arch/x86/boot/bzImage is ready  (#40)
    MODPOST 4225 modules

   Performance counter stats for 'system wide':

              183.44 mWatts power/power-pkg/

       341.837270111 seconds time elapsed

  root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' sleep 10

   Performance counter stats for 'system wide':

                0.18 mWatts power/power-pkg/

        10.012551815 seconds time elapsed

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: jacob.w.shin@gmail.com
Link: http://lkml.kernel.org/r/1457502306-2559-1-git-send-email-ray.huang@amd.com
[ Fixed the modular build. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:37:15 +01:00
Huang Rui
01fe03ff1c x86/cpufeature, perf/x86: Add AMD Accumulated Power Mechanism feature flag
AMD CPU family 15h model 0x60 introduces a mechanism for measuring
accumulated power. It is used to report the processor power consumption
and support for it is indicated by CPUID Fn8000_0007_EDX[12].

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wan Zongshun <Vincent.Wan@amd.com>
Cc: spg_linux_kernel@amd.com
Link: http://lkml.kernel.org/r/1452739808-11871-4-git-send-email-ray.huang@amd.com
[ Resolved conflict and moved the synthetic CPUID slot to 19. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:35:29 +01:00
Suravee Suthikulpanit
f8519155b4 perf/x86/amd: Add support for new IOMMU performance events
This patch adds new IOMMU performance event based on
the information in table 74 of the AMD I/O Virtualization Technology
(IOMMU) Specification (Document Id: 4882, Rev 2.62, Feb 2015)

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Cc: <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://support.amd.com/TechDocs/48882_IOMMU.pdf
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:35:28 +01:00
Huang Rui
8dfeae0d73 perf/x86/amd: Move nodes_per_socket into bsp_init_amd()
nodes_per_socket is static and it needn't be initialized many
times during every CPU core init. So move its initialization into
bsp_init_amd().

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: spg_linux_kernel@amd.com
Link: http://lkml.kernel.org/r/1452739808-11871-2-git-send-email-ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:22 +01:00
Peter Zijlstra
27348f382b perf/x86/cqm: Factor out some common code
Having the same code twice (and once quite ugly) is fragile.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:22 +01:00
Vikas Shivappa
e7ee3e8cb5 perf/x86/mbm: Add support for MBM counter overflow handling
This patch adds a per package timer which periodically updates the
memory bandwidth counters for the events that are currently active.

Current patch has a periodic timer every 1s since the SDM guarantees
that the counter will not overflow in 1s but this time can be definitely
improved by calibrating on the system. The overflow is really a function
of the max memory b/w that the socket can support, max counter value and
scaling factor.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/013b756c5006b1c4ca411f3ecf43ed52f19fbf87.1457723885.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:21 +01:00
Vikas Shivappa
2d4de8376f perf/x86/mbm: Implement RMID recycling
RMID could be allocated or deallocated as part of RMID recycling.

When an RMID is allocated for MBM event, the MBM counter needs to be
initialized because next time we read the counter we need the previous
value to account for total bytes that went to the memory controller.

Similarly, when RMID is deallocated we need to update the ->count
variable.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1457652732-4499-6-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:20 +01:00
Tony Luck
87f01cc2a2 perf/x86/mbm: Add memory bandwidth monitoring event management
Includes all the core infrastructure to measure the total_bytes and
bandwidth.

We have per socket counters for both total system wide L3 external
bytes and local socket memory-controller bytes. The OS does MSR writes
to MSR_IA32_QM_EVTSEL and MSR_IA32_QM_CTR to read the counters and
uses the IA32_PQR_ASSOC_MSR to associate the RMID with the task. The
tasks have a common RMID for CQM (cache quality of service monitoring)
and MBM. Hence most of the scheduling code is reused from CQM.

Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Restructured rmid_read to not have an obvious hole, removed MBM_CNTR_MAX as its unused. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/abd7aac9a18d93b95b985b931cf258df0164746d.1457723885.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:20 +01:00