IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
- Expose all invalidation types to the L1
- Reject invvpid instruction, if L1 passed zero vpid value to single
context invalidations
Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit adds missing host CR3 checks. Before entering guest mode, the value
of CR3 is checked for reserved bits. After returning, nested_vmx_load_cr3 is
called to set the new CR3 value and check and load PDPTRs.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Loading CR3 as part of emulating vmentry is different from regular CR3 loads,
as implemented in kvm_set_cr3, in several ways.
* different rules are followed to check CR3 and it is desirable for the caller
to distinguish between the possible failures
* PDPTRs are not loaded if PAE paging and nested EPT are both enabled
* many MMU operations are not necessary
This patch introduces nested_vmx_load_cr3 suitable for CR3 loads as part of
nested vmentry and vmexit, and makes use of it on the nested vmentry path.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
It is possible that prepare_vmcs02 fails to load the guest state. This
patch adds the proper error handling for such a case. L1 will receive
an INVALID_STATE vmexit with the appropriate exit qualification if it
happens.
A failure to set guest CR3 is the only error propagated from prepare_vmcs02
at the moment.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
KVM does not correctly handle L1 hypervisors that emulate L2 real mode with
PAE and EPT, such as Hyper-V. In this mode, the L1 hypervisor populates guest
PDPTE VMCS fields and leaves guest CR3 uninitialized because it is not used
(see 26.3.2.4 Loading Page-Directory-Pointer-Table Entries). KVM always
dereferences CR3 and tries to load PDPTEs if PAE is on. This leads to two
related issues:
1) On the first nested vmentry, the guest PDPTEs, as populated by L1, are
overwritten in ept_load_pdptrs because the registers are believed to have
been loaded in load_pdptrs as part of kvm_set_cr3. This is incorrect. L2 is
running with PAE enabled but PDPTRs have been set up by L1.
2) When L2 is about to enable paging and loads its CR3, we, again, attempt
to load PDPTEs in load_pdptrs called from kvm_set_cr3. There are no guarantees
that this will succeed (it's just a CR3 load, paging is not enabled yet) and
if it doesn't, kvm_set_cr3 returns early without persisting the CR3 which is
then lost and L2 crashes right after it enables paging.
This patch replaces the kvm_set_cr3 call with a simple register write if PAE
and EPT are both on. CR3 is not to be interpreted in this case.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
vmx_set_cr0() modifies GUEST_EFER and "IA-32e mode guest" in the current
VMCS. Call vmx_set_efer() after vmx_set_cr0() so that emulated VM-entry
is more faithful to VMCS12.
This patch correctly causes VM-entry to fail when "IA-32e mode guest" is
1 and GUEST_CR0.PG is 0. Previously this configuration would succeed and
"IA-32e mode guest" would silently be disabled by KVM.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
MSR_IA32_CR{0,4}_FIXED1 define which bits in CR0 and CR4 are allowed to
be 1 during VMX operation. Since the set of allowed-1 bits is the same
in and out of VMX operation, we can generate these MSRs entirely from
the guest's CPUID. This lets userspace avoiding having to save/restore
these MSRs.
This patch also initializes MSR_IA32_CR{0,4}_FIXED1 from the CPU's MSRs
by default. This is a saner than the current default of -1ull, which
includes bits that the host CPU does not support.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning
all CR0 and CR4 bits are allowed to be 1 during VMX operation.
This does not match real hardware, which disallows the high 32 bits of
CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits
which are defined in the SDM but missing according to CPUID). A guest
can induce a VM-entry failure by setting these bits in GUEST_CR0 and
GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are
valid.
Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing
checks on these registers do not verify must-be-0 bits. Fix these checks
to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1.
This patch should introduce no change in behavior in KVM, since these
MSRs are still -1ULL.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
The VMX capability MSRs advertise the set of features the KVM virtual
CPU can support. This set of features varies across different host CPUs
and KVM versions. This patch aims to addresses both sources of
differences, allowing VMs to be migrated across CPUs and KVM versions
without guest-visible changes to these MSRs. Note that cross-KVM-
version migration is only supported from this point forward.
When the VMX capability MSRs are restored, they are audited to check
that the set of features advertised are a subset of what KVM and the
CPU support.
Since the VMX capability MSRs are read-only, they do not need to be on
the default MSR save/restore lists. The userspace hypervisor can set
the values of these MSRs or read them from KVM at VCPU creation time,
and restore the same value after every save/restore.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
The "non-true" VMX capability MSRs can be generated from their "true"
counterparts, by OR-ing the default1 bits. The default1 bits are fixed
and defined in the SDM.
Since we can generate the non-true VMX MSRs from the true versions,
there's no need to store both in struct nested_vmx. This also lets
userspace avoid having to restore the non-true MSRs.
Note this does not preclude emulating MSR_IA32_VMX_BASIC[55]=0. To do so,
we simply need to set all the default1 bits in the true MSRs (such that
the true MSRs and the generated non-true MSRs are equal).
Signed-off-by: David Matlack <dmatlack@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
We can't return both the pass/fail boolean for the vmcs and the upcoming
continue/exit-to-userspace boolean for skip_emulated_instruction out of
nested_vmx_check_vmcs, so move skip_emulated_instruction out of it instead.
Additionally, VMENTER/VMRESUME only trigger singlestep exceptions when
they advance the IP to the following instruction, not when they a) succeed,
b) fail MSR validation or c) throw an exception. Add a separate call to
skip_emulated_instruction that will later not be converted to the variant
that checks the singlestep flag.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
The functions being moved ahead of skip_emulated_instruction here don't
need updated IPs, and skipping the emulated instruction at the end will
make it easier to return its value.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Once skipping the emulated instruction can potentially trigger an exit to
userspace (via KVM_GUESTDBG_SINGLESTEP) kvm_emulate_cpuid will need to
propagate a return value.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
The function is never called under PV guests, and only shows up
when MSI (or MSI-X) cannot be allocated. Convert the message
to include the error value.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Pull x86 fixes from Ingo Molnar:
"Misc fixes: a core dumping crash fix, a guess-unwinder regression fix,
plus three build warning fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/unwind: Fix guess-unwinder regression
x86/build: Annotate die() with noreturn to fix build warning on clang
x86/platform/olpc: Fix resume handler build warning
x86/apic/uv: Silence a shift wrapping warning
x86/coredump: Always use user_regs_struct for compat_elf_gregset_t
I recently encountered wreckage because access_ok() was used where it
should not be, add an explicit WARN when access_ok() is used wrongly.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Lukasz reported that perf stat counters overflow handling is broken on KNL/SLM.
Both these parts have full_width_write set, and that does indeed have
a problem. In order to deal with counter wrap, we must sample the
counter at at least half the counter period (see also the sampling
theorem) such that we can unambiguously reconstruct the count.
However commit:
069e0c3c4058 ("perf/x86/intel: Support full width counting")
sets the sampling interval to the full period, not half.
Fixing that exposes another issue, in that we must not sign extend the
delta value when we shift it right; the counter cannot have
decremented after all.
With both these issues fixed, counter overflow functions correctly
again.
Reported-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Tested-by: Liang, Kan <kan.liang@intel.com>
Tested-by: Odzioba, Lukasz <lukasz.odzioba@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org
Fixes: 069e0c3c4058 ("perf/x86/intel: Support full width counting")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The Knights Mill is enough close to Knights Landing so the path reuses
C-state residency support of the latter.
Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20161201000853.18260-1-piotr.luc@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYRIGyAAoJEHm+PkMAQRiG2ksH/jwMUT9j6glbwESxbn1YTqTM
QcBT5AMc7D0wNuidQe0hWZMtG4RbC+4ZhxzZl2wPgA2gueJ+rBnyX7bgtA7ka8ka
Fdc3u/Q1v38HPzf8iBnxcdCs40VgsoMLjFYCXrpOxuGDNKYzRd+Q8aI2TeGvzbyi
X8+6oAWifBwo2oA06jfcuUncEWbyDDyK9aQksmfKOpjHdb26yELPEhsPOlds1g7E
jYLnvUVnU2CoFaumta+rZQ0kzLdc4Ntu0wEao6WzJuQKsgoID+tS/6iudi8cUhDp
YowGAVoOfr6rAJB0mwrDVfugpamaT3386XKyocdNsK0/jR60UIJ8x+WzvvSU+lY=
=JTBj
-----END PGP SIGNATURE-----
Backmerge tag 'v4.9-rc8' into drm-next
Linux 4.9-rc8
Daniel requested this so we could apply some follow on fixes cleanly to -next.
intel_rdt_sched_in() must be called with preemption disabled because the
function accesses percpu variables (pqr_state and closid).
If a task moves itself via move_myself() preemption is enabled, which
violates the calling convention and can result in incorrect closid
selection when the task gets preempted or migrated.
Add the required protection and a comment about the calling convention.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Marcelo Tosatti" <mtosatti@redhat.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1480625714-54246-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch provides APEI arch-specific bits for ARM64
Meanwhile,
(1) Move HEST type (ACPI_HEST_TYPE_IA32_CORRECTED_CHECK) checking to
a generic place.
(2) Select HAVE_ACPI_APEI when EFI and ACPI is set on ARM64, because
arch_apei_get_mem_attribute is using efi_mem_attributes() on
ARM64.
Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Tested-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Fu Wei <fu.wei@linaro.org>
[ Fu Wei: improve && upstream ]
Acked-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
0-day testing encountered a NULL pointer dereference in a cpumask access
from tsc_store_and_check_tsc_adjust().
This happens when the function is called on the boot CPU and the topology
masks are not yet available due to CPUMASK_OFFSTACK=y.
Add a NULL pointer check for the mask pointer. If NULL it's safe to assume
that the CPU is the boot CPU and the first one in the package.
Fixes: 8b223bc7abe0 ("x86/tsc: Store and check TSC ADJUST MSR")
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is done to simplify the kexec_add_buffer argument list.
Adapt all callers to set up a kexec_buf to pass to kexec_add_buffer.
In addition, change the type of kexec_buf.buffer from char * to void *.
There is no particular reason for it to be a char *, and the change
allows us to get rid of 3 existing casts to char * in the code.
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add the missing return statement to the inline stub
tsc_store_and_check_tsc_adjust() and add the other stubs to make a
SMP=y,TSC=n build happy.
While at it, remove the unused variable from the UP variant of
tsc_store_and_check_tsc_adjust().
Fixes: commit ba75fb646931 ("x86/tsc: Sync test only for the first cpu in a package")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Right now CONFIG_SCHED_MC_PRIO has X86_INTEL_PSTATE as a dependency,
which is not enabled by default and which hides the CONFIG_SCHED_MC_PRIO
hardware-enabling feature.
Select X86_INTEL_PSTATE instead, plus its dependency (CPU_FREQ), if the
user enables CONFIG_SCHED_MC_PRIO=y.
(Also align the CONFIG_SCHED_MC_PRIO Kconfig help text in standard style.)
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: bp@suse.de
Cc: jolsa@redhat.com
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: rjw@rjwysocki.net
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rename CONFIG_SCHED_ITMT for Intel Turbo Boost Max Technology 3.0
to CONFIG_SCHED_MC_PRIO. This makes the configuration extensible
in future to other architectures that wish to similarly establish
CPU core priorities support in the scheduler.
The description in Kconfig is updated to reflect this change with
added details for better clarity. The configuration is explicitly
default-y, to enable the feature on CPUs that have this feature.
It has no effect on non-TBM3 CPUs.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Cc: jolsa@redhat.com
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/2b2ee29d93e3f162922d72d0165a1405864fbb23.1480444902.git.tim.c.chen@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Final 4.10 updates:
- fine-tune fb flushing and tracking (Chris Wilson)
- refactor state check dumper code for more conciseness (Tvrtko)
- roll out dev_priv all over the place (Tvrkto)
- finally remove __i915__ magic macro (Tvrtko)
- more gvt bugfixes (Zhenyu&team)
- better opregion CADL handling (Jani)
- refactor/clean up wm programming (Maarten)
- gpu scheduler + priority boosting for flips as first user (Chris
Wilson)
- make fbc use more atomic (Paulo)
- initial kvm-gvt framework, but not yet complete (Zhenyu&team)
* tag 'drm-intel-next-2016-11-21' of git://anongit.freedesktop.org/git/drm-intel: (127 commits)
drm/i915: Update DRIVER_DATE to 20161121
drm/i915: Skip final clflush if LLC is coherent
drm/i915: Always flush the dirty CPU cache when pinning the scanout
drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error
drm/i915: Check that each request phase is completed before retiring
drm/i915: i915_pages_create_for_stolen should return err ptr
drm/i915: Enable support for nonblocking modeset
drm/i915: Be more careful to drop the GT wakeref
drm/i915: Move frontbuffer CS write tracking from ggtt vma to object
drm/i915: Only dump dp_m2_n2 configuration when drrs is used
drm/i915: don't leak global_timeline
drm/i915: add i915_address_space_fini
drm/i915: Add a few more sanity checks for stolen handling
drm/i915: Waterproof verification of gen9 forcewake table ranges
drm/i915: Introduce enableddisabled helper
drm/i915: Only dump possible panel fitter config for the platform
drm/i915: Only dump scaler config where supported
drm/i915: Compact a few pipe config debug lines
drm/i915: Don't log pipe config kernel pointer and duplicated pipe name
drm/i915: Dump FDI config only where applicable
...
If the first CPU of a package comes online, it is necessary to test whether
the TSC is in sync with a CPU on some other package. When a deviation is
observed (time going backwards between the two CPUs) the TSC is marked
unstable, which is a problem on large machines as they have to fall back to
the HPET clocksource, which is insanely slow.
It has been attempted to compensate the TSC by adding the offset to the TSC
and writing it back some time ago, but this never was merged because it did
not turn out to be stable, especially not on older systems.
Modern systems have become more stable in that regard and the TSC_ADJUST
MSR allows us to compensate for the time deviation in a sane way. If it's
available allow up to three synchronization runs and if a time warp is
detected the starting CPU can compensate the time warp via the TSC_ADJUST
MSR and retry. If the third run still shows a deviation or when random time
warps are detected the test terminally fails.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134018.048237517@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
To allow TSC compensation cross nodes its necessary to know in which
direction the TSC warp was observed. Return the maximum observed value on
the calling CPU so the caller can determine the direction later.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.970859287@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cleaning up the stop marker on the control CPU is wrong when we want to add
retry support. Move the cleanup to the starting CPU.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.892095627@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If the TSC_ADJUST MSR is available all CPUs in a package are forced to the
same value. So TSCs cannot be out of sync when the first CPU in the package
was in sync.
That allows to skip the sync test for all CPUs except the first starting
CPU in a package.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.809901363@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When entering idle, it's a good oportunity to verify that the TSC_ADJUST
MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is
detected, emit a warning and restore it to the previous value.
This is especially important for machines, which mark the TSC reliable
because there is no watchdog clocksource available (SoCs).
This is not sufficient for HPC (NOHZ_FULL) situations where a CPU never
goes idle, but adding a timer to do the check periodically is not an option
either. On a machine, which has this issue, the check triggeres right
during boot, so there is a decent chance that the sysadmin will notice.
Rate limit the check to once per second and warn only once per cpu.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.732180441@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The TSC_ADJUST MSR shows whether the TSC has been modified. This is helpful
in a two aspects:
1) It allows to detect BIOS wreckage, where SMM code tries to 'hide' the
cycles spent by storing the TSC value at SMM entry and restoring it at
SMM exit. On affected machines the TSCs run slowly out of sync up to the
point where the clocksource watchdog (if available) detects it.
The TSC_ADJUST MSR allows to detect the TSC modification before that and
eventually restore it. This is also important for SoCs which have no
watchdog clocksource and therefore TSC wreckage cannot be detected and
acted upon.
2) All threads in a package are required to have the same TSC_ADJUST
value. Broken BIOSes break that and as a result the TSC synchronization
check fails.
The TSC_ADJUST MSR allows to detect the deviation when a CPU comes
online. If detected set it to the value of an already online CPU in the
same package. This also allows to reduce the number of sync tests
because with that in place the test is only required for the first CPU
in a package.
In principle all CPUs in a system should have the same TSC_ADJUST value
even across packages, but with physical CPU hotplug this assumption is
not true because the TSC starts with power on, so physical hotplug has
to do some trickery to bring the TSC into sync with already running
packages, which requires to use an TSC_ADJUST value different from CPUs
which got powered earlier.
A final enhancement is the opportunity to compensate for unsynced TSCs
accross nodes at boot time and make the TSC usable that way. It won't
help for TSCs which run apart due to frequency skew between packages,
but this gets detected by the clocksource watchdog later.
The first step toward this is to store the TSC_ADJUST value of a starting
CPU and compare it with the value of an already online CPU in the same
package. If they differ, emit a warning and adjust it to the reference
value. The !SMP version just stores the boot value for later verification.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.655323776@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If time warps can be observed then they should only ever be observed on one
CPU. If they are observed on both CPUs then the system is completely hosed.
Add a check for this condition and notify if it happens.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.574838461@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The art detection uses rdmsrl_safe() to detect the availablity of the
TSC_ADJUST MSR.
That's pointless because we have a feature bit for this. Use it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.483561692@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Power management suspend/resume tracing (ab)uses the RTC to store
suspend/resume information persistently. As a consequence the RTC value is
clobbered when timekeeping is resumed and tries to inject the sleep time.
Commit a4f8f6667f09 ("timekeeping: Cap array access in timekeeping_debug")
plugged a out of bounds array access in the timekeeping debug code which
was caused by the clobbered RTC value, but we still use the clobbered RTC
value for sleep time injection into kernel timekeeping, which will result
in random adjustments depending on the stored "hash" value.
To prevent this keep track of the RTC clobbering and ignore the invalid RTC
timestamp at resume. If the system resumed successfully clear the flag,
which marks the RTC as unusable, warn the user about the RTC clobber and
recommend to adjust the RTC with 'ntpdate' or 'rdate'.
[jstultz: Fixed up pr_warn formating, and implemented suggestions from Ingo]
[ tglx: Rewrote changelog ]
Originally-from: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/1480372524-15181-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch converts aesni (including fpu) over to the skcipher
interface. The LRW implementation has been removed as the generic
LRW code can now be used directly on top of the accelerated ECB
implementation.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds xts helpers that use the skcipher interface rather
than blkcipher. This will be used by aesni_intel.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When removing a sub directory/rdtgroup by rmdir or umount, closid in a
task in the sub directory is set to default rdtgroup's closid which is 0.
If the task is running on a CPU, the PQR_ASSOC MSR is only updated
when the task runs through a context switch. Up to the context switch,
the task runs with the wrong closid.
Make the change immediately effective by invoking a smp function call on
all CPUs which are running moved task. If one of the affected tasks was
moved or scheduled out before the function call is executed on the CPU the
only damage is the extra interruption of the CPU.
[ tglx: Reworked it to avoid blindly interrupting all CPUs and extra loops ]
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1479511084-59727-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There was a cut & paste error when adding code to update the per-cpu
closid when changing the bitmask of CPUs to an rdt group.
The update erronously assigns the closid of the default group to the CPUs
which are moved to a group instead of assigning the closid of their new
group. Use the proper closid.
Fixes: f410770293a1 ("x86/intel_rdt: Update percpu closid immeditately on CPUs affected by change")
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1479511084-59727-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
asm/mutex.h is gone from the locking tree, which makes sched/core break the build.
Use linux/mutex.h instead, which is the canonical method.
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: bp@suse.de
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In x86's include/asm/Kbuild three entries are appended to the genhdr-y make
variable:
genhdr-y += unistd_32.h
genhdr-y += unistd_64.h
genhdr-y += unistd_x32.h
The same entries are also appended to that variable in
include/uapi/asm/Kbuild. So commit:
10b63956fce7 ("UAPI: Plumb the UAPI Kbuilds into the user header installation and checking")
... removed these three entries from include/asm/Kbuild. But, apparently, some
merge conflict resolution re-added them.
The net effect is, in short, that the genhdr-y make variable contains these
file names twice and, as a consequence, that the corresponding headers get
installed twice. And so the build prints:
INSTALL usr/include/asm/ (65 files)
... while in reality only 62 files are installed in that directory.
Nothing breaks because of all that, but it's a good idea to finally remove
these unneeded entries nevertheless.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1480077707-2837-1-git-send-email-pebolle@tiscali.nl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The make variable KBUILD_CFLAGS contains $(LINUXINCLUDE). But the build
already picks up $(LINUXINCLUDE) from scripts/Makefile.lib. The net effect
is that the (long) list of include directories is used twice.
This is harmless but pointless. So stop using $(LINUXINCLUDE) twice.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1480077514-2586-1-git-send-email-pebolle@tiscali.nl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
My attempt at fixing some KASAN false positive warnings was rather brain
dead, and it broke the guess unwinder. With frame pointers disabled,
/proc/<pid>/stack is broken:
# cat /proc/1/stack
[<ffffffffffffffff>] 0xffffffffffffffff
Restore the code flow to more closely resemble its previous state, while
still using READ_ONCE_NOCHECK() macros to silence KASAN false positives.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: c2d75e03d630 ("x86/unwind: Prevent KASAN false positive warnings in guess unwinder")
Link: http://lkml.kernel.org/r/b824f92c2c22eca5ec95ac56bd2a7c84cf0b9df9.1480309971.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes below warning with clang:
In file included from ../arch/x86/tools/relocs_64.c:17:
../arch/x86/tools/relocs.c:977:6: warning: variable 'do_reloc' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
Signed-off-by: Peter Foley <pefoley2@pefoley.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161126222229.673-1-pefoley2@pefoley.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix:
arch/x86/platform/olpc/olpc-xo15-sci.c:199:12: warning: ‘xo15_sci_resume’
defined but not used [-Wunused-function]
static int xo15_sci_resume(struct device *dev)
^
which I see in randconfig builds here.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161126142706.13602-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>