IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
[ Upstream commit 028c221ed1904af9ac3c5162ee98f48966de6b3d ]
AMD systems provide a "NodeId" value that represents a global ID
indicating to which "Node" a logical CPU belongs. The "Node" is a
physical structure equivalent to a Die, and it should not be confused
with logical structures like NUMA nodes. Logical nodes can be adjusted
based on firmware or other settings whereas the physical nodes/dies are
fixed based on hardware topology.
The NodeId value can be used when a physical ID is needed by software.
Save the AMD NodeId to struct cpuinfo_x86.cpu_die_id. Use the value
from CPUID or MSR as appropriate. Default to phys_proc_id otherwise.
Do so for both AMD and Hygon systems.
Drop the node_id parameter from cacheinfo_*_init_llc_id() as it is no
longer needed.
Update the x86 topology documentation.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201109210659.754018-2-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 9d4747d02376aeb8de38afa25430de79129c5799 upstream.
When both KVM support and the CCP driver are built into the kernel instead
of as modules, KVM initialization can happen before CCP initialization. As
a result, sev_platform_status() will return a failure when it is called
from sev_hardware_setup(), when this isn't really an error condition.
Since sev_platform_status() doesn't need to be called at this time anyway,
remove the invocation from sev_hardware_setup().
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <618380488358b56af558f2682203786f09a49483.1607620209.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 39485ed95d6b83b62fa75c06c2c4d33992e0d971 upstream.
Until commit e7c587da1252 ("x86/speculation: Use synthetic bits for
IBRS/IBPB/STIBP"), KVM was testing both Intel and AMD CPUID bits before
allowing the guest to write MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD.
Testing only Intel bits on VMX processors, or only AMD bits on SVM
processors, fails if the guests are created with the "opposite" vendor
as the host.
While at it, also tweak the host CPU check to use the vendor-agnostic
feature bit X86_FEATURE_IBPB, since we only care about the availability
of the MSR on the host here and not about specific CPUID bits.
Fixes: e7c587da1252 ("x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP")
Cc: stable@vger.kernel.org
Reported-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8129cd958b395575e5543ce25a8434874b04d3a upstream.
The cycle count of a timed LBR is always 1 in perf record -D.
The cycle count is stored in the first 16 bits of the IA32_LBR_x_INFO
register, but the get_lbr_cycles() return Boolean type.
Use u16 to replace the Boolean type.
Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20201125213720.15692-2-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 46b72e1bf4fc571da0c29c6fb3e5b2a2107a4c26 upstream.
According to the event list from icelake_core_v1.09.json, the encoding
of the RTM_RETIRED.ABORTED event on Ice Lake should be,
"EventCode": "0xc9",
"UMask": "0x04",
"EventName": "RTM_RETIRED.ABORTED",
Correct the wrong encoding.
Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20201125213720.15692-1-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 306e3e91edf1c6739a55312edd110d298ff498dd upstream.
The event CYCLE_ACTIVITY.STALLS_MEM_ANY (0x14a3) should be available on
all 8 GP counters on ICL, but it's only scheduled on the first four
counters due to the current ICL constraint table.
Add a line for the CYCLE_ACTIVITY.STALLS_MEM_ANY event in the ICL
constraint table.
Correct the comments for the CYCLE_ACTIVITY.CYCLES_MEM_ANY event.
Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support")
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20201019164529.32154-1-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 57efa1fe5957694fa541c9062de0a127f0b9acb0 ]
Since commit 70e806e4e645 ("mm: Do early cow for pinned pages during
fork() for ptes") pages under a FOLL_PIN will not be write protected
during COW for fork. This means that pages returned from
pin_user_pages(FOLL_WRITE) should not become write protected while the pin
is active.
However, there is a small race where get_user_pages_fast(FOLL_PIN) can
establish a FOLL_PIN at the same time copy_present_page() is write
protecting it:
CPU 0 CPU 1
get_user_pages_fast()
internal_get_user_pages_fast()
copy_page_range()
pte_alloc_map_lock()
copy_present_page()
atomic_read(has_pinned) == 0
page_maybe_dma_pinned() == false
atomic_set(has_pinned, 1);
gup_pgd_range()
gup_pte_range()
pte_t pte = gup_get_pte(ptep)
pte_access_permitted(pte)
try_grab_compound_head()
pte = pte_wrprotect(pte)
set_pte_at();
pte_unmap_unlock()
// GUP now returns with a write protected page
The first attempt to resolve this by using the write protect caused
problems (and was missing a barrrier), see commit f3c64eda3e50 ("mm: avoid
early COW write protect games during fork()")
Instead wrap copy_p4d_range() with the write side of a seqcount and check
the read side around gup_pgd_range(). If there is a collision then
get_user_pages_fast() fails and falls back to slow GUP.
Slow GUP is safe against this race because copy_page_range() is only
called while holding the exclusive side of the mmap_lock on the src
mm_struct.
[akpm@linux-foundation.org: coding style fixes]
Link: https://lore.kernel.org/r/CAHk-=wi=iCnYCARbPGjkVJu9eyYeZ13N64tZYLdOB8CP5Q_PLw@mail.gmail.com
Link: https://lkml.kernel.org/r/2-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com
Fixes: f3c64eda3e50 ("mm: avoid early COW write protect games during fork()")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: "Ahmed S. Darwish" <a.darwish@linutronix.de> [seqcount_t parts]
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78ff2733ff352175eb7f4418a34654346e1b6cd2 ]
Fix to restore BTF if single-stepping causes a page fault and
it is cancelled.
Usually the BTF flag was restored when the single stepping is done
(in resume_execution()). However, if a page fault happens on the
single stepping instruction, the fault handler is invoked and
the single stepping is cancelled. Thus, the BTF flag is not
restored.
Fixes: 1ecc798c6764 ("x86: debugctlmsr kprobes")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/160389546985.106936.12727996109376240993.stgit@devnote2
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 15af36596ae305aefc8c502c2d3e8c58221709eb ]
Commit
c9c6d216ed28 ("x86/mce: Rename "first" function as "early"")
changed the enumeration of MCE notifier priorities. Correct the check
for notifier priorities to cover the new range.
[ bp: Rewrite commit message, remove superfluous brackets in
conditional. ]
Fixes: c9c6d216ed28 ("x86/mce: Rename "first" function as "early"")
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201106141216.2062-2-thunder.leizhen@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 26573a97746c7a99f394f9d398ce91a8853b3b89 ]
Currently, Linux as a hypervisor guest will enable x2apic only if there are
no CPUs present at boot time with an APIC ID above 255.
Hotplugging a CPU later with a higher APIC ID would result in a CPU which
cannot be targeted by external interrupts.
Add a filter in x2apic_apic_id_valid() which can be used to prevent such
CPUs from coming online, and allow x2apic to be enabled even if they are
present at boot time.
Fixes: ce69a784504 ("x86/apic: Enable x2APIC without interrupt remapping under KVM")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-2-dwmw2@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1fcd009102ee02e217f2e7635ab65517d785da8e ]
Commit
ea3b5e60ce80 ("x86/mm/ident_map: Add 5-level paging support")
added ident_p4d_init() to support 5-level paging, but this function
doesn't check and return errors from ident_pud_init().
For example, the decompressor stub uses this code to create an identity
mapping. If it runs out of pages while trying to allocate a PMD
pagetable, the error will be currently ignored.
Fix this to propagate errors.
[ bp: Space out statements for better readability. ]
Fixes: ea3b5e60ce80 ("x86/mm/ident_map: Add 5-level paging support")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20201027230648.1885111-1-nivedita@alum.mit.edu
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit e14fd4ba8fb47fcf5f244366ec01ae94490cd86a upstream.
When a split lock is detected always make sure to disable interrupts
before returning from the trap handler.
The kernel exit code assumes that all exits run with interrupts
disabled, otherwise the SWAPGS sequence can race against interrupts and
cause recursing page faults and later panics.
The problem will only happen on CPUs with split lock disable
functionality, so Icelake Server, Tiger Lake, Snow Ridge, Jacobsville.
Fixes: ca4c6a9858c2 ("x86/traps: Make interrupt enable/disable symmetric in C code")
Fixes: bce9b042ec73 ("x86/traps: Disable interrupts in exc_aligment_check()") # v5.8+
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- Correct a few problems in the x86 and the generic membarrier
implementation. Small corrections for assumptions about visibility
which have turned out not to be true.
- Make the PAT bits for memory encryption correct vs. 4K and 2M/1G page
table entries as they are at a different location.
- Fix a concurrency issue in the the local bandwidth readout of resource
control leading to incorrect values
- Fix the ordering of allocating a vector for an interrupt. The order
missed to respect the provided cpumask when the first attempt of
allocating node local in the mask fails. It then tries the node instead
of trying the full provided mask first. This leads to erroneous error
messages and breaking the (user) supplied affinity request. Reorder it.
- Make the INT3 padding detection in optprobe work correctly.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/WOw0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodUhEACW02Al+mlpZjOhzIGwNUALGX7YGEOi
wJoiIDTV2vOtKhcJ/AOVn05CznoWXa6PN1+7pm0nR2M1XaiXZ3B6HMjy1+0rpSqK
5dCCEqr+QowwMRsKd4hnuJowR/7Og66FiYIcknDJg/1egg+RXy7yKds577W6/KW0
2VV/x35xDywERiQ28qIftQ4L7NREyiioomN/GFpeoQf8tQF06Rb4t12T5pdA6D8S
/C1wj0IKVRMJo/7HbB/X6skxPOK0PWAEBpaT0+Q66VKSNnFVN5ap+rGTKdUoTmuM
NdxHOukpHyFCQtbRPOjRUeeSZJLKSX5oXsBO1GUvyyxcT2XNlTOdNamYLWyO+RfV
uP97qhzYDKL+cgDkwypJ0WOzOb9EIXOh4P9BTnJFBGhc4EQwen3cpb3CyWWftjnv
/obXiRDnAOE6P6H2AGiwrK3Pny+SvgrFYKMe+iy+ntToz1yrDh7ZrZ0DQKVoFWEr
n3qUnlPZmVvRzHIRVYoK69nS/UgmNN0LssavzRBzab3BcK93f23QkW86P42kNCLa
9kL+ZgGwlpqUZZR/p3pHq9Mv2ZXGEdUYY99h2vy1fgMwOw/RQZnPDAj/131UXlsU
4DL0mAlTK1/0/81H6/V1GDBkMkO+hN4x6Y3asi7bPHEKEzlLe+P1tUt3YELd3CEC
dc8ebICG1InXsw==
=t31y
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of x86 and membarrier fixes:
- Correct a few problems in the x86 and the generic membarrier
implementation. Small corrections for assumptions about visibility
which have turned out not to be true.
- Make the PAT bits for memory encryption correct vs 4K and 2M/1G
page table entries as they are at a different location.
- Fix a concurrency issue in the the local bandwidth readout of
resource control leading to incorrect values
- Fix the ordering of allocating a vector for an interrupt. The order
missed to respect the provided cpumask when the first attempt of
allocating node local in the mask fails. It then tries the node
instead of trying the full provided mask first. This leads to
erroneous error messages and breaking the (user) supplied affinity
request. Reorder it.
- Make the INT3 padding detection in optprobe work correctly"
* tag 'x86-urgent-2020-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kprobes: Fix optprobe to detect INT3 padding correctly
x86/apic/vector: Fix ordering in vector assignment
x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
x86/mm/mem_encrypt: Fix definition of PMD_FLAGS_DEC_WP
membarrier: Execute SYNC_CORE on the calling thread
membarrier: Explicitly sync remote cores when SYNC_CORE is requested
membarrier: Add an actual barrier before rseq_preempt()
x86/membarrier: Get rid of a dubious optimization
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/UDHQUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMGeQf9EtGft5U5EihqAbNr2O61Bh4ptCIT
+qNWWfuGQkKLsP6PCHMUJnNI3WJy2/Gb5+nUHjFXSEZBP2l3KGRuDniAdm4+DyEi
2khVmJiXYn2q2yfodmpHA/dqav3OHSrsq2IfH+J+WAFlIHnjkdz3Wk1zNFk7Y/xv
PVv2czvXhsnrvHvNp5e1+YsVGkMZc9fwXLRbac7ptmaKUKCBAgpZO8Gkc2GGgOdE
zUDp3qA8/7Ys+vzzYfPrRMUhev9dgE4x2TBmtOuzqOcfj2FOKRbKbwjur37fJ61j
Px4F2ZI0GEL0RrHvZK1vZ5KO41BcD+gQPumKAg1Lgz312loKj85RG8nBEQ==
=BJ9g
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Bugfixes for ARM, x86 and tools"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
tools/kvm_stat: Exempt time-based counters
KVM: mmu: Fix SPTE encoding of MMIO generation upper half
kvm: x86/mmu: Use cpuid to determine max gfn
kvm: svm: de-allocate svm_cpu_data for all cpus in svm_cpu_uninit()
selftests: kvm/set_memory_region_test: Fix race in move region test
KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
KVM: arm64: Fix handling of merging tables into a block entry
KVM: arm64: Fix memory leak on stage2 update of a valid PTE
Commit
7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes")
changed the padding bytes between functions from NOP to INT3. However,
when optprobe decodes a target function it finds INT3 and gives up the
jump optimization.
Instead of giving up any INT3 detection, check whether the rest of the
bytes to the end of the function are INT3. If all of them are INT3,
those come from the linker. In that case, continue the optprobe jump
optimization.
[ bp: Massage commit message. ]
Fixes: 7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes")
Reported-by: Adam Zabrocki <pi3@pi3.com.pl>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160767025681.3880685.16021570341428835411.stgit@devnote2
Commit cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling")
cleaned up the computation of MMIO generation SPTE masks, however it
introduced a bug how the upper part was encoded:
SPTE bits 52-61 were supposed to contain bits 10-19 of the current
generation number, however a missing shift encoded bits 1-10 there instead
(mostly duplicating the lower part of the encoded generation number that
then consisted of bits 1-9).
In the meantime, the upper part was shrunk by one bit and moved by
subsequent commits to become an upper half of the encoded generation number
(bits 9-17 of bits 0-17 encoded in a SPTE).
In addition to the above, commit 56871d444bc4 ("KVM: x86: fix overlap between SPTE_MMIO_MASK and generation")
has changed the SPTE bit range assigned to encode the generation number and
the total number of bits encoded but did not update them in the comment
attached to their defines, nor in the KVM MMU doc.
Let's do it here, too, since it is too trivial thing to warrant a separate
commit.
Fixes: cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <156700708db2a5296c5ed7a8b9ac71f1e9765c85.1607129096.git.maciej.szmigiero@oracle.com>
Cc: stable@vger.kernel.org
[Reorganize macros so that everything is computed from the bit ranges. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Prarit reported that depending on the affinity setting the
' irq $N: Affinity broken due to vector space exhaustion.'
message is showing up in dmesg, but the vector space on the CPUs in the
affinity mask is definitely not exhausted.
Shung-Hsi provided traces and analysis which pinpoints the problem:
The ordering of trying to assign an interrupt vector in
assign_irq_vector_any_locked() is simply wrong if the interrupt data has a
valid node assigned. It does:
1) Try the intersection of affinity mask and node mask
2) Try the node mask
3) Try the full affinity mask
4) Try the full online mask
Obviously #2 and #3 are in the wrong order as the requested affinity
mask has to take precedence.
In the observed cases #1 failed because the affinity mask did not contain
CPUs from node 0. That made it allocate a vector from node 0, thereby
breaking affinity and emitting the misleading message.
Revert the order of #2 and #3 so the full affinity mask without the node
intersection is tried before actually affinity is broken.
If no node is assigned then only the full affinity mask and if that fails
the full online mask is tried.
Fixes: d6ffc6ac83b1 ("x86/vector: Respect affinity mask in irq descriptor")
Reported-by: Prarit Bhargava <prarit@redhat.com>
Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87ft4djtyp.fsf@nanos.tec.linutronix.de
The MBA software controller (mba_sc) is a feedback loop which
periodically reads MBM counters and tries to restrict the bandwidth
below a user-specified value. It tags along the MBM counter overflow
handler to do the updates with 1s interval in mbm_update() and
update_mba_bw().
The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:
rdtgroup_mondata_show()
mon_event_read()
mon_event_count()
__mon_event_count()
In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.
When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated from incorrect m->chunks is shown to
the user.
Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.
Test steps:
# Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
&& make
./tools/membw/membw -c 0 -b 10000 --read
# Enable MBA software controller
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
# Create control group c1
mkdir /sys/fs/resctrl/c1
# Set MB throttle to 6 GB/s
echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata
# Write PID of the workload to tasks file
echo `pidof membw` > /sys/fs/resctrl/c1/tasks
# Read local bytes counters twice with 1s interval, the calculated
# local bandwidth is not as expected (approaching to 6 GB/s):
local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
sleep 1
local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
echo "local b/w (bytes/s):" `expr $local_2 - $local_1`
Before fix:
local b/w (bytes/s): 11076796416
After fix:
local b/w (bytes/s): 5465014272
Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1607063279-19437-1-git-send-email-xiaochen.shen@intel.com
The PAT bit is in different locations for 4k and 2M/1G page table
entries.
Add a definition for _PAGE_LARGE_CACHE_MASK to represent the three
caching bits (PWT, PCD, PAT), similar to _PAGE_CACHE_MASK for 4k pages,
and use it in the definition of PMD_FLAGS_DEC_WP to get the correct PAT
index for write-protected pages.
Fixes: 6ebcb060713f ("x86/mm: Add support to encrypt the kernel in-place")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20201111160946.147341-1-nivedita@alum.mit.edu
sync_core_before_usermode() had an incorrect optimization. If the kernel
returns from an interrupt, it can get to usermode without IRET. It just has
to schedule to a different task in the same mm and do SYSRET. Fortunately,
there were no callers of sync_core_before_usermode() that could have had
in_irq() or in_nmi() equal to true, because it's only ever called from the
scheduler.
While at it, clarify a related comment.
Fixes: 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/5afc7632be1422f91eaf7611aaaa1b5b8580a086.1607058304.git.luto@kernel.org
- Make the AMD L3 QoS code and data priorization enable/disable mechanism
work correctly. The control bit was only set/cleared on one of the CPUs
in a L3 domain, but it has to be modified on all CPUs in the domain. The
initial documentation was not clear about this, but the updated one from
Oct 2020 spells it out.
- Fix an off by one in the UV platform detection code which causes the UV
hubs to be identified wrongly. The chip revisions start at 1 not at 0.
- Fix a long standing bug in the evaluation of prefixes in the uprobes
code which fails to handle repeated prefixes properly. The aggregate
size of the prefixes can be larger than the bytes array but the code
blindly iterated over the aggregate size beyond the array boundary.
Add a macro to handle this case properly and use it at the affected
places.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/M2GoTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoUGeD/0TxQyVIZTJjY/ExDYvQZeSWgvFVon9
A/QcwkgtkWzYKI3YThgxBtSNKhHPP8eQ9QK8ErcaQKEHU5TdXVEOwHgTm1A6OGat
0+M9/EWEe7tTu+cLpu7esQ+1VbSvEcFXZbljbhCKrShlBlsIjFG4eCPAprSw9yjI
RSgfXKZu4NmHVS6nfJTwmIIaTeLQ6U3b7b1D5s/66slBFScqnLbRNhABVbHbos4F
pl/lxDCFOddy2YbEojHjjGqMA7oxPav7c0nYFOM/zG+wAqfEjbqOxReT31bGQPi2
XT9K4JEqDqILo0KnhV4GsYoWAhes3BtmsJ9IoZ7IijsMriYl80mD9URAORidJ4PX
28Ckk9V/DlE8uDrAnBDcWDSoKlg78mhVV7V9L6v43teg/gJfSZNROtNDBmqRmwG4
Op2NJfzJITtaxVQuSZRkSs8rzGv+QUfaM1sBUQ+Oz4KYeIjjA7G2MAOECrzIAWKB
GWc5toYRVS6oGT+RbZhSxZYoh8ASoGJ2MrL8K4OV4RqEqHHcXcih0WmmljtsDIFI
td4FHHH6fghIb9S6iYKiApd6k2qKa33mwJwa/xZOoIrv0w5xT0WDJnxT60gu/Mec
YDkqhmA009CNSD2G4oNRNF5MH7gp34UII+25jOGatbVh+5DDPYs+5Jnh/DR7jssR
PryAG9ER7UUb6w==
=rNBq
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Make the AMD L3 QoS code and data priorization enable/disable
mechanism work correctly.
The control bit was only set/cleared on one of the CPUs in a L3
domain, but it has to be modified on all CPUs in the domain. The
initial documentation was not clear about this, but the updated one
from Oct 2020 spells it out.
- Fix an off by one in the UV platform detection code which causes
the UV hubs to be identified wrongly.
The chip revisions start at 1 not at 0.
- Fix a long standing bug in the evaluation of prefixes in the
uprobes code which fails to handle repeated prefixes properly.
The aggregate size of the prefixes can be larger than the bytes
array but the code blindly iterated over the aggregate size beyond
the array boundary. Add a macro to handle this case properly and
use it at the affected places"
* tag 'x86-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev-es: Use new for_each_insn_prefix() macro to loop over prefixes bytes
x86/insn-eval: Use new for_each_insn_prefix() macro to loop over prefixes bytes
x86/uprobes: Do not use prefixes.nbytes when looping over prefixes.bytes
x86/platform/uv: Fix UV4 hub revision adjustment
x86/resctrl: Fix AMD L3 QOS CDP enable/disable
- Add recursion protection to another callchain invoked from
x86_pmu_stop() which can recurse back into x86_pmu_stop(). The first
attempt to fix this missed this extra code path.
- Use the already filtered status variable to check for PEBS counter
overflow bits and not the unfiltered full status read from
IA32_PERF_GLOBAL_STATUS which can have unrelated bits check which
would be evaluated incorrectly.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/M1lwTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoeKPEACo/xfvFgp/rlSIjh12OW7XTyKqKMVq
C8TvMFbzn3Pzx89RyJctyyi1d5xYz6ic3uXkV0yGJ1g9fkJdzhgCY/byoTd/nWor
/r6RRtw5Ah3werXlESbZWTb6691eqGO4TpWZMAwdDeV1eRAc7ZSoOLJiY0o4HVWd
AE2aX9/psikHBAeMzMHn5W4pELl0s6s6aX9OJLFFZeroA3UW3bWOHu8o/iylmld9
wnQPvGe+kk35ZBza8/kbBJv3ZasUBPXy1+9Z1p1aE8dmd+HrYnw9vS3El9dgtX1Y
ADKttAD3S3j/PbBOXOnIRoSTM7Vvc5gviLyPfw6erBIMtw/OBFSrRD7IbVJpyZmn
yysBgacu6pzeLXRZ4ZktQ2GPeAgdiZwsqwuvuQGd5t1zfdNkZEEI7aK87CrHCYOk
zMt7vNK/q5xF9wt9C3cYzka/Sp7UA10zqP31/fv2fOZb4dcANTtm9reS/fwqd67X
HYhYiks4z1wxqTOf76wBpXau9ZtQLb20MmC3S6IknDF39pODO1sGcWMJSB0AgYUE
3125ZTxrmlgv+U7S5qBHKB5ot5wvNUexiDD0g0o2N9x4dCQqErmvwUDWWUoqq35A
g4tqODudwM4aU05leWO/piZEBr4xxAiRke9r8WJZNHAo0C7yHjjmXRtZYDkSf2pV
U0gyy4dWTsXKKw==
=YbCo
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Two fixes for performance monitoring on X86:
- Add recursion protection to another callchain invoked from
x86_pmu_stop() which can recurse back into x86_pmu_stop(). The
first attempt to fix this missed this extra code path.
- Use the already filtered status variable to check for PEBS counter
overflow bits and not the unfiltered full status read from
IA32_PERF_GLOBAL_STATUS which can have unrelated bits check which
would be evaluated incorrectly"
* tag 'perf-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Check PEBS status correctly
perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS
- Move -Wcast-align to W=3, which tends to be false-positive and there
is no tree-wide solution.
- Pass -fmacro-prefix-map to KBUILD_CPPFLAGS because it is a preprocessor
option and makes sense for .S files as well.
- Disable -gdwarf-2 for Clang's integrated assembler to avoid warnings.
- Disable --orphan-handling=warn for LLD 10.0.1 to avoid warnings.
- Fix undesirable line breaks in *.mod files.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl/MzyMVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGKJ8P/2kLq296XAPjqC90/LWMja8dsXO/
Wgaq8zC819x0JFuGdBKlwlFe3AvFYRtts9V5+mzjxvsOjH/6+xzyrXjRPCwZYqlj
XKC3ZwuS2SGDPFCriI1edwTUp5tyDnG/VBjqbf3ybQnz0LAShidXBD9IlM/XX9Rz
BlWqd7Uib50Pq8AfM2JVokrSmkkvhqxocIsmjTa0wvRjRAw7+aVkGNCWXqnTho7y
YuHmTWbmUQIROF3Bzs1fkGp+qaQofPRfA1tTwaTVvgmt8rEqyzXi11y6kj56INfg
/pq4O1KrplKtJFdrcjj4/eptqHG3I+Jq56qCHVescF6+bH6cc6BUL8qDdAzFZQai
e/pWCzREqFDKchEmT2d0Uzik8Zfxi5Cw68Otpzb4LqTUUxXSoRx1R9Of/Ei5QZum
6b6s9Q41UwH983UQCOOSGjXGZYP6fZG1a0XejbduYo7TL4KEECAO/FlLBWGttYH3
0i3aKz3aDKb/fo7hDbbqg+o6F0mShEraqxMmWgIvgGt+k76j0O0wS2KryqpTd7Vv
xg72suGM7f9QBA50lZ0r32fm86XnlqwQAm9ZMaSXR1Ii7j4F9UNRmR/FUYq7dPwa
COkuHr+9LqzV/tkluWi2rjLIGPaCuEVeSCcQ/wIDdp2iOyb54CbozwK0Yi2dxxus
jVFKwSaMUDHrkSj6
=/ysh
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Move -Wcast-align to W=3, which tends to be false-positive and there
is no tree-wide solution.
- Pass -fmacro-prefix-map to KBUILD_CPPFLAGS because it is a
preprocessor option and makes sense for .S files as well.
- Disable -gdwarf-2 for Clang's integrated assembler to avoid warnings.
- Disable --orphan-handling=warn for LLD 10.0.1 to avoid warnings.
- Fix undesirable line breaks in *.mod files.
* tag 'kbuild-fixes-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: avoid split lines in .mod files
kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1
kbuild: Hoist '--orphan-handling' into Kconfig
Kbuild: do not emit debug info for assembly with LLVM_IAS=1
kbuild: use -fmacro-prefix-map for .S sources
Makefile.extrawarn: move -Wcast-align to W=3
Since insn.prefixes.nbytes can be bigger than the size of
insn.prefixes.bytes[] when a prefix is repeated, the proper
check must be:
insn.prefixes.bytes[i] != 0 and i < 4
instead of using insn.prefixes.nbytes. Use the new
for_each_insn_prefix() macro which does it correctly.
Debugged by Kees Cook <keescook@chromium.org>.
[ bp: Massage commit message. ]
Fixes: 25189d08e516 ("x86/sev-es: Add support for handling IOIO exceptions")
Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/160697106089.3146288.2052422845039649176.stgit@devnote2
Since insn.prefixes.nbytes can be bigger than the size of
insn.prefixes.bytes[] when a prefix is repeated, the proper check must
be
insn.prefixes.bytes[i] != 0 and i < 4
instead of using insn.prefixes.nbytes. Use the new
for_each_insn_prefix() macro which does it correctly.
Debugged by Kees Cook <keescook@chromium.org>.
[ bp: Massage commit message. ]
Fixes: 32d0b95300db ("x86/insn-eval: Add utility functions to get segment selector")
Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160697104969.3146288.16329307586428270032.stgit@devnote2
Since insn.prefixes.nbytes can be bigger than the size of
insn.prefixes.bytes[] when a prefix is repeated, the proper check must
be
insn.prefixes.bytes[i] != 0 and i < 4
instead of using insn.prefixes.nbytes.
Introduce a for_each_insn_prefix() macro for this purpose. Debugged by
Kees Cook <keescook@chromium.org>.
[ bp: Massage commit message, sync with the respective header in tools/
and drop "we". ]
Fixes: 2b1444983508 ("uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints")
Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160697103739.3146288.7437620795200799020.stgit@devnote2
In the TDP MMU, use shadow_phys_bits to dermine the maximum possible GFN
mapped in the guest for zapping operations. boot_cpu_data.x86_phys_bits
may be reduced in the case of HW features that steal HPA bits for other
purposes. However, this doesn't necessarily reduce GPA space that can be
accessed via TDP. So zap based on a maximum gfn calculated with MAXPHYADDR
retrieved from CPUID. This is already stored in shadow_phys_bits, so use
it instead of x86_phys_bits.
Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Message-Id: <20201203231120.27307-1-rick.p.edgecombe@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The cpu arg for svm_cpu_uninit() was previously ignored resulting in the
per cpu structure svm_cpu_data not being de-allocated for all cpus.
Signed-off-by: Jacob Xu <jacobhxu@google.com>
Message-Id: <20201203205939.1783969-1-jacobhxu@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The kernel cannot disambiguate when 2+ PEBS counters overflow at the
same time. This is what the comment for this code suggests. However,
I see the comparison is done with the unfiltered p->status which is a
copy of IA32_PERF_GLOBAL_STATUS at the time of the sample. This
register contains more than the PEBS counter overflow bits. It also
includes many other bits which could also be set.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201126110922.317681-2-namhyung@kernel.org
The commit 3966c3feca3f ("x86/perf/amd: Remove need to check "running"
bit in NMI handler") introduced this. It seems x86_pmu_stop can be
called recursively (like when it losts some samples) like below:
x86_pmu_stop
intel_pmu_disable_event (x86_pmu_disable)
intel_pmu_pebs_disable
intel_pmu_drain_pebs_nhm (x86_pmu_drain_pebs_buffer)
x86_pmu_stop
While commit 35d1ce6bec13 ("perf/x86/intel/ds: Fix x86_pmu_stop
warning for large PEBS") fixed it for the normal cases, there's
another path to call x86_pmu_stop() recursively when a PEBS error was
detected (like two or more counters overflowed at the same time).
Like in the Kan's previous fix, we can skip the interrupt accounting
for large PEBS, so check the iregs which is set for PMI only.
Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
Reported-by: John Sperbeck <jsperbeck@google.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201126110922.317681-1-namhyung@kernel.org
When the AMD QoS feature CDP (code and data prioritization) is enabled
or disabled, the CDP bit in MSR 0000_0C81 is written on one of the CPUs
in an L3 domain (core complex). That is not correct - the CDP bit needs
to be updated on all the logical CPUs in the domain.
This was not spelled out clearly in the spec earlier. The specification
has been updated and the updated document, "AMD64 Technology Platform
Quality of Service Extensions Publication # 56375 Revision: 1.02 Issue
Date: October 2020" is available now. Refer the section: Code and Data
Prioritization.
Fix the issue by adding a new flag arch_has_per_cpu_cfg in rdt_cache
data structure.
The documentation can be obtained at:
https://developer.amd.com/wp-content/resources/56375.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
[ bp: Massage commit message. ]
Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/160675180380.15628.3309402017215002347.stgit@bmoger-ubuntu
Currently, '--orphan-handling=warn' is spread out across four different
architectures in their respective Makefiles, which makes it a little
unruly to deal with in case it needs to be disabled for a specific
linker version (in this case, ld.lld 10.0.1).
To make it easier to control this, hoist this warning into Kconfig and
the main Makefile so that disabling it is simpler, as the warning will
only be enabled in a couple places (main Makefile and a couple of
compressed boot folders that blow away LDFLAGS_vmlinx) and making it
conditional is easier due to Kconfig syntax. One small additional
benefit of this is saving a call to ld-option on incremental builds
because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN.
To keep the list of supported architectures the same, introduce
CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to
gain this automatically after all of the sections are specified and size
asserted. A special thanks to Kees Cook for the help text on this
config.
Link: https://github.com/ClangBuiltLinux/linux/issues/1187
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
idle path. Similar to the entry path the low level idle functions have to
be non-instrumentable.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/DpAUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoXSLD/9klc0YimnEnROW6Q5Svb2IcyIutmXF
bOIY1bYYoKILOBj3wyvDUhmdMuq5zh7H9yG11hO8MaVVWVQcLcOMLdHTYm9dcdmF
xQk33+xqjuhRShB+nEmC9ayYtWogtH6W6uZ6WDtF9ZltMKU85n5ddGJ/Fvo+HoCb
NbOdHGJdJ3/3ZCeHnxOnxM+5/GwjkBuccTV/tXmb3yXrfU9DBySyQ4/UchcpF43w
LcEb0kiQbpZsBTByKJOQV8+RR654S0sILlvRwVXpmj94vrgGwhlVk1/9rz7tkOhF
ksoo1mTVu75LMt22G/hXxE63787yRvFdHjapf0+kCOAuhl992NK+xlGDH8o9DXcu
9y73D4bI0HnDFs20w6vs20iLvxECJiYHJqlgR5ZwFUToceaNgtiYr8kzuD7Zbae1
KG2E7BuNSwHWMtf97fGn44GZknPEOaKdDn4Wv6/bvKHxLm77qe11RKF70Stcz2AI
am13KmQzzsHGF5qNWwpElRUxSdxfJMR66RnOdTQULGrRedaZTFol/y2pnVzTSe3k
SZnlpL5kE7y92UYDogPb5wWA7b+YkJN0OdSkRFy1FH26ZG8E4M7ZJ2tql5Sw7pGM
lsTjXpAUphnK5rz7QcYE8KAZWj//fIAcElIrvdklVcBnS3IqjfksYW27B64133vx
cT1B/lA1PHXj6Q==
=raED
-----END PGP SIGNATURE-----
Merge tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two more places which invoke tracing from RCU disabled regions in the
idle path.
Similar to the entry path the low level idle functions have to be
non-instrumentable"
* tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Fix intel_idle() vs tracing
sched/idle: Fix arch_cpu_idle() vs tracing
resctrl fs (Xiaochen Shen)
- Correct prctl(PR_GET_SPECULATION_CTRL) reporting (Anand K Mistry)
- A fix to not lose already seen MCE severity which determines whether
the machine can recover (Gabriele Paoloni)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl/DeEcACgkQEsHwGGHe
VUrAlg/+KO2GMYiz5mNslI3tg0tUf0tDEw7bsfsibGL9XkkRRre0zn2pTN3GoutG
ZeM9d+epFNfv1msyTYXjhhMeO5XLGnuBzyeCv4wQkIH0zdNJSencPlr6tKnaMrAo
Alk/BxmvBmOOWHayBOhgNBVYWXBoACkIzIUFi0d3lRbgiGDhOqowYbYkLWIu/tvm
pI8anovuAfkluhh+tD96TyBrSa+GfWWiOYItNbJbmKAbGjGbJU5IEVmlXeMEziGO
7zmGNwl4Di2V3lU5oDJNcYvF2Ngok/L81QqOMKeVu7EYDHLBjHxp0rxwuE8QqlX8
bk40EEHyI1Aiwx/7WxB/ChnDOfpIpXljWEvropZeNOCY1LqDMnMnAJFrygQAmYeo
t9GxbhBYvtXSxTkkfTWzowt20Vmm2+r5j09kpq3tfMrZXwk7VrSF8TjQ/3iM/5iM
eV/wPaigqX0xZwdzF1o3fuXeM+RXHhMrnX+cKlFemUX9uFZi0/ZXk3TlJwCG/jKL
QWbpTBuxcDwXR3K1RNPjBJG1OA1cupZQ9ePK8h95H1844rEwglreypCOf3LwDJzU
9XEZo94V/J+gxNzg5iakidtdBMSst+SYY+vd/SUlLHXzkwcykOzwPgvZwVA7U2HM
74Eh7eQhIht3Lin35i9h1Ez/bxE/Ss6Fi9wnObp8ij5ImOaS5eU=
=JrT9
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"A couple of urgent fixes which accumulated this last week:
- Two resctrl fixes to prevent refcount leaks when manipulating the
resctrl fs (Xiaochen Shen)
- Correct prctl(PR_GET_SPECULATION_CTRL) reporting (Anand K Mistry)
- A fix to not lose already seen MCE severity which determines
whether the machine can recover (Gabriele Paoloni)"
* tag 'x86_urgent_for_v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Do not overwrite no_way_out if mce_end() fails
x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb
x86/resctrl: Add necessary kernfs_put() calls to prevent refcount leak
x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak
- Fix alignment of the new HYP sections
- Fix GICR_TYPER access from userspace
S390:
- do not reset the global diag318 data for per-cpu reset
- do not mark memory as protected too early
- fix for destroy page ultravisor call
x86:
- fix for SEV debugging
- fix incorrect return code
- fix for "noapic" with PIC in userspace and LAPIC in kernel
- fix for 5-level paging
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/BKpQUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPrZgf+Jdw1ONU5hFLl5Xz2YneVppqMr3nh
X/Nr/dGzP+ve2FPNgkMotwqOWb/6jwKYKbliB2Q6fS51/7MiV7TDizna8ZpzEn12
M0/NMWtW7Luq7yTTnXUhClG4QfRvn90EaflxUYxCBSRRhDleJ9sCl4Ga5b1fDIdQ
AeDdqJV4ElCGUrPM1my4vemrbFeiiEeDeWZvb6TP5LlJS+EDZeehk9zEAB7PFwAu
oX3O8WUbRxRYakZR1PPIn8e0qh2zaVDFUk/sZKJLOCCPx2UnOErf3jV6rQEMeSPC
5aOspfq+gI3jukufdyNxcKxRSj8Jw63f0vDaUgd4H71dsG390gM6onQiQg==
=IyC5
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix alignment of the new HYP sections
- Fix GICR_TYPER access from userspace
S390:
- do not reset the global diag318 data for per-cpu reset
- do not mark memory as protected too early
- fix for destroy page ultravisor call
x86:
- fix for SEV debugging
- fix incorrect return code
- fix for 'noapic' with PIC in userspace and LAPIC in kernel
- fix for 5-level paging"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT
KVM: x86: Fix split-irqchip vs interrupt injection window request
KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint
MAINTAINERS: Update email address for Sean Christopherson
MAINTAINERS: add uv.c also to KVM/s390
s390/uv: handle destroy page legacy interface
KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace
KVM: SVM: fix error return code in svm_create_vcpu()
KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt().
KVM: arm64: Correctly align nVHE percpu data
KVM: s390: remove diag318 reset code
KVM: s390: pv: Mark mm as protected after the set secure parameters and improve cleanup
- Fix intel iommu driver when running on devices without VCCAP_REG
- Fix swiotlb and "iommu=pt" interaction under TXT (tboot)
- Fix missing return value check during device probe()
- Fix probe ordering for Qualcomm SMMU implementation
- Ensure page-sized mappings are used for AMD IOMMU buffers with SNP RMP
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl/A3msQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNAI6B/9/MLjurPmQrSusq9Y7dhnGR7aahtICLAwR
UDZHebGTeVxNqtTklQVl/1qgcY7DxU40LAO1777MM7eHOW1FnlCAWrkTo6BBQsGx
U1FKegOJ/0eVHFPtFDfM7IA2skeLwlZW+hywNLAksme5mtd6iZG9yQLlDFqAjxL7
v8uXgfHFn6Z2MvMc2O+IeKTtflIwPek/6rYuaEf7UknA6ZYPAD3hnu9i1RTEuUAA
h2PoVrrJ/KefEsCMUIq2jwMTsSvxohDH8ClGK9b6h74J2CLKKuhALSgABRAmsEL9
7w5TVdtMQ85n3ccnXyT4RBQ+O/eVtmsKSdfeAbFI4nG9e9j7YE1L
=BI1u
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull iommu fixes from Will Deacon:
"Here's another round of IOMMU fixes for -rc6 consisting mainly of a
bunch of independent driver fixes. Thomas agreed for me to take the
x86 'tboot' fix here, as it fixes a regression introduced by a vt-d
change.
- Fix intel iommu driver when running on devices without VCCAP_REG
- Fix swiotlb and "iommu=pt" interaction under TXT (tboot)
- Fix missing return value check during device probe()
- Fix probe ordering for Qualcomm SMMU implementation
- Ensure page-sized mappings are used for AMD IOMMU buffers with SNP
RMP"
* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
iommu/vt-d: Don't read VCCAP register unless it exists
x86/tboot: Don't disable swiotlb when iommu is forced on
iommu: Check return of __iommu_attach_device()
arm-smmu-qcom: Ensure the qcom_scm driver has finished probing
iommu/amd: Enforce 4k mapping for certain IOMMU data structures
Currently, if mce_end() fails, no_way_out - the variable denoting
whether the machine can recover from this MCE - is determined by whether
the worst severity that was found across the MCA banks associated with
the current CPU, is of panic severity.
However, at this point no_way_out could have been already set by
mca_start() after looking at all severities of all CPUs that entered the
MCE handler. If mce_end() fails, check first if no_way_out is already
set and, if so, stick to it, otherwise use the local worst value.
[ bp: Massage. ]
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201127161819.3106432-2-gabriele.paoloni@intel.com
Commit 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU") caused
the following WARNING on an Intel Ice Lake CPU:
get_mmio_spte: detect reserved bits on spte, addr 0xb80a0, dump hierarchy:
------ spte 0xb80a0 level 5.
------ spte 0xfcd210107 level 4.
------ spte 0x1004c40107 level 3.
------ spte 0x1004c41107 level 2.
------ spte 0x1db00000000b83b6 level 1.
WARNING: CPU: 109 PID: 10254 at arch/x86/kvm/mmu/mmu.c:3569 kvm_mmu_page_fault.cold.150+0x54/0x22f [kvm]
...
Call Trace:
? kvm_io_bus_get_first_dev+0x55/0x110 [kvm]
vcpu_enter_guest+0xaa1/0x16a0 [kvm]
? vmx_get_cs_db_l_bits+0x17/0x30 [kvm_intel]
? skip_emulated_instruction+0xaa/0x150 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0xca/0x520 [kvm]
The guest triggering this crashes. Note, this happens with the traditional
MMU and EPT enabled, not with the newly introduced TDP MMU. Turns out,
there was a subtle change in the above mentioned commit. Previously,
walk_shadow_page_get_mmio_spte() was setting 'root' to 'iterator.level'
which is returned by shadow_walk_init() and this equals to
'vcpu->arch.mmu->shadow_root_level'. Now, get_mmio_spte() sets it to
'int root = vcpu->arch.mmu->root_level'.
The difference between 'root_level' and 'shadow_root_level' on CPUs
supporting 5-level page tables is that in some case we don't want to
use 5-level, in particular when 'cpuid_maxphyaddr(vcpu) <= 48'
kvm_mmu_get_tdp_level() returns '4'. In case upper layer is not used,
the corresponding SPTE will fail '__is_rsvd_bits_set()' check.
Revert to using 'shadow_root_level'.
Fixes: 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201126110206.2118959-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are
a hodge-podge of conditions, hacked together to get something that
more or less works. But what is actually needed is much simpler;
in both cases the fundamental question is, do we have a place to stash
an interrupt if userspace does KVM_INTERRUPT?
In userspace irqchip mode, that is !vcpu->arch.interrupt.injected.
Currently kvm_event_needs_reinjection(vcpu) covers it, but it is
unnecessarily restrictive.
In split irqchip mode it's a bit more complicated, we need to check
kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK
cycle and thus requires ExtINTs not to be masked) as well as
!pending_userspace_extint(vcpu). However, there is no need to
check kvm_event_needs_reinjection(vcpu), since split irqchip keeps
pending ExtINT state separate from event injection state, and checking
kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher
priority than APIC interrupts. In fact the latter fixes a bug:
when userspace requests an IRQ window vmexit, an interrupt in the
local APIC can cause kvm_cpu_has_interrupt() to be true and thus
kvm_vcpu_ready_for_interrupt_injection() to return false. When this
happens, vcpu_run does not exit to userspace but the interrupt window
vmexits keep occurring. The VM loops without any hope of making progress.
Once we try to fix these with something like
return kvm_arch_interrupt_allowed(vcpu) &&
- !kvm_cpu_has_interrupt(vcpu) &&
- !kvm_event_needs_reinjection(vcpu) &&
- kvm_cpu_accept_dm_intr(vcpu);
+ (!lapic_in_kernel(vcpu)
+ ? !vcpu->arch.interrupt.injected
+ : (kvm_apic_accept_pic_intr(vcpu)
+ && !pending_userspace_extint(v)));
we realize two things. First, thanks to the previous patch the complex
conditional can reuse !kvm_cpu_has_extint(vcpu). Second, the interrupt
window request in vcpu_enter_guest()
bool req_int_win =
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
should be kept in sync with kvm_vcpu_ready_for_interrupt_injection():
it is unnecessary to ask the processor for an interrupt window
if we would not be able to return to userspace. Therefore,
kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu)
ANDed with the existing check for masked ExtINT. It all makes sense:
- we can accept an interrupt from userspace if there is a place
to stash it (and, for irqchip split, ExtINTs are not masked).
Interrupts from userspace _can_ be accepted even if right now
EFLAGS.IF=0.
- in order to tell userspace we will inject its interrupt ("IRQ
window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both
KVM and the vCPU need to be ready to accept the interrupt.
... and this is what the patch implements.
Reported-by: David Woodhouse <dwmw@amazon.co.uk>
Analyzed-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Centralize handling of interrupts from the userspace APIC
in kvm_cpu_has_extint and kvm_cpu_get_extint, since
userspace APIC interrupts are handled more or less the
same as ExtINTs are with split irqchip. This removes
duplicated code from kvm_cpu_has_injectable_intr and
kvm_cpu_has_interrupt, and makes the code more similar
between kvm_cpu_has_{extint,interrupt} on one side
and kvm_cpu_get_{extint,interrupt} on the other.
Cc: stable@vger.kernel.org
Reviewed-by: Filippo Sironi <sironi@amazon.de>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When spectre_v2_user={seccomp,prctl},ibpb is specified on the command
line, IBPB is force-enabled and STIPB is conditionally-enabled (or not
available).
However, since
21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
the spectre_v2_user_ibpb variable is set to SPECTRE_V2_USER_{PRCTL,SECCOMP}
instead of SPECTRE_V2_USER_STRICT, which is the actual behaviour.
Because the issuing of IBPB relies on the switch_mm_*_ibpb static
branches, the mitigations behave as expected.
Since
1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP")
this discrepency caused the misreporting of IB speculation via prctl().
On CPUs with STIBP always-on and spectre_v2_user=seccomp,ibpb,
prctl(PR_GET_SPECULATION_CTRL) would return PR_SPEC_PRCTL |
PR_SPEC_ENABLE instead of PR_SPEC_DISABLE since both IBPB and STIPB are
always on. It also allowed prctl(PR_SET_SPECULATION_CTRL) to set the IB
speculation mode, even though the flag is ignored.
Similarly, for CPUs without SMT, prctl(PR_GET_SPECULATION_CTRL) should
also return PR_SPEC_DISABLE since IBPB is always on and STIBP is not
available.
[ bp: Massage commit message. ]
Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
Fixes: 1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP")
Signed-off-by: Anand K Mistry <amistry@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201110123349.1.Id0cbf996d2151f4c143c90f9028651a5b49a5908@changeid
After commit 327d5b2fee91c ("iommu/vt-d: Allow 32bit devices to uses DMA
domain"), swiotlb could also be used for direct memory access if IOMMU
is enabled but a device is configured to pass through the DMA translation.
Keep swiotlb when IOMMU is forced on, otherwise, some devices won't work
if "iommu=pt" kernel parameter is used.
Fixes: 327d5b2fee91 ("iommu/vt-d: Allow 32bit devices to uses DMA domain")
Reported-and-tested-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20201125014124.4070776-1-baolu.lu@linux.intel.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210237
Signed-off-by: Will Deacon <will@kernel.org>
We call arch_cpu_idle() with RCU disabled, but then use
local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.
Switch all arch_cpu_idle() implementations to use
raw_local_irq_{en,dis}able() and carefully manage the
lockdep,rcu,tracing state like we do in entry.
(XXX: we really should change arch_cpu_idle() to not return with
interrupts enabled)
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
On resource group creation via a mkdir an extra kernfs_node reference is
obtained by kernfs_get() to ensure that the rdtgroup structure remains
accessible for the rdtgroup_kn_unlock() calls where it is removed on
deletion. Currently the extra kernfs_node reference count is only
dropped by kernfs_put() in rdtgroup_kn_unlock() while the rdtgroup
structure is removed in a few other locations that lack the matching
reference drop.
In call paths of rmdir and umount, when a control group is removed,
kernfs_remove() is called to remove the whole kernfs nodes tree of the
control group (including the kernfs nodes trees of all child monitoring
groups), and then rdtgroup structure is freed by kfree(). The rdtgroup
structures of all child monitoring groups under the control group are
freed by kfree() in free_all_child_rdtgrp().
Before calling kfree() to free the rdtgroup structures, the kernfs node
of the control group itself as well as the kernfs nodes of all child
monitoring groups still take the extra references which will never be
dropped to 0 and the kernfs nodes will never be freed. It leads to
reference count leak and kernfs_node_cache memory leak.
For example, reference count leak is observed in these two cases:
(1) mount -t resctrl resctrl /sys/fs/resctrl
mkdir /sys/fs/resctrl/c1
mkdir /sys/fs/resctrl/c1/mon_groups/m1
umount /sys/fs/resctrl
(2) mkdir /sys/fs/resctrl/c1
mkdir /sys/fs/resctrl/c1/mon_groups/m1
rmdir /sys/fs/resctrl/c1
The same reference count leak issue also exists in the error exit paths
of mkdir in mkdir_rdt_prepare() and rdtgroup_mkdir_ctrl_mon().
Fix this issue by following changes to make sure the extra kernfs_node
reference on rdtgroup is dropped before freeing the rdtgroup structure.
(1) Introduce rdtgroup removal helper rdtgroup_remove() to wrap up
kernfs_put() and kfree().
(2) Call rdtgroup_remove() in rdtgroup removal path where the rdtgroup
structure is about to be freed by kfree().
(3) Call rdtgroup_remove() or kernfs_put() as appropriate in the error
exit paths of mkdir where an extra reference is taken by kernfs_get().
Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1604085088-31707-1-git-send-email-xiaochen.shen@intel.com
Willem reported growing of kernfs_node_cache entries in slabtop when
repeatedly creating and removing resctrl subdirectories as well as when
repeatedly mounting and unmounting the resctrl filesystem.
On resource group (control as well as monitoring) creation via a mkdir
an extra kernfs_node reference is obtained to ensure that the rdtgroup
structure remains accessible for the rdtgroup_kn_unlock() calls where it
is removed on deletion. The kernfs_node reference count is dropped by
kernfs_put() in rdtgroup_kn_unlock().
With the above explaining the need for one kernfs_get()/kernfs_put()
pair in resctrl there are more places where a kernfs_node reference is
obtained without a corresponding release. The excessive amount of
reference count on kernfs nodes will never be dropped to 0 and the
kernfs nodes will never be freed in the call paths of rmdir and umount.
It leads to reference count leak and kernfs_node_cache memory leak.
Remove the superfluous kernfs_get() calls and expand the existing
comments surrounding the remaining kernfs_get()/kernfs_put() pair that
remains in use.
Superfluous kernfs_get() calls are removed from two areas:
(1) In call paths of mount and mkdir, when kernfs nodes for "info",
"mon_groups" and "mon_data" directories and sub-directories are
created, the reference count of newly created kernfs node is set to 1.
But after kernfs_create_dir() returns, superfluous kernfs_get() are
called to take an additional reference.
(2) kernfs_get() calls in rmdir call paths.
Fixes: 17eafd076291 ("x86/intel_rdt: Split resource group removal in two")
Fixes: 4af4a88e0c92 ("x86/intel_rdt/cqm: Add mount,umount support")
Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data")
Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
Fixes: 5dc1d5c6bac2 ("x86/intel_rdt: Simplify info and base file lists")
Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system")
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1604085053-31639-1-git-send-email-xiaochen.shen@intel.com
attributes instead of device attributes and therefore making Clangs control
flow integrity checker upset.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+6diATHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoaVKD/9pbxJEopP0ojZx8MbGVxPn1dzgii6k
YgWOemMPGV/hi87pCSah1RcNVY8aMwMUBnzMJEhx9bCYFMxCummM7wcoN7bJneWU
f548CesgeKP662vFYjAXrsASX0DZAUUQm/MTnkLqVlD11yaqudxKEjEYNi2jVehq
7tTip8Eb88w5dyQJLN7sjZuV9AdZCiXeiyGgYb3jVjfdfTuq4KRp6XdL9SghEfwV
3qDOrcBuLKgQbz+eE2C1tVX+/kSNfG1stNAWkENHBWfrnmbeCt7YdT6Gd6bays0b
1Ul0LgZwXbDdkQrtmoEGtI8j/Q+yiozeJY7+eeh/Z9nw0eviZ8ir36y96wRukOqw
od+EjENiF1hIJk21r6mIuQoMyezHraIMgg0WF4/YT1XRGqwR1wqgHpmwqMjmem0Y
Kuab7aYC4g8vGk9ah2/FmTnTHooEyk8GyxnVXO+aSdLbFI04L7HBDILvgvpHPyCS
BnUDZE785W8yLydhKslfy92kVQGKuGaSqQUQcHD+c/L7m2AexoKrp1RJgA0gXtDj
ilmE7KNCCbeNXt3cDep6BQzymw9YStPNSPBSv4dK73DM/Y2qKBk3RD0BfMqdCkOy
sBqUCWRY1/q6Z+64zwDlER5Fn+KflE0ihobTxhA6m0yx7AOn9BoMDWfo3xGKw6Mn
nhVDpafF8YuVIw==
=xsma
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Thomas Gleixner:
"A single fix for the x86 perf sysfs interfaces which used kobject
attributes instead of device attributes and therefore making clang's
control flow integrity checker upset"
* tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: fix sysfs type mismatches
- fix memory leak in efivarfs driver
- fix HYP mode issue in 32-bit ARM version of the EFI stub when built in
Thumb2 mode
- avoid leaking EFI pgd pages on allocation failure
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEnNKg2mrY9zMBdeK7wjcgfpV0+n0FAl+tRcoACgkQwjcgfpV0
+n1LZQf/af1p9A0zT1nC3IrcaABceJDnD3dJuV9SD6QhFuD2Dw/Mshr+MVzDsO+3
btvuu0r4CzQ5ajfpfcGcvBFFWbbPTwKvWQe++9Unwoz5acw7hpV5yxNwMivdaJEh
3o4pkgpCmwtliTwiroDficO7Vlqefqf4LZd7/iQYQTnuPK3waYQBjwp9t2D9tlx7
kXiEQDP2BDRCUrKEjlR7AHTZ156mw+UsiquAuxMCGTKBqwiELEEV6aPseqa5MmNV
RDV1IXWdhOQyQfzg0s6vTzwGeN0JubSxHng6O9UbE+tctz4EqaaHIEsRuMBq8oLD
Y8JeGp1ovypTJxeLE5t6eEzsfvTRsg==
=GnmM
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Borislav Petkov:
"Forwarded EFI fixes from Ard Biesheuvel:
- fix memory leak in efivarfs driver
- fix HYP mode issue in 32-bit ARM version of the EFI stub when built
in Thumb2 mode
- avoid leaking EFI pgd pages on allocation failure"
* tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/x86: Free efi_pgd with free_pages()
efivarfs: fix memory leak in efivarfs_create()
efi/arm: set HSCTLR Thumb2 bit correctly for HVC calls from HYP
same because the proper one is going through the IOMMU tree. (Thomas Gleixner)
* An Intel microcode loader fix to save the correct microcode patch to
apply during resume. (Chen Yu)
* A fix to not access user memory of other processes when dumping opcode
bytes. (Thomas Gleixner)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+6QnYACgkQEsHwGGHe
VUpMgg//cE4GvK7dFihq5iNYS16yKKgyVBLwHiHoCxQkOu4CPSf0EejTMLKPSKpA
hS7pJm6dMYLJ7PhIKRwlQD1NYet3SHCOi1h0Y2/3rVay82Vz/As2nZRd7paEkP9k
kiNd/Ib2uAQ9ZwCKhmLanOWDoXXFo5Fwmpw+O/zhn31Y3kBx05mc2ojXnEwCERK/
j6GwOBkfOq+q6D7HzsfEehlX0iy2rlPSq6OUN1H/Z4w+W9zfpubLX9bhhr9RvDmS
6Hsmwos8fM/cQXZC2oxoKUP6zeYk5lDF/Qf7hOFt3WgdCpTZP2z6uLwUL6+NXu/C
Ms+77Cx+Loe8UaDkyxXXruiv7mZIZCnl0phZulE4ibOjzL7W7w5cEf0LHdDS7uDL
JzZwTJlbK9RjrSp4pXR+rn+U/PdluZJYpjh/Gj8+V23QszvbDsVd55SV5+CLSTM/
qwrNp5ffLEnIFHmW98/OQKMvNTTzBFdlltnJm7mMW2JucgfCZMx4zwY0+dZ+ocDq
/Kt/HNCzez1jHWiS5jdZCM15KuIhPTNuTpq0ipU+UAXxZOaNtTo1st1lhKoPvTz/
k4y7S/EibX6PK4lxBQMM3fBG/T1KK7ZMgNEN0jbhWegqEmdWtgwc57p1gVE70XUU
f6elyUupD65TMwgio2I8sQuiNBCKfF2Wj144dxXGJnGnOoZKey8=
=z/Gq
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- An IOMMU VT-d build fix when CONFIG_PCI_ATS=n along with a revert of
same because the proper one is going through the IOMMU tree (Thomas
Gleixner)
- An Intel microcode loader fix to save the correct microcode patch to
apply during resume (Chen Yu)
- A fix to not access user memory of other processes when dumping
opcode bytes (Thomas Gleixner)
* tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "iommu/vt-d: Take CONFIG_PCI_ATS into account"
x86/dumpstack: Do not try to access user space code of other tasks
x86/microcode/intel: Check patch signature before saving microcode for early loading
iommu/vt-d: Take CONFIG_PCI_ATS into account
The core-mm has a default __weak implementation of phys_to_target_node()
to mirror the weak definition of memory_add_physaddr_to_nid(). That
symbol is exported for modules. However, while the export in
mm/memory_hotplug.c exported the symbol in the configuration cases of:
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=y
...and:
CONFIG_NUMA_KEEP_MEMINFO=n
CONFIG_MEMORY_HOTPLUG=y
...it failed to export the symbol in the case of:
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=n
Not only is that broken, but Christoph points out that the kernel should
not be exporting any __weak symbol, which means that
memory_add_physaddr_to_nid() example that phys_to_target_node() copied
is broken too.
Rework the definition of phys_to_target_node() and
memory_add_physaddr_to_nid() to not require weak symbols. Move to the
common arch override design-pattern of an asm header defining a symbol
to replace the default implementation.
The only common header that all memory_add_physaddr_to_nid() producing
architectures implement is asm/sparsemem.h. In fact, powerpc already
defines its memory_add_physaddr_to_nid() helper in sparsemem.h.
Double-down on that observation and define phys_to_target_node() where
necessary in asm/sparsemem.h. An alternate consideration that was
discarded was to put this override in asm/numa.h, but that entangles
with the definition of MAX_NUMNODES relative to the inclusion of
linux/nodemask.h, and requires powerpc to grow a new header.
The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid
now that the symbol is properly exported / stubbed in all combinations
of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG.
[dan.j.williams@intel.com: v4]
Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com
[dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning]
Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>