IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
CPUID Leaf 0x1F defines a DIE_TYPE level (nb: ECX[8:15] level type == 0x5),
but CPUID Leaf 0xB does not. However, detect_extended_topology() will
set struct cpuinfo_x86.cpu_die_id regardless of whether a valid Die ID
was found.
Only set cpu_die_id if a DIE_TYPE level is found. CPU topology code may
use another value for cpu_die_id, e.g. the AMD NodeId on AMD-based
systems. Code ordering should be maintained so that the CPUID Leaf 0x1F
Die ID value will take precedence on systems that may use another value.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201109210659.754018-5-Yazen.Ghannam@amd.com
The Last Level Cache ID is returned by amd_get_nb_id(). In practice,
this value is the same as the AMD NodeId for callers of this function.
The NodeId is saved in struct cpuinfo_x86.cpu_die_id.
Replace calls to amd_get_nb_id() with the logical CPU's cpu_die_id and
remove the function.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201109210659.754018-3-Yazen.Ghannam@amd.com
AMD systems provide a "NodeId" value that represents a global ID
indicating to which "Node" a logical CPU belongs. The "Node" is a
physical structure equivalent to a Die, and it should not be confused
with logical structures like NUMA nodes. Logical nodes can be adjusted
based on firmware or other settings whereas the physical nodes/dies are
fixed based on hardware topology.
The NodeId value can be used when a physical ID is needed by software.
Save the AMD NodeId to struct cpuinfo_x86.cpu_die_id. Use the value
from CPUID or MSR as appropriate. Default to phys_proc_id otherwise.
Do so for both AMD and Hygon systems.
Drop the node_id parameter from cacheinfo_*_init_llc_id() as it is no
longer needed.
Update the x86 topology documentation.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201109210659.754018-2-Yazen.Ghannam@amd.com
Return -ERESTARTSYS instead of -EINTR in sgx_ioc_enclave_add_pages()
when interrupted before any pages have been processed. At this point
ioctl can be obviously safely restarted.
Reported-by: Haitao Huang <haitao.huang@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201118213932.63341-1-jarkko@kernel.org
- Cure the fallout from the MSI irqdomain overhaul which missed that the
Intel IOMMU does not register virtual function devices and therefore
never reaches the point where the MSI interrupt domain is assigned. This
makes the VF devices use the non-remapped MSI domain which is trapped by
the IOMMU/remap unit.
- Remove an extra space in the SGI_UV architecture type procfs output for
UV5.
- Remove a unused function which was missed when removing the UV BAU TLB
shootdown handler.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+xJi0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoVWxD/9Tq4W6Kniln7mtoEWHRvHRceiiGcS3
MocvqurhoJwirH4F2gkvCegTBy0r3FdUORy3OMmChVs6nb8XpPpso84SANCRePWp
JZezpVwLSNC4O1/ZCg1Kjj4eUpzLB/UjUUQV9RsjL5wyQEhfCZgb1D40yLM/2dj5
SkVm/EAqWuQNtYe/jqAOwTX/7mV+k2QEmKCNOigM13R9EWgu6a4J8ta1gtNSbwvN
jWMW+M1KjZ76pfRK+y4OpbuFixteSzhSWYPITSGwQz4IpQ+Ty2Rv0zzjidmDnAR+
Q73cup0dretdVnVDRpMwDc06dBCmt/rbN50w4yGU0YFRFDgjGc8sIbQzuIP81nEQ
XY4l4rcBgyVufFsLrRpQxu1iYPFrcgU38W1kRkkJ3Kl/rY1a2ZU7sLE4kt4Oh55W
A9KCmsfqP1PCYppjAQ0QT4NOp4YtecPvAU4UcBOb722DDBd8TfhLWWGw2yG57Q/d
Wnu8xCJGy7BaLHLGGGseAft+D4aNnCjKC3jgMyvNtRDXaV2cK2Kdd6ehMlWVUapD
xfLlKXE+igXMyoWJIWjTXQJs4dpKu6QpJCPiorwEZ8rmNaRfxsWEJVbeYwEkmUke
bMoBBSCbZT86WVOYhI8WtrIemraY0mMYrrcE03M96HU3eYB8BV92KrIzZWThupcQ
ZqkZbqCZm3vfHA==
=X/P+
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into for-next/iommu/fixes
Pull in x86 fixes from Thomas, as they include a change to the Intel DMAR
code on which we depend:
* tag 'x86-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
iommu/vt-d: Cure VF irqdomain hickup
x86/platform/uv: Fix copied UV5 output archtype
x86/platform/uv: Drop last traces of uv_flush_tlb_others
Short Version:
The SGX section->laundry_list structure is effectively thread-local, but
declared next to some shared structures. Its semantics are clear as mud.
Fix that. No functional changes. Compile tested only.
Long Version:
The SGX hardware keeps per-page metadata. This can provide things like
permissions, integrity and replay protection. It also prevents things
like having an enclave page mapped multiple times or shared between
enclaves.
But, that presents a problem for kexec()'d kernels (or any other kernel
that does not run immediately after a hardware reset). This is because
the last kernel may have been rude and forgotten to reset pages, which
would trigger the "shared page" sanity check.
To fix this, the SGX code "launders" the pages by running the EREMOVE
instruction on all pages at boot. This is slow and can take a long
time, so it is performed off in the SGX-specific ksgxd instead of being
synchronous at boot. The init code hands the list of pages to launder in
a per-SGX-section list: ->laundry_list. The only code to touch this list
is the init code and ksgxd. This means that no locking is necessary for
->laundry_list.
However, a lock is required for section->page_list, which is accessed
while creating enclaves and by ksgxd. This lock (section->lock) is
acquired by ksgxd while also processing ->laundry_list. It is easy to
confuse the purpose of the locking as being for ->laundry_list and
->page_list.
Rename ->laundry_list to ->init_laundry_list to make it clear that this
is not normally used at runtime. Also add some comments clarifying the
locking, and reorganize 'sgx_epc_section' to put 'lock' near the things
it protects.
Note: init_laundry_list is 128 bytes of wasted space at runtime. It
could theoretically be dynamically allocated and then freed after
the laundering process. But it would take nearly 128 bytes of extra
instructions to do that.
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201116222531.4834-1-dave.hansen@intel.com
Commit
4b47cdbda6f1 ("x86/head/64: Move early exception dispatch to C code")
removed the usage of GET_CR2_INTO().
Drop the definition as well, and related definitions in paravirt.h and
asm-offsets.h
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201005151208.2212886-3-nivedita@alum.mit.edu
Enclave memory is normally inaccessible from outside the enclave. This
makes enclaves hard to debug. However, enclaves can be put in a debug
mode when they are being built. In that mode, enclave data *can* be read
and/or written by using the ENCLS[EDBGRD] and ENCLS[EDBGWR] functions.
This is obviously only for debugging and destroys all the protections
present with normal enclaves. But, enclaves know their own debug status
and can adjust their behavior appropriately.
Add a vm_ops->access() implementation which can be used to read and write
memory inside debug enclaves. This is typically used via ptrace() APIs.
[ bp: Massage. ]
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-23-jarkko@kernel.org
Just like normal RAM, there is a limited amount of enclave memory available
and overcommitting it is a very valuable tool to reduce resource use.
Introduce a simple reclaim mechanism for enclave pages.
In contrast to normal page reclaim, the kernel cannot directly access
enclave memory. To get around this, the SGX architecture provides a set of
functions to help. Among other things, these functions copy enclave memory
to and from normal memory, encrypting it and protecting its integrity in
the process.
Implement a page reclaimer by using these functions. Picks victim pages in
LRU fashion from all the enclaves running in the system. A new kernel
thread (ksgxswapd) reclaims pages in the background based on watermarks,
similar to normal kswapd.
All enclave pages can be reclaimed, architecturally. But, there are some
limits to this, such as the special SECS metadata page which must be
reclaimed last. The page version array (used to mitigate replaying old
reclaimed pages) is also architecturally reclaimable, but not yet
implemented. The end result is that the vast majority of enclave pages are
currently reclaimable.
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-22-jarkko@kernel.org
vDSO functions can now leverage an exception fixup mechanism similar to
kernel exception fixup. For vDSO exception fixup, the initial user is
Intel's Software Guard Extensions (SGX), which will wrap the low-level
transitions to/from the enclave, i.e. EENTER and ERESUME instructions,
in a vDSO function and leverage fixup to intercept exceptions that would
otherwise generate a signal. This allows the vDSO wrapper to return the
fault information directly to its caller, obviating the need for SGX
applications and libraries to juggle signal handlers.
Attempt to fixup vDSO exceptions immediately prior to populating and
sending signal information. Except for the delivery mechanism, an
exception in a vDSO function should be treated like any other exception
in userspace, e.g. any fault that is successfully handled by the kernel
should not be directly visible to userspace.
Although it's debatable whether or not all exceptions are of interest to
enclaves, defer to the vDSO fixup to decide whether to do fixup or
generate a signal. Future users of vDSO fixup, if there ever are any,
will undoubtedly have different requirements than SGX enclaves, e.g. the
fixup vs. signal logic can be made function specific if/when necessary.
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-19-jarkko@kernel.org
The whole point of SGX is to create a hardware protected place to do
“stuff”. But, before someone is willing to hand over the keys to
the castle , an enclave must often prove that it is running on an
SGX-protected processor. Provisioning enclaves play a key role in
providing proof.
There are actually three different enclaves in play in order to make this
happen:
1. The application enclave. The familiar one we know and love that runs
the actual code that’s doing real work. There can be many of these on
a single system, or even in a single application.
2. The quoting enclave (QE). The QE is mentioned in lots of silly
whitepapers, but, for the purposes of kernel enabling, just pretend they
do not exist.
3. The provisioning enclave. There is typically only one of these
enclaves per system. Provisioning enclaves have access to a special
hardware key.
They can use this key to help to generate certificates which serve as
proof that enclaves are running on trusted SGX hardware. These
certificates can be passed around without revealing the special key.
Any user who can create a provisioning enclave can access the
processor-unique Provisioning Certificate Key which has privacy and
fingerprinting implications. Even if a user is permitted to create
normal application enclaves (via /dev/sgx_enclave), they should not be
able to create provisioning enclaves. That means a separate permissions
scheme is needed to control provisioning enclave privileges.
Implement a separate device file (/dev/sgx_provision) which allows
creating provisioning enclaves. This device will typically have more
strict permissions than the plain enclave device.
The actual device “driver” is an empty stub. Open file descriptors for
this device will represent a token which allows provisioning enclave duty.
This file descriptor can be passed around and ultimately given as an
argument to the /dev/sgx_enclave driver ioctl().
[ bp: Touchups. ]
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-security-module@vger.kernel.org
Link: https://lkml.kernel.org/r/20201112220135.165028-16-jarkko@kernel.org
Enclaves have two basic states. They are either being built and are
malleable and can be modified by doing things like adding pages. Or,
they are locked down and not accepting changes. They can only be run
after they have been locked down. The ENCLS[EINIT] function induces the
transition from being malleable to locked-down.
Add an ioctl() that performs ENCLS[EINIT]. After this, new pages can
no longer be added with ENCLS[EADD]. This is also the time where the
enclave can be measured to verify its integrity.
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-15-jarkko@kernel.org
SGX enclave pages are inaccessible to normal software. They must be
populated with data by copying from normal memory with the help of the
EADD and EEXTEND functions of the ENCLS instruction.
Add an ioctl() which performs EADD that adds new data to an enclave, and
optionally EEXTEND functions that hash the page contents and use the
hash as part of enclave “measurement” to ensure enclave integrity.
The enclave author gets to decide which pages will be included in the
enclave measurement with EEXTEND. Measurement is very slow and has
sometimes has very little value. For instance, an enclave _could_
measure every page of data and code, but would be slow to initialize.
Or, it might just measure its code and then trust that code to
initialize the bulk of its data after it starts running.
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-14-jarkko@kernel.org
Add an ioctl() that performs the ECREATE function of the ENCLS
instruction, which creates an SGX Enclave Control Structure (SECS).
Although the SECS is an in-memory data structure, it is present in
enclave memory and is not directly accessible by software.
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-13-jarkko@kernel.org
Intel(R) SGX is a new hardware functionality that can be used by
applications to set aside private regions of code and data called
enclaves. New hardware protects enclave code and data from outside
access and modification.
Add a driver that presents a device file and ioctl API to build and
manage enclaves.
[ bp: Small touchups, remove unused encl variable in sgx_encl_find() as
Reported-by: kernel test robot <lkp@intel.com> ]
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-12-jarkko@kernel.org
"intel_iommu=off" command line is used to disable iommu but iommu is force
enabled in a tboot system for security reason.
However for better performance on high speed network device, a new option
"intel_iommu=tboot_noforce" is introduced to disable the force on.
By default kernel should panic if iommu init fail in tboot for security
reason, but it's unnecessory if we use "intel_iommu=tboot_noforce,off".
Fix the code setting force_on and move intel_iommu_tboot_noforce
from tboot code to intel iommu code.
Fixes: 7304e8f28bb2 ("iommu/vt-d: Correctly disable Intel IOMMU force on")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com>
Tested-by: Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20201110071908.3133-1-zhenzhong.duan@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
sysrq-t ends up invoking show_opcodes() for each task which tries to access
the user space code of other processes, which is obviously bogus.
It either manages to dump where the foreign task's regs->ip points to in a
valid mapping of the current task or triggers a pagefault and prints "Code:
Bad RIP value.". Both is just wrong.
Add a safeguard in copy_code() and check whether the @regs pointer matches
currents pt_regs. If not, do not even try to access it.
While at it, add commentary why using copy_from_user_nmi() is safe in
copy_code() even if the function name suggests otherwise.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201117202753.667274723@linutronix.de
show_trace_log_lvl() is not used by other compilation units so make it
static and remove the declaration from the header file.
Signed-off-by: Hui Su <sh_def@163.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201113133943.GA136221@rlk
Add functions for runtime allocation and free.
This allocator and its algorithms are as simple as it gets. They do a
linear search across all EPC sections and find the first free page. They
are not NUMA-aware and only hand out individual pages. The SGX hardware
does not support large pages, so something more complicated like a buddy
allocator is unwarranted.
The free function (sgx_free_epc_page()) implicitly calls ENCLS[EREMOVE],
which returns the page to the uninitialized state. This ensures that the
page is ready for use at the next allocation.
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-10-jarkko@kernel.org
Add a kernel parameter to disable SGX kernel support and document it.
[ bp: Massage. ]
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Tested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-9-jarkko@kernel.org
Kernel support for SGX is ultimately decided by the state of the launch
control bits in the feature control MSR (MSR_IA32_FEAT_CTL). If the
hardware supports SGX, but neglects to support flexible launch control, the
kernel will not enable SGX.
Enable SGX at feature control MSR initialization and update the associated
X86_FEATURE flags accordingly. Disable X86_FEATURE_SGX (and all
derivatives) if the kernel is not able to establish itself as the authority
over SGX Launch Control.
All checks are performed for each logical CPU (not just boot CPU) in order
to verify that MSR_IA32_FEATURE_CONTROL is correctly configured on all
CPUs. All SGX code in this series expects the same configuration from all
CPUs.
This differs from VMX where X86_FEATURE_VMX is intentionally cleared only
for the current CPU so that KVM can provide additional information if KVM
fails to load like which CPU doesn't support VMX. There’s not much the
kernel or an administrator can do to fix the situation, so SGX neglects to
convey additional details about these kinds of failures if they occur.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-8-jarkko@kernel.org
Although carved out of normal DRAM, enclave memory is marked in the
system memory map as reserved and is not managed by the core mm. There
may be several regions spread across the system. Each contiguous region
is called an Enclave Page Cache (EPC) section. EPC sections are
enumerated via CPUID
Enclave pages can only be accessed when they are mapped as part of an
enclave, by a hardware thread running inside the enclave.
Parse CPUID data, create metadata for EPC pages and populate a simple
EPC page allocator. Although much smaller, ‘struct sgx_epc_page’
metadata is the SGX analog of the core mm ‘struct page’.
Similar to how the core mm’s page->flags encode zone and NUMA
information, embed the EPC section index to the first eight bits of
sgx_epc_page->desc. This allows a quick reverse lookup from EPC page to
EPC section. Existing client hardware supports only a single section,
while upcoming server hardware will support at most eight sections.
Thus, eight bits should be enough for long term needs.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Serge Ayoun <serge.ayoun@intel.com>
Signed-off-by: Serge Ayoun <serge.ayoun@intel.com>
Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-6-jarkko@kernel.org
ENCLS is the userspace instruction which wraps virtually all
unprivileged SGX functionality for managing enclaves. It is essentially
the ioctl() of instructions with each function implementing different
SGX-related functionality.
Add macros to wrap the ENCLS functionality. There are two main groups,
one for functions which do not return error codes and a “ret_” set for
those that do.
ENCLS functions are documented in Intel SDM section 36.6.
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-3-jarkko@kernel.org
Define the SGX architectural data structures used by various SGX
functions. This is not an exhaustive representation of all SGX data
structures but only those needed by the kernel.
The goal is to sequester hardware structures in "sgx/arch.h" and keep
them separate from kernel-internal or uapi structures.
The data structures are described in Intel SDM section 37.6.
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-2-jarkko@kernel.org
Currently, scan_microcode() leverages microcode_matches() to check
if the microcode matches the CPU by comparing the family and model.
However, the processor stepping and flags of the microcode signature
should also be considered when saving a microcode patch for early
update.
Use find_matching_signature() in scan_microcode() and get rid of the
now-unused microcode_matches() which is a good cleanup in itself.
Complete the verification of the patch being saved for early loading in
save_microcode_patch() directly. This needs to be done there too because
save_mc_for_early() will call save_microcode_patch() too.
The second reason why this needs to be done is because the loader still
tries to support, at least hypothetically, mixed-steppings systems and
thus adds all patches to the cache that belong to the same CPU model
albeit with different steppings.
For example:
microcode: CPU: sig=0x906ec, pf=0x2, rev=0xd6
microcode: mc_saved[0]: sig=0x906e9, pf=0x2a, rev=0xd6, total size=0x19400, date = 2020-04-23
microcode: mc_saved[1]: sig=0x906ea, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27
microcode: mc_saved[2]: sig=0x906eb, pf=0x2, rev=0xd6, total size=0x19400, date = 2020-04-23
microcode: mc_saved[3]: sig=0x906ec, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27
microcode: mc_saved[4]: sig=0x906ed, pf=0x22, rev=0xd6, total size=0x19400, date = 2020-04-23
The patch which is being saved for early loading, however, can only be
the one which fits the CPU this runs on so do the signature verification
before saving.
[ bp: Do signature verification in save_microcode_patch()
and rewrite commit message. ]
Fixes: ec400ddeff20 ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU")
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208535
Link: https://lkml.kernel.org/r/20201113015923.13960-1-yu.c.chen@intel.com
This macro is useless, and could cause gcc warning:
arch/x86/kernel/kvmclock.c:47:0: warning: macro "HV_CLOCK_SIZE" is not
used [-Wunused-macros]
Let's remove it.
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <1604651963-10067-1-git-send-email-alex.shi@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that all in-kernel-tree users are converted to using the sysfs file,
remove the MSR from the "allowlist".
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20201029190259.3476-5-bp@alien8.de
Booting as a guest under KVM results in error messages about
unchecked MSR access:
unchecked MSR access error: RDMSR from 0x17f at rIP: 0xffffffff84483f16 (mce_intel_feature_init+0x156/0x270)
because KVM doesn't provide emulation for random model specific
registers.
Switch to using rdmsrl_safe()/wrmsrl_safe() to avoid the message.
Fixes: 68299a42f842 ("x86/mce: Enable additional error logging on certain Intel CPUs")
Reported-by: Qian Cai <cai@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201111003954.GA11878@agluck-desk2.amr.corp.intel.com
- Cure the fallout from the MSI irqdomain overhaul which missed that the
Intel IOMMU does not register virtual function devices and therefore
never reaches the point where the MSI interrupt domain is assigned. This
makes the VF devices use the non-remapped MSI domain which is trapped by
the IOMMU/remap unit.
- Remove an extra space in the SGI_UV architecture type procfs output for
UV5.
- Remove a unused function which was missed when removing the UV BAU TLB
shootdown handler.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+xJi0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoVWxD/9Tq4W6Kniln7mtoEWHRvHRceiiGcS3
MocvqurhoJwirH4F2gkvCegTBy0r3FdUORy3OMmChVs6nb8XpPpso84SANCRePWp
JZezpVwLSNC4O1/ZCg1Kjj4eUpzLB/UjUUQV9RsjL5wyQEhfCZgb1D40yLM/2dj5
SkVm/EAqWuQNtYe/jqAOwTX/7mV+k2QEmKCNOigM13R9EWgu6a4J8ta1gtNSbwvN
jWMW+M1KjZ76pfRK+y4OpbuFixteSzhSWYPITSGwQz4IpQ+Ty2Rv0zzjidmDnAR+
Q73cup0dretdVnVDRpMwDc06dBCmt/rbN50w4yGU0YFRFDgjGc8sIbQzuIP81nEQ
XY4l4rcBgyVufFsLrRpQxu1iYPFrcgU38W1kRkkJ3Kl/rY1a2ZU7sLE4kt4Oh55W
A9KCmsfqP1PCYppjAQ0QT4NOp4YtecPvAU4UcBOb722DDBd8TfhLWWGw2yG57Q/d
Wnu8xCJGy7BaLHLGGGseAft+D4aNnCjKC3jgMyvNtRDXaV2cK2Kdd6ehMlWVUapD
xfLlKXE+igXMyoWJIWjTXQJs4dpKu6QpJCPiorwEZ8rmNaRfxsWEJVbeYwEkmUke
bMoBBSCbZT86WVOYhI8WtrIemraY0mMYrrcE03M96HU3eYB8BV92KrIzZWThupcQ
ZqkZbqCZm3vfHA==
=X/P+
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of fixes for x86:
- Cure the fallout from the MSI irqdomain overhaul which missed that
the Intel IOMMU does not register virtual function devices and
therefore never reaches the point where the MSI interrupt domain is
assigned. This made the VF devices use the non-remapped MSI domain
which is trapped by the IOMMU/remap unit
- Remove an extra space in the SGI_UV architecture type procfs output
for UV5
- Remove a unused function which was missed when removing the UV BAU
TLB shootdown handler"
* tag 'x86-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
iommu/vt-d: Cure VF irqdomain hickup
x86/platform/uv: Fix copied UV5 output archtype
x86/platform/uv: Drop last traces of uv_flush_tlb_others
- A set of commits which reduce the stack usage of various perf event
handling functions which allocated large data structs on stack causing
stack overflows in the worst case.
- Use the proper mechanism for detecting soft interrupts in the recursion
protection.
- Make the resursion protection simpler and more robust.
- Simplify the scheduling of event groups to make the code more robust and
prepare for fixing the issues vs. scheduling of exclusive event groups.
- Prevent event multiplexing and rotation for exclusive event groups
- Correct the perf event attribute exclusive semantics to take pinned
events, e.g. the PMU watchdog, into account
- Make the anythread filtering conditional for Intel's generic PMU
counters as it is not longer guaranteed to be supported on newer
CPUs. Check the corresponding CPUID leaf to make sure.
- Fixup a duplicate initialization in an array which was probably cause by
the usual copy & paste - forgot to edit mishap.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+xIi0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYofixD/4+4gc8DhOmAkMrN0Z9tiW8ebgMKmb9
wZRkMr5Osi0GzLJOPZ6SdY6jd0A3rMN/sW6P1DT6pDtcty4bKFoW5VZBuUDIAhel
BC4C93L3y1En/GEZu1GTy3LvsBwLBQTOoY4goDjbdAbk60S/0RTHOGyQsRsOQFe6
fVs3iXozAFuaR6I6N3dlxuJAE51zvr8MyBWaUoByNDB//1+lLNW+JfClaAOG1oXx
qZIg/niatBVGzSGgKNRUyh3g8G1HJtabsA/NZ4PH8ZHuYABfmj4lmmUPR77ICLfV
wMITEBG7eaktB8EqM9hvaoOZLA5kpXHO2JbCFSs4c4x11mlC8g7QMV3poCw33YoN
a5TmT1A3muri1riy1/Ee9lXACOq7/tf2+Xfn9o6dvDdBwd6s5pzlhLGR8gILp2lF
2bcg3IwYvHT/Kiurb/WGNpbCqQIPJpcUcfs3tNBCCtKegahUQNnGjxN3NVo9RCit
zfL6xIJ8eZiYnsxXx4NKm744AukWiql3aRNgRkOdBP5WC68xt6VLcxG1YZKUoDhy
jRSOCD/DuPSMSvAAgN7S8OWlPsKWBxVxxWYV+K8FpwhgzbQ3WbS3UDiYkhgjeOxu
OlM692oWpllKvQWlvYthr2Be6oPCRRi1vvADNNbTKzgHk5i61bwympsGl1EZx3Pz
2ROp7NJFRESnqw==
=FzCf
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A set of fixes for perf:
- A set of commits which reduce the stack usage of various perf
event handling functions which allocated large data structs on
stack causing stack overflows in the worst case
- Use the proper mechanism for detecting soft interrupts in the
recursion protection
- Make the resursion protection simpler and more robust
- Simplify the scheduling of event groups to make the code more
robust and prepare for fixing the issues vs. scheduling of
exclusive event groups
- Prevent event multiplexing and rotation for exclusive event groups
- Correct the perf event attribute exclusive semantics to take
pinned events, e.g. the PMU watchdog, into account
- Make the anythread filtering conditional for Intel's generic PMU
counters as it is not longer guaranteed to be supported on newer
CPUs. Check the corresponding CPUID leaf to make sure
- Fixup a duplicate initialization in an array which was probably
caused by the usual 'copy & paste - forgot to edit' mishap"
* tag 'perf-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Fix Add BW copypasta
perf/x86/intel: Make anythread filter support conditional
perf: Tweak perf_event_attr::exclusive semantics
perf: Fix event multiplexing for exclusive groups
perf: Simplify group_sched_in()
perf: Simplify group_sched_out()
perf/x86: Make dummy_iregs static
perf/arch: Remove perf_sample_data::regs_user_copy
perf: Optimize get_recursion_context()
perf: Fix get_recursion_context()
perf/x86: Reduce stack usage for x86_pmu::drain_pebs()
perf: Reduce stack usage of perf_output_begin()
When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS is available, the ftrace call
will be able to set the ip of the calling function. This will improve the
performance of live kernel patching where it does not need all the regs to
be stored just to change the instruction pointer.
If all archs that support live kernel patching also support
HAVE_DYNAMIC_FTRACE_WITH_ARGS, then the architecture specific function
klp_arch_set_pc() could be made generic.
It is possible that an arch can support HAVE_DYNAMIC_FTRACE_WITH_ARGS but
not HAVE_DYNAMIC_FTRACE_WITH_REGS and then have access to live patching.
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: live-patching@vger.kernel.org
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently, the only way to get access to the registers of a function via a
ftrace callback is to set the "FL_SAVE_REGS" bit in the ftrace_ops. But as this
saves all regs as if a breakpoint were to trigger (for use with kprobes), it
is expensive.
The regs are already saved on the stack for the default ftrace callbacks, as
that is required otherwise a function being traced will get the wrong
arguments and possibly crash. And on x86, the arguments are already stored
where they would be on a pt_regs structure to use that code for both the
regs version of a callback, it makes sense to pass that information always
to all functions.
If an architecture does this (as x86_64 now does), it is to set
HAVE_DYNAMIC_FTRACE_WITH_ARGS, and this will let the generic code that it
could have access to arguments without having to set the flags.
This also includes having the stack pointer being saved, which could be used
for accessing arguments on the stack, as well as having the function graph
tracer not require its own trampoline!
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In preparation to have arguments of a function passed to callbacks attached
to functions as default, change the default callback prototype to receive a
struct ftrace_regs as the forth parameter instead of a pt_regs.
For callbacks that set the FL_SAVE_REGS flag in their ftrace_ops flags, they
will now need to get the pt_regs via a ftrace_get_regs() helper call. If
this is called by a callback that their ftrace_ops did not have a
FL_SAVE_REGS flag set, it that helper function will return NULL.
This will allow the ftrace_regs to hold enough just to get the parameters
and stack pointer, but without the worry that callbacks may have a pt_regs
that is not completely filled.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
A test shows that the output contains a space:
# cat /proc/sgi_uv/archtype
NSGI4 U/UVX
Remove that embedded space by copying the "trimmed" buffer instead of the
untrimmed input character list. Use sizeof to remove size dependency on
copy out length. Increase output buffer size by one character just in case
BIOS sends an 8 character string for archtype.
Fixes: 1e61f5a95f19 ("Add and decode Arch Type in UVsystab")
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20201111010418.82133-1-mike.travis@hpe.com
PCI's default trigger type is level and ISA's is edge. The recent
refactoring made it the other way round, which went unnoticed as it seems
only to cause havoc on some AMD systems.
Make the comment and code do the right thing again.
Fixes: a27dca645d2c ("x86/io_apic: Cleanup trigger/polarity helpers")
Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Qian Cai <cai@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/87d00lgu13.fsf@nanos.tec.linutronix.de
struct perf_sample_data lives on-stack, we should be careful about it's
size. Furthermore, the pt_regs copy in there is only because x86_64 is a
trainwreck, solve it differently.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org
- Use SYM_FUNC_START_WEAK in the mem* ASM functions instead of a
combination of .weak and SYM_FUNC_START_LOCAL which makes LLVMs
integrated assembler upset.
- Correct the mitigation selection logic which prevented the related prctl
to work correctly.
- Make the UV5 hubless system work correctly by fixing up the malformed
table entries and adding the missing ones.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+oDNYTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoaN0EACPWY15k1YuAEIjiQxRBhq22J8Y6wNX
Ui/rF2AZcAnNEJDTIyvjP6COnT9mjX/tuuluMaI6i/XY/9Xp5LpKvivkL2PXNN3X
onW01ouIc1iYxXwQEVZvhYHsOyhkR9Z8yNG/q9I7xYAXNSZcAHwXVar4VlPBT7Ay
iP75i8pGmb/NCc4oHNXuBp/dV/0/dCoLTndb5p5pX8oS60AAt9ZuK3IRc3ucayhI
M4rTTEya1oY+ZNbtP4A4Jp7Qc/NGYDo6q04za+jcxZ5Gqacs+fk/PNuWgL1fZZtW
sn1D+SMWEb55Xcsdy976b29FFU/DcOcf7TRASzyKgyPW5jg1dP6BZ6U0wpVV3KZw
S2h5/pt48JZI7olrDsLQ0tzjALlk2CcFNrnRtOMDduHdw9wyz+Sg58lZYuvH3sXK
5ZblWRJ3JiBNsNO0sA3kd4sp7xWQB3ey6mkYD8Vqb7zRIt8aXT9jqBxhDrP+Vqs/
/UKv+BJfD6WxC0nQ4x6MS3g4sDvI+1SLfHSZ/UjWJ6NfYJW5/w429pFCaF73xCTd
cqxja1dZYixn7ioFZjolMUdvuDiC5B2+5+RzEV87kaDzO9QZQyvsl7G74MSfwx6G
DAydvuyJoxP2qVASobOBcVOzLQO7DsLzFZzJTttZcnkK2iprcz4qrsFLMxF9SxTD
Amb8qck60dLfqA==
=JdPk
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of x86 fixes:
- Use SYM_FUNC_START_WEAK in the mem* ASM functions instead of a
combination of .weak and SYM_FUNC_START_LOCAL which makes LLVMs
integrated assembler upset
- Correct the mitigation selection logic which prevented the related
prctl to work correctly
- Make the UV5 hubless system work correctly by fixing up the
malformed table entries and adding the missing ones"
* tag 'x86-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/uv: Recognize UV5 hubless system identifier
x86/platform/uv: Remove spaces from OEM IDs
x86/platform/uv: Fix missing OEM_TABLE_ID
x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S
Testing shows a problem in that UV5 hubless systems were not being
recognized. Add them to the list of OEM IDs checked.
Fixes: 6c7794423a998 ("Add UV5 direct references")
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201105222741.157029-4-mike.travis@hpe.com
Testing shows that trailing spaces caused problems with the OEM_ID and
the OEM_TABLE_ID. One being that the OEM_ID would not string compare
correctly. Another the OEM_ID and OEM_TABLE_ID would be concatenated
in the printout. Remove any trailing spaces.
Fixes: 1e61f5a95f191 ("Add and decode Arch Type in UVsystab")
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201105222741.157029-3-mike.travis@hpe.com
Testing shows a problem in that the OEM_TABLE_ID was missing for
hubless systems. This is used to determine the APIC type (legacy or
extended). Add the OEM_TABLE_ID to the early hubless processing.
Fixes: 1e61f5a95f191 ("Add and decode Arch Type in UVsystab")
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201105222741.157029-2-mike.travis@hpe.com
Currently, accessing /proc/cpuinfo sends IPIs to idle CPUs in order to
learn their clock frequency. Which is a bit strange, given that waking
them from idle likely significantly changes their clock frequency.
This commit therefore avoids sending /proc/cpuinfo-induced IPIs to
idle CPUs.
[ paulmck: Also check for idle in arch_freq_prepare_all(). ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
The aperfmperf_snapshot_cpu() function is invoked upon access to
/proc/cpuinfo, and it does do an early exit if the specified CPU has
recently done a snapshot. Unfortunately, the indication that a snapshot
has been completed is set in an IPI handler, and the execution of this
handler can be delayed by any number of unfortunate events. This means
that a system that starts a number of applications, each of which
parses /proc/cpuinfo, can suffer from an smp_call_function_single()
storm, especially given that each access to /proc/cpuinfo invokes
smp_call_function_single() for all CPUs. Please note that this is not
theoretical speculation. Note also that one CPU's pending IPI serves
all requests, so there is no point in ever having more than one IPI
pending to a given CPU.
This commit therefore suppresses duplicate IPIs to a given CPU via a
new ->scfpending field in the aperfmperf_sample structure. This field
is set to the value one if an IPI is pending to the corresponding CPU
and to zero otherwise.
The aperfmperf_snapshot_cpu() function uses atomic_xchg() to set this
field to the value one and sample the old value. If this function's
"wait" parameter is zero, smp_call_function_single() is called only if
the old value of the ->scfpending field was zero. The IPI handler uses
atomic_set_release() to set this new field to zero just before returning,
so that the prior stores into the aperfmperf_sample structure are seen
by future requests that get to the atomic_xchg(). Future requests that
pass the elapsed-time check are ordered by the fact that on x86 loads act
as acquire loads, just as was the case prior to this change. The return
value is based off of the age of the prior snapshot, just as before.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
[ paulmck: Allow /proc/cpuinfo to take advantage of arch_freq_get_on_cpu(). ]
[ paulmck: Add comment on memory barrier. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Commit
c9c6d216ed28 ("x86/mce: Rename "first" function as "early"")
changed the enumeration of MCE notifier priorities. Correct the check
for notifier priorities to cover the new range.
[ bp: Rewrite commit message, remove superfluous brackets in
conditional. ]
Fixes: c9c6d216ed28 ("x86/mce: Rename "first" function as "early"")
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201106141216.2062-2-thunder.leizhen@huawei.com
This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file
"recursed_functions" all the functions that caused recursion while a
callback to the function tracer was running.
Link: https://lkml.kernel.org/r/20201106023548.102375687@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Guo Ren <guoren@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-csky@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: live-patching@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If a ftrace callback does not supply its own recursion protection and
does not set the RECURSION_SAFE flag in its ftrace_ops, then ftrace will
make a helper trampoline to do so before calling the callback instead of
just calling the callback directly.
The default for ftrace_ops is going to change. It will expect that handlers
provide their own recursion protection, unless its ftrace_ops states
otherwise.
Link: https://lkml.kernel.org/r/20201028115613.140212174@goodmis.org
Link: https://lkml.kernel.org/r/20201106023546.944907560@goodmis.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-csky@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Move the x86 IMA arch code into security/integrity/ima/ima_efi.c,
so that we will be able to wire it up for arm64 in a future patch.
Co-developed-by: Chester Lin <clin@suse.com>
Signed-off-by: Chester Lin <clin@suse.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON,
STIBP is set to on and
spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED
At the same time, IBPB can be set to conditional.
However, this leads to the case where it's impossible to turn on IBPB
for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the
spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED
condition leads to a return before the task flag is set. Similarly,
ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to
conditional.
More generally, the following cases are possible:
1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb
2. STIBP = on && IBPB = conditional for AMD CPUs with
X86_FEATURE_AMD_STIBP_ALWAYS_ON
The first case functions correctly today, but only because
spectre_v2_user_ibpb isn't updated to reflect the IBPB mode.
At a high level, this change does one thing. If either STIBP or IBPB
is set to conditional, allow the prctl to change the task flag.
Also, reflect that capability when querying the state. This isn't
perfect since it doesn't take into account if only STIBP or IBPB is
unconditionally on. But it allows the conditional feature to work as
expected, without affecting the unconditional one.
[ bp: Massage commit message and comment; space out statements for
better readability. ]
Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
Signed-off-by: Anand K Mistry <amistry@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid