c40a56a781
The kernel image is mapped into two places in the virtual address space (addresses without KASLR, of course): 1. The kernel direct map (0xffff880000000000) 2. The "high kernel map" (0xffffffff81000000) We actually execute out of #2. If we get the address of a kernel symbol, it points to #2, but almost all physical-to-virtual translations point to Parts of the "high kernel map" alias are mapped in the userspace page tables with the Global bit for performance reasons. The parts that we map to userspace do not (er, should not) have secrets. When PTI is enabled then the global bit is usually not set in the high mapping and just used to compensate for poor performance on systems which lack PCID. This is fine, except that some areas in the kernel image that are adjacent to the non-secret-containing areas are unused holes. We free these holes back into the normal page allocator and reuse them as normal kernel memory. The memory will, of course, get *used* via the normal map, but the alias mapping is kept. This otherwise unused alias mapping of the holes will, by default keep the Global bit, be mapped out to userspace, and be vulnerable to Meltdown. Remove the alias mapping of these pages entirely. This is likely to fracture the 2M page mapping the kernel image near these areas, but this should affect a minority of the area. The pageattr code changes *all* aliases mapping the physical pages that it operates on (by default). We only want to modify a single alias, so we need to tweak its behavior. This unmapping behavior is currently dependent on PTI being in place. Going forward, we should at least consider doing this for all configurations. Having an extra read-write alias for memory is not exactly ideal for debugging things like random memory corruption and this does undercut features like DEBUG_PAGEALLOC or future work like eXclusive Page Frame Ownership (XPFO). Before this patch: current_kernel:---[ High Kernel Mapping ]--- current_kernel-0xffffffff80000000-0xffffffff81000000 16M pmd current_kernel-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd current_kernel-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte current_kernel-0xffffffff81e11000-0xffffffff82000000 1980K RW NX pte current_kernel-0xffffffff82000000-0xffffffff82600000 6M ro PSE GLB NX pmd current_kernel-0xffffffff82600000-0xffffffff82c00000 6M RW PSE NX pmd current_kernel-0xffffffff82c00000-0xffffffff82e00000 2M RW NX pte current_kernel-0xffffffff82e00000-0xffffffff83200000 4M RW PSE NX pmd current_kernel-0xffffffff83200000-0xffffffffa0000000 462M pmd current_user:---[ High Kernel Mapping ]--- current_user-0xffffffff80000000-0xffffffff81000000 16M pmd current_user-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd current_user-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte current_user-0xffffffff81e11000-0xffffffff82000000 1980K RW NX pte current_user-0xffffffff82000000-0xffffffff82600000 6M ro PSE GLB NX pmd current_user-0xffffffff82600000-0xffffffffa0000000 474M pmd After this patch: current_kernel:---[ High Kernel Mapping ]--- current_kernel-0xffffffff80000000-0xffffffff81000000 16M pmd current_kernel-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd current_kernel-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte current_kernel-0xffffffff81e11000-0xffffffff82000000 1980K pte current_kernel-0xffffffff82000000-0xffffffff82400000 4M ro PSE GLB NX pmd current_kernel-0xffffffff82400000-0xffffffff82488000 544K ro NX pte current_kernel-0xffffffff82488000-0xffffffff82600000 1504K pte current_kernel-0xffffffff82600000-0xffffffff82c00000 6M RW PSE NX pmd current_kernel-0xffffffff82c00000-0xffffffff82c0d000 52K RW NX pte current_kernel-0xffffffff82c0d000-0xffffffff82dc0000 1740K pte current_user:---[ High Kernel Mapping ]--- current_user-0xffffffff80000000-0xffffffff81000000 16M pmd current_user-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd current_user-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte current_user-0xffffffff81e11000-0xffffffff82000000 1980K pte current_user-0xffffffff82000000-0xffffffff82400000 4M ro PSE GLB NX pmd current_user-0xffffffff82400000-0xffffffff82488000 544K ro NX pte current_user-0xffffffff82488000-0xffffffff82600000 1504K pte current_user-0xffffffff82600000-0xffffffffa0000000 474M pmd [ tglx: Do not unmap on 32bit as there is only one mapping ] Fixes: 0f561fce4d69 ("x86/pti: Enable global pages for shared areas") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/20180802225831.5F6A2BFC@viggo.jf.intel.com
93 lines
3.9 KiB
C
93 lines
3.9 KiB
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
#ifndef _ASM_X86_SET_MEMORY_H
|
|
#define _ASM_X86_SET_MEMORY_H
|
|
|
|
#include <asm/page.h>
|
|
#include <asm-generic/set_memory.h>
|
|
|
|
/*
|
|
* The set_memory_* API can be used to change various attributes of a virtual
|
|
* address range. The attributes include:
|
|
* Cachability : UnCached, WriteCombining, WriteThrough, WriteBack
|
|
* Executability : eXeutable, NoteXecutable
|
|
* Read/Write : ReadOnly, ReadWrite
|
|
* Presence : NotPresent
|
|
* Encryption : Encrypted, Decrypted
|
|
*
|
|
* Within a category, the attributes are mutually exclusive.
|
|
*
|
|
* The implementation of this API will take care of various aspects that
|
|
* are associated with changing such attributes, such as:
|
|
* - Flushing TLBs
|
|
* - Flushing CPU caches
|
|
* - Making sure aliases of the memory behind the mapping don't violate
|
|
* coherency rules as defined by the CPU in the system.
|
|
*
|
|
* What this API does not do:
|
|
* - Provide exclusion between various callers - including callers that
|
|
* operation on other mappings of the same physical page
|
|
* - Restore default attributes when a page is freed
|
|
* - Guarantee that mappings other than the requested one are
|
|
* in any state, other than that these do not violate rules for
|
|
* the CPU you have. Do not depend on any effects on other mappings,
|
|
* CPUs other than the one you have may have more relaxed rules.
|
|
* The caller is required to take care of these.
|
|
*/
|
|
|
|
int _set_memory_uc(unsigned long addr, int numpages);
|
|
int _set_memory_wc(unsigned long addr, int numpages);
|
|
int _set_memory_wt(unsigned long addr, int numpages);
|
|
int _set_memory_wb(unsigned long addr, int numpages);
|
|
int set_memory_uc(unsigned long addr, int numpages);
|
|
int set_memory_wc(unsigned long addr, int numpages);
|
|
int set_memory_wt(unsigned long addr, int numpages);
|
|
int set_memory_wb(unsigned long addr, int numpages);
|
|
int set_memory_np(unsigned long addr, int numpages);
|
|
int set_memory_4k(unsigned long addr, int numpages);
|
|
int set_memory_encrypted(unsigned long addr, int numpages);
|
|
int set_memory_decrypted(unsigned long addr, int numpages);
|
|
int set_memory_np_noalias(unsigned long addr, int numpages);
|
|
|
|
int set_memory_array_uc(unsigned long *addr, int addrinarray);
|
|
int set_memory_array_wc(unsigned long *addr, int addrinarray);
|
|
int set_memory_array_wt(unsigned long *addr, int addrinarray);
|
|
int set_memory_array_wb(unsigned long *addr, int addrinarray);
|
|
|
|
int set_pages_array_uc(struct page **pages, int addrinarray);
|
|
int set_pages_array_wc(struct page **pages, int addrinarray);
|
|
int set_pages_array_wt(struct page **pages, int addrinarray);
|
|
int set_pages_array_wb(struct page **pages, int addrinarray);
|
|
|
|
/*
|
|
* For legacy compatibility with the old APIs, a few functions
|
|
* are provided that work on a "struct page".
|
|
* These functions operate ONLY on the 1:1 kernel mapping of the
|
|
* memory that the struct page represents, and internally just
|
|
* call the set_memory_* function. See the description of the
|
|
* set_memory_* function for more details on conventions.
|
|
*
|
|
* These APIs should be considered *deprecated* and are likely going to
|
|
* be removed in the future.
|
|
* The reason for this is the implicit operation on the 1:1 mapping only,
|
|
* making this not a generally useful API.
|
|
*
|
|
* Specifically, many users of the old APIs had a virtual address,
|
|
* called virt_to_page() or vmalloc_to_page() on that address to
|
|
* get a struct page* that the old API required.
|
|
* To convert these cases, use set_memory_*() on the original
|
|
* virtual address, do not use these functions.
|
|
*/
|
|
|
|
int set_pages_uc(struct page *page, int numpages);
|
|
int set_pages_wb(struct page *page, int numpages);
|
|
int set_pages_x(struct page *page, int numpages);
|
|
int set_pages_nx(struct page *page, int numpages);
|
|
int set_pages_ro(struct page *page, int numpages);
|
|
int set_pages_rw(struct page *page, int numpages);
|
|
|
|
extern int kernel_set_to_readonly;
|
|
void set_kernel_text_rw(void);
|
|
void set_kernel_text_ro(void);
|
|
|
|
#endif /* _ASM_X86_SET_MEMORY_H */
|