81a31a860b
Without EXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary physical memory regions into the userspace via /dev/mem. At the same time, pages may change their properties (e.g., from anonymous pages to named pages) while they are still being mapped in the userspace, leading to "corruption" detected by the page table check. To avoid these false positives, this patch makes PAGE_TABLE_CHECK depends on EXCLUSIVE_SYSTEM_RAM. This dependency is understandable because PAGE_TABLE_CHECK is a hardening technique but /dev/mem without STRICT_DEVMEM (i.e., !EXCLUSIVE_SYSTEM_RAM) is itself a security problem. Even with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be mapped via /dev/mem. However, these pages are always considered as named pages, so they won't break the logic used in the page table check. Cc: <stable@vger.kernel.org> # 5.17 Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/r/20230515130958.32471-4-lrh2000@pku.edu.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
74 lines
3.3 KiB
ReStructuredText
74 lines
3.3 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
================
|
|
Page Table Check
|
|
================
|
|
|
|
Introduction
|
|
============
|
|
|
|
Page table check allows to harden the kernel by ensuring that some types of
|
|
the memory corruptions are prevented.
|
|
|
|
Page table check performs extra verifications at the time when new pages become
|
|
accessible from the userspace by getting their page table entries (PTEs PMDs
|
|
etc.) added into the table.
|
|
|
|
In case of detected corruption, the kernel is crashed. There is a small
|
|
performance and memory overhead associated with the page table check. Therefore,
|
|
it is disabled by default, but can be optionally enabled on systems where the
|
|
extra hardening outweighs the performance costs. Also, because page table check
|
|
is synchronous, it can help with debugging double map memory corruption issues,
|
|
by crashing kernel at the time wrong mapping occurs instead of later which is
|
|
often the case with memory corruptions bugs.
|
|
|
|
Double mapping detection logic
|
|
==============================
|
|
|
|
+-------------------+-------------------+-------------------+------------------+
|
|
| Current Mapping | New mapping | Permissions | Rule |
|
|
+===================+===================+===================+==================+
|
|
| Anonymous | Anonymous | Read | Allow |
|
|
+-------------------+-------------------+-------------------+------------------+
|
|
| Anonymous | Anonymous | Read / Write | Prohibit |
|
|
+-------------------+-------------------+-------------------+------------------+
|
|
| Anonymous | Named | Any | Prohibit |
|
|
+-------------------+-------------------+-------------------+------------------+
|
|
| Named | Anonymous | Any | Prohibit |
|
|
+-------------------+-------------------+-------------------+------------------+
|
|
| Named | Named | Any | Allow |
|
|
+-------------------+-------------------+-------------------+------------------+
|
|
|
|
Enabling Page Table Check
|
|
=========================
|
|
|
|
Build kernel with:
|
|
|
|
- PAGE_TABLE_CHECK=y
|
|
Note, it can only be enabled on platforms where ARCH_SUPPORTS_PAGE_TABLE_CHECK
|
|
is available.
|
|
|
|
- Boot with 'page_table_check=on' kernel parameter.
|
|
|
|
Optionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have page
|
|
table support without extra kernel parameter.
|
|
|
|
Implementation notes
|
|
====================
|
|
|
|
We specifically decided not to use VMA information in order to avoid relying on
|
|
MM states (except for limited "struct page" info). The page table check is a
|
|
separate from Linux-MM state machine that verifies that the user accessible
|
|
pages are not falsely shared.
|
|
|
|
PAGE_TABLE_CHECK depends on EXCLUSIVE_SYSTEM_RAM. The reason is that without
|
|
EXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary physical memory
|
|
regions into the userspace via /dev/mem. At the same time, pages may change
|
|
their properties (e.g., from anonymous pages to named pages) while they are
|
|
still being mapped in the userspace, leading to "corruption" detected by the
|
|
page table check.
|
|
|
|
Even with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be mapped via
|
|
/dev/mem. However, these pages are always considered as named pages, so they
|
|
won't break the logic used in the page table check.
|