linux/include
Ryan Roberts 19eaf44954 mm: thp: support allocation of anonymous multi-size THP
Introduce the logic to allow THP to be configured (through the new sysfs
interface we just added) to allocate large folios to back anonymous
memory, which are larger than the base page size but smaller than
PMD-size.  We call this new THP extension "multi-size THP" (mTHP).

mTHP continues to be PTE-mapped, but in many cases can still provide
similar benefits to traditional PMD-sized THP: Page faults are
significantly reduced (by a factor of e.g.  4, 8, 16, etc.  depending on
the configured order), but latency spikes are much less prominent because
the size of each page isn't as huge as the PMD-sized variant and there is
less memory to clear in each page fault.  The number of per-page
operations (e.g.  ref counting, rmap management, lru list management) are
also significantly reduced since those ops now become per-folio.

Some architectures also employ TLB compression mechanisms to squeeze more
entries in when a set of PTEs are virtually and physically contiguous and
approporiately aligned.  In this case, TLB misses will occur less often.

The new behaviour is disabled by default, but can be enabled at runtime by
writing to /sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled (see
documentation in previous commit).  The long term aim is to change the
default to include suitable lower orders, but there are some risks around
internal fragmentation that need to be better understood first.

[ryan.roberts@arm.com: resolve some multi-size THP review nits]
  Link: https://lkml.kernel.org/r/20231214160251.3574571-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20231207161211.2374093-5-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Itaru Kitayama <itaru.kitayama@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-20 14:48:12 -08:00
..
acpi ACPI: PM: Add acpi_device_fix_up_power_children() function 2023-11-20 17:31:49 +01:00
asm-generic asm-generic: qspinlock: fix queued_spin_value_unlocked() implementation 2023-11-22 09:32:49 -08:00
clocksource
crypto
drm amd-drm-fixes-6.7-2023-11-30: 2023-12-01 13:57:11 +10:00
dt-bindings linux-watchdog 6.7-rc1 tag 2023-11-09 13:54:25 -08:00
keys
kunit
kvm
linux mm: thp: support allocation of anonymous multi-size THP 2023-12-20 14:48:12 -08:00
math-emu
media
memory
misc
net wireless fixes: 2023-11-29 19:43:34 -08:00
pcmcia
ras
rdma
rv
scsi scsi: sd: Fix system start for ATA devices 2023-11-24 20:44:21 -05:00
soc IOMMU Updates for Linux v6.7 2023-11-09 13:37:28 -08:00
sound ALSA: cs35l41: Fix for old systems which do not support command 2023-11-20 12:37:01 +01:00
target
trace rxrpc: Fix RTT determination to use any ACK as a source 2023-11-17 02:50:33 +00:00
uapi fs/proc/task_mmu: report SOFT_DIRTY bits through the PAGEMAP_SCAN ioctl 2023-12-10 16:51:35 -08:00
ufs
vdso
video
xen xen/events: reduce externally visible helper functions 2023-11-14 09:29:28 +01:00