linux/mm
David Hildenbrand 05c5323b2a mm: track mapcount of large folios in single value
Let's track the mapcount of large folios in a single value.  The mapcount
of a large folio currently corresponds to the sum of the entire mapcount
and all page mapcounts.

This sum is what we actually want to know in folio_mapcount() and it is
also sufficient for implementing folio_mapped().

With PTE-mapped THP becoming more important and more widely used, we want
to avoid looping over all pages of a folio just to obtain the mapcount of
large folios.  The comment "In the common case, avoid the loop when no
pages mapped by PTE" in folio_total_mapcount() does no longer hold for
mTHP that are always mapped by PTE.

Further, we are planning on using folio_mapcount() more frequently, and
might even want to remove page mapcounts for large folios in some kernel
configs.  Therefore, allow for reading the mapcount of large folios
efficiently and atomically without looping over any pages.

Maintain the mapcount also for hugetlb pages for simplicity.  Use the new
mapcount to implement folio_mapcount() and folio_mapped().  Make
page_mapped() simply call folio_mapped().  We can now get rid of
folio_large_is_mapped().

_nr_pages_mapped is now only used in rmap code and for debugging purposes.
Keep folio_nr_pages_mapped() around, but document that its use should be
limited to rmap internals and debugging purposes.

This change implies one additional atomic add/sub whenever
mapping/unmapping (parts of) a large folio.

As we now batch RMAP operations for PTE-mapped THP during fork(), during
unmap/zap, and when PTE-remapping a PMD-mapped THP, and we adjust the
large mapcount for a PTE batch only once, the added overhead in the common
case is small.  Only when unmapping individual pages of a large folio
(e.g., during COW), the overhead might be bigger in comparison, but it's
essentially one additional atomic operation.

Note that before the new mapcount would overflow, already our refcount
would overflow: each mapping requires a folio reference.  Extend the
focumentation of folio_mapcount().

Link: https://lkml.kernel.org/r/20240409192301.907377-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Richard Chang <richardycc@google.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05 17:53:28 -07:00
..
damon mm: madvise: pageout: ignore references rather than clearing young 2024-03-04 17:01:18 -08:00
kasan fix missing vmalloc.h includes 2024-04-25 20:55:49 -07:00
kfence mm: introduce slabobj_ext to support slab object extensions 2024-04-25 20:55:51 -07:00
kmsan mm: kmsan: remove runtime checks from kmsan_unpoison_memory() 2024-02-22 10:24:41 -08:00
backing-dev.c mm: backing-dev: use group allocation/free of per-cpu counters API 2024-04-25 20:56:12 -07:00
balloon_compaction.c
bootmem_info.c bootmem: use kmemleak_free_part_phys in put_page_bootmem 2023-10-25 16:47:13 -07:00
cma_debug.c
cma_sysfs.c mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
cma.c mm/cma: drop incorrect alignment check in cma_init_reserved_mem 2024-04-25 20:56:42 -07:00
cma.h mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
compaction.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
debug_page_alloc.c mm: page_alloc: consolidate free page accounting 2024-04-25 20:56:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c fix missing vmalloc.h includes 2024-04-25 20:55:49 -07:00
debug.c mm: track mapcount of large folios in single value 2024-05-05 17:53:28 -07:00
dmapool_test.c
dmapool.c mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs 2023-12-05 11:17:58 +01:00
early_ioremap.c
fadvise.c
fail_page_alloc.c
failslab.c
filemap.c mm: use "GUP-fast" instead "fast GUP" in remaining comments 2024-04-25 20:56:41 -07:00
folio-compat.c mm: remove __set_page_dirty_nobuffers() 2024-04-25 20:56:25 -07:00
gup_test.c
gup_test.h
gup.c mm/treewide: rename CONFIG_HAVE_FAST_GUP to CONFIG_HAVE_GUP_FAST 2024-04-25 20:56:41 -07:00
highmem.c x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs 2023-12-29 12:22:28 -08:00
hmm.c mm/treewide: replace pXd_huge() with pXd_leaf() 2024-04-25 20:55:46 -07:00
huge_memory.c mm: swap: remove CLUSTER_FLAG_HUGE from swap_cluster_info:flags 2024-04-25 20:56:36 -07:00
hugetlb_cgroup.c mm, hugetlb: remove HUGETLB_CGROUP_MIN_ORDER 2023-10-18 14:34:17 -07:00
hugetlb_vmemmap.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: fix reference to nonexistent file 2023-10-25 16:47:14 -07:00
hugetlb.c mm: track mapcount of large folios in single value 2024-05-05 17:53:28 -07:00
hwpoison-inject.c
init-mm.c mm: Deprecate pasid field 2023-12-12 10:11:32 +01:00
internal.h mm: track mapcount of large folios in single value 2024-05-05 17:53:28 -07:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig mm/treewide: rename CONFIG_HAVE_FAST_GUP to CONFIG_HAVE_GUP_FAST 2024-04-25 20:56:41 -07:00
Kconfig.debug mm/slub: unify all sl[au]b parameters with "slab_$param" 2024-01-22 10:31:08 +01:00
khugepaged.c mm: track mapcount of large folios in single value 2024-05-05 17:53:28 -07:00
kmemleak.c mm/kmemleak: compact kmemleak_object further 2024-04-25 20:56:05 -07:00
ksm.c mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]() 2023-12-29 11:58:56 -08:00
list_lru.c mm/zswap: stop lru list shrinking when encounter warm region 2024-02-22 10:24:54 -08:00
maccess.c
madvise.c mm: madvise: avoid split during MADV_PAGEOUT and MADV_COLD 2024-04-25 20:56:38 -07:00
Makefile mm/kmemleak: disable KASAN instrumentation in kmemleak 2024-04-25 20:56:05 -07:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c cxl fixes for 6.8-rc6 2024-02-24 15:53:40 -08:00
memcontrol.c mm, slab: move slab_memcg hooks to mm/memcontrol.c 2024-04-25 20:56:16 -07:00
memfd.c mm/memfd: refactor memfd_tag_pins() and memfd_wait_for_pins() 2024-03-04 17:01:21 -08:00
memory_hotplug.c mm: record the migration reason for struct migration_target_control 2024-04-25 20:56:06 -07:00
memory-failure.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
memory-tiers.c memory tier: create CPUless memory tiers after obtaining HMAT info 2024-05-05 17:53:26 -07:00
memory.c mm: follow_pte() improvements 2024-05-05 17:53:27 -07:00
mempolicy.c mm: add pmd_folio() 2024-04-25 20:56:19 -07:00
mempool.c mempool: hook up to memory allocation profiling 2024-04-25 20:55:56 -07:00
memremap.c mm: convert free_zone_device_page to free_zone_device_folio 2024-04-25 20:56:44 -07:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-03-13 12:12:21 -07:00
migrate_device.c mm: convert migrate_vma_collect_pmd to use a folio 2024-04-25 20:56:19 -07:00
migrate.c remove references to page->flags in documentation 2024-04-25 20:56:15 -07:00
mincore.c
mlock.c mm: add pmd_folio() 2024-04-25 20:56:19 -07:00
mm_init.c mm: init_mlocked_on_free_v3 2024-04-25 20:56:29 -07:00
mm_slot.h
mmap_lock.c
mmap.c mm/mmap: make accountable_mapping return bool 2024-05-05 17:53:26 -07:00
mmu_gather.c mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing 2024-02-22 15:27:17 -08:00
mmu_notifier.c
mmzone.c zswap: shrink zswap pool based on memory pressure 2023-12-12 10:57:02 -08:00
mprotect.c mm: support multi-size THP numa balancing 2024-04-25 20:56:30 -07:00
mremap.c mm: remove "prot" parameter from move_pte() 2024-04-25 20:56:24 -07:00
msync.c
nommu.c mm: remove follow_pfn 2024-04-25 20:56:12 -07:00
oom_kill.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
page_alloc.c mm: track mapcount of large folios in single value 2024-05-05 17:53:28 -07:00
page_counter.c
page_ext.c mm: make page_ext_get() take a const argument 2024-04-25 20:56:14 -07:00
page_idle.c
page_io.c arm64: mm: swap: support THP_SWAP on hardware with MTE 2024-04-25 20:56:07 -07:00
page_isolation.c mm: page_isolation: prepare for hygienic freelists 2024-04-25 20:56:04 -07:00
page_owner.c mm: introduce slabobj_ext to support slab object extensions 2024-04-25 20:55:51 -07:00
page_poison.c mm/page_poison: replace kmap_atomic() with kmap_local_page() 2023-12-10 16:51:50 -08:00
page_reporting.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c mm: convert page_table_check_pte_set() to page_table_check_ptes_set() 2023-08-24 16:20:18 -07:00
page_vma_mapped.c mm: rename vma_pgoff_address back to vma_address 2024-04-25 20:56:31 -07:00
page-writeback.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
pagewalk.c mm: pagewalk: assert write mmap lock only for walking the user page tables 2023-12-10 16:51:53 -08:00
percpu-internal.h mm: percpu: add codetag reference into pcpuobj_ext 2024-04-25 20:55:56 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c percpu: clean up all mappings when pcpu_map_pages() fails 2024-04-25 20:55:49 -07:00
percpu.c mm: percpu: enable per-cpu allocation tagging 2024-04-25 20:55:56 -07:00
pgalloc-track.h
pgtable-generic.c
process_vm_access.c mm: fix process_vm_rw page counts 2023-12-10 16:51:39 -08:00
ptdump.c mm: ptdump: add check_wx_pages debugfs attribute 2024-02-22 10:24:47 -08:00
readahead.c mm/readahead: break read-ahead loop if filemap_add_folio return -ENOMEM 2024-04-25 20:56:07 -07:00
rmap.c mm: track mapcount of large folios in single value 2024-05-05 17:53:28 -07:00
rodata_test.c
secretmem.c
shmem_quota.c tmpfs: fix race on handling dquot rbtree 2024-03-26 11:07:23 -07:00
shmem.c mm: switch mm->get_unmapped_area() to a flag 2024-04-25 20:56:25 -07:00
show_mem.c lib: add memory allocations report in show_mem() 2024-04-25 20:55:57 -07:00
shrinker_debug.c mm: shrinker: convert shrinker_rwsem to mutex 2023-10-04 10:32:26 -07:00
shrinker.c mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info() 2024-01-05 09:58:32 -08:00
shuffle.c
shuffle.h mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
slab_common.c mm/slab: enable slab allocation tagging for kmalloc and friends 2024-04-25 20:55:55 -07:00
slab.h mm, slab: move slab_memcg hooks to mm/memcontrol.c 2024-04-25 20:56:16 -07:00
slub.c mm, slab: move slab_memcg hooks to mm/memcontrol.c 2024-04-25 20:56:16 -07:00
sparse-vmemmap.c
sparse.c mm: move array mem_section init code out of memory_present() 2024-04-25 20:56:16 -07:00
swap_cgroup.c
swap_slots.c mm: swap: update get_swap_pages() to take folio order 2024-04-25 20:56:37 -07:00
swap_state.c mm: remove struct page from get_shadow_from_swap_cache 2024-04-25 20:56:40 -07:00
swap.c mm: convert free_zone_device_page to free_zone_device_folio 2024-04-25 20:56:44 -07:00
swap.h mm/swap: fix race when skipping swapcache 2024-02-20 14:20:48 -08:00
swapfile.c mm,swap: add document about RCU read lock and swapoff interaction 2024-05-05 17:53:26 -07:00
truncate.c mm: convert pagecache_isize_extended to use a folio 2024-04-25 20:56:43 -07:00
usercopy.c
userfaultfd.c mm: add pmd_folio() 2024-04-25 20:56:19 -07:00
util.c mm: switch mm->get_unmapped_area() to a flag 2024-04-25 20:56:25 -07:00
vmalloc.c mm/vmalloc.c: optimize to reduce arguments of alloc_vmap_area() 2024-04-25 20:56:08 -07:00
vmpressure.c eventfd: simplify eventfd_signal() 2023-11-28 14:08:38 +01:00
vmscan.c mm: vmscan: avoid split during shrink_folio_list() 2024-04-25 20:56:38 -07:00
vmstat.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
workingset.c mm: move mapping_set_update out of <linux/swap.h> 2024-02-21 11:36:50 +05:30
z3fold.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zbud.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zpool.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zsmalloc.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zswap.c zswap: replace RB tree with xarray 2024-04-25 20:56:18 -07:00