linux/mm
Mel Gorman 4d73ba5fa7 mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages
A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is
taking an excessive amount of time for large amounts of memory.  Further
testing allocating huge pages that the cost is linear i.e.  if allocating
1G pages in batches of 10 then the time to allocate nr_hugepages from
10->20->30->etc increases linearly even though 10 pages are allocated at
each step.  Profiles indicated that much of the time is spent checking the
validity within already existing huge pages and then attempting a
migration that fails after isolating the range, draining pages and a whole
lot of other useless work.

Commit eb14d4eefd ("mm,page_alloc: drop unnecessary checks from
pfn_range_valid_contig") removed two checks, one which ignored huge pages
for contiguous allocations as huge pages can sometimes migrate.  While
there may be value on migrating a 2M page to satisfy a 1G allocation, it's
potentially expensive if the 1G allocation fails and it's pointless to try
moving a 1G page for a new 1G allocation or scan the tail pages for valid
PFNs.

Reintroduce the PageHuge check and assume any contiguous region with
hugetlbfs pages is unsuitable for a new 1G allocation.

The hpagealloc test allocates huge pages in batches and reports the
average latency per page over time.  This test happens just after boot
when fragmentation is not an issue.  Units are in milliseconds.

hpagealloc
                               6.3.0-rc6              6.3.0-rc6              6.3.0-rc6
                                 vanilla   hugeallocrevert-v1r1   hugeallocsimple-v1r2
Min       Latency       26.42 (   0.00%)        5.07 (  80.82%)       18.94 (  28.30%)
1st-qrtle Latency      356.61 (   0.00%)        5.34 (  98.50%)       19.85 (  94.43%)
2nd-qrtle Latency      697.26 (   0.00%)        5.47 (  99.22%)       20.44 (  97.07%)
3rd-qrtle Latency      972.94 (   0.00%)        5.50 (  99.43%)       20.81 (  97.86%)
Max-1     Latency       26.42 (   0.00%)        5.07 (  80.82%)       18.94 (  28.30%)
Max-5     Latency       82.14 (   0.00%)        5.11 (  93.78%)       19.31 (  76.49%)
Max-10    Latency      150.54 (   0.00%)        5.20 (  96.55%)       19.43 (  87.09%)
Max-90    Latency     1164.45 (   0.00%)        5.53 (  99.52%)       20.97 (  98.20%)
Max-95    Latency     1223.06 (   0.00%)        5.55 (  99.55%)       21.06 (  98.28%)
Max-99    Latency     1278.67 (   0.00%)        5.57 (  99.56%)       22.56 (  98.24%)
Max       Latency     1310.90 (   0.00%)        8.06 (  99.39%)       26.62 (  97.97%)
Amean     Latency      678.36 (   0.00%)        5.44 *  99.20%*       20.44 *  96.99%*

                   6.3.0-rc6   6.3.0-rc6   6.3.0-rc6
                     vanilla   revert-v1   hugeallocfix-v2
Duration User           0.28        0.27        0.30
Duration System       808.66       17.77       35.99
Duration Elapsed      830.87       18.08       36.33

The vanilla kernel is poor, taking up to 1.3 second to allocate a huge
page and almost 10 minutes in total to run the test.  Reverting the
problematic commit reduces it to 8ms at worst and the patch takes 26ms. 
This patch fixes the main issue with skipping huge pages but leaves the
page_count() out because a page with an elevated count potentially can
migrate.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=217022
Link: https://lkml.kernel.org/r/20230414141429.pwgieuwluxwez3rj@techsingularity.net
Fixes: eb14d4eefd ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Yuanxi Liu <y.liu@naruida.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18 14:22:14 -07:00
..
damon mm/damon/paddr: fix folio_nr_pages() after folio_put() in damon_pa_mark_accessed_or_deactivate() 2023-03-07 17:04:55 -08:00
kasan kasan: test: fix test for new meminstrinsic instrumentation 2023-03-02 21:54:22 -08:00
kfence mm: kfence: fix handling discontiguous page 2023-03-28 15:24:32 -07:00
kmsan mm: kmsan: handle alloc failures in kmsan_ioremap_page_range() 2023-04-18 14:22:13 -07:00
backing-dev.c writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs 2023-04-16 10:41:26 -07:00
balloon_compaction.c
bootmem_info.c
cma_debug.c
cma_sysfs.c
cma.c mm/cma: fix potential memory loss on cma_declare_contiguous_nid 2023-02-02 22:33:24 -08:00
cma.h
compaction.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
debug_page_ref.c
debug_vm_pgtable.c mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE 2023-02-02 22:33:11 -08:00
debug.c mm: export dump_mm() 2023-02-09 16:51:40 -08:00
dmapool.c
early_ioremap.c
fadvise.c mm: support POSIX_FADV_NOREUSE 2023-01-18 17:12:57 -08:00
failslab.c mm: fix unexpected changes to {failslab|fail_page_alloc}.attr 2022-11-22 18:50:44 -08:00
filemap.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
folio-compat.c mm: change to return bool for isolate_lru_page() 2023-02-20 12:46:17 -08:00
frontswap.c
gup_test.c mm/gup_test: free memory allocated via kvcalloc() using kvfree() 2022-12-15 16:37:48 -08:00
gup_test.h
gup.c mm: change to return bool for folio_isolate_lru() 2023-02-20 12:46:17 -08:00
highmem.c
hmm.c mm/hugetlb: make walk_hugetlb_range() safe to pmd unshare 2023-01-18 17:12:39 -08:00
huge_memory.c mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO 2023-04-16 10:41:25 -07:00
hugetlb_cgroup.c mm/hugetlb: increase use of folios in alloc_huge_page() 2023-02-13 15:54:27 -08:00
hugetlb_vmemmap.c sysctl: fix proc_dobool() usability 2023-02-21 13:34:07 -08:00
hugetlb_vmemmap.h
hugetlb.c mm/hugetlb: fix uffd wr-protection for CoW optimization path 2023-04-05 18:06:22 -07:00
hwpoison-inject.c
init-mm.c
internal.h - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig zsmalloc: set default zspage chain size to 8 2023-02-02 22:33:23 -08:00
Kconfig.debug mm: move KMEMLEAK's Kconfig items from lib to mm 2023-02-02 22:33:26 -08:00
khugepaged.c mm/khugepaged: check again on anon uffd-wp during isolation 2023-04-16 10:41:24 -07:00
kmemleak.c lib/stackdepot, mm: rename stack_depot_want_early_init 2023-02-16 20:43:49 -08:00
ksm.c mm/ksm: fix race with VMA iteration and mm_struct teardown 2023-03-23 17:18:33 -07:00
list_lru.c
maccess.c maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() 2022-11-11 11:44:46 -08:00
madvise.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
Makefile
mapping_dirty_helpers.c mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export 2023-02-02 22:32:54 -08:00
memblock.c memblock: small optimizations 2023-02-27 09:34:53 -08:00
memcontrol.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
memfd.c mm/memfd: add write seals when apply SEAL_EXEC to executable memfd 2023-01-18 17:12:37 -08:00
memory_hotplug.c mm/memory_hotplug: cleanup return value handing in do_migrate_range() 2023-02-20 12:46:18 -08:00
memory-failure.c mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON 2023-02-27 17:00:14 -08:00
memory-tiers.c memory tier: release the new_memtier in find_create_memory_tier() 2023-02-09 16:51:40 -08:00
memory.c mm: take a page reference when removing device exclusive entries 2023-04-05 18:06:24 -07:00
mempolicy.c mm/mempolicy: fix use-after-free of VMA iterator 2023-04-16 10:41:25 -07:00
mempool.c mempool: do not use ksize() for poisoning 2022-11-30 15:58:41 -08:00
memremap.c mm/memremap.c: fix outdated comment in devm_memremap_pages 2023-02-09 16:51:46 -08:00
memtest.c
migrate_device.c mm: change to return bool for isolate_lru_page() 2023-02-20 12:46:17 -08:00
migrate.c migrate_pages: try migrate in batch asynchronously firstly 2023-03-07 17:04:54 -08:00
mincore.c mm: teach mincore_hugetlb about pte markers 2023-03-07 17:04:53 -08:00
mlock.c mm: introduce vm_flags_reset_once to replace WRITE_ONCE vm_flags updates 2023-02-09 16:51:41 -08:00
mm_init.c memory: move hotplug memory notifier priority to same file for easy sorting 2022-11-08 17:37:17 -08:00
mm_slot.h
mmap_lock.c
mmap.c mm/mmap: regression fix for unmapped_area{_topdown} 2023-04-18 14:22:14 -07:00
mmu_gather.c mm: mmu_gather: allow more than one batch of delayed rmaps 2022-12-11 18:12:21 -08:00
mmu_notifier.c mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export 2023-02-02 22:32:54 -08:00
mmzone.c
mprotect.c mm/mprotect: fix do_mprotect_pkey() return on error 2023-04-16 10:41:24 -07:00
mremap.c mm: replace vma->vm_flags direct modifications with modifier calls 2023-02-09 16:51:39 -08:00
msync.c
nommu.c mm: replace vma->vm_flags direct modifications with modifier calls 2023-02-09 16:51:39 -08:00
oom_kill.c mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export 2023-02-02 22:32:54 -08:00
page_alloc.c mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages 2023-04-18 14:22:14 -07:00
page_counter.c
page_ext.c mm/page_ext: init page_ext early if there are no deferred struct pages 2023-02-02 22:33:22 -08:00
page_idle.c mm: page_idle: convert page idle to use a folio 2023-01-18 17:12:52 -08:00
page_io.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
page_isolation.c
page_owner.c lib/stackdepot, mm: rename stack_depot_want_early_init 2023-02-16 20:43:49 -08:00
page_poison.c
page_reporting.c mm/page_reporting: replace rcu_access_pointer() with rcu_dereference_protected() 2023-01-18 17:12:50 -08:00
page_reporting.h
page_table_check.c mm/page_ext: do not allocate space for page_ext->flags if not needed 2023-02-02 22:33:11 -08:00
page_vma_mapped.c mm/hugetlb: introduce hugetlb_walk() 2023-01-18 17:12:39 -08:00
page-writeback.c fs: convert writepage_t callback to pass a folio 2023-02-02 22:33:34 -08:00
pagewalk.c mm/hugetlb: introduce hugetlb_walk() 2023-01-18 17:12:39 -08:00
percpu-internal.h mm: percpu: fix incorrect size in pcpu_obj_full_size() 2023-02-16 20:43:55 -08:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm: memcontrol: rename memcg_kmem_enabled() 2023-02-16 20:43:56 -08:00
pgalloc-track.h
pgtable-generic.c
process_vm_access.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
ptdump.c
readahead.c readahead: convert readahead_expand() to use a folio 2023-02-02 22:33:21 -08:00
rmap.c mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON 2023-02-27 17:00:14 -08:00
rodata_test.c
secretmem.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
shmem.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
shrinker_debug.c mm: shrinkers: fix deadlock in shrinker debugfs 2023-02-09 15:56:51 -08:00
shuffle.c
shuffle.h
slab_common.c mm/kasan: simplify and refine kasan_cache code 2023-01-18 17:12:55 -08:00
slab.c slab fix for 6.3-rc4 2023-03-24 10:12:14 -07:00
slab.h mm: memcontrol: rename memcg_kmem_enabled() 2023-02-16 20:43:56 -08:00
slob.c
slub.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
sparse-vmemmap.c mm/sparse-vmemmap: generalise vmemmap_populate_hugepages() 2022-12-11 18:12:12 -08:00
sparse.c mm/sparse: fix "unused function 'pgdat_to_phys'" warning 2023-02-02 22:33:29 -08:00
swap_cgroup.c
swap_slots.c
swap_state.c swap_state: update shadow_nodes for anonymous page 2023-02-02 22:33:24 -08:00
swap.c mm: swap: fix performance regression on sparsetruncate-tiny 2023-04-16 10:41:24 -07:00
swap.h mm: remove the __swap_writepage return value 2023-02-02 22:33:33 -08:00
swapfile.c mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() 2023-04-05 18:06:24 -07:00
truncate.c folio-compat: remove lru_cache_add() 2022-12-11 18:12:13 -08:00
usercopy.c mm: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
userfaultfd.c mm/uffd: detect pgtable allocation failures 2023-01-18 17:12:53 -08:00
util.c mm: fix typo in __vm_enough_memory warning 2023-02-13 15:54:33 -08:00
vmalloc.c mm: kmsan: handle alloc failures in kmsan_ioremap_page_range() 2023-04-18 14:22:13 -07:00
vmpressure.c
vmscan.c mm: change to return bool for folio_isolate_lru() 2023-02-20 12:46:17 -08:00
vmstat.c mm: vmscan: split khugepaged stats from direct reclaim stats 2022-11-30 15:58:41 -08:00
workingset.c swap_state: update shadow_nodes for anonymous page 2023-02-02 22:33:24 -08:00
z3fold.c mm: remove PageMovable export 2023-01-18 17:12:57 -08:00
zbud.c zpool: clean out dead code 2022-12-11 18:12:10 -08:00
zpool.c zpool: clean out dead code 2022-12-11 18:12:10 -08:00
zsmalloc.c zsmalloc: make zspage chain size configurable 2023-02-02 22:33:23 -08:00
zswap.c zswap: fix writeback lock ordering for zsmalloc 2022-12-11 18:12:09 -08:00