linux

iv/linux

History

Johannes Weiner f53af4285d mm: vmscan: fix extreme overreclaim and swap floods During proactive reclaim, we sometimes observe severe overreclaim, with several thousand times more pages reclaimed than requested. This trace was obtained from shrink_lruvec() during such an instance: prio:0 anon_cost:1141521 file_cost:7767 nr_reclaimed:4387406 nr_to_reclaim:1047 (or_factor:4190) nr=[7161123 345 578 1111] While he reclaimer requested 4M, vmscan reclaimed close to 16G, most of it by swapping. These requests take over a minute, during which the write() to memory.reclaim is unkillably stuck inside the kernel. Digging into the source, this is caused by the proportional reclaim bailout logic. This code tries to resolve a fundamental conflict: to reclaim roughly what was requested, while also aging all LRUs fairly and in accordance to their size, swappiness, refault rates etc. The way it attempts fairness is that once the reclaim goal has been reached, it stops scanning the LRUs with the smaller remaining scan targets, and adjusts the remainder of the bigger LRUs according to how much of the smaller LRUs was scanned. It then finishes scanning that remainder regardless of the reclaim goal. This works fine if priority levels are low and the LRU lists are comparable in size. However, in this instance, the cgroup that is targeted by proactive reclaim has almost no files left - they've already been squeezed out by proactive reclaim earlier - and the remaining anon pages are hot. Anon rotations cause the priority level to drop to 0, which results in reclaim targeting all of anon (a lot) and all of file (almost nothing). By the time reclaim decides to bail, it has scanned most or all of the file target, and therefor must also scan most or all of the enormous anon target. This target is thousands of times larger than the reclaim goal, thus causing the overreclaim. The bailout code hasn't changed in years, why is this failing now? The most likely explanations are two other recent changes in anon reclaim: 1. Before the series starting with commit `5df741963d` ("mm: fix LRU balancing effect of new transparent huge pages"), the VM was overall relatively reluctant to swap at all, even if swap was configured. This means the LRU balancing code didn't come into play as often as it does now, and mostly in high pressure situations where pronounced swap activity wouldn't be as surprising. 2. For historic reasons, shrink_lruvec() loops on the scan targets of all LRU lists except the active anon one, meaning it would bail if the only remaining pages to scan were active anon - even if there were a lot of them. Before the series starting with commit `ccc5dc6734` ("mm/vmscan: make active/inactive ratio as 1:1 for anon lru"), most anon pages would live on the active LRU; the inactive one would contain only a handful of preselected reclaim candidates. After the series, anon gets aged similarly to file, and the inactive list is the default for new anon pages as well, making it often the much bigger list. As a result, the VM is now more likely to actually finish large anon targets than before. Change the code such that only one SWAP_CLUSTER_MAX-sized nudge toward the larger LRU lists is made before bailing out on a met reclaim goal. This fixes the extreme overreclaim problem. Fairness is more subtle and harder to evaluate. No obvious misbehavior was observed on the test workload, in any case. Conceptually, fairness should primarily be a cumulative effect from regular, lower priority scans. Once the VM is in trouble and needs to escalate scan targets to make forward progress, fairness needs to take a backseat. This is also acknowledged by the myriad exceptions in get_scan_count(). This patch makes fairness decrease gradually, as it keeps fairness work static over increasing priority levels with growing scan targets. This should make more sense - although we may have to re-visit the exact values. Link: https://lkml.kernel.org/r/20220802162811.39216-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>		2022-11-22 18:50:41 -08:00
..
damon	mm/damon/dbgfs: check if rm_contexts input is for a real context	2022-11-08 15:57:25 -08:00
kasan	Random number generator fixes for Linux 6.1-rc1.	2022-10-16 15:27:07 -07:00
kfence	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
kmsan	kmsan: core: kmsan_in_runtime() should return true in NMI context	2022-11-08 15:57:24 -08:00
backing-dev.c	mm: backing-dev: Remove the unneeded result variable	2022-09-11 20:26:02 -07:00
balloon_compaction.c	mm: Convert all PageMovable users to movable_operations	2022-08-02 12:34:03 -04:00
bootmem_info.c	bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem	2022-08-28 14:02:45 -07:00
cma_debug.c	mm/cma_debug: show complete cma name in debugfs directories	2022-09-11 20:25:50 -07:00
cma_sysfs.c
cma.c	Revert "mm/cma.c: remove redundant cma_mutex lock"	2022-05-13 15:11:26 -07:00
cma.h	mm/cma: provide option to opt out from exposing pages on activation failure	2022-03-22 15:57:09 -07:00
compaction.c	- Alistair Popple has a series which addresses a race which causes page	2022-10-14 12:28:43 -07:00
debug_page_ref.c
debug_vm_pgtable.c	docs: rename Documentation/vm to Documentation/mm	2022-06-27 12:52:53 -07:00
debug.c	mm: remove the vma linked list	2022-09-26 19:46:26 -07:00
dmapool.c
early_ioremap.c	mm/early_ioremap: declare early_memremap_pgprot_adjust()	2022-03-22 15:57:11 -07:00
fadvise.c	riscv: compat: syscall: Add compat_sys_call_table implementation	2022-04-26 13:36:25 -07:00
failslab.c	mm: fix missing handler for __GFP_NOWARN	2022-05-19 14:08:55 -07:00
filemap.c	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
folio-compat.c	mm: remove try_to_free_swap()	2022-10-03 14:02:53 -07:00
frontswap.c	frontswap: don't call ->init if no ops are registered	2022-09-26 12:14:34 -07:00
gup_test.c	mm: rename is_pinnable_page() to is_longterm_pinnable_page()	2022-07-17 17:14:27 -07:00
gup_test.h
gup.c	Five hotfixes - three for nilfs2, two for MM. For are cc:stable, one is	2022-10-12 11:16:58 -07:00
highmem.c	highmem: fix kmap_to_page() for kmap_local_page() addresses	2022-10-12 18:51:51 -07:00
hmm.c	mm/swap: add swp_offset_pfn() to fetch PFN from swap entry	2022-09-26 19:46:05 -07:00
huge_memory.c	Partly revert "mm/thp: carry over dirty bit when thp splits on pmd"	2022-11-08 15:57:23 -08:00
hugetlb_cgroup.c	hugetlb_cgroup: use helper for_each_hstate and hstate_index	2022-09-11 20:25:53 -07:00
hugetlb_vmemmap.c	mm: hugetlb_vmemmap: include missing linux/moduleparam.h	2022-11-08 15:57:23 -08:00
hugetlb_vmemmap.h	mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability	2022-08-08 18:06:43 -07:00
hugetlb.c	hugetlbfs: don't delete error page from pagecache	2022-11-08 15:57:22 -08:00
hwpoison-inject.c	mm/hwpoison: add __init/__exit annotations to module init/exit funcs	2022-10-03 14:03:05 -07:00
init-mm.c	mm: remove rb tree.	2022-09-26 19:46:16 -07:00
internal.h	mm/page_alloc: make boot_nodestats static	2022-10-03 14:03:30 -07:00
interval_tree.c
io-mapping.c
ioremap.c	mm: ioremap: Add ioremap/iounmap_allowed()	2022-06-27 12:22:31 +01:00
Kconfig	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
Kconfig.debug	Two followon fixes for the post-5.19 series "Use pageblock_order for cma	2022-05-27 11:40:49 -07:00
khugepaged.c	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
kmemleak.c	mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops	2022-10-28 13:37:22 -07:00
ksm.c	ksm: use a folio in replace_page()	2022-10-03 14:02:53 -07:00
list_lru.c	mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe	2022-06-16 19:48:31 -07:00
maccess.c	asm-generic updates for 5.18	2022-03-23 18:03:08 -07:00
madvise.c	mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs	2022-10-28 13:37:22 -07:00
Makefile	mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol	2022-10-03 14:03:36 -07:00
mapping_dirty_helpers.c
memblock.c	mm: add pageblock_align() macro	2022-10-03 14:03:04 -07:00
memcontrol.c	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
memfd.c
memory_hotplug.c	mm: add pageblock_aligned() macro	2022-10-03 14:03:04 -07:00
memory-failure.c	hugetlbfs: don't delete error page from pagecache	2022-11-08 15:57:22 -08:00
memory-tiers.c	memory tier, sysfs: rename attribute "nodes" to "nodelist"	2022-10-28 13:37:22 -07:00
memory.c	mm: use update_mmu_tlb() on the second thread	2022-10-12 18:51:50 -07:00
mempolicy.c	mm/mempolicy: fix mbind_range() arguments to vma_merge()	2022-10-20 21:27:21 -07:00
mempool.c	mm/mempool: use might_alloc()	2022-06-16 19:48:30 -07:00
memremap.c	mm/memremap.c: map FS_DAX device memory as decrypted	2022-11-08 15:57:23 -08:00
memtest.c
migrate_device.c	mm/migrate_device.c: add migrate_device_range()	2022-10-12 18:51:49 -07:00
migrate.c	mm: migrate: fix return value if all subpages of THPs are migrated successfully	2022-10-28 13:37:22 -07:00
mincore.c	mm: teach core mm about pte markers	2022-05-13 07:20:09 -07:00
mlock.c	mm/mlock: drop dead code in count_mm_mlocked_page_nr()	2022-09-26 19:46:27 -07:00
mm_init.c	mm: multi-gen LRU: groundwork	2022-09-26 19:46:09 -07:00
mm_slot.h	mm: introduce common struct mm_slot	2022-10-03 14:02:43 -07:00
mmap_lock.c
mmap.c	mm/mmap: fix memory leak in mmap_region()	2022-11-08 15:57:23 -08:00
mmu_gather.c	kmsan: unpoison @tlb in arch_tlb_gather_mmu()	2022-10-12 18:51:48 -07:00
mmu_notifier.c	mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove()	2022-04-21 20:01:10 -07:00
mmzone.c	mm: multi-gen LRU: groundwork	2022-09-26 19:46:09 -07:00
mprotect.c	mm/uffd: fix warning without PTE_MARKER_UFFD_WP compiled in	2022-10-12 15:56:46 -07:00
mremap.c	mm: add merging after mremap resize	2022-09-26 19:46:28 -07:00
msync.c	mm/msync: use vma_find() instead of vma linked list	2022-09-26 19:46:25 -07:00
nommu.c	mm: remove the vma linked list	2022-09-26 19:46:26 -07:00
oom_kill.c	mm: reduce noise in show_mem for lowmem allocations	2022-09-26 19:46:29 -07:00
page_alloc.c	mm: prep_compound_tail() clear page->private	2022-10-28 13:37:22 -07:00
page_counter.c	mm: page_counter: remove unneeded atomic ops for low/min	2022-09-11 20:26:01 -07:00
page_ext.c	page_ext: introduce boot parameter 'early_page_ext'	2022-09-11 20:26:02 -07:00
page_idle.c	mm: don't be stuck to rmap lock on reclaim path	2022-05-19 14:08:54 -07:00
page_io.c	swap: convert swap_writepage() to use a folio	2022-10-03 14:02:52 -07:00
page_isolation.c	mm/page_isolation: fix clang deadcode warning	2022-10-28 13:37:22 -07:00
page_owner.c	mm: reuse pageblock_start/end_pfn() macro	2022-10-03 14:03:03 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c	mm/page_table_check: fix typos	2022-10-03 14:03:27 -07:00
page_vma_mapped.c	mm/swap: add swp_offset_pfn() to fetch PFN from swap entry	2022-09-26 19:46:05 -07:00
page-writeback.c	mm: export balance_dirty_pages_ratelimited_flags()	2022-09-26 12:28:07 +02:00
pagewalk.c	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
percpu-internal.h	percpu: improve percpu_alloc_percpu event trace	2022-05-13 07:20:18 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c	mm: percpu: use kmemleak_ignore_phys() instead of kmemleak_free()	2022-07-17 17:14:47 -07:00
pgalloc-track.h
pgtable-generic.c	mm: avoid unnecessary flush on change_huge_pmd()	2022-05-13 07:20:05 -07:00
process_vm_access.c
ptdump.c	mm: pagewalk: Fix race between unmap and page walker	2022-09-03 10:13:13 -07:00
readahead.c	mm: add PSI accounting around ->read_folio and ->readahead calls	2022-09-20 08:24:38 -06:00
rmap.c	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
rodata_test.c	mm/rodata_test: use PAGE_ALIGNED() helper	2022-10-03 14:03:05 -07:00
secretmem.c	mm/secretmem: remove reduntant return value	2022-10-03 14:03:36 -07:00
shmem.c	mm/shmem: ensure proper fallback if page faults	2022-10-28 13:37:23 -07:00
shrinker_debug.c	mm: shrinkers: fix double kfree on shrinker name	2022-07-29 18:07:13 -07:00
shuffle.c	mm/shuffle: convert module_param_call to module_param_cb	2022-10-03 14:03:07 -07:00
shuffle.h
slab_common.c	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
slab.c	Random number generator fixes for Linux 6.1-rc1.	2022-10-16 15:27:07 -07:00
slab.h	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
slob.c	Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-next	2022-09-29 11:30:55 +02:00
slub.c	treewide: use prandom_u32_max() when possible, part 1	2022-10-11 17:42:55 -06:00
sparse-vmemmap.c	mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c	2022-08-08 18:06:42 -07:00
sparse.c	mm: memory_hotplug: enumerate all supported section flags	2022-07-03 18:08:49 -07:00
swap_cgroup.c	mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled	2022-10-03 14:03:36 -07:00
swap_slots.c	mm/swap: convert put_swap_page() to put_swap_folio()	2022-10-03 14:02:46 -07:00
swap_state.c	swap_state: convert free_swap_cache() to use a folio	2022-10-03 14:02:51 -07:00
swap.c	mm: add folio_add_lru_vma()	2022-10-03 14:02:45 -07:00
swap.h	mm: remove lookup_swap_cache()	2022-10-03 14:02:51 -07:00
swapfile.c	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
truncate.c	mm: add split_folio()	2022-10-03 14:02:45 -07:00
usercopy.c	usercopy: use unsigned long instead of uintptr_t	2022-07-01 17:03:38 -07:00
userfaultfd.c	mm/shmem: use page_mapping() to detect page cache for uffd continue	2022-11-08 15:57:23 -08:00
util.c	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
vmalloc.c	mm: kmsan: maintain KMSAN metadata for page operations	2022-10-03 14:03:20 -07:00
vmpressure.c
vmscan.c	mm: vmscan: fix extreme overreclaim and swap floods	2022-11-22 18:50:41 -08:00
vmstat.c	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
workingset.c	mm: multi-gen LRU: minimal implementation	2022-09-26 19:46:09 -07:00
z3fold.c	mm: Convert all PageMovable users to movable_operations	2022-08-02 12:34:03 -04:00
zbud.c
zpool.c
zsmalloc.c	zsmalloc: zs_destroy_pool: add size_class NULL check	2022-10-20 21:27:21 -07:00
zswap.c	mm/swap: remove the end_write_func argument to __swap_writepage	2022-09-11 20:25:50 -07:00