linux/mm
Vlastimil Babka a8161d1ed6 mm, page_alloc: restructure direct compaction handling in slowpath
The retry loop in __alloc_pages_slowpath is supposed to keep trying
reclaim and compaction (and OOM), until either the allocation succeeds,
or returns with failure.  Success here is more probable when reclaim
precedes compaction, as certain watermarks have to be met for compaction
to even try, and more free pages increase the probability of compaction
success.  On the other hand, starting with light async compaction (if
the watermarks allow it), can be more efficient, especially for smaller
orders, if there's enough free memory which is just fragmented.

Thus, the current code starts with compaction before reclaim, and to
make sure that the last reclaim is always followed by a final
compaction, there's another direct compaction call at the end of the
loop.  This makes the code hard to follow and adds some duplicated
handling of migration_mode decisions.  It's also somewhat inefficient
that even if reclaim or compaction decides not to retry, the final
compaction is still attempted.  Some gfp flags combination also shortcut
these retry decisions by "goto noretry;", making it even harder to
follow.

This patch attempts to restructure the code with only minimal functional
changes.  The call to the first compaction and THP-specific checks are
now placed above the retry loop, and the "noretry" direct compaction is
removed.

The initial compaction is additionally restricted only to costly orders,
as we can expect smaller orders to be held back by watermarks, and only
larger orders to suffer primarily from fragmentation.  This better
matches the checks in reclaim's shrink_zones().

There are two other smaller functional changes.  One is that the upgrade
from async migration to light sync migration will always occur after the
initial compaction.  This is how it has been until recent patch "mm,
oom: protect !costly allocations some more", which introduced upgrading
the mode based on COMPACT_COMPLETE result, but kept the final compaction
always upgraded, which made it even more special.  It's better to return
to the simpler handling for now, as migration modes will be further
modified later in the series.

The second change is that once both reclaim and compaction declare it's
not worth to retry the reclaim/compact loop, there is no final
compaction attempt.  As argued above, this is intentional.  If that
final compaction were to succeed, it would be due to a wrong retry
decision, or simply a race with somebody else freeing memory for us.

The main outcome of this patch should be simpler code.  Logically, the
initial compaction without reclaim is the exceptional case to the
reclaim/compaction scheme, but prior to the patch, it was the last loop
iteration that was exceptional.  Now the code matches the logic better.
The change also enable the following patches.

Link: http://lkml.kernel.org/r/20160721073614.24395-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
..
kasan mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB 2016-07-28 16:07:41 -07:00
backing-dev.c mm, vmscan: move LRU lists to node 2016-07-28 16:07:41 -07:00
balloon_compaction.c mm: balloon: use general non-lru movable page feature 2016-07-26 16:19:19 -07:00
bootmem.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
cleancache.c cleancache: constify cleancache_ops structure 2016-01-27 09:09:57 -05:00
cma_debug.c
cma.c mm/cma: silence warnings due to max() usage 2016-05-27 14:49:37 -07:00
cma.h mm: cma: mark cma_bitmap_maxno() inline in header 2015-08-14 15:56:32 -07:00
compaction.c mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode 2016-07-28 16:07:41 -07:00
debug_page_ref.c mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
debug.c mm: introduce page reference manipulation functions 2016-03-17 15:09:34 -07:00
dmapool.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
early_ioremap.c mm/early_ioremap: use offset_in_page macro 2015-11-05 19:34:48 -08:00
fadvise.c mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED 2016-06-09 14:23:11 -07:00
failslab.c mm: fault-inject take over bootstrap kmem_cache check 2016-03-15 16:55:16 -07:00
filemap.c mm: move most file-based accounting to the node 2016-07-28 16:07:41 -07:00
frame_vector.c mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm 2016-02-16 10:11:12 +01:00
frontswap.c mm, frontswap: convert frontswap_enabled to static key 2016-07-26 16:19:19 -07:00
gup.c thp: file pages support for split_huge_page() 2016-07-26 16:19:19 -07:00
highmem.c mm/highmem: make nr_free_highpages() handles all highmem zones by itself 2016-05-19 19:12:14 -07:00
huge_memory.c mm, THP: clean up return value of madvise_free_huge_pmd 2016-07-28 16:07:41 -07:00
hugetlb_cgroup.c mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size 2016-05-20 17:58:30 -07:00
hugetlb.c mm: hwpoison: remove incorrect comments 2016-07-28 16:07:41 -07:00
hwpoison-inject.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
init-mm.c
internal.h mm, page_alloc: remove fair zone allocation policy 2016-07-28 16:07:41 -07:00
interval_tree.c
Kconfig mm: CONFIG_ZONE_DEVICE stop depending on CONFIG_EXPERT 2016-07-28 16:07:41 -07:00
Kconfig.debug mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
khugepaged.c mm: convert zone_reclaim to node_reclaim 2016-07-28 16:07:41 -07:00
kmemcheck.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak-test.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak.c kmemleak: don't hang if user disables scanning early 2016-07-28 16:07:41 -07:00
ksm.c mm: do not pass mm_struct into handle_mm_fault 2016-07-26 16:19:19 -07:00
list_lru.c mm: memcontrol: move kmem accounting code to CONFIG_MEMCG 2016-01-20 17:09:18 -08:00
maccess.c x86: remove more uaccess_32.h complexity 2016-05-22 17:21:27 -07:00
madvise.c mm: make mmap_sem for write waits killable for mm syscalls 2016-05-23 17:04:14 -07:00
Makefile thp: extract khugepaged from mm/huge_memory.c 2016-07-26 16:19:19 -07:00
memblock.c mm/memblock.c: fix index adjustment error in __next_mem_range_rev() 2016-07-28 16:07:41 -07:00
memcontrol.c mm: fix memcg stack accounting for sub-page stacks 2016-07-28 16:07:41 -07:00
memory_hotplug.c mem-hotplug: alloc new page from a nearest neighbor node when mem-offline 2016-07-28 16:07:41 -07:00
memory-failure.c mm: hwpoison: remove incorrect comments 2016-07-28 16:07:41 -07:00
memory.c thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE 2016-07-26 16:19:19 -07:00
mempolicy.c mm, vmscan: move LRU lists to node 2016-07-28 16:07:41 -07:00
mempool.c Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" 2016-07-28 16:07:41 -07:00
memtest.c memtest: remove unused header files 2015-09-08 15:35:28 -07:00
migrate.c mm: remove reclaim and compaction retry approximations 2016-07-28 16:07:41 -07:00
mincore.c mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage 2016-04-04 10:41:08 -07:00
mlock.c mm, vmscan: move LRU lists to node 2016-07-28 16:07:41 -07:00
mm_init.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
mmap.c mm: fix use-after-free if memory allocation failed in vma_adjust() 2016-07-28 16:07:41 -07:00
mmu_context.c mm/mmu_context, sched/core: Fix mmu_context.h assumption 2016-04-28 11:44:19 +02:00
mmu_notifier.c fix Christoph's email addresses 2016-03-17 15:09:34 -07:00
mmzone.c mm, page_alloc: inline the fast path of the zonelist iterator 2016-05-19 19:12:14 -07:00
mprotect.c mm: thp: check pmd_trans_unstable() after split_huge_pmd() 2016-07-26 16:19:19 -07:00
mremap.c mm: thp: check pmd_trans_unstable() after split_huge_pmd() 2016-07-26 16:19:19 -07:00
msync.c mm/msync: use offset_in_page macro 2015-11-05 19:34:48 -08:00
nobootmem.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
nommu.c mm: introduce fault_env 2016-07-26 16:19:19 -07:00
oom_kill.c mm, oom: tighten task_will_free_mem() locking 2016-07-28 16:07:41 -07:00
page_alloc.c mm, page_alloc: restructure direct compaction handling in slowpath 2016-07-28 16:07:41 -07:00
page_counter.c mm: page_counter: let page_counter_try_charge() return bool 2015-11-05 19:34:48 -08:00
page_ext.c mm: use early_pfn_to_nid in page_ext_init 2016-05-27 14:49:37 -07:00
page_idle.c mm, vmscan: move lru_lock to the node 2016-07-28 16:07:41 -07:00
page_io.c mm: add cond_resched() to generic_swapfile_activate() 2016-07-28 16:07:41 -07:00
page_isolation.c mm/page_isolation: clean up confused code 2016-07-26 16:19:19 -07:00
page_owner.c mm/page_owner: use stackdepot to store stacktrace 2016-07-26 16:19:19 -07:00
page_poison.c mm: check the return value of lookup_page_ext for all call sites 2016-06-03 15:06:22 -07:00
page-writeback.c mm: remove reclaim and compaction retry approximations 2016-07-28 16:07:41 -07:00
pagewalk.c thp: rename split_huge_page_pmd() to split_huge_pmd() 2016-01-15 17:56:32 -08:00
percpu-km.c mm: percpu: use pr_fmt to prefix output 2016-03-17 15:09:34 -07:00
percpu-vm.c
percpu.c percpu: fix synchronization between synchronous map extension and chunk destruction 2016-05-25 11:48:25 -04:00
pgtable-generic.c mm/thp/migration: switch from flush_tlb_range to flush_pmd_tlb_range 2016-03-17 15:09:34 -07:00
process_vm_access.c mm/gup: Introduce get_user_pages_remote() 2016-02-16 10:04:09 +01:00
quicklist.c fix Christoph's email addresses 2016-03-17 15:09:34 -07:00
readahead.c mm, memcg: use consistent gfp flags during readahead 2016-07-26 16:19:19 -07:00
rmap.c mm: move most file-based accounting to the node 2016-07-28 16:07:41 -07:00
shmem.c mm: move most file-based accounting to the node 2016-07-28 16:07:41 -07:00
slab_common.c mm: charge/uncharge kmemcg from generic page allocator paths 2016-07-26 16:19:19 -07:00
slab.c mm/slab: use list_move instead of list_del/list_add 2016-07-26 16:19:19 -07:00
slab.h mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB 2016-07-28 16:07:41 -07:00
slob.c mm: slab: free kmem_cache_node after destroy sysfs file 2016-02-18 16:23:24 -08:00
slub.c mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB 2016-07-28 16:07:41 -07:00
sparse-vmemmap.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
sparse.c make __section_nr() more efficient 2016-07-28 16:07:41 -07:00
swap_cgroup.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
swap_state.c mm: move most file-based accounting to the node 2016-07-28 16:07:41 -07:00
swap.c mm, pagevec: release/reacquire lru_lock on pgdat change 2016-07-28 16:07:41 -07:00
swapfile.c mm, frontswap: convert frontswap_enabled to static key 2016-07-26 16:19:19 -07:00
truncate.c truncate: handle file thp 2016-07-26 16:19:19 -07:00
userfaultfd.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
util.c mm: move most file-based accounting to the node 2016-07-28 16:07:41 -07:00
vmacache.c mm/vmacache: inline vmacache_valid_mm() 2015-11-05 19:34:48 -08:00
vmalloc.c mm: charge/uncharge kmemcg from generic page allocator paths 2016-07-26 16:19:19 -07:00
vmpressure.c mm/vmpressure.c: fix subtree pressure detection 2016-02-03 08:28:43 -08:00
vmscan.c mm: bail out in shrink_inactive_list() 2016-07-28 16:07:41 -07:00
vmstat.c mm: remove reclaim and compaction retry approximations 2016-07-28 16:07:41 -07:00
workingset.c mm, workingset: make working set detection node-aware 2016-07-28 16:07:41 -07:00
z3fold.c mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup 2016-06-03 16:02:55 -07:00
zbud.c mm/zbud.c: use list_last_entry() instead of list_tail_entry() 2016-01-15 11:40:52 -08:00
zpool.c mm: zsmalloc: constify struct zs_pool name 2015-11-06 17:50:42 -08:00
zsmalloc.c zsmalloc: Delete an unnecessary check before the function call "iput" 2016-07-28 16:07:41 -07:00
zswap.c mm/zswap: use workqueue to destroy pool 2016-05-20 17:58:30 -07:00