linux

iv/linux

History

Vlastimil Babka a8161d1ed6 mm, page_alloc: restructure direct compaction handling in slowpath

The retry loop in __alloc_pages_slowpath is supposed to keep trying
reclaim and compaction (and OOM), until either the allocation succeeds,
or returns with failure.  Success here is more probable when reclaim
precedes compaction, as certain watermarks have to be met for compaction
to even try, and more free pages increase the probability of compaction
success.  On the other hand, starting with light async compaction (if
the watermarks allow it), can be more efficient, especially for smaller
orders, if there's enough free memory which is just fragmented.

Thus, the current code starts with compaction before reclaim, and to
make sure that the last reclaim is always followed by a final
compaction, there's another direct compaction call at the end of the
loop.  This makes the code hard to follow and adds some duplicated
handling of migration_mode decisions.  It's also somewhat inefficient
that even if reclaim or compaction decides not to retry, the final
compaction is still attempted.  Some gfp flags combination also shortcut
these retry decisions by "goto noretry;", making it even harder to
follow.

This patch attempts to restructure the code with only minimal functional
changes.  The call to the first compaction and THP-specific checks are
now placed above the retry loop, and the "noretry" direct compaction is
removed.

The initial compaction is additionally restricted only to costly orders,
as we can expect smaller orders to be held back by watermarks, and only
larger orders to suffer primarily from fragmentation.  This better
matches the checks in reclaim's shrink_zones().

There are two other smaller functional changes.  One is that the upgrade
from async migration to light sync migration will always occur after the
initial compaction.  This is how it has been until recent patch "mm,
oom: protect !costly allocations some more", which introduced upgrading
the mode based on COMPACT_COMPLETE result, but kept the final compaction
always upgraded, which made it even more special.  It's better to return
to the simpler handling for now, as migration modes will be further
modified later in the series.

The second change is that once both reclaim and compaction declare it's
not worth to retry the reclaim/compact loop, there is no final
compaction attempt.  As argued above, this is intentional.  If that
final compaction were to succeed, it would be due to a wrong retry
decision, or simply a race with somebody else freeing memory for us.

The main outcome of this patch should be simpler code.  Logically, the
initial compaction without reclaim is the exceptional case to the
reclaim/compaction scheme, but prior to the patch, it was the last loop
iteration that was exceptional.  Now the code matches the logic better.
The change also enable the following patches.

Link: http://lkml.kernel.org/r/20160721073614.24395-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2016-07-28 16:07:41 -07:00

kasan

mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB

2016-07-28 16:07:41 -07:00

backing-dev.c

mm, vmscan: move LRU lists to node

2016-07-28 16:07:41 -07:00

balloon_compaction.c

mm: balloon: use general non-lru movable page feature

2016-07-26 16:19:19 -07:00

bootmem.c

mm: convert printk(KERN_<LEVEL> to pr_<level>

2016-03-17 15:09:34 -07:00

cleancache.c

cleancache: constify cleancache_ops structure

2016-01-27 09:09:57 -05:00

cma_debug.c

mm/cma_debug: correct size input to bitmap function

2015-07-17 16:39:54 -07:00

cma.c

mm/cma: silence warnings due to max() usage

2016-05-27 14:49:37 -07:00

cma.h

mm: cma: mark cma_bitmap_maxno() inline in header

2015-08-14 15:56:32 -07:00

compaction.c

mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode

2016-07-28 16:07:41 -07:00

debug_page_ref.c

mm/page_ref: add tracepoint to track down page reference manipulation

2016-03-17 15:09:34 -07:00

debug.c

mm: introduce page reference manipulation functions

2016-03-17 15:09:34 -07:00

dmapool.c

mm: convert printk(KERN_<LEVEL> to pr_<level>

2016-03-17 15:09:34 -07:00

early_ioremap.c

mm/early_ioremap: use offset_in_page macro

2015-11-05 19:34:48 -08:00

fadvise.c

mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED

2016-06-09 14:23:11 -07:00

failslab.c

mm: fault-inject take over bootstrap kmem_cache check

2016-03-15 16:55:16 -07:00

filemap.c

mm: move most file-based accounting to the node

2016-07-28 16:07:41 -07:00

frame_vector.c

mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm

2016-02-16 10:11:12 +01:00

frontswap.c

mm, frontswap: convert frontswap_enabled to static key

2016-07-26 16:19:19 -07:00

gup.c

thp: file pages support for split_huge_page()

2016-07-26 16:19:19 -07:00

highmem.c

mm/highmem: make nr_free_highpages() handles all highmem zones by itself

2016-05-19 19:12:14 -07:00

huge_memory.c

mm, THP: clean up return value of madvise_free_huge_pmd

2016-07-28 16:07:41 -07:00

hugetlb_cgroup.c

mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size

2016-05-20 17:58:30 -07:00

hugetlb.c

mm: hwpoison: remove incorrect comments

2016-07-28 16:07:41 -07:00

hwpoison-inject.c

hwpoison: use page_cgroup_ino for filtering by memcg

2015-09-10 13:29:01 -07:00

init-mm.c

…

internal.h

mm, page_alloc: remove fair zone allocation policy

2016-07-28 16:07:41 -07:00

interval_tree.c

mm: replace vma->sharead.linear with vma->shared

2015-02-10 14:30:31 -08:00

Kconfig

mm: CONFIG_ZONE_DEVICE stop depending on CONFIG_EXPERT

2016-07-28 16:07:41 -07:00

Kconfig.debug

mm/page_ref: add tracepoint to track down page reference manipulation

2016-03-17 15:09:34 -07:00

khugepaged.c

mm: convert zone_reclaim to node_reclaim

2016-07-28 16:07:41 -07:00

kmemcheck.c

mm: convert printk(KERN_<LEVEL> to pr_<level>

2016-03-17 15:09:34 -07:00

kmemleak-test.c

mm: convert printk(KERN_<LEVEL> to pr_<level>

2016-03-17 15:09:34 -07:00

kmemleak.c

kmemleak: don't hang if user disables scanning early

2016-07-28 16:07:41 -07:00

ksm.c

mm: do not pass mm_struct into handle_mm_fault

2016-07-26 16:19:19 -07:00

list_lru.c

mm: memcontrol: move kmem accounting code to CONFIG_MEMCG

2016-01-20 17:09:18 -08:00

maccess.c

x86: remove more uaccess_32.h complexity

2016-05-22 17:21:27 -07:00

madvise.c

mm: make mmap_sem for write waits killable for mm syscalls

2016-05-23 17:04:14 -07:00

Makefile

thp: extract khugepaged from mm/huge_memory.c

2016-07-26 16:19:19 -07:00

memblock.c

mm/memblock.c: fix index adjustment error in __next_mem_range_rev()

2016-07-28 16:07:41 -07:00

memcontrol.c

mm: fix memcg stack accounting for sub-page stacks

2016-07-28 16:07:41 -07:00

memory_hotplug.c

mem-hotplug: alloc new page from a nearest neighbor node when mem-offline

2016-07-28 16:07:41 -07:00

memory-failure.c

mm: hwpoison: remove incorrect comments

2016-07-28 16:07:41 -07:00

memory.c

thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE

2016-07-26 16:19:19 -07:00

mempolicy.c

mm, vmscan: move LRU lists to node

2016-07-28 16:07:41 -07:00

mempool.c

Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"

2016-07-28 16:07:41 -07:00

memtest.c

memtest: remove unused header files

2015-09-08 15:35:28 -07:00

migrate.c

mm: remove reclaim and compaction retry approximations

2016-07-28 16:07:41 -07:00

mincore.c

mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage

2016-04-04 10:41:08 -07:00

mlock.c

mm, vmscan: move LRU lists to node

2016-07-28 16:07:41 -07:00

mm_init.c

mm: convert printk(KERN_<LEVEL> to pr_<level>

2016-03-17 15:09:34 -07:00

mmap.c

mm: fix use-after-free if memory allocation failed in vma_adjust()

2016-07-28 16:07:41 -07:00

mmu_context.c

mm/mmu_context, sched/core: Fix mmu_context.h assumption

2016-04-28 11:44:19 +02:00

mmu_notifier.c

fix Christoph's email addresses

2016-03-17 15:09:34 -07:00

mmzone.c

mm, page_alloc: inline the fast path of the zonelist iterator

2016-05-19 19:12:14 -07:00

mprotect.c

mm: thp: check pmd_trans_unstable() after split_huge_pmd()

2016-07-26 16:19:19 -07:00

mremap.c

mm: thp: check pmd_trans_unstable() after split_huge_pmd()

2016-07-26 16:19:19 -07:00

msync.c

mm/msync: use offset_in_page macro

2015-11-05 19:34:48 -08:00

nobootmem.c

mm: convert printk(KERN_<LEVEL> to pr_<level>

2016-03-17 15:09:34 -07:00

nommu.c

mm: introduce fault_env

2016-07-26 16:19:19 -07:00

oom_kill.c

mm, oom: tighten task_will_free_mem() locking

2016-07-28 16:07:41 -07:00

page_alloc.c

mm, page_alloc: restructure direct compaction handling in slowpath

2016-07-28 16:07:41 -07:00

page_counter.c

mm: page_counter: let page_counter_try_charge() return bool

2015-11-05 19:34:48 -08:00

page_ext.c

mm: use early_pfn_to_nid in page_ext_init

2016-05-27 14:49:37 -07:00

page_idle.c

mm, vmscan: move lru_lock to the node

2016-07-28 16:07:41 -07:00

page_io.c

mm: add cond_resched() to generic_swapfile_activate()

2016-07-28 16:07:41 -07:00

page_isolation.c

mm/page_isolation: clean up confused code

2016-07-26 16:19:19 -07:00

page_owner.c

mm/page_owner: use stackdepot to store stacktrace

2016-07-26 16:19:19 -07:00

page_poison.c

mm: check the return value of lookup_page_ext for all call sites

2016-06-03 15:06:22 -07:00

page-writeback.c

mm: remove reclaim and compaction retry approximations

2016-07-28 16:07:41 -07:00

pagewalk.c

thp: rename split_huge_page_pmd() to split_huge_pmd()

2016-01-15 17:56:32 -08:00

percpu-km.c

mm: percpu: use pr_fmt to prefix output

2016-03-17 15:09:34 -07:00

percpu-vm.c

percpu: move region iterations out of pcpu_[de]populate_chunk()

2014-09-02 14:46:02 -04:00

percpu.c

percpu: fix synchronization between synchronous map extension and chunk destruction

2016-05-25 11:48:25 -04:00

pgtable-generic.c

mm/thp/migration: switch from flush_tlb_range to flush_pmd_tlb_range

2016-03-17 15:09:34 -07:00

process_vm_access.c

mm/gup: Introduce get_user_pages_remote()

2016-02-16 10:04:09 +01:00

quicklist.c

fix Christoph's email addresses

2016-03-17 15:09:34 -07:00

readahead.c

mm, memcg: use consistent gfp flags during readahead

2016-07-26 16:19:19 -07:00

rmap.c

mm: move most file-based accounting to the node

2016-07-28 16:07:41 -07:00

shmem.c

mm: move most file-based accounting to the node

2016-07-28 16:07:41 -07:00

slab_common.c

mm: charge/uncharge kmemcg from generic page allocator paths

2016-07-26 16:19:19 -07:00

slab.c

mm/slab: use list_move instead of list_del/list_add

2016-07-26 16:19:19 -07:00

slab.h

mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB

2016-07-28 16:07:41 -07:00

slob.c

mm: slab: free kmem_cache_node after destroy sysfs file

2016-02-18 16:23:24 -08:00

slub.c

mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB

2016-07-28 16:07:41 -07:00

sparse-vmemmap.c

mm: convert printk(KERN_<LEVEL> to pr_<level>

2016-03-17 15:09:34 -07:00

sparse.c

make __section_nr() more efficient

2016-07-28 16:07:41 -07:00

swap_cgroup.c

mm: convert printk(KERN_<LEVEL> to pr_<level>

2016-03-17 15:09:34 -07:00

swap_state.c

mm: move most file-based accounting to the node

2016-07-28 16:07:41 -07:00

swap.c

mm, pagevec: release/reacquire lru_lock on pgdat change

2016-07-28 16:07:41 -07:00

swapfile.c

mm, frontswap: convert frontswap_enabled to static key

2016-07-26 16:19:19 -07:00

truncate.c

truncate: handle file thp

2016-07-26 16:19:19 -07:00

userfaultfd.c

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

2016-04-04 10:41:08 -07:00

util.c

mm: move most file-based accounting to the node

2016-07-28 16:07:41 -07:00

vmacache.c

mm/vmacache: inline vmacache_valid_mm()

2015-11-05 19:34:48 -08:00

vmalloc.c

mm: charge/uncharge kmemcg from generic page allocator paths

2016-07-26 16:19:19 -07:00

vmpressure.c

mm/vmpressure.c: fix subtree pressure detection

2016-02-03 08:28:43 -08:00

vmscan.c

mm: bail out in shrink_inactive_list()

2016-07-28 16:07:41 -07:00

vmstat.c

mm: remove reclaim and compaction retry approximations

2016-07-28 16:07:41 -07:00

workingset.c

mm, workingset: make working set detection node-aware

2016-07-28 16:07:41 -07:00

z3fold.c

mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup

2016-06-03 16:02:55 -07:00

zbud.c

mm/zbud.c: use list_last_entry() instead of list_tail_entry()

2016-01-15 11:40:52 -08:00

zpool.c

mm: zsmalloc: constify struct zs_pool name

2015-11-06 17:50:42 -08:00

zsmalloc.c

zsmalloc: Delete an unnecessary check before the function call "iput"

2016-07-28 16:07:41 -07:00

zswap.c

mm/zswap: use workqueue to destroy pool

2016-05-20 17:58:30 -07:00