linux/mm
David Hildenbrand 47b6a24a23 mm/page_alloc: place pages to tail in __putback_isolated_page()
__putback_isolated_page() already documents that pages will be placed to
the tail of the freelist - this is, however, not the case for "order >=
MAX_ORDER - 2" (see buddy_merge_likely()) - which should be the case for
all existing users.

This change affects two users:
- free page reporting
- page isolation, when undoing the isolation (including memory onlining).

This behavior is desirable for pages that haven't really been touched
lately, so exactly the two users that don't actually read/write page
content, but rather move untouched pages.

The new behavior is especially desirable for memory onlining, where we
allow allocation of newly onlined pages via undo_isolate_page_range() in
online_pages().  Right now, we always place them to the head of the
freelist, resulting in undesireable behavior: Assume we add individual
memory chunks via add_memory() and online them right away to the NORMAL
zone.  We create a dependency chain of unmovable allocations e.g., via the
memmap.  The memmap of the next chunk will be placed onto previous chunks
- if the last block cannot get offlined+removed, all dependent ones cannot
get offlined+removed.  While this can already be observed with individual
DIMMs, it's more of an issue for virtio-mem (and I suspect also ppc
DLPAR).

Document that this should only be used for optimizations, and no code
should rely on this behavior for correction (if the order of the freelists
ever changes).

We won't care about page shuffling: memory onlining already properly
shuffles after onlining.  free page reporting doesn't care about
physically contiguous ranges, and there are already cases where page
isolation will simply move (physically close) free pages to (currently)
the head of the freelists via move_freepages_block() instead of shuffling.
If this becomes ever relevant, we should shuffle the whole zone when
undoing isolation of larger ranges, and after free_contig_range().

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Link: https://lkml.kernel.org/r/20201005121534.15649-3-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:18 -07:00
..
kasan mm: kasan: do not panic if both panic_on_warn and kasan_multishot set 2020-10-13 18:38:32 -07:00
backing-dev.c bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag 2020-09-24 13:43:39 -06:00
balloon_compaction.c
cleancache.c
cma_debug.c debugfs: make sure we can remove u32_array files cleanly 2020-07-10 13:54:00 -07:00
cma.c cma: don't quit at first error when activating reserved areas 2020-08-12 10:57:57 -07:00
cma.h mm: cma: use CMA_MAX_NAME to define the length of cma name array 2020-09-01 09:19:43 +02:00
compaction.c mm/compaction.c: micro-optimization remove unnecessary branch 2020-10-13 18:38:34 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped. 2020-10-16 11:11:14 -07:00
debug.c mm, dump_page: rename head_mapcount() --> head_compound_mapcount() 2020-10-13 18:38:29 -07:00
dmapool.c mm/dmapool.c: replace hard coded function name with __func__ 2020-10-13 18:38:32 -07:00
early_ioremap.c
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c
filemap.c mm/filemap: fold ra_submit into do_sync_mmap_readahead 2020-10-16 11:11:16 -07:00
frame_vector.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
frontswap.c mm/frontswap: mark various intentional data races 2020-08-14 19:56:56 -07:00
gup_benchmark.c mm/gup_benchmark: use pin_user_pages for FOLL_LONGTERM flag 2020-10-13 18:38:29 -07:00
gup.c mm/gup: protect unpin_user_pages() against npages==-ERRNO 2020-10-13 18:38:29 -07:00
highmem.c
hmm.c mm: do page fault accounting in handle_mm_fault 2020-08-12 10:58:02 -07:00
huge_memory.c mm: fix a race during THP splitting 2020-10-16 11:11:15 -07:00
hugetlb_cgroup.c hugetlb_cgroup: convert comma to semicolon 2020-08-21 09:52:52 -07:00
hugetlb.c dma-mapping updates for 5.10 2020-10-15 14:43:29 -07:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mmap locking API: add MMAP_LOCK_INITIALIZER 2020-06-09 09:39:14 -07:00
internal.h mm/readahead: pass a file_ra_state into force_page_cache_ra 2020-10-16 11:11:16 -07:00
interval_tree.c
ioremap.c mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
Kconfig mm/memory_hotplug: mark pageblocks MIGRATE_ISOLATE while onlining memory 2020-10-16 11:11:17 -07:00
Kconfig.debug treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
khugepaged.c mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged 2020-10-11 10:31:11 -07:00
kmemleak.c mm/kmemleak: rely on rcu for task stack scanning 2020-10-13 18:38:27 -07:00
ksm.c ksm: reinstate memcg charge on copied pages 2020-09-19 13:13:38 -07:00
list_lru.c mm/list_lru: fix a data race in list_lru_count_one 2020-08-14 19:56:57 -07:00
maccess.c uaccess: add force_uaccess_{begin,end} helpers 2020-08-12 10:57:59 -07:00
madvise.c mm,hwpoison: return 0 if the page is already poisoned in soft-offline 2020-10-16 11:11:16 -07:00
Makefile mm,kmemleak-test.c: move kmemleak-test.c to samples dir 2020-10-13 18:38:27 -07:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: update huge page-table entry callbacks 2020-04-02 09:35:29 -07:00
memblock.c memblock: use separate iterators for memory and reserved regions 2020-10-13 18:38:35 -07:00
memcontrol.c mm/memcg: fix device private memcg accounting 2020-10-13 18:38:31 -07:00
memfd.c
memory_hotplug.c mm: don't panic when links can't be created in sysfs 2020-10-16 11:11:18 -07:00
memory-failure.c mm,hwpoison: try to narrow window race for free pages 2020-10-16 11:11:17 -07:00
memory.c mm/memory: remove page fault assumption of compound page size 2020-10-16 11:11:15 -07:00
mempolicy.c mm: remove unused alloc_page_vma_node() 2020-10-13 18:38:34 -07:00
mempool.c mm/mempool: add 'else' to split mutually exclusive case 2020-10-13 18:38:34 -07:00
memremap.c mm: pass migratetype into memmap_init_zone() and move_pfn_range_to_zone() 2020-10-16 11:11:17 -07:00
memtest.c
migrate.c mm,hwpoison: rework soft offline for in-use pages 2020-10-16 11:11:16 -07:00
mincore.c mm: factor find_get_incore_page out of mincore_page 2020-10-13 18:38:29 -07:00
mlock.c mlock: fix unevictable_pgs event counts on THP 2020-09-19 13:13:38 -07:00
mm_init.c mm: adjust vm_committed_as_batch according to vm overcommit policy 2020-08-07 11:33:26 -07:00
mmap.c mm/mmap.c: replace do_brk with do_brk_flags in comment of insert_vm_struct() 2020-10-13 18:38:32 -07:00
mmu_gather.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
mmu_notifier.c mm/mmu_notifier: fix mmget() assert in __mmu_interval_notifier_insert 2020-10-16 11:11:17 -07:00
mmzone.c
mprotect.c mm: Introduce arch_validate_flags() 2020-09-04 12:46:07 +01:00
mremap.c mm/mremap: start addresses are properly aligned 2020-08-07 11:33:27 -07:00
msync.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
nommu.c Fix references to nommu-mmap.rst 2020-09-24 11:03:40 -06:00
oom_kill.c mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary 2020-10-13 18:38:35 -07:00
page_alloc.c mm/page_alloc: place pages to tail in __putback_isolated_page() 2020-10-16 11:11:18 -07:00
page_counter.c mm/page_counter: correct the obsolete func name in the comment of page_counter_try_charge() 2020-10-13 18:38:30 -07:00
page_ext.c mm/page_ext.c: drop pfn_present() check when onlining 2020-04-07 10:43:40 -07:00
page_idle.c mm/page_idle.c: skip offline pages 2020-06-08 11:05:55 -07:00
page_io.c mm/page_io.c: remove useless out label in __swap_writepage() 2020-10-13 18:38:30 -07:00
page_isolation.c mm/page_isolation: simplify return value of start_isolate_page_range() 2020-10-16 11:11:17 -07:00
page_owner.c mm/page_owner: change split_page_owner to take a count 2020-10-16 11:11:15 -07:00
page_poison.c mm/page_poison.c: replace bool variable with static key 2020-10-16 11:11:17 -07:00
page_reporting.c mm/page_reporting: add budget limit on how many pages can be reported per pass 2020-04-07 10:43:39 -07:00
page_reporting.h mm: introduce include/linux/pgtable.h 2020-06-09 09:39:13 -07:00
page_vma_mapped.c mm: replace hpage_nr_pages with thp_nr_pages 2020-08-14 19:56:56 -07:00
page-writeback.c mm/page-writeback: support tail pages in wait_for_stable_page 2020-10-16 11:11:15 -07:00
pagewalk.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
percpu-internal.h mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-km.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-stats.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-vm.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu.c percpu: fix first chunk size calculation for populated bitmap 2020-09-17 17:34:39 +00:00
pgalloc-track.h mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
pgtable-generic.c mm: introduce include/linux/pgtable.h 2020-06-09 09:39:13 -07:00
process_vm_access.c mm: remove compat_process_vm_{readv,writev} 2020-10-03 00:02:15 -04:00
ptdump.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
readahead.c mm/readahead: pass a file_ra_state into force_page_cache_ra 2020-10-16 11:11:16 -07:00
rmap.c mm/rmap: fix assumptions of THP size 2020-10-16 11:11:15 -07:00
rodata_test.c mm/rodata_test.c: fix missing function declaration 2020-08-21 09:52:53 -07:00
shmem.c fs: add a filesystem flag for THPs 2020-10-16 11:11:15 -07:00
shuffle.c mm/shuffle: remove dynamic reconfiguration 2020-08-07 11:33:29 -07:00
shuffle.h mm/shuffle: remove dynamic reconfiguration 2020-08-07 11:33:29 -07:00
slab_common.c mm/slab_common.c: delete duplicated word 2020-08-12 10:57:58 -07:00
slab.c mm: memcg/slab: uncharge during kmem_cache_free_bulk() 2020-10-13 18:38:31 -07:00
slab.h mm: memcg/slab: uncharge during kmem_cache_free_bulk() 2020-10-13 18:38:31 -07:00
slob.c mm: memcg: convert vmstat slab counters to bytes 2020-08-07 11:33:24 -07:00
slub.c mm: memcg/slab: uncharge during kmem_cache_free_bulk() 2020-10-13 18:38:31 -07:00
sparse-vmemmap.c mm/sparse: only sub-section aligned range would be populated 2020-08-07 11:33:27 -07:00
sparse.c mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG 2020-10-16 11:11:18 -07:00
swap_cgroup.c mm: memcontrol: make swap tracking an integral part of memory control 2020-06-03 20:09:48 -07:00
swap_slots.c mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache() 2020-10-13 18:38:30 -07:00
swap_state.c swap: rename SWP_FS to SWAP_FS_OPS to avoid ambiguity 2020-10-13 18:38:29 -07:00
swap.c mm: move call to compound_head() in release_pages() 2020-10-13 18:38:33 -07:00
swapfile.c mm/swapfile.c: fix potential memory leak in sys_swapon 2020-10-13 18:38:30 -07:00
truncate.c mm/truncate: fix truncation for pages of arbitrary size 2020-10-16 11:11:15 -07:00
usercopy.c mm/usercopy.c: delete duplicated word 2020-08-12 10:57:58 -07:00
userfaultfd.c mm/vmscan: protect the workingset on anonymous LRU 2020-08-12 10:57:55 -07:00
util.c mm/util.c: update the kerneldoc for kstrdup_const() 2020-10-16 11:11:17 -07:00
vmacache.c kernel: better document the use_mm/unuse_mm API contract 2020-06-10 19:14:18 -07:00
vmalloc.c mm/vmalloc.c: fix the comment of find_vm_area 2020-10-13 18:38:32 -07:00
vmpressure.c mm: vmpressure: use mem_cgroup_is_root API 2020-04-02 09:35:31 -07:00
vmscan.c mm/vmscan: allow arbitrary sized pages to be paged out 2020-10-16 11:11:15 -07:00
vmstat.c mm/vmstat.c: use helper macro abs() 2020-10-16 11:11:17 -07:00
workingset.c mm: replace hpage_nr_pages with thp_nr_pages 2020-08-14 19:56:56 -07:00
z3fold.c mm/z3fold.c: use xx_zalloc instead xx_alloc and memset 2020-10-13 18:38:34 -07:00
zbud.c mm/zbud: remove redundant initialization 2020-10-13 18:38:34 -07:00
zpool.c mm/zpool.c: delete duplicated word and fix grammar 2020-08-12 10:57:58 -07:00
zsmalloc.c mm/zsmalloc.c: fix duplicated words 2020-08-12 10:57:58 -07:00
zswap.c mm/zswap: allow setting default status, compressor and allocator in Kconfig 2020-04-07 10:43:41 -07:00