linux/mm
Gilad Ben-Yossef 74046494ea mm: only IPI CPUs to drain local pages if they exist
Calculate a cpumask of CPUs with per-cpu pages in any zone and only send
an IPI requesting CPUs to drain these pages to the buddy allocator if they
actually have pages when asked to flush.

This patch saves 85%+ of IPIs asking to drain per-cpu pages in case of
severe memory pressure that leads to OOM since in these cases multiple,
possibly concurrent, allocation requests end up in the direct reclaim code
path so when the per-cpu pages end up reclaimed on first allocation
failure for most of the proceeding allocation attempts until the memory
pressure is off (possibly via the OOM killer) there are no per-cpu pages
on most CPUs (and there can easily be hundreds of them).

This also has the side effect of shortening the average latency of direct
reclaim by 1 or more order of magnitude since waiting for all the CPUs to
ACK the IPI takes a long time.

Tested by running "hackbench 400" on a 8 CPU x86 VM and observing the
difference between the number of direct reclaim attempts that end up in
drain_all_pages() and those were more then 1/2 of the online CPU had any
per-cpu page in them, using the vmstat counters introduced in the next
patch in the series and using proc/interrupts.

In the test sceanrio, this was seen to save around 3600 global
IPIs after trigerring an OOM on a concurrent workload:

$ cat /proc/vmstat | tail -n 2
pcp_global_drain 0
pcp_global_ipi_saved 0

$ cat /proc/interrupts | grep CAL
CAL:          1          2          1          2
          2          2          2          2   Function call interrupts

$ hackbench 400
[OOM messages snipped]

$ cat /proc/vmstat | tail -n 2
pcp_global_drain 3647
pcp_global_ipi_saved 3642

$ cat /proc/interrupts | grep CAL
CAL:          6         13          6          3
          3          3         1 2          7   Function call interrupts

Please note that if the global drain is removed from the direct reclaim
path as a patch from Mel Gorman currently suggests this should be replaced
with an on_each_cpu_cond invocation.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Acked-by: Michal Nazarewicz <mina86@mina86.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28 17:14:35 -07:00
..
backing-dev.c backing-dev: fix wakeup timer races with bdi_unregister() 2012-02-01 16:52:49 +08:00
bootmem.c bootmem/sparsemem: remove limit constraint in alloc_bootmem_section 2012-03-21 17:54:58 -07:00
bounce.c mm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:27 +08:00
cleancache.c mm: cleancache: Use __read_mostly as appropiate. 2012-01-23 16:08:09 -05:00
compaction.c mm: compaction: make compact_control order signed 2012-03-21 17:54:56 -07:00
debug-pagealloc.c
dmapool.c
fadvise.c fadvise: only initiate writeback for specified range with FADV_DONTNEED 2012-01-10 16:30:43 -08:00
failslab.c
filemap_xip.c mm/filemap_xip.c: fix race condition in xip_file_fault() 2012-02-03 16:16:41 -08:00
filemap.c Cleanups: rename of flush to invalidate, moving reporting of statistics 2012-03-22 19:52:47 -07:00
fremap.c
highmem.c
huge_memory.c thp: optimize away unnecessary page table locking 2012-03-21 17:54:57 -07:00
hugetlb.c mm: hugetlb: cleanup duplicated code in unmapping vm range 2012-03-23 16:58:31 -07:00
hwpoison-inject.c
init-mm.c
internal.h
Kconfig
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: Disable early logging when kmemleak is off by default 2012-01-20 16:57:05 +00:00
ksm.c ksm: cleanup: introduce find_mergeable_vma() 2012-03-21 17:54:59 -07:00
maccess.c
madvise.c coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP 2012-03-23 16:58:42 -07:00
Makefile
memblock.c memblock: Fix size aligning of memblock_alloc_base_nid() 2012-03-01 10:53:18 +01:00
memcontrol.c mm: thp: fix up pmd_trans_unstable() locations 2012-03-28 17:14:35 -07:00
memory_hotplug.c mm: compaction: introduce sync-light migration for use by compaction 2012-01-12 20:13:09 -08:00
memory-failure.c Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-03-22 09:42:04 -07:00
memory.c coredump: remove VM_ALWAYSDUMP flag 2012-03-23 16:58:42 -07:00
mempolicy.c cpuset: mm: reduce large amounts of memory barrier related damage v3 2012-03-21 17:54:59 -07:00
mempool.c mempool: fix first round failure behavior 2012-01-10 16:30:45 -08:00
migrate.c mm: fix move/migrate_pages() race on task struct 2012-03-21 17:54:58 -07:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mm_init.c
mmap.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c
mmzone.c
mprotect.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mremap.c mm: collapse security_vm_enough_memory() variants into a single function 2012-02-14 10:45:39 +11:00
msync.c
nobootmem.c
nommu.c NOMMU: Don't need to clear vm_mm when deleting a VMA 2012-02-24 08:59:04 -08:00
oom_kill.c signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig() 2012-03-23 16:58:41 -07:00
page_alloc.c mm: only IPI CPUs to drain local pages if they exist 2012-03-28 17:14:35 -07:00
page_cgroup.c page_cgroup: fix horrid swap accounting regression 2012-03-06 08:18:23 -08:00
page_io.c
page_isolation.c
page-writeback.c Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes 2012-03-28 10:02:55 -07:00
pagewalk.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
percpu-km.c
percpu-vm.c percpu: use bitmap_clear 2012-01-20 09:23:16 -08:00
percpu.c Kmemleak patches 2012-01-14 18:11:11 -08:00
pgtable-generic.c thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE 2012-03-21 17:55:02 -07:00
prio_tree.c
process_vm_access.c Fix race in process_vm_rw_core 2012-02-02 12:55:17 -08:00
quicklist.c
readahead.c
rmap.c memcg: use new logic for page stat accounting 2012-03-21 17:55:01 -07:00
shmem.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
slab.c cpuset: mm: reduce large amounts of memory barrier related damage v3 2012-03-21 17:54:59 -07:00
slob.c
slub.c slub: only IPI CPUs that have per cpu obj to flush 2012-03-28 17:14:35 -07:00
sparse-vmemmap.c
sparse.c bootmem/sparsemem: remove limit constraint in alloc_bootmem_section 2012-03-21 17:54:58 -07:00
swap_state.c mm: make swapin readahead skip over holes 2012-03-21 17:54:56 -07:00
swap.c mm: drain percpu lru add/rotate page-vectors on cpu hot-unplug 2012-03-21 17:54:58 -07:00
swapfile.c swapon: check validity of swap_flags 2012-03-28 17:14:35 -07:00
thrash.c
truncate.c mm for fs: add truncate_pagecache_range() 2012-03-28 17:14:35 -07:00
util.c procfs: mark thread stack correctly in proc/<pid>/maps 2012-03-21 17:54:58 -07:00
vmalloc.c mm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:27 +08:00
vmscan.c Fix potential endless loop in kswapd when compaction is not enabled 2012-03-24 12:18:32 -07:00
vmstat.c mm,x86,um: move CMPXCHG_LOCAL config option 2012-01-12 20:13:03 -08:00