linux

iv/linux

History

Nick Piggin 89699605fe mm: vmap area cache Provide a free area cache for the vmalloc virtual address allocator, based on the algorithm used by the user virtual memory allocator. This reduces the number of rbtree operations and linear traversals over the vmap extents in order to find a free area, by starting off at the last point that a free area was found. The free area cache is reset if areas are freed behind it, or if we are searching for a smaller area or alignment than last time. So allocation patterns are not changed (verified by corner-case and random test cases in userspace testing). This solves a regression caused by lazy vunmap TLB purging introduced in `db64fe02` (mm: rewrite vmap layer). That patch will leave extents in the vmap allocator after they are vunmapped, and until a significant number accumulate that can be flushed in a single batch. So in a workload that vmalloc/vfree frequently, a chain of extents will build up from VMALLOC_START address, which have to be iterated over each time (giving an O(n) type of behaviour). After this patch, the search will start from where it left off, giving closer to an amortized O(1). This is verified to solve regressions reported Steven in GFS2, and Avi in KVM. Hugh's update: : I tried out the recent mmotm, and on one machine was fortunate to hit : the BUG_ON(first->va_start < addr) which seems to have been stalling : your vmap area cache patch ever since May. : I can get you addresses etc, I did dump a few out; but once I stared : at them, it was easier just to look at the code: and I cannot see how : you would be so sure that first->va_start < addr, once you've done : that addr = ALIGN(max(...), align) above, if align is over 0x1000 : (align was 0x8000 or 0x4000 in the cases I hit: ioremaps like Steve). : I originally got around it by just changing the : if (first->va_start < addr) { : to : while (first->va_start < addr) { : without thinking about it any further; but that seemed unsatisfactory, : why would we want to loop here when we've got another very similar : loop just below it? : I am never going to admit how long I've spent trying to grasp your : "while (n)" rbtree loop just above this, the one with the peculiar : if (!first && tmp->va_start < addr + size) : in. That's unfamiliar to me, I'm guessing it's designed to save a : subsequent rb_next() in a few circumstances (at risk of then setting : a wrong cached_hole_size?); but they did appear few to me, and I didn't : feel I could sign off something with that in when I don't grasp it, : and it seems responsible for extra code and mistaken BUG_ON below it. : I've reverted to the familiar rbtree loop that find_vma() does (but : with va_end >= addr as you had, to respect the additional guard page): : and then (given that cached_hole_size starts out 0) I don't see the : need for any complications below it. If you do want to keep that loop : as you had it, please add a comment to explain what it's trying to do, : and where addr is relative to first when you emerge from it. : Aren't your tests "size <= cached_hole_size" and : "addr + size > first->va_start" forgetting the guard page we want : before the next area? I've changed those. : I have not changed your many "addr + size - 1 < addr" overflow tests, : but have since come to wonder, shouldn't they be "addr + size < addr" : tests - won't the vend checks go wrong if addr + size is 0? : I have added a few comments - Wolfgang Wander's 2.6.13 description of : `1363c3cd86` Avoiding mmap fragmentation : helped me a lot, perhaps a pointer to that would be good too. And I found : it easier to understand when I renamed cached_start slightly and moved the : overflow label down. : This patch would go after your mm-vmap-area-cache.patch in mmotm. : Trivially, nobody is going to get that BUG_ON with this patch, and it : appears to work fine on my machines; but I have not given it anything like : the testing you did on your original, and may have broken all the : performance you were aiming for. Please take a look and test it out : integrate with yours if you're satisfied - thanks. [akpm@linux-foundation.org: add locking comment] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reported-and-tested-by: Steven Whitehouse <swhiteho@redhat.com> Reported-and-tested-by: Avi Kivity <avi@redhat.com> Tested-by: "Barry J. Marson" <bmarson@redhat.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2011-03-22 17:44:00 -07:00
..
backing-dev.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6	2010-10-26 17:58:44 -07:00
bootmem.c	bootmem: Move contig_page_data definition to bootmem.c/nobootmem.c	2011-02-24 14:43:06 +01:00
bounce.c	bounce: call flush_dcache_page() after bounce_copy_vec()	2010-09-09 18:57:25 -07:00
compaction.c	mm: compaction: prevent division-by-zero during user-requested compaction	2011-01-20 17:02:05 -08:00
debug-pagealloc.c
dmapool.c	mm/dmapool.c: use TASK_UNINTERRUPTIBLE in dma_pool_alloc()	2011-01-13 17:32:48 -08:00
fadvise.c
failslab.c
filemap_xip.c
filemap.c	mm: remove likely() from grab_cache_page_write_begin()	2011-01-13 17:32:36 -08:00
fremap.c	Avoid pgoff overflow in remap_file_pages	2010-09-25 09:34:58 -07:00
highmem.c	mm,x86: fix kmap_atomic_push vs ioremap_32.c	2010-10-27 18:03:05 -07:00
huge_memory.c	thp+memcg-numa: fix BUG at include/linux/mm.h:370!	2011-03-14 08:29:50 -07:00
hugetlb.c	hugetlb: fix handling of parse errors in sysfs	2011-01-13 17:32:49 -08:00
hwpoison-inject.c	HWPOISON, hugetlb: support hwpoison injection for hugepage	2010-08-11 09:23:11 +02:00
init-mm.c	mm: provide init_mm mm_context initializer	2010-08-09 20:44:54 -07:00
internal.h	mm: export __get_user_pages	2011-03-17 13:08:27 -03:00
Kconfig	mm: compaction: don't depend on HUGETLB_PAGE	2011-01-26 10:50:02 +10:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c	kmemleak: remove memset by using kzalloc	2011-01-27 18:31:51 +00:00
kmemleak.c	kmemleak: Allow kmemleak metadata allocations to fail	2011-01-27 18:32:06 +00:00
ksm.c	ksm: drain pagevecs to lru	2011-01-13 17:32:49 -08:00
maccess.c	MN10300: Save frame pointer in thread_info struct rather than global var	2010-10-27 17:29:01 +01:00
madvise.c	thp: khugepaged: make khugepaged aware about madvise	2011-01-13 17:32:47 -08:00
Makefile	bootmem: Separate out CONFIG_NO_BOOTMEM code into nobootmem.c	2011-02-24 14:43:05 +01:00
memblock.c	memblock: don't adjust size in memblock_find_base()	2011-02-11 16:12:20 -08:00
memcontrol.c	memcg: fix event counting breakage from recent THP update	2011-02-02 16:03:19 -08:00
memory_hotplug.c	Merge branch 'slub/hotplug' into slab/urgent	2011-01-15 13:28:17 +02:00
memory-failure.c	mm: remove is_hwpoison_address	2011-03-17 13:08:27 -03:00
memory.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2011-03-18 10:37:40 -07:00
mempolicy.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2011-03-18 10:37:40 -07:00
mempool.c
migrate.c	mm: grab rcu read lock in move_pages()	2011-02-25 15:07:36 -08:00
mincore.c	thp: mincore transparent hugepage support	2011-01-13 17:32:44 -08:00
mlock.c	mlock: operate on any regions with protection != PROT_NONE	2011-02-02 10:20:50 +11:00
mm_init.c
mmap.c	brk: fix min_brk lower bound computation for COMPAT_BRK	2011-01-13 17:32:48 -08:00
mmu_context.c
mmu_notifier.c	thp: mmu_notifier_test_young	2011-01-13 17:32:46 -08:00
mmzone.c	mm: page allocator: adjust the per-cpu counter threshold when memory is low	2011-01-13 17:32:31 -08:00
mprotect.c	thp: mprotect: transparent huge page support	2011-01-13 17:32:44 -08:00
mremap.c	mm: fix possible cause of a page_mapped BUG	2011-02-23 21:55:06 -08:00
msync.c
nobootmem.c	bootmem: Move __alloc_memory_core_early() to nobootmem.c	2011-02-24 14:43:06 +01:00
nommu.c	mlock: do not hold mmap_sem for extended periods of time	2011-01-13 17:32:36 -08:00
oom_kill.c	oom: avoid deferring oom killer if exiting task is being traced	2011-03-22 17:43:58 -07:00
page_alloc.c	mm: PageBuddy and mapcount robustness	2011-03-17 16:31:13 -07:00
page_cgroup.c
page_io.c
page_isolation.c	mm: page_isolation: codeclean fix comment and rm unneeded val init	2010-10-26 16:52:11 -07:00
page-writeback.c	writeback: avoid unnecessary determine_dirtyable_memory call	2011-01-13 17:32:38 -08:00
pagewalk.c	thp: split_huge_page_mm/vma	2011-01-13 17:32:41 -08:00
percpu-km.c	percpu: clear memory allocated with the km allocator	2010-10-02 10:28:42 +03:00
percpu-vm.c	mm: remove gfp mask from pcpu_get_vm_areas	2011-01-13 17:32:34 -08:00
percpu.c	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2011-01-13 10:05:56 -08:00
pgtable-generic.c	mm/pgtable-generic.c: fix CONFIG_SWAP=n build	2011-01-26 10:49:58 +10:00
prio_tree.c
quicklist.c
readahead.c
rmap.c	thp: fix page_referenced to modify mapcount/vm_flags only if page is found	2011-03-13 15:35:57 -07:00
shmem.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2011-03-18 10:37:40 -07:00
slab.c	Merge branch 'slab/urgent' into slab/next	2011-03-11 18:11:19 +02:00
slob.c	mm: Remove support for kmem_cache_name()	2011-01-23 21:00:05 +02:00
slub.c	slub: Add statistics for this_cmpxchg_double failures	2011-03-22 20:48:04 +02:00
sparse-vmemmap.c	tree-wide: fix comment/printk typos	2010-11-01 15:38:34 -04:00
sparse.c	thp: remove PG_buddy	2011-01-13 17:32:43 -08:00
swap_state.c	thp: split_huge_page paging	2011-01-13 17:32:41 -08:00
swap.c	Revert "mm: simplify code of swap.c"	2011-01-17 14:42:34 -08:00
swapfile.c	mm: swap: unlock swapfile inode mutex before closing file on bad swapfiles	2011-03-22 17:43:58 -07:00
thrash.c
truncate.c	memcg: more mem_cgroup_uncharge() batching	2011-02-25 15:07:37 -08:00
util.c	kernel: kmem_ptr_validate considered harmful	2011-01-07 17:50:16 +11:00
vmalloc.c	mm: vmap area cache	2011-03-22 17:44:00 -07:00
vmscan.c	mm: vmscan: stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT	2011-02-25 15:07:36 -08:00
vmstat.c	thp: transparent hugepage vmstat	2011-01-13 17:32:43 -08:00