linux

iv/linux

History

Don Mullis 835cc0c847 lib: more scalable list_sort() XFS and UBIFS can pass long lists to list_sort(); this alternative implementation scales better, reaching ~3x performance gain when list length exceeds the L2 cache size. Stand-alone program timings were run on a Core 2 duo L1=32KB L2=4MB, gcc-4.4, with flags extracted from an Ubuntu kernel build. Object size is 581 bytes compared to 455 for Mark J. Roberts' code. Worst case for either implementation is a list length just over a power of two, and to roughly the same degree, so here are timing results for a range of 2^N+1 lengths. List elements were 16 bytes each including malloc overhead; initial order was random. time (msec) Tatham-Roberts \| generic-Mullis-v2 loop_count length \| \| ratio 4000000 2 206 294 1.427 2000000 3 176 227 1.289 1000000 5 199 172 0.864 500000 9 235 178 0.757 250000 17 243 182 0.748 125000 33 261 196 0.750 62500 65 277 209 0.754 31250 129 292 219 0.75 15625 257 317 235 0.741 7812 513 340 252 0.741 3906 1025 362 267 0.737 1953 2049 388 283 0.729 ~ L1 size 976 4097 556 323 0.580 488 8193 678 361 0.532 244 16385 773 395 0.510 122 32769 844 418 0.495 61 65537 917 454 0.495 30 131073 1128 543 0.481 15 262145 2355 869 0.369 ~ L2 size 7 524289 5597 1714 0.306 3 1048577 6218 2022 0.325 Mark's code does not actually implement the usual or generic mergesort, but rather a variant from Simon Tatham described here: http://www.chiark.greenend.org.uk/~sgtatham/algorithms/listsort.html Simon's algorithm performs O(log N) passes over the entire input list, doing merges of sublists that double in size on each pass. The generic algorithm instead merges pairs of equal length lists as early as possible, in recursive order. For either algorithm, the elements that extend the list beyond power-of-two length are a special case, handled as nearly as possible as a "rounding-up" to a full POT. Some intuition for the locality of reference implications of merge order may be gotten by watching this animation: http://www.sorting-algorithms.com/merge-sort Simon's algorithm requires only O(1) extra space rather than the generic algorithm's O(log N), but in my non-recursive implementation the actual O(log N) data is merely a vector of ~20 pointers, which I've put on the stack. Long-running list_sort() calls: If the list passed in may be long, or the client's cmp() callback function is slow, the client's cmp() may periodically invoke cond_resched() to voluntarily yield the CPU. All inner loops of list_sort() call back to cmp(). Stability of the sort: distinct elements that compare equal emerge from the sort in the same order as with Mark's code, for simple test cases. A boot-time test is provided to verify this and other correctness requirements. A kernel that uses drm.ko appears to run normally with this change; I have no suitable hardware to similarly test the use by UBIFS. [akpm@linux-foundation.org: style tweaks, fix comment, make list_sort_test __init] Signed-off-by: Don Mullis <don.mullis@gmail.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Artem Bityutskiy <dedekind@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2010-03-06 11:26:35 -08:00
..
lzo	lib: add support for LZO-compressed kernels	2010-01-11 09:34:04 -08:00
reed_solomon	lib: Remove unnecessary inclusions of asm/semaphore.h	2008-04-18 22:17:17 -04:00
zlib_deflate	trivial: fix typo "to to" in multiple files	2009-09-21 15:14:55 +02:00
zlib_inflate	zlib: Fix build of powerpc boot wrapper	2010-01-13 16:13:39 -08:00
.gitignore
argv_split.c	tree-wide: convert open calls to remove spaces to skip_spaces() lib function	2009-12-15 08:53:32 -08:00
atomic64.c	lib: export generic atomic64_t functions	2009-07-29 19:10:35 -07:00
audit.c
bcd.c	rtc: BCD codeshrink	2008-07-24 10:47:33 -07:00
bitmap.c	bitmap: introduce bitmap_set, bitmap_clear, bitmap_find_next_zero_area	2009-12-16 07:20:18 -08:00
bitrev.c	lib: export bitrev16	2008-06-06 11:29:10 -07:00
bug.c	allow bug table entries to use relative pointers (and use it on x86-64)	2008-12-16 18:40:32 +01:00
bust_spinlocks.c	oops handling: ensure that any oops is flushed to the mtdoops console	2009-01-06 15:59:11 -08:00
check_signature.c
checksum.c	lib/checksum: fix one more thinko	2009-11-03 16:06:53 +01:00
cmdline.c	generic, memparse(): constify argument	2008-07-28 15:05:23 +02:00
cpumask.c	x86: remove some alloc_bootmem_cpumask_var calling	2009-06-11 19:27:07 +03:00
crc7.c
crc16.c
crc32.c	crc32: minor optimizations and cleanup	2009-12-15 08:53:35 -08:00
crc32defs.h
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c	[SCSI] lib: Add support for the T10 (SCSI) Data Integrity Field CRC	2008-07-12 08:22:32 -05:00
ctype.c	ctype: constify read-only _ctype string	2009-12-15 08:53:32 -08:00
debug_locks.c	rcu: Introduce lockdep-based checking to RCU read-side primitives	2010-02-25 09:40:59 +01:00
debugobjects.c	debugobjects: Convert to raw_spinlocks	2009-12-14 23:55:34 +01:00
dec_and_lock.c	atomic: only take lock when the counter drops to zero on UP as well	2009-06-16 19:47:47 -07:00
decompress_bunzip2.c	bzip2: Add missing checks for malloc returning NULL	2009-12-15 14:04:19 -08:00
decompress_inflate.c	lzma/gzip: fix potential oops when input data is truncated	2009-09-24 07:21:05 -07:00
decompress_unlzma.c	lzma/gzip: fix potential oops when input data is truncated	2009-09-24 07:21:05 -07:00
decompress_unlzo.c	lib: add support for LZO-compressed kernels	2010-01-11 09:34:04 -08:00
decompress.c	Add LZO compression support for initramfs and old-style initrd	2010-01-11 09:34:05 -08:00
devres.c	[POWERPC] devres: Add devm_ioremap_prot()	2008-05-05 16:47:14 +10:00
div64.c	add an inlined version of iter_div_u64_rem	2008-06-12 10:47:58 +02:00
dma-debug.c	Merge branches 'amd-iommu/fixes' and 'dma-debug/fixes' into iommu/fixes	2010-01-22 18:00:41 +01:00
dump_stack.c
dynamic_debug.c	tree-wide: convert open calls to remove spaces to skip_spaces() lib function	2009-12-15 08:53:32 -08:00
extable.c	module: trim exception table on init free.	2009-06-12 21:47:04 +09:30
fault-inject.c	headers: remove sched.h from interrupt.h	2009-10-11 11:20:58 -07:00
find_last_bit.c	bitmap: find_last_bit()	2009-01-01 10:12:19 +10:30
find_next_bit.c	bitops: remove "optimizations"	2008-04-29 08:11:16 -07:00
flex_array.c	flex_array: add missing kerneldoc annotations	2009-09-22 07:17:47 -07:00
gcd.c	lib: add lib/gcd.c	2009-06-18 13:04:05 -07:00
gen_crc32table.c
genalloc.c	genalloc: use bitmap_find_next_zero_area	2009-12-16 07:20:21 -08:00
halfmd4.c
hexdump.c	hexdump: remove the trailing space	2009-06-16 19:47:51 -07:00
hweight.c	x86, core: Optimize hweight32()	2009-12-28 10:41:39 +01:00
idr.c	idr: Apply lockdep-based diagnostics to rcu_dereference() uses	2010-02-25 10:34:51 +01:00
inflate.c	Nicolas Pitre has a new email address	2009-09-15 09:37:12 -07:00
int_sqrt.c
iomap_copy.c
iomap.c	Use WARN() in lib/	2008-07-26 12:00:07 -07:00
iommu-helper.c	iommu-helper: use bitmap library	2009-12-16 07:20:18 -08:00
ioremap.c
irq_regs.c
is_single_threaded.c	kernel: is_current_single_threaded: don't use ->mmap_sem	2009-07-17 09:11:31 +10:00
kasprintf.c
Kconfig	Add LZO compression support for initramfs and old-style initrd	2010-01-11 09:34:05 -08:00
Kconfig.debug	lkdtm: add debugfs access and loosen KPROBE ties	2010-03-06 11:26:32 -08:00
Kconfig.kgdb	kgdb: remove the requirement for CONFIG_FRAME_POINTER	2008-08-01 08:39:34 -05:00
Kconfig.kmemcheck	kmemcheck: depend on HAVE_ARCH_KMEMCHECK	2009-07-01 22:28:44 +02:00
kernel_lock.c	bkl: Fixup core_lock fallout	2009-12-14 23:55:33 +01:00
klist.c	driver core: Remove completion from struct klist_node	2009-01-06 10:44:30 -08:00
kobject_uevent.c	driver core: allow non-root users to listen to uevents	2009-04-16 16:17:09 -07:00
kobject.c	kobject: make kset_create check kobject_set_name return value	2009-06-15 21:30:24 -07:00
kref.c	kref: add kref_set()	2008-01-24 20:40:05 -08:00
libcrc32c.c	libcrc32c: Fix "crc32c undefined" compilation error	2008-12-25 11:01:42 +11:00
list_debug.c	list debugging: use WARN() instead of BUG()	2008-07-25 10:53:29 -07:00
list_sort.c	lib: more scalable list_sort()	2010-03-06 11:26:35 -08:00
lmb.c	lmb: Add lmb_free()	2010-02-03 17:39:50 +11:00
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c	locking: rename trace_softirq_[enter\|exit] => lockdep_softirq_[enter\|exit]	2009-03-13 01:32:36 +01:00
lru_cache.c	The DRBD driver	2009-10-01 21:17:49 +02:00
Makefile	lib: Introduce generic list_sort function	2010-01-12 21:02:00 -08:00
nlattr.c	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	2009-03-26 22:45:23 -07:00
parser.c	parser: remove unnecessary strlen()	2009-12-15 08:53:33 -08:00
percpu_counter.c	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-01-06 17:10:04 -08:00
plist.c	plist: Make plist debugging raw_spinlock aware	2009-12-14 23:55:33 +01:00
prio_heap.c	lib: fix sparse shadowed variable warning	2009-01-06 15:59:11 -08:00
prio_tree.c
proportions.c	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-01-06 17:10:04 -08:00
radix-tree.c	radix-tree: Disable RCU lockdep checking in radix tree	2010-02-25 10:34:50 +01:00
random32.c	random32: seeding improvement	2008-07-30 16:29:19 -07:00
ratelimit.c	ratelimit: Make suppressed output messages more useful	2009-10-23 17:26:37 +02:00
rational.c	lib/rational.c needs module.h	2010-01-11 09:34:05 -08:00
rbtree.c	rb_tree: remove redundant if()-condition in rb_erase()	2009-06-16 19:47:56 -07:00
reciprocal_div.c
rwsem-spinlock.c	rwsem: fix rwsem_is_locked() bugs	2009-12-15 08:53:26 -08:00
rwsem.c	x86: fix UML and -regparm=3	2008-01-30 13:33:00 +01:00
scatterlist.c	lib/scatterlist: add a flags to signalize mapping direction	2009-07-31 12:28:45 +02:00
sha1.c
show_mem.c	mm: use the same log level for show_mem()	2010-03-06 11:26:27 -08:00
smp_processor_id.c	cpumask: convert lib/smp_processor_id to new cpumask ops	2009-01-30 15:47:34 +01:00
sort.c	generic swap(): lib/sort.c: rename swap to swap_func	2009-01-08 08:31:14 -08:00
spinlock_debug.c	locking: Further name space cleanups	2009-12-14 23:55:33 +01:00
string_helpers.c	[SCSI] lib: string_get_size(): don't hang on zero; no decimals on exact	2008-10-23 11:42:20 -05:00
string.c	lib/string.c: simplify strnstr()	2010-03-06 11:26:35 -08:00
swiotlb.c	dma-mapping: fix off-by-one error in dma_capable()	2009-12-16 07:20:12 -08:00
syscall.c	task_current_syscall	2008-07-26 12:00:10 -07:00
textsearch.c	remove CONFIG_KMOD from lib	2008-07-22 19:24:31 +10:00
ts_bm.c	textsearch: ts_bm: support case insensitive searching in Boyer-Moore algorithm	2008-07-08 02:37:54 -07:00
ts_fsm.c	textsearch: ts_fsm: return error on request for case insensitive search	2008-07-08 02:38:27 -07:00
ts_kmp.c	textsearch: ts_kmp: support case insensitive searching in Knuth-Morris-Pratt algorithm	2008-07-08 02:38:09 -07:00
vsprintf.c	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-01-23 00:31:06 -08:00