linux/lib
Don Mullis 835cc0c847 lib: more scalable list_sort()
XFS and UBIFS can pass long lists to list_sort(); this alternative
implementation scales better, reaching ~3x performance gain when list
length exceeds the L2 cache size.

Stand-alone program timings were run on a Core 2 duo L1=32KB L2=4MB,
gcc-4.4, with flags extracted from an Ubuntu kernel build.  Object size is
581 bytes compared to 455 for Mark J.  Roberts' code.

Worst case for either implementation is a list length just over a power of
two, and to roughly the same degree, so here are timing results for a
range of 2^N+1 lengths.  List elements were 16 bytes each including malloc
overhead; initial order was random.

                      time (msec)
                      Tatham-Roberts
                      |       generic-Mullis-v2
loop_count  length    |       |    ratio
4000000       2     206     294    1.427
2000000       3     176     227    1.289
1000000       5     199     172    0.864
 500000       9     235     178    0.757
 250000      17     243     182    0.748
 125000      33     261     196    0.750
  62500      65     277     209    0.754
  31250     129     292     219    0.75
  15625     257     317     235    0.741
   7812     513     340     252    0.741
   3906    1025     362     267    0.737
   1953    2049     388     283    0.729  ~ L1 size
    976    4097     556     323    0.580
    488    8193     678     361    0.532
    244   16385     773     395    0.510
    122   32769     844     418    0.495
     61   65537     917     454    0.495
     30  131073    1128     543    0.481
     15  262145    2355     869    0.369  ~ L2 size
      7  524289    5597    1714    0.306
      3 1048577    6218    2022    0.325

Mark's code does not actually implement the usual or generic mergesort,
but rather a variant from Simon Tatham described here:

    http://www.chiark.greenend.org.uk/~sgtatham/algorithms/listsort.html

Simon's algorithm performs O(log N) passes over the entire input list,
doing merges of sublists that double in size on each pass.  The generic
algorithm instead merges pairs of equal length lists as early as possible,
in recursive order.  For either algorithm, the elements that extend the
list beyond power-of-two length are a special case, handled as nearly as
possible as a "rounding-up" to a full POT.

Some intuition for the locality of reference implications of merge order
may be gotten by watching this animation:

    http://www.sorting-algorithms.com/merge-sort

Simon's algorithm requires only O(1) extra space rather than the generic
algorithm's O(log N), but in my non-recursive implementation the actual
O(log N) data is merely a vector of ~20 pointers, which I've put on the
stack.

Long-running list_sort() calls: If the list passed in may be long, or the
client's cmp() callback function is slow, the client's cmp() may
periodically invoke cond_resched() to voluntarily yield the CPU.  All
inner loops of list_sort() call back to cmp().

Stability of the sort: distinct elements that compare equal emerge from
the sort in the same order as with Mark's code, for simple test cases.  A
boot-time test is provided to verify this and other correctness
requirements.

A kernel that uses drm.ko appears to run normally with this change; I have
no suitable hardware to similarly test the use by UBIFS.

[akpm@linux-foundation.org: style tweaks, fix comment, make list_sort_test __init]
Signed-off-by: Don Mullis <don.mullis@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:35 -08:00
..
lzo lib: add support for LZO-compressed kernels 2010-01-11 09:34:04 -08:00
reed_solomon lib: Remove unnecessary inclusions of asm/semaphore.h 2008-04-18 22:17:17 -04:00
zlib_deflate trivial: fix typo "to to" in multiple files 2009-09-21 15:14:55 +02:00
zlib_inflate zlib: Fix build of powerpc boot wrapper 2010-01-13 16:13:39 -08:00
.gitignore
argv_split.c tree-wide: convert open calls to remove spaces to skip_spaces() lib function 2009-12-15 08:53:32 -08:00
atomic64.c lib: export generic atomic64_t functions 2009-07-29 19:10:35 -07:00
audit.c
bcd.c rtc: BCD codeshrink 2008-07-24 10:47:33 -07:00
bitmap.c bitmap: introduce bitmap_set, bitmap_clear, bitmap_find_next_zero_area 2009-12-16 07:20:18 -08:00
bitrev.c lib: export bitrev16 2008-06-06 11:29:10 -07:00
bug.c allow bug table entries to use relative pointers (and use it on x86-64) 2008-12-16 18:40:32 +01:00
bust_spinlocks.c oops handling: ensure that any oops is flushed to the mtdoops console 2009-01-06 15:59:11 -08:00
check_signature.c
checksum.c lib/checksum: fix one more thinko 2009-11-03 16:06:53 +01:00
cmdline.c generic, memparse(): constify argument 2008-07-28 15:05:23 +02:00
cpumask.c x86: remove some alloc_bootmem_cpumask_var calling 2009-06-11 19:27:07 +03:00
crc7.c
crc16.c
crc32.c crc32: minor optimizations and cleanup 2009-12-15 08:53:35 -08:00
crc32defs.h
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c [SCSI] lib: Add support for the T10 (SCSI) Data Integrity Field CRC 2008-07-12 08:22:32 -05:00
ctype.c ctype: constify read-only _ctype string 2009-12-15 08:53:32 -08:00
debug_locks.c rcu: Introduce lockdep-based checking to RCU read-side primitives 2010-02-25 09:40:59 +01:00
debugobjects.c debugobjects: Convert to raw_spinlocks 2009-12-14 23:55:34 +01:00
dec_and_lock.c atomic: only take lock when the counter drops to zero on UP as well 2009-06-16 19:47:47 -07:00
decompress_bunzip2.c bzip2: Add missing checks for malloc returning NULL 2009-12-15 14:04:19 -08:00
decompress_inflate.c lzma/gzip: fix potential oops when input data is truncated 2009-09-24 07:21:05 -07:00
decompress_unlzma.c lzma/gzip: fix potential oops when input data is truncated 2009-09-24 07:21:05 -07:00
decompress_unlzo.c lib: add support for LZO-compressed kernels 2010-01-11 09:34:04 -08:00
decompress.c Add LZO compression support for initramfs and old-style initrd 2010-01-11 09:34:05 -08:00
devres.c [POWERPC] devres: Add devm_ioremap_prot() 2008-05-05 16:47:14 +10:00
div64.c add an inlined version of iter_div_u64_rem 2008-06-12 10:47:58 +02:00
dma-debug.c Merge branches 'amd-iommu/fixes' and 'dma-debug/fixes' into iommu/fixes 2010-01-22 18:00:41 +01:00
dump_stack.c
dynamic_debug.c tree-wide: convert open calls to remove spaces to skip_spaces() lib function 2009-12-15 08:53:32 -08:00
extable.c module: trim exception table on init free. 2009-06-12 21:47:04 +09:30
fault-inject.c headers: remove sched.h from interrupt.h 2009-10-11 11:20:58 -07:00
find_last_bit.c bitmap: find_last_bit() 2009-01-01 10:12:19 +10:30
find_next_bit.c bitops: remove "optimizations" 2008-04-29 08:11:16 -07:00
flex_array.c flex_array: add missing kerneldoc annotations 2009-09-22 07:17:47 -07:00
gcd.c lib: add lib/gcd.c 2009-06-18 13:04:05 -07:00
gen_crc32table.c
genalloc.c genalloc: use bitmap_find_next_zero_area 2009-12-16 07:20:21 -08:00
halfmd4.c
hexdump.c hexdump: remove the trailing space 2009-06-16 19:47:51 -07:00
hweight.c x86, core: Optimize hweight32() 2009-12-28 10:41:39 +01:00
idr.c idr: Apply lockdep-based diagnostics to rcu_dereference() uses 2010-02-25 10:34:51 +01:00
inflate.c Nicolas Pitre has a new email address 2009-09-15 09:37:12 -07:00
int_sqrt.c
iomap_copy.c
iomap.c Use WARN() in lib/ 2008-07-26 12:00:07 -07:00
iommu-helper.c iommu-helper: use bitmap library 2009-12-16 07:20:18 -08:00
ioremap.c
irq_regs.c
is_single_threaded.c kernel: is_current_single_threaded: don't use ->mmap_sem 2009-07-17 09:11:31 +10:00
kasprintf.c
Kconfig Add LZO compression support for initramfs and old-style initrd 2010-01-11 09:34:05 -08:00
Kconfig.debug lkdtm: add debugfs access and loosen KPROBE ties 2010-03-06 11:26:32 -08:00
Kconfig.kgdb kgdb: remove the requirement for CONFIG_FRAME_POINTER 2008-08-01 08:39:34 -05:00
Kconfig.kmemcheck kmemcheck: depend on HAVE_ARCH_KMEMCHECK 2009-07-01 22:28:44 +02:00
kernel_lock.c bkl: Fixup core_lock fallout 2009-12-14 23:55:33 +01:00
klist.c driver core: Remove completion from struct klist_node 2009-01-06 10:44:30 -08:00
kobject_uevent.c driver core: allow non-root users to listen to uevents 2009-04-16 16:17:09 -07:00
kobject.c kobject: make kset_create check kobject_set_name return value 2009-06-15 21:30:24 -07:00
kref.c kref: add kref_set() 2008-01-24 20:40:05 -08:00
libcrc32c.c libcrc32c: Fix "crc32c undefined" compilation error 2008-12-25 11:01:42 +11:00
list_debug.c list debugging: use WARN() instead of BUG() 2008-07-25 10:53:29 -07:00
list_sort.c lib: more scalable list_sort() 2010-03-06 11:26:35 -08:00
lmb.c lmb: Add lmb_free() 2010-02-03 17:39:50 +11:00
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c locking: rename trace_softirq_[enter|exit] => lockdep_softirq_[enter|exit] 2009-03-13 01:32:36 +01:00
lru_cache.c The DRBD driver 2009-10-01 21:17:49 +02:00
Makefile lib: Introduce generic list_sort function 2010-01-12 21:02:00 -08:00
nlattr.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2009-03-26 22:45:23 -07:00
parser.c parser: remove unnecessary strlen() 2009-12-15 08:53:33 -08:00
percpu_counter.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-06 17:10:04 -08:00
plist.c plist: Make plist debugging raw_spinlock aware 2009-12-14 23:55:33 +01:00
prio_heap.c lib: fix sparse shadowed variable warning 2009-01-06 15:59:11 -08:00
prio_tree.c
proportions.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-06 17:10:04 -08:00
radix-tree.c radix-tree: Disable RCU lockdep checking in radix tree 2010-02-25 10:34:50 +01:00
random32.c random32: seeding improvement 2008-07-30 16:29:19 -07:00
ratelimit.c ratelimit: Make suppressed output messages more useful 2009-10-23 17:26:37 +02:00
rational.c lib/rational.c needs module.h 2010-01-11 09:34:05 -08:00
rbtree.c rb_tree: remove redundant if()-condition in rb_erase() 2009-06-16 19:47:56 -07:00
reciprocal_div.c
rwsem-spinlock.c rwsem: fix rwsem_is_locked() bugs 2009-12-15 08:53:26 -08:00
rwsem.c x86: fix UML and -regparm=3 2008-01-30 13:33:00 +01:00
scatterlist.c lib/scatterlist: add a flags to signalize mapping direction 2009-07-31 12:28:45 +02:00
sha1.c
show_mem.c mm: use the same log level for show_mem() 2010-03-06 11:26:27 -08:00
smp_processor_id.c cpumask: convert lib/smp_processor_id to new cpumask ops 2009-01-30 15:47:34 +01:00
sort.c generic swap(): lib/sort.c: rename swap to swap_func 2009-01-08 08:31:14 -08:00
spinlock_debug.c locking: Further name space cleanups 2009-12-14 23:55:33 +01:00
string_helpers.c [SCSI] lib: string_get_size(): don't hang on zero; no decimals on exact 2008-10-23 11:42:20 -05:00
string.c lib/string.c: simplify strnstr() 2010-03-06 11:26:35 -08:00
swiotlb.c dma-mapping: fix off-by-one error in dma_capable() 2009-12-16 07:20:12 -08:00
syscall.c task_current_syscall 2008-07-26 12:00:10 -07:00
textsearch.c remove CONFIG_KMOD from lib 2008-07-22 19:24:31 +10:00
ts_bm.c textsearch: ts_bm: support case insensitive searching in Boyer-Moore algorithm 2008-07-08 02:37:54 -07:00
ts_fsm.c textsearch: ts_fsm: return error on request for case insensitive search 2008-07-08 02:38:27 -07:00
ts_kmp.c textsearch: ts_kmp: support case insensitive searching in Knuth-Morris-Pratt algorithm 2008-07-08 02:38:09 -07:00
vsprintf.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-01-23 00:31:06 -08:00