linux/lib
Rasmus Villemoes 7c43d9a30c lib/vsprintf.c: even faster binary to decimal conversion
The most expensive part of decimal conversion is the divisions by 10
(albeit done using reciprocal multiplication with appropriately chosen
constants).  I decided to see if one could eliminate around half of
these multiplications by emitting two digits at a time, at the cost of a
200 byte lookup table, and it does indeed seem like there is something
to be gained, especially on 64 bits.  Microbenchmarking shows
improvements ranging from -50% (for numbers uniformly distributed in [0,
2^64-1]) to -25% (for numbers heavily biased toward the smaller end, a
more realistic distribution).

On a larger scale, perf shows that top, one of the big consumers of /proc
data, uses 0.5-1.0% fewer cpu cycles.

I had to jump through some hoops to get the 32 bit code to compile and run
on my 64 bit machine, so I'm not sure how relevant these numbers are, but
just for comparison the microbenchmark showed improvements between -30%
and -10%.

The bloat-o-meter costs are around 150 bytes (the generated code is a
little smaller, so it's not the full 200 bytes) on both 32 and 64 bit.
I'm aware that extra cache misses won't show up in a microbenchmark as
used above, but on the other hand decimal conversions often happen in bulk
(for example in the case of top).

I have of course tested that the new code generates the same output as the
old, for both the first and last 1e10 numbers in [0,2^64-1] and 4e9
'random' numbers in-between.

Test and verification code on github: https://github.com/Villemoes/dec.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Tested-by: Jeff Epler <jepler@unpythonic.net>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:03:54 -04:00
..
fonts fonts: Add 6x10 font 2014-10-09 11:35:48 +03:00
lz4 lib/lz4: Pull out constant tables 2015-03-25 15:04:57 +01:00
lzo lzo: check for length overrun in variable length encoding. 2014-09-28 11:08:01 +02:00
mpi MPILIB: Fix comparison of negative MPIs 2015-01-14 16:10:12 +00:00
raid6 x86/raid6: correctly check for assembler capabilities 2015-02-04 08:35:51 +11:00
reed_solomon
xz lib/xz: enable all filters by default in Kconfig 2014-06-04 16:54:18 -07:00
zlib_deflate zlib: clean up some dead code 2014-08-06 18:01:24 -07:00
zlib_inflate zlib: clean up some dead code 2014-08-06 18:01:24 -07:00
.gitignore
argv_split.c argv_split(): teach it to handle mutable strings 2013-04-29 18:28:19 -07:00
asn1_decoder.c lib/asn1_decoder.c: kernel-doc warning fix 2014-06-04 16:54:19 -07:00
assoc_array.c assoc_array: Include rcupdate.h for call_rcu() definition 2015-01-07 16:08:41 +00:00
atomic64_test.c lib/atomic64_test.c: convert printk(KERN_INFO to pr_info 2014-06-04 16:54:19 -07:00
atomic64.c locking,arch: Rewrite generic atomic support 2014-08-14 12:48:14 +02:00
audit.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
average.c lib: Ensure EWMA does not store wrong intermediate values 2014-01-16 23:46:06 -08:00
bcd.c
bch.c
bitmap.c bitmap, cpumask, nodemask: remove dedicated formatting functions 2015-02-13 21:21:39 -08:00
bitrev.c ARM: 8187/1: add CONFIG_HAVE_ARCH_BITREVERSE to support rbit instruction 2014-12-22 16:43:06 +00:00
bsearch.c
btree.c lib/btree.c: fix leak of whole btree nodes 2014-06-04 16:54:18 -07:00
bug.c lib/bug: Use RCU list ops for module_bug_list 2014-11-11 17:07:46 +10:30
build_OID_registry X.509: do not emit any informational output 2013-06-19 17:54:06 +02:00
bust_spinlocks.c printk: Provide a wake_up_klogd() off-case 2013-03-22 16:41:20 -07:00
check_signature.c
checksum.c lib/checksum.c: fix build for generic csum_tcpudp_nofold 2015-01-29 11:57:38 -08:00
clz_ctz.c lib/clz_ctz.c: add prototype declarations in lib/clz_ctz.c 2014-04-03 16:21:12 -07:00
clz_tab.c
cmdline.c lib: Add a generic cmdline parse function parse_option_str 2014-10-03 18:40:58 +01:00
compat_audit.c audit: Add generic compat syscall support 2014-03-20 10:11:35 -04:00
cordic.c
cpu_rmap.c Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
cpu-notifier-error-inject.c
cpumask.c lib/cpumask: cpumask_set_cpu_local_first to use all cores when numa node is not defined 2014-07-02 18:29:23 -07:00
crc7.c lib/crc7: Shift crc7() output left 1 bit 2014-05-16 14:26:52 -04:00
crc8.c
crc16.c
crc32.c lib: crc32: Add some additional __pure annotations 2014-06-25 16:04:00 -07:00
crc32defs.h
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c crypto: crct10dif - Add fallback for broken initrds 2013-09-12 15:31:34 +10:00
ctype.c
debug_locks.c mutex: Add support for wound/wait style locks 2013-06-26 12:10:56 +02:00
debugobjects.c lib/debugobjects.c: convert printk(KERN_DEBUG to pr_debug 2014-06-04 16:53:53 -07:00
dec_and_lock.c
decompress_bunzip2.c decompress_bunzip2: off by one in get_next_block() 2014-12-13 12:42:52 -08:00
decompress_inflate.c initramfs: support initramfs that is bigger than 2GiB 2014-08-08 15:57:26 -07:00
decompress_unlz4.c initramfs: support initramfs that is bigger than 2GiB 2014-08-08 15:57:26 -07:00
decompress_unlzma.c initramfs: support initramfs that is bigger than 2GiB 2014-08-08 15:57:26 -07:00
decompress_unlzo.c initramfs: support initramfs that is bigger than 2GiB 2014-08-08 15:57:26 -07:00
decompress_unxz.c initramfs: support initramfs that is bigger than 2GiB 2014-08-08 15:57:26 -07:00
decompress.c lib/decompress.c: consistency of compress formats for kernel image 2014-12-13 12:42:52 -08:00
devres.c devres: support sizes greater than an unsigned long 2014-11-07 10:09:07 -08:00
digsig.c lib/digsig.c: kernel-doc warning fixes 2014-06-04 16:54:19 -07:00
div64.c lib: correct link to the original source for div64_u64 2015-03-06 23:19:27 +01:00
dma-debug.c dma-debug: prevent early callers from crashing 2014-12-10 17:41:02 -08:00
dump_stack.c asmlinkage: Add explicit __visible to drivers/*, lib/*, kernel/* 2014-05-05 16:07:46 -07:00
dynamic_debug.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2015-02-10 18:57:15 -08:00
dynamic_queue_limits.c lib/dynamic_queue_limits.c: simplify includes 2015-02-12 18:54:15 -08:00
earlycpio.c earlycpio.c: Fix the confusing comment of find_cpio_data(). 2013-08-14 23:24:01 +02:00
extable.c
fault-inject.c fault-inject: add ratelimit option 2014-12-13 12:42:52 -08:00
fdt_empty_tree.c lib: add fdt_empty_tree.c 2014-04-30 19:49:37 +01:00
fdt_ro.c
fdt_rw.c
fdt_strerror.c
fdt_sw.c
fdt_wip.c
fdt.c
find_bit.c lib: rename lib/find_next_bit.c to lib/find_bit.c 2015-04-17 09:03:54 -04:00
find_last_bit.c lib: find_*_bit reimplementation 2015-04-17 09:03:53 -04:00
flex_array.c reciprocal_divide: update/correction of the algorithm 2014-01-21 23:17:20 -08:00
flex_proportions.c proportions: add @gfp to init functions 2014-09-08 09:51:30 +09:00
gcd.c
gen_crc32table.c lib: crc32: constify crc32 lookup table 2015-02-13 21:21:35 -08:00
genalloc.c lib/genalloc.c: check result of devres_alloc() 2015-02-13 21:21:36 -08:00
glob.c lib/glob.c: add CONFIG_GLOB_SELFTEST 2014-08-06 18:01:25 -07:00
halfmd4.c lib/halfmd4.c: simplify includes 2015-02-12 18:54:15 -08:00
hexdump.c hexdump: make it return number of bytes placed in buffer 2015-02-12 18:54:15 -08:00
hweight.c Make ARCH_HAS_FAST_MULTIPLIER a real config variable 2014-09-13 11:14:53 -07:00
idr.c lib/idr.c: remove redundant include 2015-02-12 18:54:15 -08:00
inflate.c
int_sqrt.c lib/int_sqrt.c: optimize square root algorithm 2013-04-29 18:28:19 -07:00
interval_tree_test.c lib: Export interval_tree 2014-05-05 09:09:14 +02:00
interval_tree.c lib/interval_tree.c: simplify includes 2015-02-12 18:54:15 -08:00
iomap_copy.c
iomap.c Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
iommu-helper.c
ioremap.c x86, mm: support huge KVA mappings on x86 2015-04-14 16:49:04 -07:00
iov_iter.c Merge branch 'iov_iter' into for-next 2015-04-11 22:26:51 -04:00
irq_regs.c
is_single_threaded.c
jedec_ddr_data.c
kasprintf.c
Kconfig Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2015-02-19 10:36:45 -08:00
Kconfig.debug Merge branch 'akpm' (patches from Andrew) 2015-04-14 16:49:17 -07:00
Kconfig.kasan kasan: enable instrumentation of global variables 2015-02-13 21:21:42 -08:00
Kconfig.kgdb kdb: Allow access to sensitive commands to be restricted by default 2014-11-11 09:31:52 -06:00
Kconfig.kmemcheck
kfifo.c kfifo: use BUG_ON 2014-08-08 15:57:25 -07:00
klist.c klist: use same naming scheme as hlist for klist_add_after() 2014-08-06 18:01:24 -07:00
kobject_uevent.c lib/kobject_uevent.c: remove redundant include 2015-02-12 18:54:15 -08:00
kobject.c kobject: WARN as tip when call kobject_get() to a kobject not initialized 2015-03-25 15:26:49 +01:00
kstrtox.c lib/kstrtox.c: remove redundant cleanup 2014-01-23 16:36:57 -08:00
kstrtox.h
lcm.c block: fix blk_stack_limits() regression due to lcm() change 2015-03-31 09:45:50 -06:00
libcrc32c.c crypto: LLVMLinux: Remove VLAIS usage from libcrc32c.c 2014-10-14 10:51:23 +02:00
list_debug.c
list_sort.c lib/list_sort.c: rearrange includes 2015-02-12 18:54:15 -08:00
llist.c lib/llist.c: remove redundant include 2015-02-12 18:54:15 -08:00
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c locking/lockdep: Revert qrwlock recusive stuff 2014-10-03 06:09:30 +02:00
lockref.c locking: Remove ACCESS_ONCE() usage 2015-02-24 08:44:16 +01:00
lru_cache.c lru_cache: remove use of seq_printf return value 2015-04-15 16:35:25 -07:00
Makefile lib: rename lib/find_next_bit.c to lib/find_bit.c 2015-04-17 09:03:54 -04:00
md5.c lib/md5.c: simplify include 2015-02-12 18:54:15 -08:00
memory-notifier-error-inject.c
memweight.c
net_utils.c mac_pton: Use bool not int return 2014-06-25 17:45:43 -07:00
nlattr.c netlink: pad nla_memcpy dest buffer with zeroes 2015-03-31 14:07:24 -04:00
notifier-error-inject.c mode_t, whack-a-mole at 11... 2013-04-09 14:13:05 -04:00
notifier-error-inject.h
of-reconfig-notifier-error-inject.c
oid_registry.c Give the OID registry file module info to avoid kernel tainting 2013-05-05 14:38:00 -07:00
parser.c lib/parser.c: put EXPORT_SYMBOLs in the conventional place 2014-01-23 16:36:55 -08:00
pci_iomap.c pci: add pci_iomap_range 2015-01-21 16:28:49 +10:30
percpu_counter.c percpu_counter: add @gfp to percpu_counter_init() 2014-09-08 09:51:29 +09:00
percpu_ida.c lib/percpu_ida.c: remove redundant includes 2015-02-12 18:54:16 -08:00
percpu_test.c percpu: add test module for various percpu operations 2013-11-13 12:09:11 +09:00
percpu-refcount.c percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky 2014-09-24 13:31:50 -04:00
plist.c lib/plist.c: remove redundant include 2015-02-12 18:54:16 -08:00
pm-notifier-error-inject.c
proportions.c proportions: add @gfp to init functions 2014-09-08 09:51:30 +09:00
radix-tree.c lib/radix-tree.c: change to simpler include 2015-02-12 18:54:16 -08:00
random32.c random32: improvements to prandom_bytes 2014-08-24 18:36:01 -07:00
ratelimit.c
rational.c
rbtree_test.c rbtree/test: test rbtree_postorder_for_each_entry_safe() 2014-01-23 16:37:03 -08:00
rbtree.c lib/rbtree.c: fix typo in comment of __rb_insert() 2014-08-08 15:57:24 -07:00
reciprocal_div.c reciprocal_divide: update/correction of the algorithm 2014-01-21 23:17:20 -08:00
rhashtable.c rhashtable: provide len to obj_hashfn 2015-03-25 17:18:33 +01:00
scatterlist.c lib/scatterlist: fix memory leak with scsi-mq 2014-10-28 10:27:10 -06:00
seq_buf.c seq_buf: Fix seq_buf_bprintf() truncation 2015-03-04 23:40:19 -05:00
sha1.c lib: EXPORT_SYMBOL sha_init 2015-03-23 22:12:08 -04:00
show_mem.c lib/show_mem.c: remove redundant include 2015-02-12 18:54:16 -08:00
smp_processor_id.c percpu: add preemption checks to __this_cpu ops 2014-04-07 16:36:14 -07:00
sort.c lib/sort.c: move include inside #if 0 2015-02-12 18:54:16 -08:00
stmp_device.c lib/stmp_device.c: replace module.h include 2015-02-12 18:54:16 -08:00
string_helpers.c lib/string_helpers.c: change semantics of string_escape_mem 2015-04-15 16:35:24 -07:00
string.c lib: memzero_explicit: use barrier instead of OPTIMIZER_HIDE_VAR 2015-03-20 07:56:12 +11:00
strncpy_from_user.c lib/strncpy_from_user.c: replace module.h include 2015-02-12 18:54:16 -08:00
strnlen_user.c
swiotlb.c swiotlb: don't assume PA 0 is invalid 2014-06-20 16:04:32 -04:00
syscall.c lib/syscall.c: unexport task_current_syscall() 2014-04-03 16:21:06 -07:00
test_bpf.c test: bpf: expand DIV_KX to DIV_MOD_KX 2014-12-08 20:23:22 -05:00
test_firmware.c test: add firmware_class loader test 2014-07-17 18:44:19 -07:00
test_kasan.c lib: add kasan test module 2015-02-13 21:21:41 -08:00
test_module.c test: add minimal module for verification testing 2014-01-23 16:36:57 -08:00
test_rhashtable.c test_rhashtable: Remove bogus max_size setting 2015-04-03 15:09:36 -04:00
test_user_copy.c test: check copy_to/from_user boundary validation 2014-01-23 16:36:57 -08:00
test-hexdump.c lib/test-hexdump.c: fix initconst confusion 2015-04-15 16:35:22 -07:00
test-kstrtox.c lib/test-kstrtox.c: use ARRAY_SIZE instead of sizeof/sizeof[0] 2014-08-06 18:01:25 -07:00
test-string_helpers.c lib/string_helpers.c: change semantics of string_escape_mem 2015-04-15 16:35:24 -07:00
textsearch.c lib/textsearch.c: remove textsearch_put reference from comments 2014-10-14 02:18:14 +02:00
timerqueue.c
ts_bm.c
ts_fsm.c
ts_kmp.c
ucs2_string.c Move utf16 functions to kernel core and rename 2013-04-15 21:23:03 +01:00
usercopy.c Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS 2013-04-30 17:04:09 -07:00
uuid.c uuid: use prandom_bytes() 2013-04-29 18:28:42 -07:00
vsprintf.c lib/vsprintf.c: even faster binary to decimal conversion 2015-04-17 09:03:54 -04:00