2008-01-14 06:51:16 +03:00
config ARCH
string
option env="ARCH"
config KERNELVERSION
string
option env="KERNELVERSION"
2006-06-09 09:12:45 +04:00
config DEFCONFIG_LIST
string
[PATCH] uml: use DEFCONFIG_LIST to avoid reading host's config
This should make sure that, for UML, host's configuration files are not
considered, which avoids various pains to the user. Our dependency are such
that the obtained Kconfig will be valid and will lead to successful
compilation - however they cannot prevent an user from disabling any boot
device, and if an option is not set in the read .config (say
/boot/config-XXX), with make menuconfig ARCH=um, it is not set. This always
disables UBD and all console I/O channels, which leads to non-working UML
kernels, so this bothers users - especially now, since it will happen on
almost every machine (/boot/config-`uname -r` exists almost on every machine).
It can be workarounded with make defconfig ARCH=um, but it is non-obvious and
can be avoided, so please _do_ merge this patch.
Given the existence of options, it could be interesting to implement
(additionally) "option required" - with it, Kconfig will refuse reading a
.config file (from wherever it comes) if the given option is not set. With
this, one could mark with it the option characteristic of the given
architecture (it was an old proposal of Roman Zippel, when I pointed out our
problem):
config UML
option required
default y
However this should be further discussed:
*) for x86, it must support constructs like:
==arch/i386/Kconfig==
config 64BIT
option required
default n
where Kconfig must require that CONFIG_64BIT is disabled or not present in the
read .config.
*) do we want to do such checks only for the starting defconfig or also for
.config? Which leads to:
*) I may want to port a x86_64 .config to x86 and viceversa, or even among more
different archs. Should that be allowed, and in which measure (the user may
force skipping the check for a .config or it is only given a warning by
default)?
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: <kbuild-devel@lists.sourceforge.net>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-20 10:28:23 +04:00
depends on !UML
2006-06-09 09:12:45 +04:00
option defconfig_list
default "/lib/modules/$UNAME_RELEASE/.config"
default "/etc/kernel-config"
default "/boot/config-$UNAME_RELEASE"
2008-05-26 01:03:18 +04:00
default "$ARCH_DEFCONFIG"
2006-06-09 09:12:45 +04:00
default "arch/$ARCH/defconfig"
2009-06-18 03:28:03 +04:00
config CONSTRUCTORS
bool
depends on !UML
2010-10-14 10:01:34 +04:00
config IRQ_WORK
bool
2012-04-20 01:59:57 +04:00
config BUILDTIME_EXTABLE_SORT
bool
2007-07-31 11:39:23 +04:00
menu "General setup"
2005-04-17 02:20:36 +04:00
config BROKEN
bool
config BROKEN_ON_SMP
bool
depends on BROKEN || !SMP
default y
config INIT_ENV_ARG_LIMIT
int
2006-06-30 12:55:51 +04:00
default 32 if !UML
default 128 if UML
2005-04-17 02:20:36 +04:00
help
2005-10-31 02:01:46 +03:00
Maximum of each of the number of arguments and environment
variables passed to init from the kernel command line.
2005-04-17 02:20:36 +04:00
2009-12-22 03:24:06 +03:00
config CROSS_COMPILE
string "Cross-compiler tool prefix"
help
Same as running 'make CROSS_COMPILE=prefix-' but stored for
default make runs in this kernel build directory. You don't
need to set this unless you want the configured kernel build
directory to select the cross-compiler automatically.
2013-05-22 12:56:24 +04:00
config COMPILE_TEST
bool "Compile also drivers which will not load"
default n
help
Some drivers can be compiled on a different platform than they are
intended to be run on. Despite they cannot be loaded there (or even
when they load they cannot be used due to missing HW support),
developers still, opposing to distributors, might want to build such
drivers to compile-test them.
If you are a developer and want to build everything available, say Y
here. If you are a user/distributor, say N here to exclude useless
drivers to be distributed.
2005-04-17 02:20:36 +04:00
config LOCALVERSION
string "Local version - append to kernel release"
help
Append an extra string to the end of your kernel version.
This will show up when you type uname, for example.
The string you set here will be appended after the contents of
any files with a filename matching localversion* in your
object and source tree, in that order. Your total string can
be a maximum of 64 characters.
2005-07-31 12:57:49 +04:00
config LOCALVERSION_AUTO
bool "Automatically append version information to the version string"
default y
help
This will try to automatically determine if the current tree is a
2007-05-02 01:08:11 +04:00
release tree by looking for git tags that belong to the current
top of tree revision.
2005-07-31 12:57:49 +04:00
A string of the format -gxxxxxxxx will be added to the localversion
2007-05-02 01:08:11 +04:00
if a git-based tree is found. The string generated by this will be
2005-07-31 12:57:49 +04:00
appended after any matching localversion* files, and after the value
2007-05-02 01:08:11 +04:00
set in CONFIG_LOCALVERSION.
2005-07-31 12:57:49 +04:00
2007-05-02 01:08:11 +04:00
(The actual string used here is the first eight characters produced
by running the command:
$ git rev-parse --verify HEAD
which is done within the script "scripts/setlocalversion".)
2005-07-31 12:57:49 +04:00
2009-01-05 02:41:25 +03:00
config HAVE_KERNEL_GZIP
bool
config HAVE_KERNEL_BZIP2
bool
config HAVE_KERNEL_LZMA
bool
decompressors: add boot-time XZ support
This implements the API defined in <linux/decompress/generic.h> which is
used for kernel, initramfs, and initrd decompression. This patch together
with the first patch is enough for XZ-compressed initramfs and initrd;
XZ-compressed kernel will need arch-specific changes.
The buffering requirements described in decompress_unxz.c are stricter
than with gzip, so the relevant changes should be done to the
arch-specific code when adding support for XZ-compressed kernel.
Similarly, the heap size in arch-specific pre-boot code may need to be
increased (30 KiB is enough).
The XZ decompressor needs memmove(), memeq() (memcmp() == 0), and
memzero() (memset(ptr, 0, size)), which aren't available in all
arch-specific pre-boot environments. I'm including simple versions in
decompress_unxz.c, but a cleaner solution would naturally be nicer.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 04:01:23 +03:00
config HAVE_KERNEL_XZ
bool
lib: add support for LZO-compressed kernels
This patch series adds generic support for creating and extracting
LZO-compressed kernel images, as well as support for using such images on
the x86 and ARM architectures, and support for creating and using
LZO-compressed initrd and initramfs images.
Russell King said:
: Testing on a Cortex A9 model:
: - lzo decompressor is 65% of the time gzip takes to decompress a kernel
: - lzo kernel is 9% larger than a gzip kernel
:
: which I'm happy to say confirms your figures when comparing the two.
:
: However, when comparing your new gzip code to the old gzip code:
: - new is 99% of the size of the old code
: - new takes 42% of the time to decompress than the old code
:
: What this means is that for a proper comparison, the results get even better:
: - lzo is 7.5% larger than the old gzip'd kernel image
: - lzo takes 28% of the time that the old gzip code took
:
: So the expense seems definitely worth the effort. The only reason I
: can think of ever using gzip would be if you needed the additional
: compression (eg, because you have limited flash to store the image.)
:
: I would argue that the default for ARM should therefore be LZO.
This patch:
The lzo compressor is worse than gzip at compression, but faster at
extraction. Here are some figures for an ARM board I'm working on:
Uncompressed size: 3.24Mo
gzip 1.61Mo 0.72s
lzo 1.75Mo 0.48s
So for a compression ratio that is still relatively close to gzip, it's
much faster to extract, at least in that case.
This part contains:
- Makefile routine to support lzo compression
- Fixes to the existing lzo compressor so that it can be used in
compressed kernels
- wrapper around the existing lzo1x_decompress, as it only extracts one
block at a time, while we need to extract a whole file here
- config dialog for kernel compression
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Tested-by: Wu Zhangjin <wuzhangjin@gmail.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Russell King <rmk@arm.linux.org.uk>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-09 01:42:42 +03:00
config HAVE_KERNEL_LZO
bool
2013-07-09 03:01:46 +04:00
config HAVE_KERNEL_LZ4
bool
2009-01-05 00:46:17 +03:00
choice
2009-01-05 02:41:25 +03:00
prompt "Kernel compression mode"
default KERNEL_GZIP
2013-11-15 09:43:47 +04:00
depends on HAVE_KERNEL_GZIP || HAVE_KERNEL_BZIP2 || HAVE_KERNEL_LZMA || HAVE_KERNEL_XZ || HAVE_KERNEL_LZO || HAVE_KERNEL_LZ4
2009-01-05 02:41:25 +03:00
help
2009-01-05 00:46:17 +03:00
The linux kernel is a kind of self-extracting executable.
Several compression algorithms are available, which differ
in efficiency, compression and decompression speed.
Compression speed is only relevant when building a kernel.
Decompression speed is relevant at each boot.
If you have any problems with bzip2 or lzma compressed
kernels, mail me (Alain Knaff) <alain@knaff.lu>. (An older
version of this functionality (bzip2 only), for 2.4, was
supplied by Christian Ludwig)
High compression options are mostly useful for users, who
are low on disk space (embedded systems), but for whom ram
size matters less.
If in doubt, select 'gzip'
config KERNEL_GZIP
2009-01-05 02:41:25 +03:00
bool "Gzip"
depends on HAVE_KERNEL_GZIP
help
lib: add support for LZO-compressed kernels
This patch series adds generic support for creating and extracting
LZO-compressed kernel images, as well as support for using such images on
the x86 and ARM architectures, and support for creating and using
LZO-compressed initrd and initramfs images.
Russell King said:
: Testing on a Cortex A9 model:
: - lzo decompressor is 65% of the time gzip takes to decompress a kernel
: - lzo kernel is 9% larger than a gzip kernel
:
: which I'm happy to say confirms your figures when comparing the two.
:
: However, when comparing your new gzip code to the old gzip code:
: - new is 99% of the size of the old code
: - new takes 42% of the time to decompress than the old code
:
: What this means is that for a proper comparison, the results get even better:
: - lzo is 7.5% larger than the old gzip'd kernel image
: - lzo takes 28% of the time that the old gzip code took
:
: So the expense seems definitely worth the effort. The only reason I
: can think of ever using gzip would be if you needed the additional
: compression (eg, because you have limited flash to store the image.)
:
: I would argue that the default for ARM should therefore be LZO.
This patch:
The lzo compressor is worse than gzip at compression, but faster at
extraction. Here are some figures for an ARM board I'm working on:
Uncompressed size: 3.24Mo
gzip 1.61Mo 0.72s
lzo 1.75Mo 0.48s
So for a compression ratio that is still relatively close to gzip, it's
much faster to extract, at least in that case.
This part contains:
- Makefile routine to support lzo compression
- Fixes to the existing lzo compressor so that it can be used in
compressed kernels
- wrapper around the existing lzo1x_decompress, as it only extracts one
block at a time, while we need to extract a whole file here
- config dialog for kernel compression
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Tested-by: Wu Zhangjin <wuzhangjin@gmail.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Russell King <rmk@arm.linux.org.uk>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-09 01:42:42 +03:00
The old and tried gzip compression. It provides a good balance
between compression ratio and decompression speed.
2009-01-05 00:46:17 +03:00
config KERNEL_BZIP2
bool "Bzip2"
2009-01-05 02:41:25 +03:00
depends on HAVE_KERNEL_BZIP2
2009-01-05 00:46:17 +03:00
help
Its compression ratio and speed is intermediate.
2012-06-01 03:26:46 +04:00
Decompression speed is slowest among the choices. The kernel
2009-01-05 02:41:25 +03:00
size is about 10% smaller with bzip2, in comparison to gzip.
Bzip2 uses a large amount of memory. For modern kernels you
will need at least 8MB RAM or more for booting.
2009-01-05 00:46:17 +03:00
config KERNEL_LZMA
2009-01-05 02:41:25 +03:00
bool "LZMA"
depends on HAVE_KERNEL_LZMA
help
2012-06-01 03:26:46 +04:00
This compression algorithm's ratio is best. Decompression speed
is between gzip and bzip2. Compression is slowest.
The kernel size is about 33% smaller with LZMA in comparison to gzip.
2009-01-05 00:46:17 +03:00
decompressors: add boot-time XZ support
This implements the API defined in <linux/decompress/generic.h> which is
used for kernel, initramfs, and initrd decompression. This patch together
with the first patch is enough for XZ-compressed initramfs and initrd;
XZ-compressed kernel will need arch-specific changes.
The buffering requirements described in decompress_unxz.c are stricter
than with gzip, so the relevant changes should be done to the
arch-specific code when adding support for XZ-compressed kernel.
Similarly, the heap size in arch-specific pre-boot code may need to be
increased (30 KiB is enough).
The XZ decompressor needs memmove(), memeq() (memcmp() == 0), and
memzero() (memset(ptr, 0, size)), which aren't available in all
arch-specific pre-boot environments. I'm including simple versions in
decompress_unxz.c, but a cleaner solution would naturally be nicer.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 04:01:23 +03:00
config KERNEL_XZ
bool "XZ"
depends on HAVE_KERNEL_XZ
help
XZ uses the LZMA2 algorithm and instruction set specific
BCJ filters which can improve compression ratio of executable
code. The size of the kernel is about 30% smaller with XZ in
comparison to gzip. On architectures for which there is a BCJ
filter (i386, x86_64, ARM, IA-64, PowerPC, and SPARC), XZ
will create a few percent smaller kernel than plain LZMA.
The speed is about the same as with LZMA: The decompression
speed of XZ is better than that of bzip2 but worse than gzip
and LZO. Compression is slow.
lib: add support for LZO-compressed kernels
This patch series adds generic support for creating and extracting
LZO-compressed kernel images, as well as support for using such images on
the x86 and ARM architectures, and support for creating and using
LZO-compressed initrd and initramfs images.
Russell King said:
: Testing on a Cortex A9 model:
: - lzo decompressor is 65% of the time gzip takes to decompress a kernel
: - lzo kernel is 9% larger than a gzip kernel
:
: which I'm happy to say confirms your figures when comparing the two.
:
: However, when comparing your new gzip code to the old gzip code:
: - new is 99% of the size of the old code
: - new takes 42% of the time to decompress than the old code
:
: What this means is that for a proper comparison, the results get even better:
: - lzo is 7.5% larger than the old gzip'd kernel image
: - lzo takes 28% of the time that the old gzip code took
:
: So the expense seems definitely worth the effort. The only reason I
: can think of ever using gzip would be if you needed the additional
: compression (eg, because you have limited flash to store the image.)
:
: I would argue that the default for ARM should therefore be LZO.
This patch:
The lzo compressor is worse than gzip at compression, but faster at
extraction. Here are some figures for an ARM board I'm working on:
Uncompressed size: 3.24Mo
gzip 1.61Mo 0.72s
lzo 1.75Mo 0.48s
So for a compression ratio that is still relatively close to gzip, it's
much faster to extract, at least in that case.
This part contains:
- Makefile routine to support lzo compression
- Fixes to the existing lzo compressor so that it can be used in
compressed kernels
- wrapper around the existing lzo1x_decompress, as it only extracts one
block at a time, while we need to extract a whole file here
- config dialog for kernel compression
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Tested-by: Wu Zhangjin <wuzhangjin@gmail.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Russell King <rmk@arm.linux.org.uk>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-09 01:42:42 +03:00
config KERNEL_LZO
bool "LZO"
depends on HAVE_KERNEL_LZO
help
2012-06-01 03:26:46 +04:00
Its compression ratio is the poorest among the choices. The kernel
2010-07-14 13:23:08 +04:00
size is about 10% bigger than gzip; however its speed
lib: add support for LZO-compressed kernels
This patch series adds generic support for creating and extracting
LZO-compressed kernel images, as well as support for using such images on
the x86 and ARM architectures, and support for creating and using
LZO-compressed initrd and initramfs images.
Russell King said:
: Testing on a Cortex A9 model:
: - lzo decompressor is 65% of the time gzip takes to decompress a kernel
: - lzo kernel is 9% larger than a gzip kernel
:
: which I'm happy to say confirms your figures when comparing the two.
:
: However, when comparing your new gzip code to the old gzip code:
: - new is 99% of the size of the old code
: - new takes 42% of the time to decompress than the old code
:
: What this means is that for a proper comparison, the results get even better:
: - lzo is 7.5% larger than the old gzip'd kernel image
: - lzo takes 28% of the time that the old gzip code took
:
: So the expense seems definitely worth the effort. The only reason I
: can think of ever using gzip would be if you needed the additional
: compression (eg, because you have limited flash to store the image.)
:
: I would argue that the default for ARM should therefore be LZO.
This patch:
The lzo compressor is worse than gzip at compression, but faster at
extraction. Here are some figures for an ARM board I'm working on:
Uncompressed size: 3.24Mo
gzip 1.61Mo 0.72s
lzo 1.75Mo 0.48s
So for a compression ratio that is still relatively close to gzip, it's
much faster to extract, at least in that case.
This part contains:
- Makefile routine to support lzo compression
- Fixes to the existing lzo compressor so that it can be used in
compressed kernels
- wrapper around the existing lzo1x_decompress, as it only extracts one
block at a time, while we need to extract a whole file here
- config dialog for kernel compression
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Tested-by: Wu Zhangjin <wuzhangjin@gmail.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Russell King <rmk@arm.linux.org.uk>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-09 01:42:42 +03:00
(both compression and decompression) is the fastest.
2013-07-09 03:01:46 +04:00
config KERNEL_LZ4
bool "LZ4"
depends on HAVE_KERNEL_LZ4
help
LZ4 is an LZ77-type compressor with a fixed, byte-oriented encoding.
A preliminary version of LZ4 de/compression tool is available at
<https://code.google.com/p/lz4/>.
Its compression ratio is worse than LZO. The size of the kernel
is about 8% bigger than LZO. But the decompression speed is
faster than LZO.
2009-01-05 00:46:17 +03:00
endchoice
uts: make default hostname configurable, rather than always using "(none)"
The "hostname" tool falls back to setting the hostname to "localhost" if
/etc/hostname does not exist. Distribution init scripts have the same
fallback. However, if userspace never calls sethostname, such as when
booting with init=/bin/sh, or otherwise booting a minimal system without
the usual init scripts, the default hostname of "(none)" remains,
unhelpfully appearing in various places such as prompts ("root@(none):~#")
and logs. Furthermore, "(none)" doesn't typically resolve to anything
useful.
Make the default hostname configurable. This removes the need for the
standard fallback, provides a useful default for systems that never call
sethostname, and makes minimal systems that much more useful with less
configuration. Distributions could choose to use "localhost" here to
avoid the fallback, while embedded systems may wish to use a specific
target hostname.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: David Miller <davem@davemloft.net>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Kel Modderman <kel@otaku42.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-16 02:08:28 +04:00
config DEFAULT_HOSTNAME
string "Default hostname"
default "(none)"
help
This option determines the default system hostname before userspace
calls sethostname(2). The kernel traditionally uses "(none)" here,
but you may wish to use a different default here to make a minimal
system more usable with less configuration.
2005-04-17 02:20:36 +04:00
config SWAP
bool "Support for paging of anonymous memory (swap)"
[PATCH] BLOCK: Make it possible to disable the block layer [try #6]
Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.
This patch does the following:
(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.
(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:
(*) Block I/O tracing.
(*) Disk partition code.
(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.
(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.
(*) MTD blockdev handling and FTL.
(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.
(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.
(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.
(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.
(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:
(*) Default blockdev file operations (to give error ENODEV on opening).
(*) Makes some /proc changes:
(*) /proc/devices does not list any blockdevs.
(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.
(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.
(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).
(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 22:45:40 +04:00
depends on MMU && BLOCK
2005-04-17 02:20:36 +04:00
default y
help
This option allows you to choose whether you want to have support
2006-01-15 04:40:08 +03:00
for so called swap devices or swap files in your kernel that are
2005-04-17 02:20:36 +04:00
used to provide more virtual memory than the actual RAM present
in your computer. If unsure say Y.
config SYSVIPC
bool "System V IPC"
---help---
Inter Process Communication is a suite of library functions and
system calls which let processes (running programs) synchronize and
exchange information. It is generally considered to be a good thing,
and some programs won't run unless you say Y here. In particular, if
you want to run the DOS emulator dosemu under Linux (read the
DOSEMU-HOWTO, available from <http://www.tldp.org/docs.html#howto>),
you'll need to say Y here.
You can find documentation about IPC with "info ipc" and also in
section 6.4 of the Linux Programmer's Guide, available from
<http://www.tldp.org/guides.html>.
2007-02-14 11:34:06 +03:00
config SYSVIPC_SYSCTL
bool
depends on SYSVIPC
depends on SYSCTL
default y
2005-04-17 02:20:36 +04:00
config POSIX_MQUEUE
bool "POSIX Message Queues"
2012-10-02 22:19:29 +04:00
depends on NET
2005-04-17 02:20:36 +04:00
---help---
POSIX variant of message queues is a part of IPC. In POSIX message
queues every message has a priority which decides about succession
of receiving it by a process. If you want to compile and run
programs written e.g. for Solaris with use of its POSIX message
2007-05-09 09:25:13 +04:00
queues (functions mq_*) say Y here.
2005-04-17 02:20:36 +04:00
POSIX message queues are visible as a filesystem called 'mqueue'
and can be mounted somewhere if you want to do filesystem
operations on message queues.
If unsure, say Y.
2009-04-07 06:01:11 +04:00
config POSIX_MQUEUE_SYSCTL
bool
depends on POSIX_MQUEUE
depends on SYSCTL
default y
2014-06-05 03:10:50 +04:00
config CROSS_MEMORY_ATTACH
bool "Enable process_vm_readv/writev syscalls"
depends on MMU
default y
help
Enabling this option adds the system calls process_vm_readv and
process_vm_writev which allow a process with the correct privileges
2014-08-13 00:46:11 +04:00
to directly read from or write to another process' address space.
2014-06-05 03:10:50 +04:00
See the man page for more details.
2012-09-09 16:22:07 +04:00
config FHANDLE
bool "open by fhandle syscalls"
select EXPORTFS
help
If you say Y here, a user level program will be able to map
file names to handle and then later use the handle for
different file system operations. This is useful in implementing
userspace file servers, which now track files using handles instead
of names. The handle would remain the same even if file names
get renamed. Enables open_by_handle_at(2) and name_to_handle_at(2)
syscalls.
2014-04-04 01:48:27 +04:00
config USELIB
bool "uselib syscall"
default y
help
This option enables the uselib syscall, a system call used in the
dynamic linker from libc5 and earlier. glibc does not use this
system call. If you intend to run programs built on libc5 or
earlier, you may need to enable this syscall. Current systems
running glibc can safely disable this.
2012-09-09 16:22:07 +04:00
config AUDIT
bool "Auditing support"
depends on NET
help
Enable auditing infrastructure that can be used with another
kernel subsystem, such as SELinux (which requires this for
logging of avc messages output). Does not do system-call
auditing without CONFIG_AUDITSYSCALL.
2014-02-25 13:16:24 +04:00
config HAVE_ARCH_AUDITSYSCALL
bool
2012-09-09 16:22:07 +04:00
config AUDITSYSCALL
bool "Enable system-call auditing support"
2014-02-25 13:16:24 +04:00
depends on AUDIT && HAVE_ARCH_AUDITSYSCALL
2012-09-09 16:22:07 +04:00
default y if SECURITY_SELINUX
help
Enable low-overhead system-call auditing infrastructure that
can be used independently or with another kernel subsystem,
such as SELinux.
config AUDIT_WATCH
def_bool y
depends on AUDITSYSCALL
select FSNOTIFY
config AUDIT_TREE
def_bool y
depends on AUDITSYSCALL
select FSNOTIFY
source "kernel/irq/Kconfig"
source "kernel/time/Kconfig"
menu "CPU/Task time and stats accounting"
2012-07-25 09:56:04 +04:00
config VIRT_CPU_ACCOUNTING
bool
2012-09-09 16:56:31 +04:00
choice
prompt "Cputime accounting"
default TICK_CPU_ACCOUNTING if !PPC64
2013-02-08 07:19:38 +04:00
default VIRT_CPU_ACCOUNTING_NATIVE if PPC64
2012-09-09 16:56:31 +04:00
# Kind of a stub config for the pure tick based cputime accounting
config TICK_CPU_ACCOUNTING
bool "Simple tick based cputime accounting"
2013-04-26 17:16:31 +04:00
depends on !S390 && !NO_HZ_FULL
2012-09-09 16:56:31 +04:00
help
This is the basic tick based cputime accounting that maintains
statistics about user, system and idle time spent on per jiffies
granularity.
If unsure, say Y.
2012-07-25 09:56:04 +04:00
config VIRT_CPU_ACCOUNTING_NATIVE
2012-06-16 17:39:34 +04:00
bool "Deterministic task and CPU time accounting"
2013-04-26 17:16:31 +04:00
depends on HAVE_VIRT_CPU_ACCOUNTING && !NO_HZ_FULL
2012-07-25 09:56:04 +04:00
select VIRT_CPU_ACCOUNTING
2012-06-16 17:39:34 +04:00
help
Select this option to enable more accurate task and CPU time
accounting. This is done by reading a CPU counter on each
kernel entry and exit and on transitions within the kernel
between system, softirq and hardirq state, so there is a
small performance impact. In the case of s390 or IBM POWER > 5,
this also enables accounting of stolen time on logically-partitioned
systems.
2012-07-25 09:56:04 +04:00
config VIRT_CPU_ACCOUNTING_GEN
bool "Full dynticks CPU time accounting"
2013-09-17 02:28:19 +04:00
depends on HAVE_CONTEXT_TRACKING
2013-09-17 02:28:21 +04:00
depends on HAVE_VIRT_CPU_ACCOUNTING_GEN
2012-07-25 09:56:04 +04:00
select VIRT_CPU_ACCOUNTING
select CONTEXT_TRACKING
help
Select this option to enable task and CPU time accounting on full
dynticks systems. This accounting is implemented by watching every
kernel-user boundaries using the context tracking subsystem.
The accounting is thus performed at the expense of some significant
overhead.
For now this is only useful if you are working on the full
dynticks subsystem development.
If unsure, say N.
2012-09-09 16:56:31 +04:00
config IRQ_TIME_ACCOUNTING
bool "Fine granularity task level IRQ time accounting"
2013-04-26 17:16:31 +04:00
depends on HAVE_IRQ_TIME_ACCOUNTING && !NO_HZ_FULL
2012-09-09 16:56:31 +04:00
help
Select this option to enable fine granularity task irq time
accounting. This is done by reading a timestamp on each
transitions between softirq and hardirq state, so there can be a
small performance impact.
If in doubt, say N here.
endchoice
2005-04-17 02:20:36 +04:00
config BSD_PROCESS_ACCT
bool "BSD Process Accounting"
help
If you say Y here, a user level program will be able to instruct the
kernel (via a special system call) to write process accounting
information to a file: whenever a process exits, information about
that process will be appended to the file by the kernel. The
information includes things such as creation time, owning user,
command name, memory usage, controlling terminal etc. (the complete
list is in the struct acct in <file:include/linux/acct.h>). It is
up to the user level program to do useful things with this
information. This is generally a good idea, so say Y.
config BSD_PROCESS_ACCT_V3
bool "BSD Process Accounting version 3 file format"
depends on BSD_PROCESS_ACCT
default n
help
If you say Y here, the process accounting information is written
in a new file format that also logs the process IDs of each
process and it's parent. Note that this file format is incompatible
with previous v0/v1/v2 file formats, so you will need updated tools
for processing it. A preliminary version of these tools is available
2008-06-18 12:45:13 +04:00
at <http://www.gnu.org/software/acct/>.
2005-04-17 02:20:36 +04:00
2006-07-14 11:24:40 +04:00
config TASKSTATS
2012-10-02 22:19:29 +04:00
bool "Export task/process statistics through netlink"
2006-07-14 11:24:40 +04:00
depends on NET
default n
help
Export selected statistics for tasks/processes through the
generic netlink interface. Unlike BSD process accounting, the
statistics are available during the lifetime of tasks/processes as
responses to commands. Like BSD accounting, they are sent to user
space on task exit.
Say N if unsure.
2006-07-14 11:24:36 +04:00
config TASK_DELAY_ACCT
2012-10-02 22:19:29 +04:00
bool "Enable per-task delay accounting"
2006-07-14 11:24:41 +04:00
depends on TASKSTATS
2006-07-14 11:24:36 +04:00
help
Collect information on time spent by a task waiting for system
resources like cpu, synchronous block I/O completion and swapping
in pages. Such statistics can help in setting a task's priorities
relative to other tasks for cpu, io, rss limits etc.
Say N if unsure.
2007-02-10 12:46:44 +03:00
config TASK_XACCT
2012-10-02 22:19:29 +04:00
bool "Enable extended accounting over taskstats"
2007-02-10 12:46:44 +03:00
depends on TASKSTATS
help
Collect extended task accounting data and send the data
to userland for processing over the taskstats interface.
Say N if unsure.
config TASK_IO_ACCOUNTING
2012-10-02 22:19:29 +04:00
bool "Enable per-task storage I/O accounting"
2007-02-10 12:46:44 +03:00
depends on TASK_XACCT
help
Collect information on the number of bytes of storage I/O which this
task has caused.
Say N if unsure.
2012-09-09 16:22:07 +04:00
endmenu # "CPU/Task time and stats accounting"
2010-09-27 16:45:59 +04:00
2009-01-15 23:28:29 +03:00
menu "RCU Subsystem"
choice
prompt "RCU Implementation"
2009-04-03 08:06:25 +04:00
default TREE_RCU
2009-01-15 23:28:29 +03:00
config TREE_RCU
bool "Tree-based hierarchical RCU"
2010-07-21 17:52:40 +04:00
depends on !PREEMPT && SMP
rcu: Don't call wakeup() with rcu_node structure ->lock held
This commit fixes a lockdep-detected deadlock by moving a wake_up()
call out from a rnp->lock critical section. Please see below for
the long version of this story.
On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:
> [12572.705832] ======================================================
> [12572.750317] [ INFO: possible circular locking dependency detected ]
> [12572.796978] 3.10.0-rc3+ #39 Not tainted
> [12572.833381] -------------------------------------------------------
> [12572.862233] trinity-child17/31341 is trying to acquire lock:
> [12572.870390] (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12572.878859]
> but task is already holding lock:
> [12572.894894] (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> [12572.903381]
> which lock already depends on the new lock.
>
> [12572.927541]
> the existing dependency chain (in reverse order) is:
> [12572.943736]
> -> #4 (&ctx->lock){-.-...}:
> [12572.960032] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12572.968337] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12572.976633] [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0
> [12572.984969] [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0
> [12572.993326] [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0
> [12573.001652] [<ffffffff816eacfe>] schedule_user+0x2e/0x70
> [12573.009998] [<ffffffff816ecd64>] retint_careful+0x12/0x2e
> [12573.018321]
> -> #3 (&rq->lock){-.-.-.}:
> [12573.034628] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.042930] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.051248] [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260
> [12573.059579] [<ffffffff810492f5>] do_fork+0x105/0x470
> [12573.067880] [<ffffffff81049686>] kernel_thread+0x26/0x30
> [12573.076202] [<ffffffff816cee63>] rest_init+0x23/0x140
> [12573.084508] [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe
> [12573.092852] [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c
> [12573.101233] [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf
> [12573.109528]
> -> #2 (&p->pi_lock){-.-.-.}:
> [12573.125675] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.133829] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.141964] [<ffffffff8108e881>] try_to_wake_up+0x31/0x320
> [12573.150065] [<ffffffff8108ebe2>] default_wake_function+0x12/0x20
> [12573.158151] [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40
> [12573.166195] [<ffffffff81085398>] __wake_up_common+0x58/0x90
> [12573.174215] [<ffffffff81086909>] __wake_up+0x39/0x50
> [12573.182146] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.190119] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> [12573.198023] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> [12573.205860] [<ffffffff8107a91d>] kthread+0xed/0x100
> [12573.213656] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
> [12573.221379]
> -> #1 (&rsp->gp_wq){..-.-.}:
> [12573.236329] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.243783] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.251178] [<ffffffff810868f3>] __wake_up+0x23/0x50
> [12573.258505] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.265891] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> [12573.273248] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> [12573.280564] [<ffffffff8107a91d>] kthread+0xed/0x100
> [12573.287807] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
Notice the above call chain.
rcu_start_future_gp() is called with the rnp->lock held. Then it calls
rcu_start_gp_advance, which does a wakeup.
You can't do wakeups while holding the rnp->lock, as that would mean
that you could not do a rcu_read_unlock() while holding the rq lock, or
any lock that was taken while holding the rq lock. This is because...
(See below).
> [12573.295067]
> -> #0 (rcu_node_0){..-.-.}:
> [12573.309293] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> [12573.316568] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.323825] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.331081] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12573.338377] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> [12573.345648] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> [12573.352942] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> [12573.360211] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> [12573.367514] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> [12573.374816] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
Notice the above trace.
perf took its own ctx->lock, which can be taken while holding the rq
lock. While holding this lock, it did a rcu_read_unlock(). The
perf_lock_task_context() basically looks like:
rcu_read_lock();
raw_spin_lock(ctx->lock);
rcu_read_unlock();
Now, what looks to have happened, is that we scheduled after taking that
first rcu_read_lock() but before taking the spin lock. When we scheduled
back in and took the ctx->lock, the following rcu_read_unlock()
triggered the "special" code.
The rcu_read_unlock_special() takes the rnp->lock, which gives us a
possible deadlock scenario.
CPU0 CPU1 CPU2
---- ---- ----
rcu_nocb_kthread()
lock(rq->lock);
lock(ctx->lock);
lock(rnp->lock);
wake_up();
lock(rq->lock);
rcu_read_unlock();
rcu_read_unlock_special();
lock(rnp->lock);
lock(ctx->lock);
**** DEADLOCK ****
> [12573.382068]
> other info that might help us debug this:
>
> [12573.403229] Chain exists of:
> rcu_node_0 --> &rq->lock --> &ctx->lock
>
> [12573.424471] Possible unsafe locking scenario:
>
> [12573.438499] CPU0 CPU1
> [12573.445599] ---- ----
> [12573.452691] lock(&ctx->lock);
> [12573.459799] lock(&rq->lock);
> [12573.467010] lock(&ctx->lock);
> [12573.474192] lock(rcu_node_0);
> [12573.481262]
> *** DEADLOCK ***
>
> [12573.501931] 1 lock held by trinity-child17/31341:
> [12573.508990] #0: (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> [12573.516475]
> stack backtrace:
> [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
> [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
> [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
> [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
> [12573.567856] Call Trace:
> [12573.575011] [<ffffffff816e375b>] dump_stack+0x19/0x1b
> [12573.582284] [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f
> [12573.589637] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> [12573.596982] [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100
> [12573.604344] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.611652] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.619030] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.626331] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.633671] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12573.640992] [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0
> [12573.648330] [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40
> [12573.655662] [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0
> [12573.662964] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> [12573.670276] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> [12573.677622] [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370
> [12573.684981] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> [12573.692358] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> [12573.699753] [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50
> [12573.707135] [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [12573.714599] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> [12573.721996] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
This commit delays the wakeup via irq_work(), which is what
perf and ftrace use to perform wakeups in critical sections.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-05-29 01:32:53 +04:00
select IRQ_WORK
2009-01-15 23:28:29 +03:00
help
This option selects the RCU implementation that is
designed for very large SMP system with hundreds or
2009-06-24 04:12:47 +04:00
thousands of CPUs. It also scales down nicely to
smaller systems.
2009-01-15 23:28:29 +03:00
rcu: Merge preemptable-RCU functionality into hierarchical RCU
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23 00:56:52 +04:00
config TREE_PREEMPT_RCU
2010-06-30 03:49:16 +04:00
bool "Preemptible tree-based hierarchical RCU"
2013-01-09 03:48:33 +04:00
depends on PREEMPT
2013-07-25 18:34:25 +04:00
select IRQ_WORK
rcu: Merge preemptable-RCU functionality into hierarchical RCU
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23 00:56:52 +04:00
help
This option selects the RCU implementation that is
designed for very large SMP systems with hundreds or
thousands of CPUs, but for which real-time response
2009-09-13 20:15:08 +04:00
is also required. It also scales down nicely to
smaller systems.
rcu: Merge preemptable-RCU functionality into hierarchical RCU
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23 00:56:52 +04:00
2013-01-09 03:48:33 +04:00
Select this option if you are unsure.
rcu: "Tiny RCU", The Bloatwatch Edition
This patch is a version of RCU designed for !SMP provided for a
small-footprint RCU implementation. In particular, the
implementation of synchronize_rcu() is extremely lightweight and
high performance. It passes rcutorture testing in each of the
four relevant configurations (combinations of NO_HZ and PREEMPT)
on x86. This saves about 1K bytes compared to old Classic RCU
(which is no longer in mainline), and more than three kilobytes
compared to Hierarchical RCU (updated to 2.6.30):
CONFIG_TREE_RCU:
text data bss dec filename
183 4 0 187 kernel/rcupdate.o
2783 520 36 3339 kernel/rcutree.o
3526 Total (vs 4565 for v7)
CONFIG_TREE_PREEMPT_RCU:
text data bss dec filename
263 4 0 267 kernel/rcupdate.o
4594 776 52 5422 kernel/rcutree.o
5689 Total (6155 for v7)
CONFIG_TINY_RCU:
text data bss dec filename
96 4 0 100 kernel/rcupdate.o
734 24 0 758 kernel/rcutiny.o
858 Total (vs 848 for v7)
The above is for x86. Your mileage may vary on other platforms.
Further compression is possible, but is being procrastinated.
Changes from v7 (http://lkml.org/lkml/2009/10/9/388)
o Apply Lai Jiangshan's review comments (aside from
might_sleep() in synchronize_sched(), which is covered by SMP builds).
o Fix up expedited primitives.
Changes from v6 (http://lkml.org/lkml/2009/9/23/293).
o Forward ported to put it into the 2.6.33 stream.
o Added lockdep support.
o Make lightweight rcu_barrier.
Changes from v5 (http://lkml.org/lkml/2009/6/23/12).
o Ported to latest pre-2.6.32 merge window kernel.
- Renamed rcu_qsctr_inc() to rcu_sched_qs().
- Renamed rcu_bh_qsctr_inc() to rcu_bh_qs().
- Provided trivial rcu_cpu_notify().
- Provided trivial exit_rcu().
- Provided trivial rcu_needs_cpu().
- Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h.
o Removed the dependence on EMBEDDED, with a view to making
TINY_RCU default for !SMP at some time in the future.
o Added (trivial) support for expedited grace periods.
Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include:
o Squeeze the size down a bit further by removing the
->completed field from struct rcu_ctrlblk.
o This permits synchronize_rcu() to become the empty function.
Previous concerns about rcutorture were unfounded, as
rcutorture correctly handles a constant value from
rcu_batches_completed() and rcu_batches_completed_bh().
Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include:
o Changed rcu_batches_completed(), rcu_batches_completed_bh()
rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and
rcu_nmi_exit(), to be static inlines, as suggested by David
Howells. Doing this saves about 100 bytes from rcutiny.o.
(The numbers between v3 and this v4 of the patch are not directly
comparable, since they are against different versions of Linux.)
Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include:
o Fix whitespace issues.
o Change short-circuit "||" operator to instead be "+" in order
to fix performance bug noted by "kraai" on LWN.
(http://lwn.net/Articles/324348/)
Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include:
o This version depends on EMBEDDED as well as !SMP, as suggested
by Ingo.
o Updated rcu_needs_cpu() to unconditionally return zero,
permitting the CPU to enter dynticks-idle mode at any time.
This works because callbacks can be invoked upon entry to
dynticks-idle mode.
o Paul is now OK with this being included, based on a poll at
the Kernel Miniconf at linux.conf.au, where about ten people said
that they cared about saving 900 bytes on single-CPU systems.
o Applies to both mainline and tip/core/rcu.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: avi@redhat.com
Cc: mtosatti@redhat.com
LKML-Reference: <12565226351355-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-26 05:03:50 +03:00
config TINY_RCU
bool "UP-only small-memory-footprint RCU"
2011-06-09 03:31:33 +04:00
depends on !PREEMPT && !SMP
rcu: "Tiny RCU", The Bloatwatch Edition
This patch is a version of RCU designed for !SMP provided for a
small-footprint RCU implementation. In particular, the
implementation of synchronize_rcu() is extremely lightweight and
high performance. It passes rcutorture testing in each of the
four relevant configurations (combinations of NO_HZ and PREEMPT)
on x86. This saves about 1K bytes compared to old Classic RCU
(which is no longer in mainline), and more than three kilobytes
compared to Hierarchical RCU (updated to 2.6.30):
CONFIG_TREE_RCU:
text data bss dec filename
183 4 0 187 kernel/rcupdate.o
2783 520 36 3339 kernel/rcutree.o
3526 Total (vs 4565 for v7)
CONFIG_TREE_PREEMPT_RCU:
text data bss dec filename
263 4 0 267 kernel/rcupdate.o
4594 776 52 5422 kernel/rcutree.o
5689 Total (6155 for v7)
CONFIG_TINY_RCU:
text data bss dec filename
96 4 0 100 kernel/rcupdate.o
734 24 0 758 kernel/rcutiny.o
858 Total (vs 848 for v7)
The above is for x86. Your mileage may vary on other platforms.
Further compression is possible, but is being procrastinated.
Changes from v7 (http://lkml.org/lkml/2009/10/9/388)
o Apply Lai Jiangshan's review comments (aside from
might_sleep() in synchronize_sched(), which is covered by SMP builds).
o Fix up expedited primitives.
Changes from v6 (http://lkml.org/lkml/2009/9/23/293).
o Forward ported to put it into the 2.6.33 stream.
o Added lockdep support.
o Make lightweight rcu_barrier.
Changes from v5 (http://lkml.org/lkml/2009/6/23/12).
o Ported to latest pre-2.6.32 merge window kernel.
- Renamed rcu_qsctr_inc() to rcu_sched_qs().
- Renamed rcu_bh_qsctr_inc() to rcu_bh_qs().
- Provided trivial rcu_cpu_notify().
- Provided trivial exit_rcu().
- Provided trivial rcu_needs_cpu().
- Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h.
o Removed the dependence on EMBEDDED, with a view to making
TINY_RCU default for !SMP at some time in the future.
o Added (trivial) support for expedited grace periods.
Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include:
o Squeeze the size down a bit further by removing the
->completed field from struct rcu_ctrlblk.
o This permits synchronize_rcu() to become the empty function.
Previous concerns about rcutorture were unfounded, as
rcutorture correctly handles a constant value from
rcu_batches_completed() and rcu_batches_completed_bh().
Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include:
o Changed rcu_batches_completed(), rcu_batches_completed_bh()
rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and
rcu_nmi_exit(), to be static inlines, as suggested by David
Howells. Doing this saves about 100 bytes from rcutiny.o.
(The numbers between v3 and this v4 of the patch are not directly
comparable, since they are against different versions of Linux.)
Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include:
o Fix whitespace issues.
o Change short-circuit "||" operator to instead be "+" in order
to fix performance bug noted by "kraai" on LWN.
(http://lwn.net/Articles/324348/)
Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include:
o This version depends on EMBEDDED as well as !SMP, as suggested
by Ingo.
o Updated rcu_needs_cpu() to unconditionally return zero,
permitting the CPU to enter dynticks-idle mode at any time.
This works because callbacks can be invoked upon entry to
dynticks-idle mode.
o Paul is now OK with this being included, based on a poll at
the Kernel Miniconf at linux.conf.au, where about ten people said
that they cared about saving 900 bytes on single-CPU systems.
o Applies to both mainline and tip/core/rcu.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: avi@redhat.com
Cc: mtosatti@redhat.com
LKML-Reference: <12565226351355-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-26 05:03:50 +03:00
help
This option selects the RCU implementation that is
designed for UP systems from which real-time response
is not required. This option greatly reduces the
memory footprint of RCU.
2009-01-15 23:28:29 +03:00
endchoice
2010-06-30 03:49:16 +04:00
config PREEMPT_RCU
2013-03-27 19:44:00 +04:00
def_bool TREE_PREEMPT_RCU
2010-06-30 03:49:16 +04:00
help
This option enables preemptible-RCU code that is common between
2014-05-05 02:41:21 +04:00
TREE_PREEMPT_RCU and, in the old days, TINY_PREEMPT_RCU.
2010-06-30 03:49:16 +04:00
2014-06-28 00:42:20 +04:00
config TASKS_RCU
bool "Task_based RCU implementation using voluntary context switch"
default n
help
This option enables a task-based RCU implementation that uses
only voluntary context switch (not preemption!), idle, and
user-mode execution as quiescent states.
If unsure, say N.
2012-10-19 23:49:17 +04:00
config RCU_STALL_COMMON
def_bool ( TREE_RCU || TREE_PREEMPT_RCU || RCU_TRACE )
help
This option enables RCU CPU stall code that is common between
the TINY and TREE variants of RCU. The purpose is to allow
the tiny variants to disable RCU CPU stall warnings, while
making these warnings mandatory for the tree variants.
2012-11-27 22:33:25 +04:00
config CONTEXT_TRACKING
bool
2012-07-11 22:26:30 +04:00
config RCU_USER_QS
bool "Consider userspace as in RCU extended quiescent state"
2012-11-27 22:33:25 +04:00
depends on HAVE_CONTEXT_TRACKING && SMP
select CONTEXT_TRACKING
2012-07-11 22:26:30 +04:00
help
This option sets hooks on kernel / userspace boundaries and
puts RCU in extended quiescent state when the CPU runs in
userspace. It means that when a CPU runs in userspace, it is
excluded from the global RCU state machine and thus doesn't
2012-10-24 22:07:09 +04:00
try to keep the timer tick on for RCU.
2012-07-11 22:26:30 +04:00
2012-10-11 03:48:28 +04:00
Unless you want to hack and help the development of the full
2012-11-27 22:33:25 +04:00
dynticks mode, you shouldn't enable this option. It also
2012-10-24 22:07:09 +04:00
adds unnecessary overhead.
2012-10-11 03:48:28 +04:00
If unsure say N
2012-11-27 22:33:25 +04:00
config CONTEXT_TRACKING_FORCE
bool "Force context tracking"
depends on CONTEXT_TRACKING
2013-07-24 23:59:29 +04:00
default y if !NO_HZ_FULL
2012-07-11 22:26:40 +04:00
help
2013-07-24 23:59:29 +04:00
The major pre-requirement for full dynticks to work is to
support the context tracking subsystem. But there are also
other dependencies to provide in order to make the full
dynticks working.
This option stands for testing when an arch implements the
context tracking backend but doesn't yet fullfill all the
requirements to make the full dynticks feature working.
Without the full dynticks, there is no way to test the support
for context tracking and the subsystems that rely on it: RCU
userspace extended quiescent state and tickless cputime
accounting. This option copes with the absence of the full
dynticks subsystem by forcing the context tracking on all
CPUs in the system.
2013-10-24 18:07:47 +04:00
Say Y only if you're working on the development of an
2013-07-24 23:59:29 +04:00
architecture backend for the context tracking.
Say N otherwise, this option brings an overhead that you
don't want in production.
2012-10-11 03:48:28 +04:00
2009-01-15 23:28:29 +03:00
config RCU_FANOUT
int "Tree-based hierarchical RCU fanout value"
range 2 64 if 64BIT
range 2 32 if !64BIT
rcu: Merge preemptable-RCU functionality into hierarchical RCU
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23 00:56:52 +04:00
depends on TREE_RCU || TREE_PREEMPT_RCU
2009-01-15 23:28:29 +03:00
default 64 if 64BIT
default 32 if !64BIT
help
This option controls the fanout of hierarchical implementations
of RCU, allowing RCU to work efficiently on machines with
2010-08-05 04:31:12 +04:00
large numbers of CPUs. This value must be at least the fourth
root of NR_CPUS, which allows NR_CPUS to be insanely large.
The default value of RCU_FANOUT should be used for production
systems, but if you are stress-testing the RCU implementation
itself, small RCU_FANOUT values allow you to test large-system
code paths on small(er) systems.
2009-01-15 23:28:29 +03:00
Select a specific number if testing RCU itself.
Take the default if unsure.
2012-04-19 23:20:14 +04:00
config RCU_FANOUT_LEAF
int "Tree-based hierarchical RCU leaf-level fanout value"
range 2 RCU_FANOUT if 64BIT
range 2 RCU_FANOUT if !64BIT
depends on TREE_RCU || TREE_PREEMPT_RCU
default 16
help
This option controls the leaf-level fanout of hierarchical
implementations of RCU, and allows trading off cache misses
against lock contention. Systems that synchronize their
scheduling-clock interrupts for energy-efficiency reasons will
want the default because the smaller leaf-level fanout keeps
lock contention levels acceptably low. Very large systems
(hundreds or thousands of CPUs) will instead want to set this
value to the maximum value possible in order to reduce the
number of cache misses incurred during RCU's grace-period
initialization. These systems tend to run CPU-bound, and thus
are not helped by synchronized interrupts, and thus tend to
skew them, which reduces lock contention enough that large
leaf-level fanouts work well.
Select a specific number if testing RCU itself.
Select the maximum permissible value for large systems.
Take the default if unsure.
2009-01-15 23:28:29 +03:00
config RCU_FANOUT_EXACT
bool "Disable tree-based hierarchical RCU auto-balancing"
rcu: Merge preemptable-RCU functionality into hierarchical RCU
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23 00:56:52 +04:00
depends on TREE_RCU || TREE_PREEMPT_RCU
2009-01-15 23:28:29 +03:00
default n
help
This option forces use of the exact RCU_FANOUT value specified,
regardless of imbalances in the hierarchy. This is useful for
testing RCU itself, and might one day be useful on systems with
strong NUMA behavior.
Without RCU_FANOUT_EXACT, the code will balance the hierarchy.
Say N if unsure.
2010-02-23 04:04:59 +03:00
config RCU_FAST_NO_HZ
bool "Accelerate last non-dyntick-idle CPU's grace periods"
2011-08-11 01:21:01 +04:00
depends on NO_HZ_COMMON && SMP
2010-02-23 04:04:59 +03:00
default n
help
2012-12-28 23:30:36 +04:00
This option permits CPUs to enter dynticks-idle state even if
they have RCU callbacks queued, and prevents RCU from waking
these CPUs up more than roughly once every four jiffies (by
default, you can adjust this using the rcutree.rcu_idle_gp_delay
parameter), thus improving energy efficiency. On the other
hand, this option increases the duration of RCU grace periods,
for example, slowing down synchronize_rcu().
2012-10-07 20:26:13 +04:00
2012-12-28 23:30:36 +04:00
Say Y if energy efficiency is critically important, and you
don't care about increased grace-period durations.
2010-02-23 04:04:59 +03:00
Say N if you are unsure.
2009-01-15 23:28:29 +03:00
config TREE_RCU_TRACE
rcu: Merge preemptable-RCU functionality into hierarchical RCU
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23 00:56:52 +04:00
def_bool RCU_TRACE && ( TREE_RCU || TREE_PREEMPT_RCU )
2009-01-15 23:28:29 +03:00
select DEBUG_FS
help
rcu: Merge preemptable-RCU functionality into hierarchical RCU
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23 00:56:52 +04:00
This option provides tracing for the TREE_RCU and
TREE_PREEMPT_RCU implementations, permitting Makefile to
trivially select kernel/rcutree_trace.c.
2009-01-15 23:28:29 +03:00
2010-09-28 04:25:23 +04:00
config RCU_BOOST
bool "Enable RCU priority boosting"
2011-02-07 23:47:15 +03:00
depends on RT_MUTEXES && PREEMPT_RCU
2010-09-28 04:25:23 +04:00
default n
help
This option boosts the priority of preempted RCU readers that
block the current preemptible RCU grace period for too long.
This option also prevents heavy loads from blocking RCU
callback invocation for all flavors of RCU.
Say Y here if you are working with real-time apps or heavy loads
Say N here if you are unsure.
config RCU_BOOST_PRIO
int "Real-time priority to boost RCU readers to"
range 1 99
depends on RCU_BOOST
default 1
help
2012-04-19 03:20:18 +04:00
This option specifies the real-time priority to which long-term
preempted RCU readers are to be boosted. If you are working
with a real-time application that has one or more CPU-bound
threads running at a real-time priority level, you should set
RCU_BOOST_PRIO to a priority higher then the highest-priority
real-time CPU-bound thread. The default RCU_BOOST_PRIO value
of 1 is appropriate in the common case, which is real-time
applications that do not have any CPU-bound threads.
Some real-time applications might not have a single real-time
thread that saturates a given CPU, but instead might have
multiple real-time threads that, taken together, fully utilize
that CPU. In this case, you should set RCU_BOOST_PRIO to
a priority higher than the lowest-priority thread that is
conspiring to prevent the CPU from running any non-real-time
tasks. For example, if one thread at priority 10 and another
thread at priority 5 are between themselves fully consuming
the CPU time on a given CPU, then RCU_BOOST_PRIO should be
set to priority 6 or higher.
2010-09-28 04:25:23 +04:00
Specify the real-time priority, or take the default if unsure.
config RCU_BOOST_DELAY
int "Milliseconds to delay boosting after RCU grace-period start"
range 0 3000
depends on RCU_BOOST
default 500
help
This option specifies the time to wait after the beginning of
a given grace period before priority-boosting preempted RCU
readers blocking that grace period. Note that any RCU reader
blocking an expedited RCU grace period is boosted immediately.
Accept the default if unsure.
2012-08-20 08:35:53 +04:00
config RCU_NOCB_CPU
2013-03-29 07:48:36 +04:00
bool "Offload RCU callback processing from boot-selected CPUs"
2012-08-20 08:35:53 +04:00
depends on TREE_RCU || TREE_PREEMPT_RCU
default n
help
Use this option to reduce OS jitter for aggressive HPC or
real-time workloads. It can also be used to offload RCU
callback invocation to energy-efficient CPUs in battery-powered
asymmetric multiprocessors.
This option offloads callback invocation from the set of
CPUs specified at boot time by the rcu_nocbs parameter.
rcu: Distinguish "rcuo" kthreads by RCU flavor
Currently, the per-no-CBs-CPU kthreads are named "rcuo" followed by
the CPU number, for example, "rcuo". This is problematic given that
there are either two or three RCU flavors, each of which gets a per-CPU
kthread with exactly the same name. This commit therefore introduces
a one-letter abbreviation for each RCU flavor, namely 'b' for RCU-bh,
'p' for RCU-preempt, and 's' for RCU-sched. This abbreviation is used
to distinguish the "rcuo" kthreads, for example, for CPU 0 we would have
"rcuob/0", "rcuop/0", and "rcuos/0".
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
2012-12-03 20:16:28 +04:00
For each such CPU, a kthread ("rcuox/N") will be created to
invoke callbacks, where the "N" is the CPU being offloaded,
and where the "x" is "b" for RCU-bh, "p" for RCU-preempt, and
"s" for RCU-sched. Nothing prevents this kthread from running
on the specified CPUs, but (1) the kthreads may be preempted
between each callback, and (2) affinity or cgroups can be used
to force the kthreads to run on whatever set of CPUs is desired.
2012-08-20 08:35:53 +04:00
2013-01-08 01:37:42 +04:00
Say Y here if you want to help to debug reduced OS jitter.
2012-08-20 08:35:53 +04:00
Say N here if you are unsure.
2013-02-11 22:23:27 +04:00
choice
prompt "Build-forced no-CBs CPUs"
default RCU_NOCB_CPU_NONE
help
2013-05-01 01:49:42 +04:00
This option allows no-CBs CPUs (whose RCU callbacks are invoked
from kthreads rather than from softirq context) to be specified
at build time. Additional no-CBs CPUs may be specified by
the rcu_nocbs= boot parameter.
2013-02-11 22:23:27 +04:00
config RCU_NOCB_CPU_NONE
bool "No build_forced no-CBs CPUs"
2014-07-25 22:21:47 +04:00
depends on RCU_NOCB_CPU
2013-02-11 22:23:27 +04:00
help
This option does not force any of the CPUs to be no-CBs CPUs.
Only CPUs designated by the rcu_nocbs= boot parameter will be
2013-05-01 01:49:42 +04:00
no-CBs CPUs, whose RCU callbacks will be invoked by per-CPU
kthreads whose names begin with "rcuo". All other CPUs will
invoke their own RCU callbacks in softirq context.
Select this option if you want to choose no-CBs CPUs at
boot time, for example, to allow testing of different no-CBs
configurations without having to rebuild the kernel each time.
2013-02-11 22:23:27 +04:00
config RCU_NOCB_CPU_ZERO
bool "CPU 0 is a build_forced no-CBs CPU"
2014-07-25 22:21:47 +04:00
depends on RCU_NOCB_CPU
2013-02-11 22:23:27 +04:00
help
2013-05-01 01:49:42 +04:00
This option forces CPU 0 to be a no-CBs CPU, so that its RCU
callbacks are invoked by a per-CPU kthread whose name begins
with "rcuo". Additional CPUs may be designated as no-CBs
CPUs using the rcu_nocbs= boot parameter will be no-CBs CPUs.
All other CPUs will invoke their own RCU callbacks in softirq
context.
2013-02-11 22:23:27 +04:00
Select this if CPU 0 needs to be a no-CBs CPU for real-time
2013-05-01 01:49:42 +04:00
or energy-efficiency reasons, but the real reason it exists
is to ensure that randconfig testing covers mixed systems.
2013-02-11 22:23:27 +04:00
config RCU_NOCB_CPU_ALL
bool "All CPUs are build_forced no-CBs CPUs"
depends on RCU_NOCB_CPU
help
This option forces all CPUs to be no-CBs CPUs. The rcu_nocbs=
2013-05-01 01:49:42 +04:00
boot parameter will be ignored. All CPUs' RCU callbacks will
be executed in the context of per-CPU rcuo kthreads created for
this purpose. Assuming that the kthreads whose names start with
"rcuo" are bound to "housekeeping" CPUs, this reduces OS jitter
on the remaining CPUs, but might decrease memory locality during
RCU-callback invocation, thus potentially degrading throughput.
2013-02-11 22:23:27 +04:00
Select this if all CPUs need to be no-CBs CPUs for real-time
or energy-efficiency reasons.
endchoice
2009-01-15 23:28:29 +03:00
endmenu # "RCU Subsystem"
2014-08-09 01:25:41 +04:00
config BUILD_BIN2C
bool
default n
2005-04-17 02:20:36 +04:00
config IKCONFIG
2006-10-01 10:27:25 +04:00
tristate "Kernel .config support"
2014-08-09 01:25:41 +04:00
select BUILD_BIN2C
2005-04-17 02:20:36 +04:00
---help---
This option enables the complete Linux kernel ".config" file
contents to be saved in the kernel. It provides documentation
of which kernel options are used in a running kernel or in an
on-disk kernel. This information can be extracted from the kernel
image file with the script scripts/extract-ikconfig and used as
input to rebuild the current kernel or to build another kernel.
It can also be extracted from a running kernel by reading
/proc/config.gz if enabled (below).
config IKCONFIG_PROC
bool "Enable access to .config through /proc/config.gz"
depends on IKCONFIG && PROC_FS
---help---
This option enables access to the kernel configuration file
through /proc/config.gz.
2007-05-08 11:31:15 +04:00
config LOG_BUF_SHIFT
int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
range 12 21
2008-04-29 11:58:58 +04:00
default 17
2014-10-04 03:00:54 +04:00
depends on PRINTK
2007-05-08 11:31:15 +04:00
help
2014-08-07 03:08:56 +04:00
Select the minimal kernel log buffer size as a power of 2.
The final size is affected by LOG_CPU_MAX_BUF_SHIFT config
parameter, see below. Any higher size also might be forced
by "log_buf_len" boot parameter.
2008-04-29 11:58:58 +04:00
Examples:
2014-08-07 03:08:56 +04:00
17 => 128 KB
2008-04-29 11:58:58 +04:00
16 => 64 KB
2014-08-07 03:08:56 +04:00
15 => 32 KB
14 => 16 KB
2007-05-08 11:31:15 +04:00
13 => 8 KB
12 => 4 KB
2014-08-07 03:08:56 +04:00
config LOG_CPU_MAX_BUF_SHIFT
int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
2014-10-14 02:51:11 +04:00
depends on SMP
2014-08-07 03:08:56 +04:00
range 0 21
default 12 if !BASE_SMALL
default 0 if BASE_SMALL
2014-10-04 03:00:54 +04:00
depends on PRINTK
2014-08-07 03:08:56 +04:00
help
This option allows to increase the default ring buffer size
according to the number of CPUs. The value defines the contribution
of each CPU as a power of 2. The used space is typically only few
lines however it might be much more when problems are reported,
e.g. backtraces.
The increased size means that a new buffer has to be allocated and
the original static one is unused. It makes sense only on systems
with more CPUs. Therefore this value is used only when the sum of
contributions is greater than the half of the default kernel ring
buffer as defined by LOG_BUF_SHIFT. The default values are set
so that more than 64 CPUs are needed to trigger the allocation.
Also this option is ignored when "log_buf_len" kernel parameter is
used as it forces an exact (power of two) size of the ring buffer.
The number of possible CPUs is used for this computation ignoring
hotplugging making the compuation optimal for the the worst case
scenerio while allowing a simple algorithm to be used from bootup.
Examples shift values and their meaning:
17 => 128 KB for each CPU
16 => 64 KB for each CPU
15 => 32 KB for each CPU
14 => 16 KB for each CPU
13 => 8 KB for each CPU
12 => 4 KB for each CPU
2008-05-06 01:19:50 +04:00
#
# Architectures with an unreliable sched_clock() should select this:
#
config HAVE_UNSTABLE_SCHED_CLOCK
bool
2013-06-02 10:39:40 +04:00
config GENERIC_SCHED_CLOCK
bool
2012-10-04 03:50:47 +04:00
#
# For architectures that want to enable the support for NUMA-affine scheduler
# balancing logic:
#
config ARCH_SUPPORTS_NUMA_BALANCING
bool
2013-11-18 21:27:06 +04:00
#
# For architectures that know their GCC __int128 support is sound
#
config ARCH_SUPPORTS_INT128
bool
2012-10-04 03:50:47 +04:00
# For architectures that (ab)use NUMA to represent different memory regions
# all cpu-local but of different latencies, such as SuperH.
#
config ARCH_WANT_NUMA_VARIABLE_LOCALITY
bool
2012-11-22 15:16:36 +04:00
config NUMA_BALANCING_DEFAULT_ENABLED
bool "Automatically enable NUMA aware memory/task placement"
default y
depends on NUMA_BALANCING
help
2013-08-13 19:06:50 +04:00
If set, automatic NUMA balancing will be enabled if running on a NUMA
2012-11-22 15:16:36 +04:00
machine.
2012-10-04 03:50:47 +04:00
config NUMA_BALANCING
bool "Memory placement aware NUMA scheduler"
depends on ARCH_SUPPORTS_NUMA_BALANCING
depends on !ARCH_WANT_NUMA_VARIABLE_LOCALITY
depends on SMP && NUMA && MIGRATION
help
This option adds support for automatic NUMA aware memory/task placement.
The mechanism is quite primitive and is based on migrating memory when
2013-08-13 19:06:50 +04:00
it has references to the node the task is running on.
2012-10-04 03:50:47 +04:00
This system will be inactive on UMA systems.
2009-01-16 00:50:58 +03:00
menuconfig CGROUPS
boolean "Control Group support"
cgroup: convert to kernfs
cgroup filesystem code was derived from the original sysfs
implementation which was heavily intertwined with vfs objects and
locking with the goal of re-using the existing vfs infrastructure.
That experiment turned out rather disastrous and sysfs switched, a
long time ago, to distributed filesystem model where a separate
representation is maintained which is queried by vfs. Unfortunately,
cgroup stuck with the failed experiment all these years and
accumulated even more problems over time.
Locking and object lifetime management being entangled with vfs is
probably the most egregious. vfs is never designed to be misused like
this and cgroup ends up jumping through various convoluted dancing to
make things work. Even then, operations across multiple cgroups can't
be done safely as it'll deadlock with rename locking.
Recently, kernfs is separated out from sysfs so that it can be used by
users other than sysfs. This patch converts cgroup to use kernfs,
which will bring the following benefits.
* Separation from vfs internals. Locking and object lifetime
management is contained in cgroup proper making things a lot
simpler. This removes significant amount of locking convolutions,
hairy object lifetime rules and the restriction on multi-cgroup
operations.
* Can drop a lot of code to implement filesystem interface as most are
provided by kernfs.
* Proper "severing" semantics, which allows controllers to not worry
about lingering file accesses after offline.
While the preceding patches did as much as possible to make the
transition less painful, large part of the conversion has to be one
discrete step making this patch rather large. The rest of the commit
message lists notable changes in different areas.
Overall
-------
* vfs constructs replaced with kernfs ones. cgroup->dentry w/ ->kn,
cgroupfs_root->sb w/ ->kf_root.
* All dentry accessors are removed. Helpers to map from kernfs
constructs are added.
* All vfs plumbing around dentry, inode and bdi removed.
* cgroup_mount() now directly looks for matching root and then
proceeds to create a new one if not found.
Synchronization and object lifetime
-----------------------------------
* vfs inode locking removed. Among other things, this removes the
need for the convolution in cgroup_cfts_commit(). Future patches
will further simplify it.
* vfs refcnting replaced with cgroup internal ones. cgroup->refcnt,
cgroupfs_root->refcnt added. cgroup_put_root() now directly puts
root->refcnt and when it reaches zero proceeds to destroy it thus
merging cgroup_put_root() and the former cgroup_kill_sb().
Simliarly, cgroup_put() now directly schedules cgroup_free_rcu()
when refcnt reaches zero.
* Unlike before, kernfs objects don't hold onto cgroup objects. When
cgroup destroys a kernfs node, all existing operations are drained
and the association is broken immediately. The same for
cgroupfs_roots and mounts.
* All operations which come through kernfs guarantee that the
associated cgroup is and stays valid for the duration of operation;
however, there are two paths which need to find out the associated
cgroup from dentry without going through kernfs -
css_tryget_from_dir() and cgroupstats_build(). For these two,
kernfs_node->priv is RCU managed so that they can dereference it
under RCU read lock.
File and directory handling
---------------------------
* File and directory operations converted to kernfs_ops and
kernfs_syscall_ops.
* xattrs is implicitly supported by kernfs. No need to worry about it
from cgroup. This means that "xattr" mount option is no longer
necessary. A future patch will add a deprecated warning message
when sane_behavior.
* When cftype->max_write_len > PAGE_SIZE, it's necessary to make a
private copy of one of the kernfs_ops to set its atomic_write_len.
cftype->kf_ops is added and cgroup_init/exit_cftypes() are updated
to handle it.
* cftype->lockdep_key added so that kernfs lockdep annotation can be
per cftype.
* Inidividual file entries and open states are now managed by kernfs.
No need to worry about them from cgroup. cfent, cgroup_open_file
and their friends are removed.
* kernfs_nodes are created deactivated and kernfs_activate()
invocations added to places where creation of new nodes are
committed.
* cgroup_rmdir() uses kernfs_[un]break_active_protection() for
self-removal.
v2: - Li pointed out in an earlier patch that specifying "name="
during mount without subsystem specification should succeed if
there's an existing hierarchy with a matching name although it
should fail with -EINVAL if a new hierarchy should be created.
Prior to the conversion, this used by handled by deferring
failure from NULL return from cgroup_root_from_opts(), which was
necessary because root was being created before checking for
existing ones. Note that cgroup_root_from_opts() returned an
ERR_PTR() value for error conditions which require immediate
mount failure.
As we now have separate search and creation steps, deferring
failure from cgroup_root_from_opts() is no longer necessary.
cgroup_root_from_opts() is updated to always return ERR_PTR()
value on failure.
- The logic to match existing roots is updated so that a mount
attempt with a matching name but different subsys_mask are
rejected. This was handled by a separate matching loop under
the comment "Check for name clashes with existing mounts" but
got lost during conversion. Merge the check into the main
search loop.
- Add __rcu __force casting in RCU_INIT_POINTER() in
cgroup_destroy_locked() to avoid the sparse address space
warning reported by kbuild test bot. Maybe we want an explicit
interface to use kn->priv as RCU protected pointer?
v3: Make CONFIG_CGROUPS select CONFIG_KERNFS.
v4: Rebased on top of 0ab02ca8f887 ("cgroup: protect modifications to
cgroup_idr with cgroup_mutex").
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: kbuild test robot fengguang.wu@intel.com>
2014-02-11 20:52:49 +04:00
select KERNFS
2009-01-08 05:07:30 +03:00
help
2009-01-16 00:50:58 +03:00
This option adds support for grouping sets of processes together, for
2009-01-08 05:07:30 +03:00
use with process control subsystems such as Cpusets, CFS, memory
controls or device isolation.
See
- Documentation/scheduler/sched-design-CFS.txt (CFS)
2009-01-16 00:50:59 +03:00
- Documentation/cgroups/ (features for grouping, isolation
and resource control)
2009-01-08 05:07:30 +03:00
Say N if unsure.
2009-01-16 00:50:58 +03:00
if CGROUPS
2009-01-08 05:07:30 +03:00
config CGROUP_DEBUG
bool "Example debug cgroup subsystem"
default n
help
This option enables a simple cgroup subsystem that
exports useful debugging information about the cgroups
2009-01-16 00:50:58 +03:00
framework.
2009-01-08 05:07:30 +03:00
2009-01-16 00:50:58 +03:00
Say N if unsure.
2009-01-08 05:07:30 +03:00
config CGROUP_FREEZER
2009-01-16 00:50:58 +03:00
bool "Freezer cgroup subsystem"
help
Provides a way to freeze and unfreeze all tasks in a
2009-01-08 05:07:30 +03:00
cgroup.
config CGROUP_DEVICE
bool "Device controller for cgroups"
help
Provides a cgroup implementing whitelists for devices which
a process in the cgroup can mknod or open.
config CPUSETS
bool "Cpuset support"
help
This option will let you create and manage CPUSETs which
allow dynamically partitioning a system into sets of CPUs and
Memory Nodes and assigning tasks to run only within those sets.
This is primarily useful on large SMP or NUMA systems.
Say N if unsure.
2009-01-16 00:50:58 +03:00
config PROC_PID_CPUSET
bool "Include legacy /proc/<pid>/cpuset file"
depends on CPUSETS
default y
2007-12-02 22:04:49 +03:00
config CGROUP_CPUACCT
bool "Simple CPU accounting cgroup subsystem"
help
Provides a simple Resource Controller for monitoring the
2009-01-16 00:50:58 +03:00
total CPU consumed by the tasks in a cgroup.
2007-12-02 22:04:49 +03:00
2008-02-07 11:13:49 +03:00
config RESOURCE_COUNTERS
bool "Resource counters"
help
This option enables controller independent resource accounting
2009-01-16 00:50:58 +03:00
infrastructure that works with cgroups.
2008-02-07 11:13:49 +03:00
2012-08-01 03:43:02 +04:00
config MEMCG
2008-03-05 01:28:39 +03:00
bool "Memory Resource Controller for Control Groups"
2010-10-28 02:34:39 +04:00
depends on RESOURCE_COUNTERS
2013-11-23 03:20:42 +04:00
select EVENTFD
2008-03-05 01:28:39 +03:00
help
2008-10-30 00:01:06 +03:00
Provides a memory resource controller that manages both anonymous
2009-02-04 12:12:08 +03:00
memory and page cache. (See Documentation/cgroups/memory.txt)
2008-03-05 01:28:39 +03:00
Note that setting this option increases fixed memory overhead
2008-10-30 00:01:06 +03:00
associated with each page of memory in the system. By this,
2013-07-04 02:03:30 +04:00
8(16)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory
2008-10-30 00:01:06 +03:00
usage tracking struct at boot. Total amount of this is printed out
at boot.
2008-03-05 01:28:39 +03:00
Only enable when you're ok with these trade offs and really
2008-10-30 00:01:06 +03:00
sure you need the memory resource controller. Even when you enable
this, you can set "cgroup_disable=memory" at your boot option to
disable memory resource controller and you can avoid overheads.
2009-01-08 05:07:35 +03:00
(and lose benefits of memory resource controller)
2008-03-05 01:28:39 +03:00
2012-08-01 03:43:02 +04:00
config MEMCG_SWAP
2010-08-11 05:02:56 +04:00
bool "Memory Resource Controller Swap Extension"
2012-08-01 03:43:02 +04:00
depends on MEMCG && SWAP
2009-01-08 05:07:57 +03:00
help
Add swap management feature to memory resource controller. When you
enable this, you can limit mem+swap usage per cgroup. In other words,
when you disable this, memory resource controller has no cares to
usage of swap...a process can exhaust all of the swap. This extension
is useful when you want to avoid exhaustion swap but this itself
adds more overheads and consumes memory for remembering information.
Especially if you use 32bit system or small memory system, please
be careful about enabling this. When memory resource controller
is disabled by boot option, this will be automatically disabled and
there will be no overhead from this. Even when you set this config=y,
2011-07-26 04:12:12 +04:00
if boot option "swapaccount=0" is set, swap will not be accounted.
2009-04-03 03:57:47 +04:00
Now, memory usage of swap_cgroup is 2 bytes per entry. If swap page
size is 4096bytes, 512k per 1Gbytes of swap.
2012-08-01 03:43:02 +04:00
config MEMCG_SWAP_ENABLED
2010-11-24 23:57:08 +03:00
bool "Memory Resource Controller Swap Extension enabled by default"
2012-08-01 03:43:02 +04:00
depends on MEMCG_SWAP
2010-11-24 23:57:08 +03:00
default y
help
Memory Resource Controller Swap Extension comes with its price in
a bigger memory consumption. General purpose distribution kernels
2010-12-18 00:32:36 +03:00
which want to enable the feature but keep it disabled by default
2013-08-23 03:35:46 +04:00
and let the user enable it by swapaccount=1 boot command line
2010-11-24 23:57:08 +03:00
parameter should have this option unselected.
For those who want to have the feature enabled by default should
select this option (if, for some reason, they need to disable it
2011-07-26 04:12:12 +04:00
then swapaccount=0 does the trick).
2012-08-01 03:43:02 +04:00
config MEMCG_KMEM
2012-10-02 22:19:29 +04:00
bool "Memory Resource Controller Kernel Memory accounting"
depends on MEMCG
2012-12-19 02:21:47 +04:00
depends on SLUB || SLAB
2011-12-12 01:47:01 +04:00
help
The Kernel Memory extension for Memory Resource Controller can limit
the amount of memory used by kernel objects in the system. Those are
fundamentally different from the entities handled by the standard
Memory Controller, which are page-based, and can be swapped. Users of
the kmem extension can use it to guarantee that no group of processes
will ever exhaust kernel resources alone.
2009-01-08 05:07:57 +03:00
2014-06-05 03:07:28 +04:00
WARNING: Current implementation lacks reclaim support. That means
allocation attempts will fail when close to the limit even if there
are plenty of kmem available for reclaim. That makes this option
unusable in real life so DO NOT SELECT IT unless for development
purposes.
2012-08-01 03:42:12 +04:00
config CGROUP_HUGETLB
bool "HugeTLB Resource Controller for Control Groups"
2012-10-02 22:19:29 +04:00
depends on RESOURCE_COUNTERS && HUGETLB_PAGE
2012-08-01 03:42:12 +04:00
default n
help
Provides a cgroup Resource Controller for HugeTLB pages.
When you enable this, you can put a per cgroup limit on HugeTLB usage.
The limit is enforced during page fault. Since HugeTLB doesn't
support page reclaim, enforcing the limit at page fault time implies
that, the application will get SIGBUS signal if it tries to access
HugeTLB pages beyond its limit. This requires the application to know
beforehand how much HugeTLB pages it would require for its use. The
control group is tracked in the third page lru pointer. This means
that we cannot use the controller with huge page less than 3 pages.
2011-02-14 12:20:01 +03:00
config CGROUP_PERF
bool "Enable perf_event per-cpu per-container group (cgroup) monitoring"
depends on PERF_EVENTS && CGROUPS
help
This option extends the per-cpu mode to restrict monitoring to
2011-03-03 09:26:20 +03:00
threads which belong to the cgroup specified and run on the
2011-02-14 12:20:01 +03:00
designated cpu.
Say N if unsure.
2010-01-20 15:26:18 +03:00
menuconfig CGROUP_SCHED
bool "Group CPU scheduler"
default n
help
This feature lets CPU scheduler recognize task groups and control CPU
bandwidth allocation to such task groups. It uses cgroups to group
tasks.
if CGROUP_SCHED
config FAIR_GROUP_SCHED
bool "Group scheduling for SCHED_OTHER"
depends on CGROUP_SCHED
default CGROUP_SCHED
2011-07-21 20:43:28 +04:00
config CFS_BANDWIDTH
bool "CPU bandwidth provisioning for FAIR_GROUP_SCHED"
depends on FAIR_GROUP_SCHED
default n
help
This option allows users to define CPU bandwidth rates (limits) for
tasks running within the fair group scheduler. Groups with no limit
set are considered to be unconstrained and will run with no
restriction.
See tip/Documentation/scheduler/sched-bwc.txt for more information.
2010-01-20 15:26:18 +03:00
config RT_GROUP_SCHED
bool "Group scheduling for SCHED_RR/FIFO"
depends on CGROUP_SCHED
default n
help
This feature lets you explicitly allocate real CPU bandwidth
2010-03-24 08:17:19 +03:00
to task groups. If enabled, it will also make it impossible to
2010-01-20 15:26:18 +03:00
schedule realtime tasks for non-root users until you allocate
realtime bandwidth for them.
See Documentation/scheduler/sched-rt-group.txt for more information.
endif #CGROUP_SCHED
2010-04-26 21:27:56 +04:00
config BLK_CGROUP
2012-03-06 01:14:54 +04:00
bool "Block IO controller"
2010-10-28 02:34:39 +04:00
depends on BLOCK
2010-04-26 21:27:56 +04:00
default n
---help---
Generic block IO controller cgroup interface. This is the common
cgroup interface which should be used by various IO controlling
policies.
Currently, CFQ IO scheduler uses it to recognize task groups and
control disk bandwidth allocation (proportional time slice allocation)
2010-09-16 01:06:35 +04:00
to such task groups. It is also used by bio throttling logic in
block layer to implement upper limit in IO rates on a device.
2010-04-26 21:27:56 +04:00
This option only enables generic Block IO controller infrastructure.
2010-09-16 01:06:35 +04:00
One needs to also enable actual IO controlling logic/policy. For
2011-01-17 00:43:10 +03:00
enabling proportional weight division of disk bandwidth in CFQ, set
CONFIG_CFQ_GROUP_IOSCHED=y; for enabling throttling policy, set
2011-01-17 03:08:41 +03:00
CONFIG_BLK_DEV_THROTTLING=y.
2010-04-26 21:27:56 +04:00
See Documentation/cgroups/blkio-controller.txt for more information.
config DEBUG_BLK_CGROUP
bool "Enable Block IO controller debugging"
depends on BLK_CGROUP
default n
---help---
Enable some debugging help. Currently it exports additional stat
files in a cgroup which can be useful for debugging.
2009-01-16 00:50:58 +03:00
endif # CGROUPS
2009-01-08 05:07:57 +03:00
2012-01-13 05:20:49 +04:00
config CHECKPOINT_RESTORE
bool "Checkpoint/restore support" if EXPERT
default n
help
Enables additional kernel features in a sake of checkpoint/restore.
In particular it adds auxiliary prctl codes to setup process text,
data and heap segment sizes, and a few additional /proc filesystem
entries.
If unsure, say N here.
2010-10-28 02:34:38 +04:00
menuconfig NAMESPACES
2011-01-21 01:44:16 +03:00
bool "Namespaces support" if EXPERT
default !EXPERT
2008-02-08 15:18:19 +03:00
help
Provides the way to make tasks work with different objects using
the same id. For example same IPC id may refer to different objects
or same user id or pid may refer to different tasks when used in
different namespaces.
2010-10-28 02:34:38 +04:00
if NAMESPACES
2008-02-08 15:18:21 +03:00
config UTS_NS
bool "UTS namespace"
2010-10-28 02:34:37 +04:00
default y
2008-02-08 15:18:21 +03:00
help
In this namespace tasks see different info provided with the
uname() system call
2008-02-08 15:18:22 +03:00
config IPC_NS
bool "IPC namespace"
2010-10-28 02:34:38 +04:00
depends on (SYSVIPC || POSIX_MQUEUE)
2010-10-28 02:34:37 +04:00
default y
2008-02-08 15:18:22 +03:00
help
In this namespace tasks work with IPC ids which correspond to
2009-04-07 06:01:08 +04:00
different IPC objects in different namespaces.
2008-02-08 15:18:22 +03:00
2008-02-08 15:18:23 +03:00
config USER_NS
2012-10-02 22:19:29 +04:00
bool "User namespace"
2011-11-17 22:23:55 +04:00
default n
2008-02-08 15:18:23 +03:00
help
This allows containers, i.e. vservers, to use user namespaces
to provide different user info for different servers.
2013-01-26 04:48:31 +04:00
When user namespaces are enabled in the kernel it is
recommended that the MEMCG and MEMCG_KMEM options also be
enabled and that user-space use the memory control groups to
limit the amount of memory a memory unprivileged users can
use.
2008-02-08 15:18:23 +03:00
If unsure, say N.
2008-02-08 15:18:24 +03:00
config PID_NS
2010-10-28 02:34:37 +04:00
bool "PID Namespaces"
2010-10-28 02:34:37 +04:00
default y
2008-02-08 15:18:24 +03:00
help
2008-07-06 16:48:02 +04:00
Support process id namespaces. This allows having multiple
2009-01-26 13:12:25 +03:00
processes with the same pid as long as they are in different
2008-02-08 15:18:24 +03:00
pid namespaces. This is a building block of containers.
2009-01-26 23:25:55 +03:00
config NET_NS
bool "Network namespace"
2010-10-28 02:34:38 +04:00
depends on NET
2010-10-28 02:34:37 +04:00
default y
2009-01-26 23:25:55 +03:00
help
Allow user space to create what appear to be multiple instances
of the network stack.
2010-10-28 02:34:38 +04:00
endif # NAMESPACES
sched: Add 'autogroup' scheduling feature: automated per session task groups
A recurring complaint from CFS users is that parallel kbuild has
a negative impact on desktop interactivity. This patch
implements an idea from Linus, to automatically create task
groups. Currently, only per session autogroups are implemented,
but the patch leaves the way open for enhancement.
Implementation: each task's signal struct contains an inherited
pointer to a refcounted autogroup struct containing a task group
pointer, the default for all tasks pointing to the
init_task_group. When a task calls setsid(), a new task group
is created, the process is moved into the new task group, and a
reference to the preveious task group is dropped. Child
processes inherit this task group thereafter, and increase it's
refcount. When the last thread of a process exits, the
process's reference is dropped, such that when the last process
referencing an autogroup exits, the autogroup is destroyed.
At runqueue selection time, IFF a task has no cgroup assignment,
its current autogroup is used.
Autogroup bandwidth is controllable via setting it's nice level
through the proc filesystem:
cat /proc/<pid>/autogroup
Displays the task's group and the group's nice level.
echo <nice level> > /proc/<pid>/autogroup
Sets the task group's shares to the weight of nice <level> task.
Setting nice level is rate limited for !admin users due to the
abuse risk of task group locking.
The feature is enabled from boot by default if
CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
the boot option noautogroup, and can also be turned on/off on
the fly via:
echo [01] > /proc/sys/kernel/sched_autogroup_enabled
... which will automatically move tasks to/from the root task group.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Paul Turner <pjt@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
[ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-30 16:18:03 +03:00
config SCHED_AUTOGROUP
bool "Automatic process group scheduling"
select CGROUPS
select CGROUP_SCHED
select FAIR_GROUP_SCHED
help
This option optimizes the scheduler for common desktop workloads by
automatically creating and populating task groups. This separation
of workloads isolates aggressive CPU burners (like build jobs) from
desktop applications. Task group autogeneration is currently based
upon task session.
2010-10-28 02:34:41 +04:00
config SYSFS_DEPRECATED
2011-01-10 21:04:22 +03:00
bool "Enable deprecated sysfs features to support old userspace tools"
2010-10-28 02:34:41 +04:00
depends on SYSFS
default n
help
This option adds code that switches the layout of the "block" class
devices, to not show up in /sys/class/block/, but only in
/sys/block/.
This switch is only active when the sysfs.deprecated=1 boot option is
passed or the SYSFS_DEPRECATED_V2 option is set.
This option allows new kernels to run on old distributions and tools,
which might get confused by /sys/class/block/. Since 2007/2008 all
major distributions and tools handle this just fine.
Recent distributions and userspace tools after 2009/2010 depend on
the existence of /sys/class/block/, and will not work with this
option enabled.
Only if you are using a new kernel on an old distribution, you might
need to say Y here.
config SYSFS_DEPRECATED_V2
2011-01-10 21:04:22 +03:00
bool "Enable deprecated sysfs features by default"
2010-10-28 02:34:41 +04:00
default n
depends on SYSFS
depends on SYSFS_DEPRECATED
help
Enable deprecated sysfs by default.
See the CONFIG_SYSFS_DEPRECATED option for more details about this
option.
Only if you are using a new kernel on an old distribution, you might
need to say Y here. Even then, odds are you would not need it
enabled, you can always pass the boot option if absolutely necessary.
config RELAY
bool "Kernel->user space relay support (formerly relayfs)"
help
This option enables support for relay interface support in
certain file systems (such as debugfs).
It is designed to provide an efficient mechanism for tools and
facilities to relay large amounts of data from kernel space to
user space.
If unsure, say N.
2007-03-06 12:42:17 +03:00
config BLK_DEV_INITRD
bool "Initial RAM filesystem and RAM disk (initramfs/initrd) support"
depends on BROKEN || !FRV
help
The initial RAM filesystem is a ramfs which is loaded by the
boot loader (loadlin or lilo) and that is mounted as root
before the normal boot procedure. It is typically used to
load modules needed to mount the "real" root file system,
etc. See <file:Documentation/initrd.txt> for details.
If RAM disk support (BLK_DEV_RAM) is also included, this
also enables initial RAM disk (initrd) support and adds
15 Kbytes (more on some other architectures) to the kernel size.
If unsure say Y.
2007-02-10 12:44:43 +03:00
if BLK_DEV_INITRD
2005-08-10 22:44:50 +04:00
source "usr/Kconfig"
2007-02-10 12:44:43 +03:00
endif
Move size optimization option outside of EMBEDDED menu, mark it EXPERIMENTAL
Also, disable on sparc64 - a number of people report breakage. Probably
a compiler bug, but it's quite possible that it tickles some latent
kernel problem too.
It still defaults to 'y' everywhere else (when enabled through
EXPERIMENTAL), and Dave Jones points out that Fedora (and RHEL4) has
been building with size optimizations for a long time on x86, x86-64,
ia64, s390, s390x, ppc32 and ppc64. So it is really only moderately
experimental, but the sparc64 breakage certainly shows that it can
trigger "issues".
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-15 05:52:21 +03:00
config CC_OPTIMIZE_FOR_SIZE
2008-04-28 03:39:43 +04:00
bool "Optimize for size"
Move size optimization option outside of EMBEDDED menu, mark it EXPERIMENTAL
Also, disable on sparc64 - a number of people report breakage. Probably
a compiler bug, but it's quite possible that it tickles some latent
kernel problem too.
It still defaults to 'y' everywhere else (when enabled through
EXPERIMENTAL), and Dave Jones points out that Fedora (and RHEL4) has
been building with size optimizations for a long time on x86, x86-64,
ia64, s390, s390x, ppc32 and ppc64. So it is really only moderately
experimental, but the sparc64 breakage certainly shows that it can
trigger "issues".
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-15 05:52:21 +03:00
help
Enabling this option will pass "-Os" instead of "-O2" to gcc
resulting in a smaller kernel.
2012-11-02 15:41:01 +04:00
If unsure, say N.
Move size optimization option outside of EMBEDDED menu, mark it EXPERIMENTAL
Also, disable on sparc64 - a number of people report breakage. Probably
a compiler bug, but it's quite possible that it tickles some latent
kernel problem too.
It still defaults to 'y' everywhere else (when enabled through
EXPERIMENTAL), and Dave Jones points out that Fedora (and RHEL4) has
been building with size optimizations for a long time on x86, x86-64,
ia64, s390, s390x, ppc32 and ppc64. So it is really only moderately
experimental, but the sparc64 breakage certainly shows that it can
trigger "issues".
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-15 05:52:21 +03:00
2006-10-01 10:28:13 +04:00
config SYSCTL
bool
2009-03-10 22:55:46 +03:00
config ANON_INODES
bool
2013-05-01 02:28:45 +04:00
config HAVE_UID16
bool
config SYSCTL_EXCEPTION_TRACE
bool
help
Enable support for /proc/sys/debug/exception-trace.
config SYSCTL_ARCH_UNALIGN_NO_WARN
bool
help
Enable support for /proc/sys/kernel/ignore-unaligned-usertrap
Allows arch to define/use @no_unaligned_warning to possibly warn
about unaligned access emulation going on under the hood.
config SYSCTL_ARCH_UNALIGN_ALLOW
bool
help
Enable support for /proc/sys/kernel/unaligned-trap
Allows arches to define/use @unaligned_enabled to runtime toggle
the unaligned access emulation.
see arch/parisc/kernel/unaligned.c for reference
config HAVE_PCSPKR_PLATFORM
bool
2014-10-24 05:41:08 +04:00
# interpreter that classic socket filters depend on
config BPF
bool
2011-01-21 01:44:16 +03:00
menuconfig EXPERT
bool "Configure standard kernel features (expert users)"
2011-06-06 05:23:58 +04:00
# Unhide debug options, to make the on-by-default options visible
select DEBUG_KERNEL
2005-04-17 02:20:36 +04:00
help
This option allows certain base kernel options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" kernel.
Only use this if you really know what you are doing.
2006-09-16 23:15:53 +04:00
config UID16
2011-01-21 01:44:16 +03:00
bool "Enable 16-bit UID system calls" if EXPERT
2012-10-09 03:28:08 +04:00
depends on HAVE_UID16
2006-09-16 23:15:53 +04:00
default y
help
This enables the legacy 16-bit UID syscall wrappers.
2014-06-05 03:11:12 +04:00
config SGETMASK_SYSCALL
bool "sgetmask/ssetmask syscalls support" if EXPERT
def_bool PARISC || MN10300 || BLACKFIN || M68K || PPC || MIPS || X86 || SPARC || CRIS || MICROBLAZE || SUPERH
---help---
sys_sgetmask and sys_ssetmask are obsolete system calls
no longer supported in libc but still enabled by default in some
architectures.
If unsure, leave the default option here.
2014-04-04 01:48:25 +04:00
config SYSFS_SYSCALL
bool "Sysfs syscall support" if EXPERT
default y
---help---
sys_sysfs is an obsolete system call no longer supported in libc.
Note that disabling this option is more secure but might break
compatibility with some systems.
If unsure say Y here.
2006-09-27 12:51:04 +04:00
config SYSCTL_SYSCALL
2011-01-21 01:44:16 +03:00
bool "Sysctl syscall support" if EXPERT
2009-11-05 16:26:41 +03:00
depends on PROC_SYSCTL
2011-11-03 00:39:25 +04:00
default n
2006-09-27 12:51:04 +04:00
select SYSCTL
2006-09-16 23:15:53 +04:00
---help---
2006-11-09 04:44:51 +03:00
sys_sysctl uses binary paths that have been found challenging
to properly maintain and use. The interface in /proc/sys
using paths with ascii names is now the primary path to this
information.
2006-09-27 12:51:04 +04:00
2006-11-09 04:44:51 +03:00
Almost nothing using the binary sysctl interface so if you are
trying to save some space it is probably safe to disable this,
making your kernel marginally smaller.
2006-09-27 12:51:04 +04:00
2011-11-03 00:39:25 +04:00
If unsure say N here.
2006-09-16 23:15:53 +04:00
2005-04-17 02:20:36 +04:00
config KALLSYMS
2011-01-21 01:44:16 +03:00
bool "Load all symbols for debugging/ksymoops" if EXPERT
2005-04-17 02:20:36 +04:00
default y
help
Say Y here to let the kernel print out symbolic crash information and
symbolic stack backtraces. This increases the size of the kernel
somewhat, as all symbols have to be loaded into the kernel image.
config KALLSYMS_ALL
bool "Include all symbols in kallsyms"
depends on DEBUG_KERNEL && KALLSYMS
help
2011-04-05 14:24:57 +04:00
Normally kallsyms only contains the symbols of functions for nicer
OOPS messages and backtraces (i.e., symbols from the text and inittext
sections). This is sufficient for most cases. And only in very rare
cases (e.g., when a debugger is used) all symbols are required (e.g.,
names of variables from the data sections, etc).
This option makes sure that all symbols are loaded into the kernel
image (i.e., symbols from all sections) in cost of increased kernel
size (depending on the kernel configuration, it may be 300KiB or
something like this).
Say N unless you really need all symbols.
2005-05-01 19:59:02 +04:00
config PRINTK
default y
2011-01-21 01:44:16 +03:00
bool "Enable support for printk" if EXPERT
2012-10-12 20:00:23 +04:00
select IRQ_WORK
2005-05-01 19:59:02 +04:00
help
This option enables normal printk support. Removing it
eliminates most of the message strings from the kernel image
and makes the kernel more or less silent. As this makes it
very difficult to diagnose system problems, saying N here is
strongly discouraged.
2005-05-01 19:59:01 +04:00
config BUG
2011-01-21 01:44:16 +03:00
bool "BUG() support" if EXPERT
2005-05-01 19:59:01 +04:00
default y
help
Disabling this option eliminates support for BUG and WARN, reducing
the size of your kernel image and potentially quietly ignoring
numerous fatal conditions. You should only consider disabling this
option for embedded systems with no facilities for reporting errors.
Just say Y.
2006-01-08 12:05:25 +03:00
config ELF_CORE
2012-10-05 04:15:23 +04:00
depends on COREDUMP
2006-01-08 12:05:25 +03:00
default y
2011-01-21 01:44:16 +03:00
bool "Enable ELF core dumps" if EXPERT
2006-01-08 12:05:25 +03:00
help
Enable support for generating core dumps. Disabling saves about 4k.
2011-06-01 22:05:09 +04:00
2008-05-07 14:39:56 +04:00
config PCSPKR_PLATFORM
2011-01-21 01:44:16 +03:00
bool "Enable PC-Speaker support" if EXPERT
2011-06-01 22:05:09 +04:00
depends on HAVE_PCSPKR_PLATFORM
2011-06-01 22:04:59 +04:00
select I8253_LOCK
2008-05-07 14:39:56 +04:00
default y
help
This option allows to disable the internal PC-Speaker
support, saving some memory.
2005-04-17 02:20:36 +04:00
config BASE_FULL
default y
2011-01-21 01:44:16 +03:00
bool "Enable full-sized data structures for core" if EXPERT
2005-04-17 02:20:36 +04:00
help
Disabling this option reduces the size of miscellaneous core
kernel data structures. This saves memory on small machines,
but may reduce performance.
config FUTEX
2011-01-21 01:44:16 +03:00
bool "Enable futex support" if EXPERT
2005-04-17 02:20:36 +04:00
default y
2006-06-27 13:54:53 +04:00
select RT_MUTEXES
2005-04-17 02:20:36 +04:00
help
Disabling this option will cause the kernel to be built without
support for "fast userspace mutexes". The resulting kernel may not
run glibc-based applications correctly.
2014-03-02 16:09:47 +04:00
config HAVE_FUTEX_CMPXCHG
bool
2014-10-04 03:19:24 +04:00
depends on FUTEX
2014-03-02 16:09:47 +04:00
help
Architectures should select this if futex_atomic_cmpxchg_inatomic()
is implemented and always working. This removes a couple of runtime
checks.
2005-04-17 02:20:36 +04:00
config EPOLL
2011-01-21 01:44:16 +03:00
bool "Enable eventpoll support" if EXPERT
2005-04-17 02:20:36 +04:00
default y
2007-07-31 11:39:10 +04:00
select ANON_INODES
2005-04-17 02:20:36 +04:00
help
Disabling this option will cause the kernel to be built without
support for epoll family of system calls.
signal/timer/event: signalfd core
This patch series implements the new signalfd() system call.
I took part of the original Linus code (and you know how badly it can be
broken :), and I added even more breakage ;) Signals are fetched from the same
signal queue used by the process, so signalfd will compete with standard
kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
seems to be working fine on my Dual Opteron machine. I made a quick test
program for it:
http://www.xmailserver.org/signafd-test.c
The signalfd() system call implements signal delivery into a file descriptor
receiver. The signalfd file descriptor if created with the following API:
int signalfd(int ufd, const sigset_t *mask, size_t masksize);
The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
signalfd file.
The "mask" allows to specify the signal mask of signals that we are interested
in. The "masksize" parameter is the size of "mask".
The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
will return POLLIN when signals are available to be dequeued. As a direct
consequence of supporting the Linux poll subsystem, the signalfd fd can use
used together with epoll(2) too.
The read(2) system call will return a "struct signalfd_siginfo" structure in
the userspace supplied buffer. The return value is the number of bytes copied
in the supplied buffer, or -1 in case of error. The read(2) call can also
return 0, in case the sighand structure to which the signalfd was attached,
has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
return -EAGAIN in case no signal is available.
If the size of the buffer passed to read(2) is lower than sizeof(struct
signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
return -ERESTARTSYS in case a signal hits the process. The format of the
struct signalfd_siginfo is, and the valid fields depends of the (->code &
__SI_MASK) value, in the same way a struct siginfo would:
struct signalfd_siginfo {
__u32 signo; /* si_signo */
__s32 err; /* si_errno */
__s32 code; /* si_code */
__u32 pid; /* si_pid */
__u32 uid; /* si_uid */
__s32 fd; /* si_fd */
__u32 tid; /* si_fd */
__u32 band; /* si_band */
__u32 overrun; /* si_overrun */
__u32 trapno; /* si_trapno */
__s32 status; /* si_status */
__s32 svint; /* si_int */
__u64 svptr; /* si_ptr */
__u64 utime; /* si_utime */
__u64 stime; /* si_stime */
__u64 addr; /* si_addr */
};
[akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 09:23:13 +04:00
config SIGNALFD
2011-01-21 01:44:16 +03:00
bool "Enable signalfd() system call" if EXPERT
2007-07-31 11:39:10 +04:00
select ANON_INODES
signal/timer/event: signalfd core
This patch series implements the new signalfd() system call.
I took part of the original Linus code (and you know how badly it can be
broken :), and I added even more breakage ;) Signals are fetched from the same
signal queue used by the process, so signalfd will compete with standard
kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
seems to be working fine on my Dual Opteron machine. I made a quick test
program for it:
http://www.xmailserver.org/signafd-test.c
The signalfd() system call implements signal delivery into a file descriptor
receiver. The signalfd file descriptor if created with the following API:
int signalfd(int ufd, const sigset_t *mask, size_t masksize);
The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
signalfd file.
The "mask" allows to specify the signal mask of signals that we are interested
in. The "masksize" parameter is the size of "mask".
The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
will return POLLIN when signals are available to be dequeued. As a direct
consequence of supporting the Linux poll subsystem, the signalfd fd can use
used together with epoll(2) too.
The read(2) system call will return a "struct signalfd_siginfo" structure in
the userspace supplied buffer. The return value is the number of bytes copied
in the supplied buffer, or -1 in case of error. The read(2) call can also
return 0, in case the sighand structure to which the signalfd was attached,
has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
return -EAGAIN in case no signal is available.
If the size of the buffer passed to read(2) is lower than sizeof(struct
signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
return -ERESTARTSYS in case a signal hits the process. The format of the
struct signalfd_siginfo is, and the valid fields depends of the (->code &
__SI_MASK) value, in the same way a struct siginfo would:
struct signalfd_siginfo {
__u32 signo; /* si_signo */
__s32 err; /* si_errno */
__s32 code; /* si_code */
__u32 pid; /* si_pid */
__u32 uid; /* si_uid */
__s32 fd; /* si_fd */
__u32 tid; /* si_fd */
__u32 band; /* si_band */
__u32 overrun; /* si_overrun */
__u32 trapno; /* si_trapno */
__s32 status; /* si_status */
__s32 svint; /* si_int */
__u64 svptr; /* si_ptr */
__u64 utime; /* si_utime */
__u64 stime; /* si_stime */
__u64 addr; /* si_addr */
};
[akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 09:23:13 +04:00
default y
help
Enable the signalfd() system call that allows to receive signals
on a file descriptor.
If unsure, say Y.
signal/timer/event: timerfd core
This patch introduces a new system call for timers events delivered though
file descriptors. This allows timer event to be used with standard POSIX
poll(2), select(2) and read(2). As a consequence of supporting the Linux
f_op->poll subsystem, they can be used with epoll(2) too.
The system call is defined as:
int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);
The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
w/out going through the close/open cycle (same as signalfd). If "ufd" is -1,
s new file descriptor will be created, otherwise the existing "ufd" will be
re-programmed.
The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time
specified in the "utmr->it_value" parameter is the expiry time for the timer.
If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
otherwise it's a relative time.
If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
tv_nsec == 0), this is the period at which the following ticks should be
generated.
The "utmr->it_interval" should be set to zero if only one tick is requested.
Setting the "utmr->it_value" to zero will disable the timer, or will create a
timerfd without the timer enabled.
The function returns the new (or same, in case "ufd" is a valid timerfd
descriptor) file, or -1 in case of error.
As stated before, the timerfd file descriptor supports poll(2), select(2) and
epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be
returned.
The read(2) call can be used, and it will return a u32 variable holding the
number of "ticks" that happened on the interface since the last call to
read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
be returned if no ticks happened.
A quick test program, shows timerfd working correctly on my amd64 box:
http://www.xmailserver.org/timerfd-test.c
[akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 09:23:16 +04:00
config TIMERFD
2011-01-21 01:44:16 +03:00
bool "Enable timerfd() system call" if EXPERT
2007-07-31 11:39:10 +04:00
select ANON_INODES
signal/timer/event: timerfd core
This patch introduces a new system call for timers events delivered though
file descriptors. This allows timer event to be used with standard POSIX
poll(2), select(2) and read(2). As a consequence of supporting the Linux
f_op->poll subsystem, they can be used with epoll(2) too.
The system call is defined as:
int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);
The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
w/out going through the close/open cycle (same as signalfd). If "ufd" is -1,
s new file descriptor will be created, otherwise the existing "ufd" will be
re-programmed.
The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time
specified in the "utmr->it_value" parameter is the expiry time for the timer.
If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
otherwise it's a relative time.
If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
tv_nsec == 0), this is the period at which the following ticks should be
generated.
The "utmr->it_interval" should be set to zero if only one tick is requested.
Setting the "utmr->it_value" to zero will disable the timer, or will create a
timerfd without the timer enabled.
The function returns the new (or same, in case "ufd" is a valid timerfd
descriptor) file, or -1 in case of error.
As stated before, the timerfd file descriptor supports poll(2), select(2) and
epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be
returned.
The read(2) call can be used, and it will return a u32 variable holding the
number of "ticks" that happened on the interface since the last call to
read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
be returned if no ticks happened.
A quick test program, shows timerfd working correctly on my amd64 box:
http://www.xmailserver.org/timerfd-test.c
[akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 09:23:16 +04:00
default y
help
Enable the timerfd() system call that allows to receive timer
events on a file descriptor.
If unsure, say Y.
signal/timer/event: eventfd core
This is a very simple and light file descriptor, that can be used as event
wait/dispatch by userspace (both wait and dispatch) and by the kernel
(dispatch only). It can be used instead of pipe(2) in all cases where those
would simply be used to signal events. Their kernel overhead is much lower
than pipes, and they do not consume two fds. When used in the kernel, it can
offer an fd-bridge to enable, for example, functionalities like KAIO or
syslets/threadlets to signal to an fd the completion of certain operations.
But more in general, an eventfd can be used by the kernel to signal readiness,
in a POSIX poll/select way, of interfaces that would otherwise be incompatible
with it. The API is:
int eventfd(unsigned int count);
The eventfd API accepts an initial "count" parameter, and returns an eventfd
fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).
The POLLIN flag is raised when the internal counter is greater than zero.
The POLLOUT flag is raised when at least a value of "1" can be written to the
internal counter.
The POLLERR flag is raised when an overflow in the counter value is detected.
The write(2) operation can never overflow the counter, since it blocks (unless
O_NONBLOCK is set, in which case -EAGAIN is returned).
But the eventfd_signal() function can do it, since it's supposed to not sleep
during its operation.
The read(2) function reads the __u64 counter value, and reset the internal
value to zero. If the value read is equal to (__u64) -1, an overflow happened
on the internal counter (due to 2^64 eventfd_signal() posts that has never
been retired - unlickely, but possible).
The write(2) call writes an __u64 count value, and adds it to the current
counter. The eventfd fd supports O_NONBLOCK also.
On the kernel side, we have:
struct file *eventfd_fget(int fd);
int eventfd_signal(struct file *file, unsigned int n);
The eventfd_fget() should be called to get a struct file* from an eventfd fd
(this is an fget() + check of f_op being an eventfd fops pointer).
The kernel can then call eventfd_signal() every time it wants to post an event
to userspace. The eventfd_signal() function can be called from any context.
An eventfd() simple test and bench is available here:
http://www.xmailserver.org/eventfd-bench.c
This is the eventfd-based version of pipetest-4 (pipe(2) based):
http://www.xmailserver.org/pipetest-4.c
Not that performance matters much in the eventfd case, but eventfd-bench
shows almost as double as performance than pipetest-4.
[akpm@linux-foundation.org: fix i386 build]
[akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 09:23:19 +04:00
config EVENTFD
2011-01-21 01:44:16 +03:00
bool "Enable eventfd() system call" if EXPERT
2007-07-31 11:39:10 +04:00
select ANON_INODES
signal/timer/event: eventfd core
This is a very simple and light file descriptor, that can be used as event
wait/dispatch by userspace (both wait and dispatch) and by the kernel
(dispatch only). It can be used instead of pipe(2) in all cases where those
would simply be used to signal events. Their kernel overhead is much lower
than pipes, and they do not consume two fds. When used in the kernel, it can
offer an fd-bridge to enable, for example, functionalities like KAIO or
syslets/threadlets to signal to an fd the completion of certain operations.
But more in general, an eventfd can be used by the kernel to signal readiness,
in a POSIX poll/select way, of interfaces that would otherwise be incompatible
with it. The API is:
int eventfd(unsigned int count);
The eventfd API accepts an initial "count" parameter, and returns an eventfd
fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).
The POLLIN flag is raised when the internal counter is greater than zero.
The POLLOUT flag is raised when at least a value of "1" can be written to the
internal counter.
The POLLERR flag is raised when an overflow in the counter value is detected.
The write(2) operation can never overflow the counter, since it blocks (unless
O_NONBLOCK is set, in which case -EAGAIN is returned).
But the eventfd_signal() function can do it, since it's supposed to not sleep
during its operation.
The read(2) function reads the __u64 counter value, and reset the internal
value to zero. If the value read is equal to (__u64) -1, an overflow happened
on the internal counter (due to 2^64 eventfd_signal() posts that has never
been retired - unlickely, but possible).
The write(2) call writes an __u64 count value, and adds it to the current
counter. The eventfd fd supports O_NONBLOCK also.
On the kernel side, we have:
struct file *eventfd_fget(int fd);
int eventfd_signal(struct file *file, unsigned int n);
The eventfd_fget() should be called to get a struct file* from an eventfd fd
(this is an fget() + check of f_op being an eventfd fops pointer).
The kernel can then call eventfd_signal() every time it wants to post an event
to userspace. The eventfd_signal() function can be called from any context.
An eventfd() simple test and bench is available here:
http://www.xmailserver.org/eventfd-bench.c
This is the eventfd-based version of pipetest-4 (pipe(2) based):
http://www.xmailserver.org/pipetest-4.c
Not that performance matters much in the eventfd case, but eventfd-bench
shows almost as double as performance than pipetest-4.
[akpm@linux-foundation.org: fix i386 build]
[akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 09:23:19 +04:00
default y
help
Enable the eventfd() system call that allows to receive both
kernel notification (ie. KAIO) or userspace notifications.
If unsure, say Y.
2014-10-24 05:41:08 +04:00
# syscall, maps, verifier
config BPF_SYSCALL
bool "Enable bpf() system call" if EXPERT
select ANON_INODES
select BPF
default n
help
Enable the bpf() system call that allows to manipulate eBPF
programs and maps via file descriptors.
2005-04-17 02:20:36 +04:00
config SHMEM
2011-01-21 01:44:16 +03:00
bool "Use full shmem filesystem" if EXPERT
2005-04-17 02:20:36 +04:00
default y
depends on MMU
help
The shmem is an internal filesystem used to manage shared memory.
It is backed by swap and manages resource limits. It is also exported
to userspace as tmpfs if TMPFS is enabled. Disabling this
option replaces shmem and tmpfs with the much simpler ramfs code,
which may be appropriate on small systems without swap.
2008-10-16 09:05:12 +04:00
config AIO
2011-01-21 01:44:16 +03:00
bool "Enable AIO support" if EXPERT
2008-10-16 09:05:12 +04:00
default y
help
This option enables POSIX asynchronous I/O which may by used
2013-05-01 02:28:45 +04:00
by some high performance threaded applications. Disabling
this option saves about 7k.
2014-08-18 04:41:09 +04:00
config ADVISE_SYSCALLS
bool "Enable madvise/fadvise syscalls" if EXPERT
default y
help
This option enables the madvise and fadvise syscalls, used by
applications to advise the kernel about their future memory or file
usage, improving performance. If building an embedded system where no
applications use these syscalls, you can disable this option to save
space.
2013-05-01 02:28:45 +04:00
config PCI_QUIRKS
default y
bool "Enable PCI quirk workarounds" if EXPERT
depends on PCI
help
This enables workarounds for various PCI chipset
bugs/quirks. Disable this only if your target machine is
unaffected by PCI quirks.
2008-10-16 09:05:12 +04:00
2011-04-26 23:33:21 +04:00
config EMBEDDED
bool "Embedded system"
2014-04-08 02:39:09 +04:00
option allnoconfig_y
2011-04-26 23:33:21 +04:00
select EXPERT
help
This option should be enabled if compiling the kernel for
an embedded system so certain expert options are available
for configuration.
perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 14:02:48 +04:00
config HAVE_PERF_EVENTS
2008-12-04 22:12:29 +03:00
bool
2009-06-12 21:17:43 +04:00
help
See tools/perf/design.txt for details.
2008-12-04 22:12:29 +03:00
2009-09-21 18:08:49 +04:00
config PERF_USE_VMALLOC
bool
help
See tools/perf/design.txt for details
2009-09-21 14:20:38 +04:00
menu "Kernel Performance Events And Counters"
2008-12-04 22:12:29 +03:00
perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 14:02:48 +04:00
config PERF_EVENTS
2009-09-21 14:20:38 +04:00
bool "Kernel performance events and counters"
2012-04-05 20:24:44 +04:00
default y if PROFILING
perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 14:02:48 +04:00
depends on HAVE_PERF_EVENTS
2008-12-08 21:38:33 +03:00
select ANON_INODES
2010-10-14 10:01:34 +04:00
select IRQ_WORK
2008-12-04 22:12:29 +03:00
help
2009-09-21 14:20:38 +04:00
Enable kernel support for various performance events provided
by software and hardware.
2008-12-04 22:12:29 +03:00
2009-10-31 00:32:25 +03:00
Software events are supported either built-in or via the
2009-09-21 14:20:38 +04:00
use of generic tracepoints.
2008-12-04 22:12:29 +03:00
2009-09-21 14:20:38 +04:00
Most modern CPUs support performance events via performance
counter registers. These registers count the number of certain
2008-12-04 22:12:29 +03:00
types of hw events: such as instructions executed, cachemisses
suffered, or branches mis-predicted - without slowing down the
kernel or applications. These registers can also trigger interrupts
when a threshold number of events have passed - and can thus be
used to profile the code that runs on that CPU.
2009-09-21 14:20:38 +04:00
The Linux Performance Event subsystem provides an abstraction of
2009-10-31 00:32:25 +03:00
these software and hardware event capabilities, available via a
2009-09-21 14:20:38 +04:00
system call and used by the "perf" utility in tools/perf/. It
2008-12-04 22:12:29 +03:00
provides per task and per CPU counters, and it provides event
capabilities on top of those.
Say Y if unsure.
2009-09-21 18:08:49 +04:00
config DEBUG_PERF_USE_VMALLOC
default n
bool "Debug: use vmalloc to back perf mmap() buffers"
depends on PERF_EVENTS && DEBUG_KERNEL
select PERF_USE_VMALLOC
help
Use vmalloc memory to back perf mmap() buffers.
Mostly useful for debugging the vmalloc code on platforms
that don't require it.
Say N if unsure.
2008-12-04 22:12:29 +03:00
endmenu
2006-06-30 12:55:45 +04:00
config VM_EVENT_COUNTERS
default y
2011-01-21 01:44:16 +03:00
bool "Enable VM event counters for /proc/vmstat" if EXPERT
2006-06-30 12:55:45 +04:00
help
2006-12-22 12:06:10 +03:00
VM event counters are needed for event counts to be shown.
This option allows the disabling of the VM event counters
2011-01-21 01:44:16 +03:00
on EXPERT systems. /proc/vmstat will only show page counts
2006-12-22 12:06:10 +03:00
if VM event counters are disabled.
2006-06-30 12:55:45 +04:00
2007-05-09 13:32:44 +04:00
config SLUB_DEBUG
default y
2011-01-21 01:44:16 +03:00
bool "Enable SLUB debugging support" if EXPERT
2008-04-30 03:16:06 +04:00
depends on SLUB && SYSFS
2007-05-09 13:32:44 +04:00
help
SLUB has extensive debug support features. Disabling these can
result in significant savings in code size. This also disables
SLUB sysfs support. /sys/slab will not exist and there will be
no support for cache validation etc.
2009-03-10 22:55:46 +03:00
config COMPAT_BRK
bool "Disable heap randomization"
default y
help
Randomizing heap placement makes heap exploits harder, but it
also breaks ancient binaries (including anything libc5 based).
This option changes the bootup default to heap randomization
2009-01-26 13:12:25 +03:00
disabled, and can be overridden at runtime by setting
2009-03-10 22:55:46 +03:00
/proc/sys/kernel/randomize_va_space to 2.
On non-ancient distros (post-2000 ones) N is usually a safe choice.
2007-05-07 01:49:36 +04:00
choice
prompt "Choose SLAB allocator"
2007-07-17 15:03:32 +04:00
default SLUB
2007-05-07 01:49:36 +04:00
help
This option allows to select a slab allocator.
config SLAB
bool "SLAB"
help
The regular slab allocator that is established and known to work
2007-05-09 13:32:47 +04:00
well in all environments. It organizes cache hot objects in
2008-11-06 01:18:19 +03:00
per cpu and per node queues.
2007-05-07 01:49:36 +04:00
config SLUB
bool "SLUB (Unqueued Allocator)"
help
SLUB is a slab allocator that minimizes cache line usage
instead of managing queues of cached objects (SLAB approach).
Per cpu caching is realized using slabs of objects instead
of queues of objects. SLUB can use memory efficiently
2008-11-06 01:18:19 +03:00
and has enhanced diagnostics. SLUB is the default choice for
a slab allocator.
2007-05-07 01:49:36 +04:00
config SLOB
2011-01-21 01:44:16 +03:00
depends on EXPERT
2007-05-07 01:49:36 +04:00
bool "SLOB (Simple Allocator)"
help
2008-02-05 09:29:38 +03:00
SLOB replaces the stock allocator with a drastically simpler
allocator. SLOB is generally more space efficient but
does not perform as well on large systems.
2007-05-07 01:49:36 +04:00
endchoice
2013-06-19 09:05:52 +04:00
config SLUB_CPU_PARTIAL
default y
2013-07-17 18:54:59 +04:00
depends on SLUB && SMP
2013-06-19 09:05:52 +04:00
bool "SLUB per cpu partial cache"
help
Per cpu partial caches accellerate objects allocation and freeing
that is local to a processor at the price of more indeterminism
in the latency of the free. On overflow these caches will be cleared
which requires the taking of locks that may cause latency spikes.
Typically one would choose no for a realtime system.
2009-12-15 05:00:02 +03:00
config MMAP_ALLOW_UNINITIALIZED
bool "Allow mmapped anonymous memory to be uninitialized"
2011-01-21 01:44:16 +03:00
depends on EXPERT && !MMU
2009-12-15 05:00:02 +03:00
default n
help
Normally, and according to the Linux spec, anonymous memory obtained
from mmap() has it's contents cleared before it is passed to
userspace. Enabling this config option allows you to request that
mmap() skip that if it is given an MAP_UNINITIALIZED flag, thus
providing a huge performance boost. If this option is not enabled,
then the flag will be ignored.
This is taken advantage of by uClibc's malloc(), and also by
ELF-FDPIC binfmt's brk and stack allocator.
Because of the obvious security issues, this option should only be
enabled on embedded devices where you control what is run in
userspace. Since that isn't generally a problem on no-MMU systems,
it is normally safe to say Y here.
See Documentation/nommu-mmap.txt for more information.
2014-04-19 02:07:11 +04:00
config SYSTEM_TRUSTED_KEYRING
bool "Provide system-wide ring of trusted keys"
depends on KEYS
help
Provide a system keyring to which trusted keys can be added. Keys in
the keyring are considered to be trusted. Keys may be added at will
by the kernel from compiled-in data and from hardware key stores, but
userspace may only add extra keys if those keys can be verified by
keys already in the keyring.
Keys in this keyring are used by module signature checking.
2008-02-02 23:10:36 +03:00
config PROFILING
2010-02-26 17:01:23 +03:00
bool "Profiling support"
2008-02-02 23:10:36 +03:00
help
Say Y here to enable the extended profiling support mechanisms used
by profilers such as OProfile.
2008-07-23 16:15:22 +04:00
#
# Place an empty function call at each tracepoint site. Can be
# dynamically changed for a probe function.
#
tracing: Kernel Tracepoints
Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.
Taken from the documentation patch :
"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section). When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).
You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."
Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".
We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.
Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
tracepoint table. This will make sure not type mismatch happens due to
connexion of a probe with the wrong type to a tracepoint declared with
the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.
Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.
Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).
About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.
Quoting Hideo Aoki about Markers :
I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.
While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.
I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.
The parameter of hackbench: every 50 from 50 to 800
The number of CPUs of the server: 2, 4, and 8
Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.
Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.
* 2 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 4.811 | 4.872 | +0.061 | +1.27 |
100 | 9.854 | 10.309 | +0.454 | +4.61 |
150 | 15.602 | 15.040 | -0.562 | -3.6 |
200 | 20.489 | 20.380 | -0.109 | -0.53 |
250 | 25.798 | 25.652 | -0.146 | -0.56 |
300 | 31.260 | 30.797 | -0.463 | -1.48 |
350 | 36.121 | 35.770 | -0.351 | -0.97 |
400 | 42.288 | 42.102 | -0.186 | -0.44 |
450 | 47.778 | 47.253 | -0.526 | -1.1 |
500 | 51.953 | 52.278 | +0.325 | +0.63 |
550 | 58.401 | 57.700 | -0.701 | -1.2 |
600 | 63.334 | 63.222 | -0.112 | -0.18 |
650 | 68.816 | 68.511 | -0.306 | -0.44 |
700 | 74.667 | 74.088 | -0.579 | -0.78 |
750 | 78.612 | 79.582 | +0.970 | +1.23 |
800 | 85.431 | 85.263 | -0.168 | -0.2 |
--------------------------------------------------------------
* 4 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.586 | 2.584 | -0.003 | -0.1 |
100 | 5.254 | 5.283 | +0.030 | +0.56 |
150 | 8.012 | 8.074 | +0.061 | +0.76 |
200 | 11.172 | 11.000 | -0.172 | -1.54 |
250 | 13.917 | 14.036 | +0.119 | +0.86 |
300 | 16.905 | 16.543 | -0.362 | -2.14 |
350 | 19.901 | 20.036 | +0.135 | +0.68 |
400 | 22.908 | 23.094 | +0.186 | +0.81 |
450 | 26.273 | 26.101 | -0.172 | -0.66 |
500 | 29.554 | 29.092 | -0.461 | -1.56 |
550 | 32.377 | 32.274 | -0.103 | -0.32 |
600 | 35.855 | 35.322 | -0.533 | -1.49 |
650 | 39.192 | 38.388 | -0.804 | -2.05 |
700 | 41.744 | 41.719 | -0.025 | -0.06 |
750 | 45.016 | 44.496 | -0.520 | -1.16 |
800 | 48.212 | 47.603 | -0.609 | -1.26 |
--------------------------------------------------------------
* 8 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.094 | 2.072 | -0.022 | -1.07 |
100 | 4.162 | 4.273 | +0.111 | +2.66 |
150 | 6.485 | 6.540 | +0.055 | +0.84 |
200 | 8.556 | 8.478 | -0.078 | -0.91 |
250 | 10.458 | 10.258 | -0.200 | -1.91 |
300 | 12.425 | 12.750 | +0.325 | +2.62 |
350 | 14.807 | 14.839 | +0.032 | +0.22 |
400 | 16.801 | 16.959 | +0.158 | +0.94 |
450 | 19.478 | 19.009 | -0.470 | -2.41 |
500 | 21.296 | 21.504 | +0.208 | +0.98 |
550 | 23.842 | 23.979 | +0.137 | +0.57 |
600 | 26.309 | 26.111 | -0.198 | -0.75 |
650 | 28.705 | 28.446 | -0.259 | -0.9 |
700 | 31.233 | 31.394 | +0.161 | +0.52 |
750 | 34.064 | 33.720 | -0.344 | -1.01 |
800 | 36.320 | 36.114 | -0.206 | -0.57 |
--------------------------------------------------------------
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 20:16:16 +04:00
config TRACEPOINTS
2008-07-23 16:15:22 +04:00
bool
tracing: Kernel Tracepoints
Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.
Taken from the documentation patch :
"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section). When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).
You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."
Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".
We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.
Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
tracepoint table. This will make sure not type mismatch happens due to
connexion of a probe with the wrong type to a tracepoint declared with
the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.
Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.
Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).
About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.
Quoting Hideo Aoki about Markers :
I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.
While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.
I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.
The parameter of hackbench: every 50 from 50 to 800
The number of CPUs of the server: 2, 4, and 8
Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.
Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.
* 2 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 4.811 | 4.872 | +0.061 | +1.27 |
100 | 9.854 | 10.309 | +0.454 | +4.61 |
150 | 15.602 | 15.040 | -0.562 | -3.6 |
200 | 20.489 | 20.380 | -0.109 | -0.53 |
250 | 25.798 | 25.652 | -0.146 | -0.56 |
300 | 31.260 | 30.797 | -0.463 | -1.48 |
350 | 36.121 | 35.770 | -0.351 | -0.97 |
400 | 42.288 | 42.102 | -0.186 | -0.44 |
450 | 47.778 | 47.253 | -0.526 | -1.1 |
500 | 51.953 | 52.278 | +0.325 | +0.63 |
550 | 58.401 | 57.700 | -0.701 | -1.2 |
600 | 63.334 | 63.222 | -0.112 | -0.18 |
650 | 68.816 | 68.511 | -0.306 | -0.44 |
700 | 74.667 | 74.088 | -0.579 | -0.78 |
750 | 78.612 | 79.582 | +0.970 | +1.23 |
800 | 85.431 | 85.263 | -0.168 | -0.2 |
--------------------------------------------------------------
* 4 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.586 | 2.584 | -0.003 | -0.1 |
100 | 5.254 | 5.283 | +0.030 | +0.56 |
150 | 8.012 | 8.074 | +0.061 | +0.76 |
200 | 11.172 | 11.000 | -0.172 | -1.54 |
250 | 13.917 | 14.036 | +0.119 | +0.86 |
300 | 16.905 | 16.543 | -0.362 | -2.14 |
350 | 19.901 | 20.036 | +0.135 | +0.68 |
400 | 22.908 | 23.094 | +0.186 | +0.81 |
450 | 26.273 | 26.101 | -0.172 | -0.66 |
500 | 29.554 | 29.092 | -0.461 | -1.56 |
550 | 32.377 | 32.274 | -0.103 | -0.32 |
600 | 35.855 | 35.322 | -0.533 | -1.49 |
650 | 39.192 | 38.388 | -0.804 | -2.05 |
700 | 41.744 | 41.719 | -0.025 | -0.06 |
750 | 45.016 | 44.496 | -0.520 | -1.16 |
800 | 48.212 | 47.603 | -0.609 | -1.26 |
--------------------------------------------------------------
* 8 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.094 | 2.072 | -0.022 | -1.07 |
100 | 4.162 | 4.273 | +0.111 | +2.66 |
150 | 6.485 | 6.540 | +0.055 | +0.84 |
200 | 8.556 | 8.478 | -0.078 | -0.91 |
250 | 10.458 | 10.258 | -0.200 | -1.91 |
300 | 12.425 | 12.750 | +0.325 | +2.62 |
350 | 14.807 | 14.839 | +0.032 | +0.22 |
400 | 16.801 | 16.959 | +0.158 | +0.94 |
450 | 19.478 | 19.009 | -0.470 | -2.41 |
500 | 21.296 | 21.504 | +0.208 | +0.98 |
550 | 23.842 | 23.979 | +0.137 | +0.57 |
600 | 26.309 | 26.111 | -0.198 | -0.75 |
650 | 28.705 | 28.446 | -0.259 | -0.9 |
700 | 31.233 | 31.394 | +0.161 | +0.52 |
750 | 34.064 | 33.720 | -0.344 | -1.01 |
800 | 36.320 | 36.114 | -0.206 | -0.57 |
--------------------------------------------------------------
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 20:16:16 +04:00
2008-02-02 23:10:33 +03:00
source "arch/Kconfig"
2005-04-17 02:20:36 +04:00
endmenu # General setup
2008-06-29 14:18:46 +04:00
config HAVE_GENERIC_DMA_COHERENT
bool
default n
2008-01-03 00:04:48 +03:00
config SLABINFO
bool
depends on PROC_FS
2008-04-14 19:53:02 +04:00
depends on SLAB || SLUB_DEBUG
2008-01-03 00:04:48 +03:00
default y
2006-09-16 23:15:53 +04:00
config RT_MUTEXES
boolean
2005-04-17 02:20:36 +04:00
config BASE_SMALL
int
default 0 if BASE_FULL
default 1 if !BASE_FULL
2007-07-16 10:39:29 +04:00
menuconfig MODULES
2005-04-17 02:20:36 +04:00
bool "Enable loadable module support"
2013-08-11 18:07:50 +04:00
option modules
2005-04-17 02:20:36 +04:00
help
Kernel modules are small pieces of compiled code which can
be inserted in the running kernel, rather than being
permanently built into the kernel. You use the "modprobe"
tool to add (and sometimes remove) them. If you say Y here,
many parts of the kernel can be built as modules (by
answering M instead of Y where indicated): this is most
useful for infrequently used options which are not required
for booting. For more information, see the man pages for
modprobe, lsmod, modinfo, insmod and rmmod.
If you say Y here, you will need to run "make
modules_install" to put the modules under /lib/modules/
where modprobe can find them (you may need to be root to do
this).
If unsure, say Y.
2008-08-04 21:31:32 +04:00
if MODULES
2008-05-05 04:04:16 +04:00
config MODULE_FORCE_LOAD
bool "Forced module loading"
default n
help
2008-05-09 10:25:28 +04:00
Allow loading of modules without version information (ie. modprobe
--force). Forced module loading sets the 'F' (forced) taint flag and
is usually a really bad idea.
2008-05-05 04:04:16 +04:00
2005-04-17 02:20:36 +04:00
config MODULE_UNLOAD
bool "Module unloading"
help
Without this option you will not be able to unload any
modules (note that some modules may not be unloadable
2008-07-23 04:24:26 +04:00
anyway), which makes your kernel smaller, faster
and simpler. If unsure, say Y.
2005-04-17 02:20:36 +04:00
config MODULE_FORCE_UNLOAD
bool "Forced module unloading"
2012-10-02 22:19:29 +04:00
depends on MODULE_UNLOAD
2005-04-17 02:20:36 +04:00
help
This option allows you to force a module to unload, even if the
kernel believes it is unsafe: the kernel will remove the module
without waiting for anyone to stop using it (using the -f option to
rmmod). This is mainly for kernel developers and desperate users.
If unsure, say N.
config MODVERSIONS
2005-12-27 01:04:02 +03:00
bool "Module versioning support"
2005-04-17 02:20:36 +04:00
help
Usually, you have to use modules compiled with your kernel.
Saying Y here makes it sometimes possible to use modules
compiled for different kernels, by adding enough information
to the modules to (hopefully) spot any changes which would
make them incompatible with the kernel you are running. If
unsure, say N.
config MODULE_SRCVERSION_ALL
bool "Source checksum for all modules"
help
Modules which contain a MODULE_VERSION get an extra "srcversion"
field inserted into their modinfo section, which contains a
sum of the source files which made it. This helps maintainers
see exactly which source was used to build a module (since
others sometimes change the module source without updating
the version). With this option, such a "srcversion" field
will be created for all modules. If unsure, say N.
2012-09-26 13:09:40 +04:00
config MODULE_SIG
bool "Module signature verification"
depends on MODULES
2013-08-30 19:07:30 +04:00
select SYSTEM_TRUSTED_KEYRING
2012-09-26 13:11:03 +04:00
select KEYS
select CRYPTO
select ASYMMETRIC_KEY_TYPE
select ASYMMETRIC_PUBLIC_KEY_SUBTYPE
select PUBLIC_KEY_ALGO_RSA
select ASN1
select OID_REGISTRY
select X509_CERTIFICATE_PARSER
2012-09-26 13:09:40 +04:00
help
Check modules for valid signatures upon load: the signature
is simply appended to the module. For more information see
Documentation/module-signing.txt.
2012-09-26 13:09:50 +04:00
!!!WARNING!!! If you enable this option, you MUST make sure that the
module DOES NOT get stripped after being signed. This includes the
debuginfo strip done by some packagers (such as rpmbuild) and
inclusion into an initramfs that wants the module size reduced.
2012-09-26 13:09:40 +04:00
config MODULE_SIG_FORCE
bool "Require modules to be validly signed"
depends on MODULE_SIG
help
Reject unsigned modules or signed modules for which we don't have a
key. Without this, such modules will simply taint the kernel.
2012-09-26 13:09:50 +04:00
2013-01-25 07:11:31 +04:00
config MODULE_SIG_ALL
bool "Automatically sign all modules"
default y
depends on MODULE_SIG
help
Sign all modules during make modules_install. Without this option,
modules must be signed manually, using the scripts/sign-file tool.
comment "Do not forget to sign required modules with scripts/sign-file"
depends on MODULE_SIG_FORCE && !MODULE_SIG_ALL
2012-09-26 13:09:50 +04:00
choice
prompt "Which hash algorithm should modules be signed with?"
depends on MODULE_SIG
help
This determines which sort of hashing algorithm will be used during
signature generation. This algorithm _must_ be built into the kernel
directly so that signature verification can take place. It is not
possible to load a signed module containing the algorithm to check
the signature on that module.
config MODULE_SIG_SHA1
bool "Sign modules with SHA-1"
select CRYPTO_SHA1
config MODULE_SIG_SHA224
bool "Sign modules with SHA-224"
select CRYPTO_SHA256
config MODULE_SIG_SHA256
bool "Sign modules with SHA-256"
select CRYPTO_SHA256
config MODULE_SIG_SHA384
bool "Sign modules with SHA-384"
select CRYPTO_SHA512
config MODULE_SIG_SHA512
bool "Sign modules with SHA-512"
select CRYPTO_SHA512
endchoice
2013-01-25 07:11:00 +04:00
config MODULE_SIG_HASH
string
depends on MODULE_SIG
default "sha1" if MODULE_SIG_SHA1
default "sha224" if MODULE_SIG_SHA224
default "sha256" if MODULE_SIG_SHA256
default "sha384" if MODULE_SIG_SHA384
default "sha512" if MODULE_SIG_SHA512
2014-08-27 15:01:56 +04:00
config MODULE_COMPRESS
bool "Compress modules on installation"
depends on MODULES
help
This option compresses the kernel modules when 'make
modules_install' is run.
The modules will be compressed either using gzip or xz depend on the
choice made in "Compression algorithm".
module-init-tools has support for gzip format while kmod handle gzip
and xz compressed modules.
When a kernel module is installed from outside of the main kernel
source and uses the Kbuild system for installing modules then that
kernel module will also be compressed when it is installed.
This option provides little benefit when the modules are to be used inside
an initrd or initramfs, it generally is more efficient to compress the whole
initrd or initramfs instead.
This is fully compatible with signed modules while the signed module is
compressed. module-init-tools or kmod handles decompression and provide to
other layer the uncompressed but signed payload.
choice
prompt "Compression algorithm"
depends on MODULE_COMPRESS
default MODULE_COMPRESS_GZIP
help
This determines which sort of compression will be used during
'make modules_install'.
GZIP (default) and XZ are supported.
config MODULE_COMPRESS_GZIP
bool "GZIP"
config MODULE_COMPRESS_XZ
bool "XZ"
endchoice
2008-08-04 21:31:32 +04:00
endif # MODULES
2008-12-13 13:49:41 +03:00
config INIT_ALL_POSSIBLE
bool
help
2012-03-29 09:08:31 +04:00
Back when each arch used to define their own cpu_online_mask and
cpu_possible_mask, some of them chose to initialize cpu_possible_mask
2008-12-13 13:49:41 +03:00
with all 1s, and others with all 0s. When they were centralised,
it was better to provide this option than to break all the archs
2009-01-26 13:12:25 +03:00
and have several arch maintainers pursuing me down dark alleys.
2008-12-13 13:49:41 +03:00
2005-04-17 02:20:36 +04:00
config STOP_MACHINE
bool
default y
depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU
help
Need stop_machine() primitive.
2005-11-04 10:43:35 +03:00
source "block/Kconfig"
2007-10-17 10:27:31 +04:00
config PREEMPT_NOTIFIERS
bool
2008-01-25 23:08:24 +03:00
2010-01-06 11:47:10 +03:00
config PADATA
depends on SMP
bool
2012-10-05 04:11:27 +04:00
# Can be selected by architectures with broken toolchains
# that get confused by correct const<->read_only section
# mappings
config BROKEN_RODATA
bool
2012-09-22 02:31:13 +04:00
config ASN1
tristate
help
Build a simple ASN.1 grammar compiler that produces a bytecode output
that can be interpreted by the ASN.1 stream decoder and used to
inform it as to what tags are to be expected in a stream and what
functions to call on what tags.
2009-11-09 18:21:34 +03:00
source "kernel/Kconfig.locks"