linux/Documentation
David Rientjes a63d83f427 oom: badness heuristic rewrite
This a complete rewrite of the oom killer's badness() heuristic which is
used to determine which task to kill in oom conditions.  The goal is to
make it as simple and predictable as possible so the results are better
understood and we end up killing the task which will lead to the most
memory freeing while still respecting the fine-tuning from userspace.

Instead of basing the heuristic on mm->total_vm for each task, the task's
rss and swap space is used instead.  This is a better indication of the
amount of memory that will be freeable if the oom killed task is chosen
and subsequently exits.  This helps specifically in cases where KDE or
GNOME is chosen for oom kill on desktop systems instead of a memory
hogging task.

The baseline for the heuristic is a proportion of memory that each task is
currently using in memory plus swap compared to the amount of "allowable"
memory.  "Allowable," in this sense, means the system-wide resources for
unconstrained oom conditions, the set of mempolicy nodes, the mems
attached to current's cpuset, or a memory controller's limit.  The
proportion is given on a scale of 0 (never kill) to 1000 (always kill),
roughly meaning that if a task has a badness() score of 500 that the task
consumes approximately 50% of allowable memory resident in RAM or in swap
space.

The proportion is always relative to the amount of "allowable" memory and
not the total amount of RAM systemwide so that mempolicies and cpusets may
operate in isolation; they shall not need to know the true size of the
machine on which they are running if they are bound to a specific set of
nodes or mems, respectively.

Root tasks are given 3% extra memory just like __vm_enough_memory()
provides in LSMs.  In the event of two tasks consuming similar amounts of
memory, it is generally better to save root's task.

Because of the change in the badness() heuristic's baseline, it is also
necessary to introduce a new user interface to tune it.  It's not possible
to redefine the meaning of /proc/pid/oom_adj with a new scale since the
ABI cannot be changed for backward compatability.  Instead, a new tunable,
/proc/pid/oom_score_adj, is added that ranges from -1000 to +1000.  It may
be used to polarize the heuristic such that certain tasks are never
considered for oom kill while others may always be considered.  The value
is added directly into the badness() score so a value of -500, for
example, means to discount 50% of its memory consumption in comparison to
other tasks either on the system, bound to the mempolicy, in the cpuset,
or sharing the same memory controller.

/proc/pid/oom_adj is changed so that its meaning is rescaled into the
units used by /proc/pid/oom_score_adj, and vice versa.  Changing one of
these per-task tunables will rescale the value of the other to an
equivalent meaning.  Although /proc/pid/oom_adj was originally defined as
a bitshift on the badness score, it now shares the same linear growth as
/proc/pid/oom_score_adj but with different granularity.  This is required
so the ABI is not broken with userspace applications and allows oom_adj to
be deprecated for future removal.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09 20:45:02 -07:00
..
ABI Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 2010-08-06 11:44:36 -07:00
accounting Documentation/: fix warnings from -Wmissing-prototypes in HOSTCFLAGS 2009-09-23 07:39:28 -07:00
acpi ACPI, APEI, EINJ injection parameters support 2010-05-19 22:42:08 -04:00
aoe Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
arm Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
auxdisplay includecheck fix: Documentation, cfag12864b-example.c 2009-09-24 07:20:57 -07:00
blackfin Documentation/: it's -> its where appropriate 2010-04-23 02:09:52 +02:00
block nick piggin: change email address 2010-08-05 08:45:20 -07:00
blockdev Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
cdrom Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
cgroups Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
connector Documentation/: it's -> its where appropriate 2010-04-23 02:09:52 +02:00
console doc: fix console doc typo 2010-02-24 13:51:32 +01:00
cpu-freq [CPUFREQ] Processor Clocking Control interface driver 2010-01-13 10:55:16 -05:00
cpuidle
cris
crypto async_tx: add support for asynchronous RAID6 recovery operations 2009-08-29 19:09:27 -07:00
development-process Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
device-mapper Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
DocBook Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 2010-08-06 11:36:30 -07:00
driver-model Fix spelling of 'platform' in comments and doc 2010-02-05 12:22:34 +01:00
dvb Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
early-userspace
fault-injection lkdtm: add debugfs access and loosen KPROBE ties 2010-03-06 11:26:32 -08:00
fb Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
filesystems oom: badness heuristic rewrite 2010-08-09 20:45:02 -07:00
firmware_class firmware: Update hotplug script 2010-08-05 13:53:34 -07:00
frv
hwmon Merge branch 'x86-hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-08-06 10:02:58 -07:00
i2c Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
i2o
ia64 Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
ide ide: preserve Host Protected Area by default (v2) 2009-06-07 13:52:52 +02:00
infiniband Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
input Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
ioctl Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 2010-08-07 17:09:24 -07:00
isdn Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
ja_JP Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
kbuild Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 2010-08-05 14:12:07 -07:00
kdump trivial: Miscellaneous documentation typo fixes 2009-06-12 18:01:47 +02:00
ko_KR Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
kvm KVM: Document KVM_GET_SUPPORTED_CPUID2 ioctl 2010-08-02 06:40:50 +03:00
laptops Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
lguest Documentation/: it's -> its where appropriate 2010-04-23 02:09:52 +02:00
m68k
make
mips ide: remove unused CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ 2009-01-14 19:19:03 +01:00
misc-devices Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
mn10300 trivial: Miscellaneous documentation typo fixes 2009-06-12 18:01:47 +02:00
mtd Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
namespaces
netlabel Documentation/: it's -> its where appropriate 2010-04-23 02:09:52 +02:00
networking DNS: Fixes for the DNS query module 2010-08-06 02:26:27 +00:00
parisc
PCI Documentation: pci.txt: fix typo 2010-07-11 22:17:45 +02:00
pcmcia pcmcia: do not use io_req_t when calling pcmcia_request_io() 2010-08-03 09:04:11 +02:00
power Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
powerpc Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2010-08-05 09:03:46 -07:00
pps LinuxPPS: core support 2009-06-18 13:04:04 -07:00
prctl
RCU Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
s390 Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
scheduler sched: Remove USER_SCHED from documentation 2010-04-02 20:12:01 +02:00
scsi Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
serial Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
sh sh: Kill off remaining CONFIG_SH_KGDB bits. 2008-12-22 18:44:05 +09:00
sound Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 2010-08-07 17:07:31 -07:00
sparc
spi spi/ep93xx: implemented driver for Cirrus EP93xx SPI controller 2010-05-25 00:23:16 -06:00
sysctl oom: enable oom tasklist dump by default 2010-08-09 20:44:56 -07:00
telephony Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
thermal thermal: add sanity check for the passive attribute 2009-11-05 18:18:10 -05:00
timers Documentation: Add timers/timers-howto.txt 2010-08-04 11:00:45 +02:00
trace vmscan: tracing: add a postprocessing script for reclaim-related ftrace events 2010-08-09 20:45:00 -07:00
uml Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
usb Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
video4linux Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
vm Documentation/vm: fix spelling in page-types.c 2010-08-05 13:21:23 -07:00
w1 Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
watchdog watchdog: docs: add an entry for imx2_wdt 2010-07-01 16:02:55 +00:00
wimax i2400m: documentation and instructions for usage 2009-01-07 10:00:18 -08:00
x86 x86, olpc: Add support for calling into OpenFirmware 2010-06-18 14:54:36 -07:00
zh_CN Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
.gitignore add random binaries to .gitignore 2010-04-08 11:34:34 +02:00
00-INDEX documentation: fix almost duplicate filenames (IO/io-mapping.txt) 2010-07-20 17:49:30 +00:00
apparmor.txt AppArmor: update Maintainer and Documentation 2010-08-02 15:35:15 +10:00
applying-patches.txt
atomic_ops.txt Documentation/: it's -> its where appropriate 2010-04-23 02:09:52 +02:00
bad_memory.txt
basic_profiling.txt
binfmt_misc.txt Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
braille-console.txt trivial: Miscellaneous documentation typo fixes 2009-06-12 18:01:47 +02:00
bt8xxgpio.txt
btmrvl.txt Bluetooth: Add documentation for Marvell Bluetooth driver 2009-08-22 14:25:32 -07:00
BUG-HUNTING
bus-virt-phys-mapping.txt documentation: fix almost duplicate filenames (IO/io-mapping.txt) 2010-07-20 17:49:30 +00:00
cachetlb.txt Documentation/: it's -> its where appropriate 2010-04-23 02:09:52 +02:00
Changes Documentation update broken web addresses 2010-07-11 21:55:42 +02:00
circular-buffers.txt Document Linux's circular buffering capabilities 2010-03-24 16:31:22 -07:00
coccinelle.txt Documentation: fix ubuntu distro name 2010-06-29 15:27:00 +02:00
CodingStyle trivial: fix typo milisecond/millisecond for documentation and source comments. 2009-06-12 18:01:46 +02:00
cpu-hotplug.txt cpumask: don't recommend set_cpus_allowed hack in Documentation/cpu-hotplug.txt 2009-12-17 11:43:29 +10:30
cpu-load.txt
cputopology.txt Documentation: ABI: /sys/devices/system/cpu/cpu#/ topology files 2009-10-30 14:59:52 -07:00
credentials.txt CRED: Fix __task_cred()'s lockdep check and banner comment 2010-07-29 15:16:18 -07:00
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt ieee1394: update URLs in debugging-via-ohci1394.txt 2009-10-03 09:28:11 +02:00
dell_rbu.txt trivial: Documentation/dell_rbu.txt: fix typos 2009-06-12 18:01:50 +02:00
devices.txt Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
DMA-API-HOWTO.txt Documentation/DMA-API-HOWTO: add ARCH_KMALLOC_MINALIGN description 2010-05-27 09:12:53 -07:00
DMA-API.txt Documentation: rename PCI-DMA-mapping.txt to DMA-API-HOWTO.txt 2010-03-12 15:52:43 -08:00
DMA-attributes.txt
DMA-ISA-LPC.txt
dmaengine.txt async_tx, dmaengine: document channel allocation and api rework 2009-01-05 18:10:19 -07:00
dontdiff Merge branch 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-kconfig 2010-02-25 14:43:57 -08:00
dynamic-debug-howto.txt Dynamic debug: allow simple quoting of words 2009-03-24 16:38:27 -07:00
edac.txt Documentation/edac.txt: Reflect the sysfs changes at the document 2010-05-10 11:49:30 -03:00
eisa.txt doc: fix Defaultd -> Defaults typo in EISA doc 2010-02-05 12:22:39 +01:00
email-clients.txt Documentation/email-clients.txt: update gmail information 2010-03-12 15:52:35 -08:00
feature-removal-schedule.txt Merge branch 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-08-06 13:18:29 -07:00
flexible-arrays.txt Update flex_arrays.txt 2009-10-15 07:25:20 -06:00
futex-requeue-pi.txt futex: add requeue-pi documentation 2009-05-09 07:12:50 +02:00
gcov.txt trivial: fix typo in CONFIG_DEBUG_FS in gcov doc 2009-09-21 15:14:56 +02:00
gpio.txt gpio: introduce gpio_request_one() and friends 2010-03-06 11:26:48 -08:00
highuid.txt
HOWTO Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
hw_random.txt
init.txt init/main.c: improve usability in case of init binary failure 2010-03-06 11:26:29 -08:00
initrd.txt
intel_txt.txt Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
Intel-IOMMU.txt intel-iommu: Kill DMAR_BROKEN_GFX_WA option. 2009-09-19 09:37:23 -07:00
io_ordering.txt
io-mapping.txt
iostats.txt
IPMI.txt ipmi: add parameter to limit CPU usage in kipmid 2010-03-12 15:52:39 -08:00
IRQ-affinity.txt
IRQ.txt
irqflags-tracing.txt
isapnp.txt
java.txt
kernel-doc-nano-HOWTO.txt documentation: update kernel-doc-nano-HOWTO information 2010-01-11 09:34:07 -08:00
kernel-docs.txt Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
kernel-parameters.txt Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 2010-08-07 17:07:31 -07:00
keys-request-key.txt
keys.txt KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] 2009-09-02 21:29:22 +10:00
kmemcheck.txt kmemcheck: update documentation 2009-07-01 22:36:22 +02:00
kmemleak.txt kmemleak: add clear command support 2009-09-08 16:36:08 +01:00
kobject.txt kobject: documentation: Update to refer to kset-example.c. 2010-03-19 07:12:20 -07:00
kprobes.txt Documentation: Mention that KProbes is supported on MIPS 2010-08-05 13:26:30 +01:00
kref.txt kref: double kref_put() in my_data_handler() 2009-09-18 09:48:52 -07:00
ldm.txt Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
leds-class.txt led: document sysfs interface 2009-08-28 15:21:12 -04:00
leds-lp3944.txt leds: LED driver for National Semiconductor LP3944 Funlight Chip 2009-06-23 20:21:38 +01:00
local_ops.txt trivial: Miscellaneous documentation typo fixes 2009-06-12 18:01:47 +02:00
lockdep-design.txt lockdep: Fix typos in documentation 2009-08-07 12:03:46 +02:00
lockstat.txt lockstat: Add usage info to Documentation/lockstat.txt 2009-12-06 13:20:02 +01:00
logo.gif Revert "linux.conf.au 2009: Tuz" 2009-04-27 12:00:27 -07:00
logo.txt Revert "linux.conf.au 2009: Tuz" 2009-04-27 12:00:27 -07:00
magic-number.txt documentation: update header file paths 2009-01-06 15:59:28 -08:00
Makefile Documentation/fs/: split txt and source files 2010-03-12 15:52:35 -08:00
ManagementStyle
mca.txt
md.txt Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
memory-barriers.txt Document Linux's circular buffering capabilities 2010-03-24 16:31:22 -07:00
memory-hotplug.txt mm: add numa node symlink for memory section in sysfs 2009-12-15 08:53:17 -08:00
memory.txt Documentation/memory.txt: remove some very outdated recommendations 2009-09-22 07:17:26 -07:00
mono.txt
mutex-design.txt Rename .text.lock to .text..lock. 2010-03-03 11:26:00 +01:00
nmi_watchdog.txt
nommu-mmap.txt nommu: fix malloc performance by adding uninitialized flag 2009-12-15 08:53:24 -08:00
numastat.txt mm: fix NUMA accounting in numastat.txt 2009-09-22 07:17:39 -07:00
oops-tracing.txt panic: Add taint flag TAINT_FIRMWARE_WORKAROUND ('I') 2010-05-19 08:37:43 +01:00
padata.txt padata: update API documentation 2010-07-31 19:53:07 +08:00
parport-lowlevel.txt
parport.txt
pi-futex.txt
pnp.txt doc: capitalization and other minor fixes in pnp doc 2010-02-05 12:22:44 +01:00
preempt-locking.txt
printk-formats.txt
prio_tree.txt
rbtree.txt rbtree: Add support for augmented rbtrees 2010-02-18 15:40:56 -08:00
rfkill.txt Document the rfkill sysfs ABI 2010-03-10 17:09:33 -05:00
robust-futex-ABI.txt futex: documentation: fix inconsistent description of futex list_op_pending 2009-06-18 13:03:56 -07:00
robust-futexes.txt
rt-mutex-design.txt variable name fix to Documentation/rt-mutex-design.txt 2010-06-05 17:39:09 +02:00
rt-mutex.txt
rtc.txt rtc: add boot_timesource sysfs attribute 2009-09-23 07:39:46 -07:00
SAK.txt
SecurityBugs
SELinux.txt
serial-console.txt
sgi-ioc4.txt
sgi-visws.txt
SM501.txt trivial: Miscellaneous documentation typo fixes 2009-06-12 18:01:47 +02:00
Smack.txt Documentation/: it's -> its where appropriate 2010-04-23 02:09:52 +02:00
sparse.txt update email address 2010-07-19 10:56:54 +02:00
spinlocks.txt Documentation: rw_lock lessons learned 2009-12-14 09:46:56 -08:00
stable_api_nonsense.txt
stable_kernel_rules.txt Documentation: -stable rules: upstream commit ID requirement reworded 2010-04-22 15:24:56 -07:00
SubmitChecklist Documentation: update SubmitChecklist for O=objdir and kconfig testing 2010-05-24 07:31:20 -07:00
SubmittingDrivers Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
SubmittingPatches docs: update patch size in SubmittingPatches 2009-10-01 16:11:12 -07:00
svga.txt
sysfs-rules.txt Fix typos in comments 2010-03-16 11:47:56 +01:00
sysrq.txt Input: Documentation/sysrq.txt - update KEY_SYSRQ info 2010-05-19 10:14:15 -07:00
tomoyo.txt TOMOYO: Update version to 2.3.0 2010-08-02 15:35:10 +10:00
unaligned-memory-access.txt
unicode.txt
unshare.txt
VGA-softcursor.txt
vgaarbiter.txt vgaarbiter: fix a typo in the vgaarbiter Documentation 2009-12-16 11:28:58 -08:00
video-output.txt
volatile-considered-harmful.txt Documentation/volatile-considered-harmful.txt: correct cpu_relax() documentation 2010-03-24 16:31:20 -07:00
zorro.txt