linux

iv/linux

Author	SHA1	Message	Date
Christoph Lameter	e94b176609	[PATCH] slab: remove SLAB_KERNEL SLAB_KERNEL is an alias of GFP_KERNEL. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:24 -08:00
Peter Zijlstra	a866374aec	[PATCH] mm: pagefault_{disable,enable}() Introduce pagefault_{disable,enable}() and use these where previously we did manual preempt increments/decrements to make the pagefault handler do the atomic thing. Currently they still rely on the increased preempt count, but do not rely on the disabled preemption, this might go away in the future. (NOTE: the extra barrier() in pagefault_disable might fix some holes on machines which have too many registers for their own good) [heiko.carstens@de.ibm.com: s390 fix] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <npiggin@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:21 -08:00
Ashwin Chaugule	7602bdf2fd	[PATCH] new scheme to preempt swap token The new swap token patches replace the current token traversal algo. The old algo had a crude timeout parameter that was used to handover the token from one task to another. This algo, transfers the token to the tasks that are in need of the token. The urgency for the token is based on the number of times a task is required to swap-in pages. Accordingly, the priority of a task is incremented if it has been badly affected due to swap-outs. To ensure that the token doesnt bounce around rapidly, the token holders are given a priority boost. The priority of tasks is also decremented, if their rate of swap-in's keeps reducing. This way, the condition to check whether to pre-empt the swap token, is a matter of comparing two task's priority fields. [akpm@osdl.org: cleanups] Signed-off-by: Ashwin Chaugule <ashwin.chaugule@celunite.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:21 -08:00
Joy Latten	161a09e737	audit: Add auditing to ipsec An audit message occurs when an ipsec SA or ipsec policy is created/deleted. Signed-off-by: Joy Latten <latten@austin.ibm.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 20:14:22 -08:00
Jan Beulich	b65780e123	[PATCH] unwinder: move .eh_frame to RODATA The .eh_frame section contents is never written to, so it can as well benefit from CONFIG_DEBUG_RODATA. Diff-ed against firstfloor tree. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:19 +01:00
Jan Beulich	c65f38d911	[PATCH] unwinder: fully support linker generated .eh_frame_hdr section Now that binutils' ld is able to properly populate .eh_frame_hdr in the Linux kernel case, here's a patch to add some functionality to the Dwarf2 unwinder to actually be able to make use of this (applies on firstfloor tree with the previously sent patch to add debug output, but not on plain 2.6.19). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:19 +01:00
Jan Beulich	6d0185ea61	[PATCH] unwinder: Add debugging output to the Dwarf2 unwinder Add debugging printks to the unwinder to allow easier debugging when something goes wrong with it. This can be controlled with the new unwinder_debug=N option Most output is given by N=1 AK: Added documentation of unwinder_debug= Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:13 +01:00
Jan Beulich	359ad0d401	[PATCH] unwinder: more sanity checks in Dwarf2 unwinder Tighten the requirements on both input to and output from the Dwarf2 unwinder. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:13 +01:00
Andi Kleen	eef5e0d185	[PATCH] unwinder: Remove lockdep disabling of nested locks for unwinder Shouldn't be needed anymore since __kernel_text_address is used unconditionally on x86-64 Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:12 +01:00
Andi Kleen	e2124bb8d3	[PATCH] unwinder: Use probe_kernel_address instead of __get_user in kernel/unwind.c This avoids trouble with the page fault handler if the fault happens inside an interrupt context. Suggested by Linus Cc: jbeulich@novell.com Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:12 +01:00
Chuck Ebbert	0741f4d207	[PATCH] x86: add sysctl for kstack_depth_to_print Add sysctl for kstack_depth_to_print. This lets users change the amount of raw stack data printed in dump_stack() without having to reboot. Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:11 +01:00
Jeremy Fitzhardinge	f95d47caae	[PATCH] i386: Use %gs as the PDA base-segment in the kernel This patch is the meat of the PDA change. This patch makes several related changes: 1: Most significantly, %gs is now used in the kernel. This means that on entry, the old value of %gs is saved away, and it is reloaded with __KERNEL_PDA. 2: entry.S constructs the stack in the shape of struct pt_regs, and this is passed around the kernel so that the process's saved register state can be accessed. Unfortunately struct pt_regs doesn't currently have space for %gs (or %fs). This patch extends pt_regs to add space for gs (no space is allocated for %fs, since it won't be used, and it would just complicate the code in entry.S to work around the space). 3: Because %gs is now saved on the stack like %ds, %es and the integer registers, there are a number of places where it no longer needs to be handled specially; namely context switch, and saving/restoring the register state in a signal context. 4: And since kernel threads run in kernel space and call normal kernel code, they need to be created with their %gs == __KERNEL_PDA. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Chuck Ebbert <76306.1226@compuserve.com> Cc: Zachary Amsden <zach@vmware.com> Cc: Jan Beulich <jbeulich@novell.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-12-07 02:14:02 +01:00
David Howells	9db7372445	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/ata/libata-scsi.c include/linux/libata.h Futher merge of Linus's head and compilation fixups. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 17:01:28 +00:00
David Howells	4c1ac1b491	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/infiniband/core/iwcm.c drivers/net/chelsio/cxgb2.c drivers/net/wireless/bcm43xx/bcm43xx_main.c drivers/net/wireless/prism54/islpci_eth.c drivers/usb/core/hub.h drivers/usb/input/hid-core.c net/core/netpoll.c Fix up merge failures with Linus's head and fix new compilation failures. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 14:37:56 +00:00
Al Viro	a1f8e7f7fb	[PATCH] severing skbuff.h -> highmem.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-12-04 02:00:29 -05:00
Al Viro	f6a570333e	[PATCH] severing module.h->sched.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-12-04 02:00:22 -05:00
Thomas Graf	17c157c889	[GENL]: Add genlmsg_put_reply() to simplify building reply headers By modyfing genlmsg_put() to take a genl_family and by adding genlmsg_put_reply() the process of constructing the netlink and generic netlink headers is simplified. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:42 -08:00
Thomas Graf	3dabc71578	[GENL]: Add genlmsg_new() to allocate generic netlink messages Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:40 -08:00
Thomas Graf	339bf98ffc	[NETLINK]: Do precise netlink message allocations where possible Account for the netlink message header size directly in nlmsg_new() instead of relying on the caller calculate it correctly. Replaces error handling of message construction functions when constructing notifications with bug traps since a failure implies a bug in calculating the size of the skb. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:11 -08:00
Kay Sievers	e17e0f51ae	Driver core: show drivers in /sys/module/ Show the drivers, which belong to the module: $ ls -l /sys/module/usbcore/drivers/ hub -> ../../../bus/usb/drivers/hub usb -> ../../../bus/usb/drivers/usb usbfs -> ../../../bus/usb/drivers/usbfs Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-12-01 14:52:02 -08:00
Linus Torvalds	707badb80b	Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: [PATCH] x86-64: Use stricter in process stack check for unwinder [PATCH] i386: Fix compilation with UP genericarch [PATCH] x86-64: Fix warning in io_apic.c [PATCH] x86-64: work around gcc4 issue with -Os in Dwarf2 stack unwind [PATCH] x86_64: Align data segment to PAGE_SIZE boundary	2006-11-28 17:28:41 -08:00
Akinobu Mita	3cce4856ff	[PATCH] fix create_write_pipe() error check The return value of create_write_pipe()/create_read_pipe() should be checked by IS_ERR(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-28 17:26:50 -08:00
Jan Beulich	ff0a538d8b	[PATCH] x86-64: work around gcc4 issue with -Os in Dwarf2 stack unwind This fixes a problem with gcc4 mis-compiling the stack unwind code under -Os, which resulted in 'stuck' messages whenever an assembly routine was encountered. (The second hunk is trivial cleanup.) Signed-off-by: Jan Beulich <jbeulich@novell.com>	2006-11-28 20:12:59 +01:00
Arjan van de Ven	cfd3ef2346	[PATCH] lockdep: spin_lock_irqsave_nested() Introduce spin_lock_irqsave_nested(); implementation from: http://lkml.org/lkml/2006/6/1/122 Patch from: http://lkml.org/lkml/2006/9/13/258 [akpm@osdl.org: two compile fixes] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Jiri Kosina <jikos@jikos.cz> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-25 13:28:34 -08:00
Akinobu Mita	753ca4f312	[PATCH] fix copy_process() error check The return value of copy_process() should be checked by IS_ERR(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-25 13:28:34 -08:00
Linus Torvalds	b42172fc7b	Don't call "note_interrupt()" with irq descriptor lock held This reverts commit `f72fa70760`, and solves the problem that it tried to fix by simply making "__do_IRQ()" call the note_interrupt() function without the lock held, the way everybody else does. It should be noted that all interrupt handling code must never allow the descriptor actors to be entered "recursively" (that's why we do all the magic IRQ_PENDING stuff in the first place), so there actually is exclusion at that much higher level, even in the absense of locking. Acked-by: Vivek Goyal <vgoyal@in.ibm.com> Acked-by:Pavel Emelianov <xemul@openvz.org> Cc: Andrew Morton <akpm@osdl.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-22 09:32:06 -08:00
David Howells	c4028958b6	WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:57:56 +00:00
David Howells	65f27f3844	WorkStruct: Pass the work_struct pointer instead of context data Pass the work_struct pointer to the work function rather than context data. The work function can use container_of() to work out the data. For the cases where the container of the work_struct may go away the moment the pending bit is cleared, it is made possible to defer the release of the structure by deferring the clearing of the pending bit. To make this work, an extra flag is introduced into the management side of the work_struct. This governs auto-release of the structure upon execution. Ordinarily, the work queue executor would release the work_struct for further scheduling or deallocation by clearing the pending bit prior to jumping to the work function. This means that, unless the driver makes some guarantee itself that the work_struct won't go away, the work function may not access anything else in the work_struct or its container lest they be deallocated.. This is a problem if the auxiliary data is taken away (as done by the last patch). However, if the pending bit is not cleared before jumping to the work function, then the work function may access the work_struct and its container with no problems. But then the work function must itself release the work_struct by calling work_release(). In most cases, automatic release is fine, so this is the default. Special initiators exist for the non-auto-release case (ending in _NAR). Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:55:48 +00:00
David Howells	365970a1ea	WorkStruct: Merge the pending bit into the wq_data pointer Reclaim a word from the size of the work_struct by folding the pending bit and the wq_data pointer together. This shouldn't cause misalignment problems as all pointers should be at least 4-byte aligned. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:54:49 +00:00
David Howells	6bb49e5965	WorkStruct: Typedef the work function prototype Define a type for the work function prototype. It's not only kept in the work_struct struct, it's also passed as an argument to several functions. This makes it easier to change it. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:54:45 +00:00
David Howells	52bad64d95	WorkStruct: Separate delayable and non-delayable events. Separate delayable work items from non-delayable work items be splitting them into a separate structure (delayed_work), which incorporates a work_struct and the timer_list removed from work_struct. The work_struct struct is huge, and this limits it's usefulness. On a 64-bit architecture it's nearly 100 bytes in size. This reduces that by half for the non-delayable type of event. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:54:01 +00:00
Ingo Molnar	1ff5683043	[PATCH] lockdep: fix static keys in module-allocated percpu areas lockdep got confused by certain locks in modules: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. Call Trace: [<ffffffff8026f40d>] dump_trace+0xaa/0x3f2 [<ffffffff8026f78f>] show_trace+0x3a/0x60 [<ffffffff8026f9d1>] dump_stack+0x15/0x17 [<ffffffff802abfe8>] __lock_acquire+0x724/0x9bb [<ffffffff802ac52b>] lock_acquire+0x4d/0x67 [<ffffffff80267139>] rt_spin_lock+0x3d/0x41 [<ffffffff8839ed3f>] :ip_conntrack:__ip_ct_refresh_acct+0x131/0x174 [<ffffffff883a1334>] :ip_conntrack:udp_packet+0xbf/0xcf [<ffffffff8839f9af>] :ip_conntrack:ip_conntrack_in+0x394/0x4a7 [<ffffffff8023551f>] nf_iterate+0x41/0x7f [<ffffffff8025946a>] nf_hook_slow+0x64/0xd5 [<ffffffff802369a2>] ip_rcv+0x24e/0x506 [...] Steven Rostedt found the bug: static_obj() check did not take PERCPU_ENOUGH_ROOM into account, so in-module DEFINE_PER_CPU-area locks were triggering this message. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-17 11:10:37 -08:00
Zhang, Yanmin	b86432b42e	[PATCH] some irq_chip variables point to NULL I got an oops when booting 2.6.19-rc5-mm1 on my ia64 machine. Below is the log. Oops 11012296146944 [1] Modules linked in: binfmt_misc dm_mirror dm_multipath dm_mod thermal processor f an container button sg eepro100 e100 mii Pid: 0, CPU 0, comm: swapper psr : 0000121008022038 ifs : 800000000000040b ip : [<a0000001000e1411>] Not tainted ip is at __do_IRQ+0x371/0x3e0 unat: 0000000000000000 pfs : 000000000000040b rsc : 0000000000000003 rnat: 656960155aa56aa5 bsps: a00000010058b890 pr : 656960155aa55a65 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001000e1390 b6 : a0000001005beac0 b7 : e00000007f01aa00 f6 : 000000000000000000000 f7 : 0ffe69090000000000000 f8 : 1000a9090000000000000 f9 : 0ffff8000000000000000 f10 : 1000a908ffffff6f70000 f11 : 1003e0000000000000909 r1 : a000000100fbbff0 r2 : 0000000000010002 r3 : 0000000000010001 r8 : fffffffffffbffff r9 : a000000100bd8060 r10 : a000000100dd83b8 r11 : fffffffffffeffff r12 : a000000100bcbbb0 r13 : a000000100bc4000 r14 : 0000000000010000 r15 : 0000000000010000 r16 : a000000100c01aa8 r17 : a000000100d2c350 r18 : 0000000000000000 r19 : a000000100d2c300 r20 : a000000100c01a88 r21 : 0000000080010100 r22 : a000000100c01ac0 r23 : a0000001000108e0 r24 : e000000477980004 r25 : 0000000000000000 r26 : 0000000000000000 r27 : e00000000913400c r28 : e0000004799ee51c r29 : e0000004778b87f0 r30 : a000000100d2c300 r31 : a00000010005c7e0 Call Trace: [<a000000100014600>] show_stack+0x40/0xa0 sp=a000000100bcb760 bsp=a000000100bc4f40 [<a000000100014f00>] show_regs+0x840/0x880 sp=a000000100bcb930 bsp=a000000100bc4ee8 [<a000000100037fb0>] die+0x250/0x320 sp=a000000100bcb930 bsp=a000000100bc4ea0 [<a00000010005e5f0>] ia64_do_page_fault+0x8d0/0xa20 sp=a000000100bcb950 bsp=a000000100bc4e50 [<a00000010000caa0>] ia64_leave_kernel+0x0/0x290 sp=a000000100bcb9e0 bsp=a000000100bc4e50 [<a0000001000e1410>] __do_IRQ+0x370/0x3e0 sp=a000000100bcbbb0 bsp=a000000100bc4df0 [<a000000100011f50>] ia64_handle_irq+0x170/0x220 sp=a000000100bcbbb0 bsp=a000000100bc4dc0 [<a00000010000caa0>] ia64_leave_kernel+0x0/0x290 sp=a000000100bcbbb0 bsp=a000000100bc4dc0 [<a000000100012390>] ia64_pal_call_static+0x90/0xc0 sp=a000000100bcbd80 bsp=a000000100bc4d78 [<a000000100015630>] default_idle+0x90/0x160 sp=a000000100bcbd80 bsp=a000000100bc4d58 [<a000000100014290>] cpu_idle+0x1f0/0x440 sp=a000000100bcbe20 bsp=a000000100bc4d18 [<a000000100009980>] rest_init+0xc0/0xe0 sp=a000000100bcbe20 bsp=a000000100bc4d00 [<a0000001009f8ea0>] start_kernel+0x6a0/0x6c0 sp=a000000100bcbe20 bsp=a000000100bc4ca0 [<a0000001000089f0>] __end_ivt_text+0x6d0/0x6f0 sp=a000000100bcbe30 bsp=a000000100bc4c00 <0>Kernel panic - not syncing: Aiee, killing interrupt handler! The root cause is that some irq_chip variables, especially ia64_msi_chip, initiate their memeber end to point to NULL. __do_IRQ doesn't check if irq_chip->end is null and just calls it after processing the interrupt. As irq_chip->end is called at many places, so I fix it by reinitiating irq_chip->end to dummy_irq_chip.end, e.g., a noop function. Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-16 11:43:37 -08:00
Linus Torvalds	9a3a04ac38	Revert "[PATCH] fix Data Acess error in dup_fd" This reverts commit `0130b0b32e`. Sergey Vlasov points out (and Vadim Lobanov concurs) that the bug it was supposed to fix must be some unrelated memory corruption, and the "fix" actually causes more problems: "However, the new code does not look safe in all cases. If some other task has opened more files while dup_fd() released oldf->file_lock, the new code will update open_files to the new larger value. But newf was allocated with the old smaller value of open_files, therefore subsequent accesses to newf may try to write into unallocated memory." so revert it. Cc: Sharyathi Nagesh <sharyath@in.ibm.com> Cc: Sergey Vlasov <vsu@altlinux.ru> Cc: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-14 15:20:51 -08:00
Andrew Morton	8b126b7753	[PATCH] setup_irq(): better mismatch debugging When we get a mismatch between handlers on the same IRQ, all we get is "IRQ handler type mismatch for IRQ n". Let's print the name of the presently-registered handler with which we got the mismatch. Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-14 09:09:26 -08:00
Pavel Emelianov	f72fa70760	[PATCH] Fix misrouted interrupts deadlocks While testing kernel on machine with "irqpoll" option I've caught such a lockup: __do_IRQ() spin_lock(&desc->lock); desc->chip->ack(); /* IRQ is ACKed / note_interrupt() misrouted_irq() handle_IRQ_event() if (...) local_irq_enable_in_hardirq(); / interrupts are enabled from now / ... __do_IRQ() / same IRQ we've started from / spin_lock(&desc->lock); / LOCKUP / Looking at misrouted_irq() code I've found that a potential deadlock like this can also take place: 1CPU: __do_IRQ() spin_lock(&desc->lock); / irq = A / misrouted_irq() for (i = 1; i < NR_IRQS; i++) { spin_lock(&desc->lock); / irq = B / if (desc->status & IRQ_INPROGRESS) { 2CPU: __do_IRQ() spin_lock(&desc->lock); / irq = B / misrouted_irq() for (i = 1; i < NR_IRQS; i++) { spin_lock(&desc->lock); / irq = A */ if (desc->status & IRQ_INPROGRESS) { As the second lock on both CPUs is taken before checking that this irq is being handled in another processor this may cause a deadlock. This issue is only theoretical. I propose the attached patch to fix booth problems: when trying to handle misrouted IRQ active desc->lock may be unlocked. Acked-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-13 07:40:43 -08:00
Sharyathi Nagesh	0130b0b32e	[PATCH] fix Data Acess error in dup_fd On running the Stress Test on machine for more than 72 hours following error message was observed. 0:mon> e cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0] pc: c000000000060d90: .dup_fd+0x240/0x39c lr: c000000000060d6c: .dup_fd+0x21c/0x39c sp: c00000007ce2fa70 msr: 800000000000b032 dar: ffffffff00000028 dsisr: 40000000 current = 0xc000000074950980 paca = 0xc000000000454500 pid = 27330, comm = bash 0:mon> t [c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable) [c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88 [c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520 [c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4 [c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74 [c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc The problem is because of race window. When if(expand) block is executed in dup_fd unlocking of oldf->file_lock give a window for fdtable in oldf to be modified. So actual open_files in oldf may not match with open_files variable. Cc: Vadim Lobanov <vlobanov@speakeasy.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-13 07:40:43 -08:00
Eric W. Biederman	d99f160ac5	[PATCH] sysctl: allow a zero ctl_name in the middle of a sysctl table Since it is becoming clear that there are just enough users of the binary sysctl interface that completely removing the binary interface from the kernel will not be an option for foreseeable future, we need to find a way to address the sysctl maintenance issues. The basic problem is that sysctl requires one central authority to allocate sysctl numbers, or else conflicts and ABI breakage occur. The proc interface to sysctl does not have that problem, as names are not densely allocated. By not terminating a sysctl table until I have neither a ctl_name nor a procname, it becomes simple to add sysctl entries that don't show up in the binary sysctl interface. Which allows people to avoid allocating a binary sysctl value when not needed. I have audited the kernel code and in my reading I have not found a single sysctl table that wasn't terminated by a completely zero filled entry. So this change in behavior should not affect anything. I think this mechanism eases the pain enough that combined with a little disciple we can solve the reoccurring sysctl ABI breakage. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-06 01:46:23 -08:00
Eric W. Biederman	0e009be8a0	[PATCH] Improve the removed sysctl warnings Don't warn about libpthread's access to kernel.version. When it receives -ENOSYS it will read /proc/sys/kernel/version. If anything else shows up print the sysctl number string. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Cal Peake <cp@absolutedigital.net> Cc: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-06 01:46:23 -08:00
Peter Zijlstra	64efade11c	[PATCH] lockdep: fix delayacct locking bug Make the delayacct lock irqsave; this avoids the possible deadlock where an interrupt is taken while holding the delayacct lock which needs to take the delayacct lock. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-06 01:46:23 -08:00
Gautham R Shenoy	4b96b1a10c	[PATCH] Fix the spurious unlock_cpu_hotplug false warnings Cpu-hotplug locking has a minor race case caused because of setting the variable "recursive" to NULL after releasing the cpu_bitmask_lock in the function unlock_cpu_hotplug,instead of doing so before releasing the cpu_bitmask_lock. This was the cause of most of the recent false spurious lock_cpu_unlock warnings. This should fix the problem reported by Martin Lorenz reported in http://lkml.org/lkml/2006/10/29/127. Thanks to Srinivasa DS for pointing it out. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-06 01:46:22 -08:00
Linus Torvalds	10b1fbdb0a	Make sure "user->sigpending" count is in sync The previous commit (`45c18b0bb5`, aka "Fix unlikely (but possible) race condition on task->user access") fixed a potential oops due to __sigqueue_alloc() getting its "user" pointer out of sync with switch_user(), and accessing a user pointer that had been de-allocated on another CPU. It still left another (much less serious) problem, where a concurrent __sigqueue_alloc and swich_user could cause sigqueue_alloc to do signal pending reference counting for a _different_ user than the one it then actually ended up using. No oops, but we'd end up with the wrong signal accounting. Another case of Oleg's eagle-eyes picking up the problem. This is trivially fixed by just making sure we load whichever "user" structure we decide to use (it doesn't matter _which_ one we pick, we just need to pick one) just once. Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andrew Morton <akpm@osdl.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-04 13:03:00 -08:00
Linus Torvalds	45c18b0bb5	Fix unlikely (but possible) race condition on task->user access There's a possible race condition when doing a "switch_uid()" from one user to another, which could race with another thread doing a signal allocation and looking at the old thread ->user pointer as it is freed. This explains an oops reported by Lukasz Trabinski: http://permalink.gmane.org/gmane.linux.kernel/462241 We fix this by delaying the (reference-counted) freeing of the user structure until the thread signal handler lock has been released, so that we know that the signal allocation has either seen the new value or has properly incremented the reference count of the old one. Race identified by Oleg Nesterov. Cc: Lukasz Trabinski <lukasz@wsisiz.edu.pl> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andrew Morton <akpm@osdl.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-04 10:06:02 -08:00
Stephen Rothwell	3fd5939798	[PATCH] Create compat_sys_migrate_pages This is needed on bigendian 64bit architectures. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-03 12:27:59 -08:00
Rafael J. Wysocki	b918f6e62c	[PATCH] swsusp: debugging Add a swsusp debugging mode. This does everything that's needed for a suspend except for actually suspending. So we can look in the log messages and work out a) what code is being slow and b) which drivers are misbehaving. (1) # echo testproc > /sys/power/disk # echo disk > /sys/power/state This should turn off the non-boot CPU, freeze all processes, wait for 5 seconds and then thaw the processes and the CPU. (2) # echo test > /sys/power/disk # echo disk > /sys/power/state This should turn off the non-boot CPU, freeze all processes, shrink memory, suspend all devices, wait for 5 seconds, resume the devices etc. Cc: Pavel Machek <pavel@ucw.cz> Cc: Stefan Seyfried <seife@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-03 12:27:58 -08:00
Andrew Morton	19c6b6ed3f	[PATCH] schedule removal of FUTEX_FD Apparently FUTEX_FD is unfixably racy and nothing uses it (or if it does, it shouldn't). Add a warning printk, give any remaining users six months to migrate off it. Cc: Ulrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-03 12:27:58 -08:00
Andrew Morton	f46c483357	[PATCH] Add printk_timed_ratelimit() printk_ratelimit() has global state which makes it not useful for callers which wish to perform ratelimiting at a particular frequency. Add a printk_timed_ratelimit() which utilises caller-provided state storage to permit more flexibility. This function can in fact be used for things other than printk ratelimiting and is perhaps poorly named. Cc: Ulrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-03 12:27:58 -08:00
Oleg Nesterov	4a279ff1ea	[PATCH] taskstats: fix sub-threads accounting If there are no listeners, taskstats_exit_send() just returns because taskstats_exit_alloc() didn't allocate *tidstats. This is wrong, each sub-thread should do fill_tgid_exit() on exit, otherwise its ->delays is not recorded in ->signal->stats and lost. Q: We don't send TASKSTATS_TYPE_AGGR_TGID when single-threaded process exits. Is it good? How can the listener figure out that it was actually a process exit, not sub-thread? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-31 08:07:00 -08:00
Oleg Nesterov	f0ec1aaf54	[PATCH] xacct_add_tsk: fix pure theoretical ->mm use-after-free Paranoid fix. The task can free its ->mm after the 'if (p->mm)' check. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-30 12:08:41 -08:00
Randy Dunlap	0e2d57fc6e	[PATCH] ndiswrapper: don't set the module->taints flags For ndiswrapper, don't set the module->taints flags, just set the kernel global tainted flag. This should allow ndiswrapper to continue to use GPL symbols. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Florin Malita <fmalita@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-30 12:08:40 -08:00
Oleg Nesterov	3d8334def5	[PATCH] taskstats: fix sk_buff size calculation prepare_reply() adds GENL_HDRLEN to the payload (genlmsg_total_size()), but then it does genlmsg_put()->nlmsg_put(). This means we forget to reserve a room for 'struct nlmsghdr'. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Thomas Graf <tgraf@suug.ch> Cc: Andrew Morton <akpm@osdl.org> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-29 12:07:37 -08:00
Oleg Nesterov	d46a3d0d07	[PATCH] taskstats: fix sk_buff leak 'return genlmsg_cancel()' in taskstats_user_cmd/taskstats_exit_send potentially leaks a skb. Unless we pass 'rep_skb' to the netlink layer we own sk_buff. This means we should always do kfree_skb() on failure. [ Thomas acked and pointed out missing return value in original version ] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Thomas Graf <tgraf@suug.ch> Cc: Andrew Morton <akpm@osdl.org> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-29 12:07:37 -08:00
Alan Stern	057647fc47	[PATCH] workqueue: update kerneldoc This patch (as812) changes the kerneldoc comments explaining the return values from queue_work(), queue_delayed_work(), and queue_delayed_work_on(). The updated comments explain more accurately the meaning of the return code and avoid suggesting that a 0 value means the routine was unsuccessful. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:55 -07:00
Satoru Takeuchi	8fa1d7d3b2	[PATCH] cpu-hotplug: release `workqueue_mutex' properly on CPU hot-remove _cpu_down() acquires `workqueue_mutex' on its process, but doen't release it if __cpu_disable() fails. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:55 -07:00
Jim Houston	bb1d860551	[PATCH] time_adjust cleared before use I notice that the code which implements adjtime clears the time_adjust value before using it. The attached patch makes the obvious fix. Acked-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Jim Houston <jim.houston@ccur.com> Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:55 -07:00
Oleg Nesterov	d7c3f5f231	[PATCH] fill_tgid: cleanup delays accounting fill_tgid() should skip not only an already exited group leader. If the task has ->exit_state != 0 it already did exit_notify(), so it also did fill_tgid_exit()->delayacct_add_tsk(->signal->stats) and we should skip it to avoid a double accounting. This patch doesn't close the race completely, but it cleanups the code. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:55 -07:00
Oleg Nesterov	a98b609426	[PATCH] taskstats: don't use tasklist_lock Remove tasklist_lock from taskstats.c. find_task_by_pid() is rcu-safe. ->siglock allows us to traverse subthread without tasklist. Q: delay accounting looks wrong to me. If sub-thread has already called taskstats_exit_send() but didn't call release_task(self) yet it will be accounted twice. The window is big. No? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:54 -07:00
Oleg Nesterov	b8534d7bd8	[PATCH] taskstats: kill ->taskstats_lock in favor of ->siglock signal_struct is (mostly) protected by ->sighand->siglock, I think we don't need ->taskstats_lock to protect ->stats. This also allows us to simplify the locking in fill_tgid(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:54 -07:00
Oleg Nesterov	093a8e8aec	[PATCH] taskstats_tgid_free: fix usage taskstats_tgid_free() is called on copy_process's error path. This is wrong. IF (clone_flags & CLONE_THREAD) We should not clear ->signal->taskstats, current uses it, it probably has a valid accumulated info. ELSE taskstats_tgid_init() set ->signal->taskstats = NULL, there is nothing to free. Move the callsite to __exit_signal(). We don't need any locking, entire thread group is exiting, nobody should have a reference to soon to be released ->signal. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:54 -07:00
Oleg Nesterov	05d5bcd60e	[PATCH] bacct_add_tsk: fix unsafe and wrong parent/group_leader dereference 1. ts = timespec_sub(uptime, current->group_leader->start_time); It is possible that current != tsk. Probably it was supposed to be 'tsk->group_leader->start_time. But why we are reading group_leader's start_time ? This accounting is per thread, not per procees, I changed this to 'tsk->start_time. Please corect me. 2. stats->ac_ppid = (tsk->parent) ? tsk->parent->pid : 0; tsk->parent never == NULL, and it is unsafe to dereference it. Both the task and it's parent may exit after the caller unlocks tasklist_lock, the memory could be unmapped (DEBUG_SLAB). (And we should use ->real_parent->tgid in fact). Q: I don't understand the 'if (thread_group_leader(tsk))' check. Why it is needed ? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:54 -07:00
Oleg Nesterov	fca178c0c6	[PATCH] fill_tgid: fix task_struct leak and possible oops 1. fill_tgid() forgets to do put_task_struct(first). 2. release_task(first) can happen after fill_tgid() drops tasklist_lock, it is unsafe to dereference first->signal. This is a temporary fix, imho the locking should be reworked. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:54 -07:00
Stephen Rothwell	5fa3839a64	[PATCH] Constify compat_get_bitmap argument This means we can call it when the bitmap we want to fetch is declared const. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:54 -07:00
Jan Dittmer	1d4d262769	[PATCH] Add missing space in module.c for taintskernel Obvious fix. Signed-off-by: Jan Dittmer <jdi@l4x.org> Acked-by: Florin Malita <fmalita@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:53 -07:00
Jan Beulich	690a973f48	[PATCH] x86-64: Speed up dwarf2 unwinder This changes the dwarf2 unwinder to do a binary search for CIEs instead of a linear work. The linker is unfortunately not able to build a proper lookup table at link time, instead it creates one at runtime as soon as the bootmem allocator is usable (so you'll continue using the linear lookup for the first [hopefully] few calls). The code should be ready to utilize a build-time created table once a fixed linker becomes available. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-10-21 18:37:01 +02:00
Alexey Dobriyan	e05d722e45	[PATCH] kernel/nsproxy.c: use kmemdup() Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:44 -07:00
Randy Dunlap	d6f8ff7381	[PATCH] cad_pid sysctl with PROC_FS=n If CONFIG_PROC_FS=n: kernel/sysctl.c:148: warning: 'proc_do_cad_pid' used but never defined kernel/built-in.o:(.data+0x1228): undefined reference to `proc_do_cad_pid' make: *** [.tmp_vmlinux1] Error 1 Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:38 -07:00
Borislav Petkov	91fcdd4e03	[PATCH] readjust comments of task_timeslice for kernel doc Signed-off-by: Borislav Petkov <petkov@math.uni-muenster.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:37 -07:00
Linus Torvalds	43f82216f0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: fm801-gp - handle errors from pci_enable_device() Input: gameport core - handle errors returned by device_bind_driver() Input: serio core - handle errors returned by device_bind_driver() Lockdep: fix compile error in drivers/input/serio/serio.c Input: serio - add lockdep annotations Lockdep: add lockdep_set_class_and_subclass() and lockdep_set_subclass() Input: atkbd - supress "too many keys" error message Input: i8042 - supress ACK/NAKs when blinking during panic Input: add missing exports to fix modular build	2006-10-17 08:56:43 -07:00
Neil Brown	bd5349cfd2	[PATCH] Convert cpu hotplug notifiers to use raw_notifier instead of blocking_notifier The use of blocking notifier by _cpu_up and _cpu_down in cpu.c has two problem. 1/ An interaction with the workqueue notifier causes lockdep to spit a warning. 2/ A notifier could conceivable be added or removed while _cpu_up or _cpu_down are in process. As each notifier is called twice (prepare then commit/abort) this could be unhealthy. To fix to we simply take cpu_add_remove_lock while adding or removing notifiers to/from the list. This makes the 'blocking' usage unnecessary as all accesses to cpu_chain are now protected by cpu_add_remove_lock. So change "blocking" to "raw" in all relevant places. This fixes 1. Credit: Andrew Morton Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> (reporter) Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:48 -07:00
Peter Zijlstra	bea493a031	[PATCH] rt-mutex: fixup rt-mutex debug code BUG: warning at kernel/rtmutex-debug.c:125/rt_mutex_debug_task_free() (Not tainted) [<c04051e3>] show_trace_log_lvl+0x58/0x16a [<c04057f0>] show_trace+0xd/0x10 [<c0405900>] dump_stack+0x19/0x1b [<c043f03d>] rt_mutex_debug_task_free+0x35/0x6a [<c04224c0>] free_task+0x15/0x24 [<c042378c>] copy_process+0x12bd/0x1324 [<c0423835>] do_fork+0x42/0x113 [<c04021dd>] sys_fork+0x19/0x1b [<c0403fb7>] syscall_call+0x7/0xb In copy_process(), dup_task_struct() also duplicates the ->pi_lock, ->pi_waiters and ->pi_blocked_on members. rt_mutex_debug_task_free() called from free_task() validates these members. However free_task() can be invoked before these members are reset for the new task. Move the initialization code before the first bail that can hit free_task(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:48 -07:00
Ingo Molnar	a460e745e8	[PATCH] genirq: clean up irq-flow-type naming Introduce desc->name and eliminate the handle_irq_name() hack. Add set_irq_chip_and_handler_name() to set the flow type and name at once. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Matthew Wilcox <willy@debian.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:45 -07:00
Andrew Morton	c60099bfe3	[PATCH] swsusp: fix memory leaks My fancy new swsusp IO code had a big memory leak. It's somewhat invisible because the whole mem_map[] gets overwritten after resume, but it can cause us to get low on memory during the actual suspend process. Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:44 -07:00
Thomas Gleixner	ac08c26492	[PATCH] posix-cpu-timers: prevent signal delivery starvation The integer divisions in the timer accounting code can round the result down to 0. Adding 0 is without effect and the signal delivery stops. Clamp the division result to minimum 1 to avoid this. Problem was reported by Seongbae Park <spark@google.com>, who provided also an inital patch. Roland sayeth: I have had some more time to think about the problem, and to reproduce it using Toyo's test case. For the record, if my understanding of the problem is correct, this happens only in one very particular case. First, the expiry time has to be so soon that in cputime_t units (usually 1s/HZ ticks) it's < nthreads so the division yields zero. Second, it only affects each thread that is so new that its CPU time accumulation is zero so now+0 is still zero and ->it__expires winds up staying zero. For the VIRT and PROF clocks when cputime_t is tick granularity (or the SCHED clock on configurations where sched_clock's value only advances on clock ticks), this is not hard to arrange with new threads starting up and blocking before they accumulate a whole tick of CPU time. That's what happens in Toyo's test case. Note that in general it is fine for that division to round down to zero, and set each thread's expiry time to its "now" time. The problem only arises with thread's whose "now" value is still zero, so that now+0 winds up 0 and is interpreted as "not set" instead of ">= now". So it would be a sufficient and more precise fix to just use max(ticks, 1) inside the loop when setting each it__expires value. But, it does no harm to round the division up to one and always advance every thread's expiry time. If the thread didn't already fire timers for the expiry time of "now", there is no expectation that it will do so before the next tick anyway. So I followed Thomas's patch in lifting the max out of the loops. This patch also covers the reload cases, which are harder to write a test for (and I didn't try). I've tested it with Toyo's case and it fixes that. [toyoa@mvista.com: fix: min_t -> max_t] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Daniel Walker <dwalker@mvista.com> Cc: Toyo Abe <toyoa@mvista.com> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Seongbae Park <spark@google.com> Cc: Peter Mattis <pmattis@google.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:43 -07:00
john stultz	3f4a0b917c	[PATCH] i386 Time: Avoid PIT SMP lockups Avoid possible PIT livelock issues seen on SMP systems (and reported by Andi), by not allowing it as a clocksource on SMP boxes. However, since the PIT may no longer be present, we have to properly handle the cases where SMP systems have TSC skew and fall back from the TSC. Since the PIT isn't there, it would "fall back" to the TSC again. So this changes the jiffies rating to 1, and the TSC-bad rating value to 0. Thus you will get the following behavior priority on i386 systems: tsc [if present & stable] hpet [if present] cyclone [if present] acpi_pm [if present] pit [if UP] jiffies Rather then the current more complicated: tsc [if present & stable] hpet [if present] cyclone [if present] acpi_pm [if present] pit [if cpus < 4] tsc [if present & unstable] jiffies Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:42 -07:00
Ingo Molnar	ca268c691d	[PATCH] lockdep: increase max allowed recursion depth In general, lockdep warnings are intended to be non-fatal, so I have put in various practical limits on internal data structure failure modes. We haven't had a /single/ lockdep-internal crash ever since lockdep went upstream [the unwinder crashes are outside of lockdep], and that's largely due to the good internal checks it does. Recursion within the dependency graph is currently limited to 20, that's probably not enough on some many-CPU boxes - this patch doubles it to 40. I have written the lockdep functions to have as small stackframes as possible, so 40 should be OK too. (The practical recursion limit should be somewhere between 100 and 200 entries. If we hit that then I'll change the algorithm to be iteration-based. Graph walking logic is so easy to program via recursion, so i'd like to keep recursion as long as possible.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:42 -07:00
Randy Dunlap	39af114377	[PATCH] fix epoll_pwait when EPOLL=n Fixes http://bugzilla.kernel.org/show_bug.cgi?id=7371 sys_epoll_pwait needs to be listed as a conditional (weak) entry point for CONFIG_EPOLL=n. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-16 09:14:05 -07:00
Ingo Molnar	256a6b4136	[PATCH] lockdep: fix printk recursion logic Bug reported and fixed by Tilman Schmidt <tilman@imap.cc>: if lockdep is enabled then log messages make it to /var/log/messages belatedly. The reason is a missed wakeup of klogd. Initially there was only a lockdep_internal() protection against lockdep recursion within vprintk() - it grew the 'outer' lockdep_off()/on() protection only later on. But that lockdep_off() made the release_console_sem() within vprintk() always happen under the lockdep_internal() condition, causing the bug. The right solution to remove the inner protection against recursion here - the outer one is enough. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Tilman Schmidt <tilman@imap.cc> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:24 -07:00
Alexey Dobriyan	3dc3099a9b	[PATCH] lockdep: use BUILD_BUG_ON Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:24 -07:00
Reinette Chatre	01a3ee2b20	[PATCH] bitmap: parse input from kernel and user buffers lib/bitmap.c:bitmap_parse() is a library function that received as input a user buffer. This seemed to have originated from the way the write_proc function of the /proc filesystem operates. This has been reworked to not use kmalloc and eliminates a lot of get_user() overhead by performing one access_ok before using __get_user(). We need to test if we are in kernel or user space (is_user) and access the buffer differently. We cannot use __get_user() to access kernel addresses in all cases, for example in architectures with separate address space for kernel and user. This function will be useful for other uses as well; for example, taking input for /sysfs instead of /proc, so it was changed to accept kernel buffers. We have this use for the Linux UWB project, as part as the upcoming bandwidth allocator code. Only a few routines used this function and they were changed too. Signed-off-by: Reinette Chatre <reinette.chatre@linux.intel.com> Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com> Cc: Paul Jackson <pj@sgi.com> Cc: Joe Korty <joe.korty@ccur.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:22 -07:00
Nick Piggin	beed33a816	[PATCH] sched: likely profiling This likely profiling is pretty fun. I found a few possible problems in sched.c. This patch may be not measurable, but when I did measure long ago, nooping (un)likely cost a couple of % on scheduler heavy benchmarks, so it all adds up. Tweak some branch hints: - the 2nd 64 bits in the bitmask is likely to be populated, because it contains the first 28 bits (nearly 3/4) of the normal priorities. (ratio of 669669:691 ~= 1000:1). - it isn't unlikely that context switching switches to another process. it might be very rapidly switching to and from the idle process (ratio of 475815:419004 and 471330:423544). Let the branch predictor decide. - preempt_enable seems to be very often called in a nested preempt_disable or with interrupts disabled (ratio of 3567760:87965 ~= 40:1) Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Daniel Walker <dwalker@mvista.com> Cc: Hua Zhong <hzhong@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:22 -07:00
Florin Malita	fa3ba2e81e	[PATCH] fix Module taint flags listing in Oops/panic Module taint flags listing in Oops/panic has a couple of issues: * taint_flags() doesn't null-terminate the buffer after printing the flags * per-module taints are only set if the kernel is not already tainted (with that particular flag) => only the first offending module gets its taint info correctly updated Some additional changes: * 'license_gplok' is no longer needed - equivalent to !(taints & TAINT_PROPRIETARY_MODULE) - so we can drop it from struct module * exporting module taint info via /proc/module: pwc 88576 0 - Live 0xf8c32000 evilmod 6784 1 pwc, Live 0xf8bbf000 (PF) Signed-off-by: Florin Malita <fmalita@gmail.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:21 -07:00
Christoph Lameter	469340236a	[PATCH] mm: kevent threads: use MPOL_DEFAULT Switch the memory policy of the kevent threads to MPOL_DEFAULT while leaving the kzalloc of the workqueue structure on interleave. This means that all code executed in the context of the kevent thread is allocating node local. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Alok Kataria <alok.kataria@calsoftinc.com> Cc: Andi Kleen <ak@suse.de> Cc: <pj@sgi.com> Cc: <shai@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:19 -07:00
Rafael J. Wysocki	97c7801cd5	[PATCH] swsusp: Use suspend_console Add suspend_console() and resume_console() to the suspend-to-disk code paths so that the users of netconsole can use swsusp with it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:14 -07:00
Peter Zijlstra	4dfbb9d8c6	Lockdep: add lockdep_set_class_and_subclass() and lockdep_set_subclass() This annotation makes it possible to assign a subclass on lock init. This annotation is meant to reduce the _nested() annotations by assigning a default subclass. One could do without this annotation and rely on lockdep_set_class() exclusively, but that would require a manual stack of struct lock_class_key objects. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>	2006-10-11 01:45:14 -04:00
Al Viro	1af9892811	[PATCH] cpuset ANSI prototype Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-10 15:37:23 -07:00
Al Viro	ba2397efe1	[PATCH] make kernel/relay.c __user-clean Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-10 15:37:22 -07:00
Al Viro	ba46df984b	[PATCH] __user annotations: futex Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-10 15:37:22 -07:00
Rafael J. Wysocki	5c339d4541	[PATCH] swsusp: Make userland suspend work on SMP again Unfortunately one of the recent changes in swsusp has broken the userland suspend on SMP. Fix it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-07 10:51:14 -07:00
Frederik Deweerdt	e317c8ccaa	[PATCH] ixp4xxdefconfig arm fixes With the following patch, the ixp4xxdefconfig builds correctly. I'll test some more configs if I get some time. Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-06 12:11:08 -07:00
Andrew Morton	4899b8b16b	[PATCH] kauditd_thread warning fix Squash this warning: kernel/audit.c: In function 'kauditd_thread': kernel/audit.c:367: warning: no return statement in function returning non-void We might as test kthread_should_stop(), although it's not very pointful at present. The code which starts this thread looks racy - the kernel could start multiple threads. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-06 08:53:39 -07:00
David Howells	7d12e780e0	IRQ: Maintain regs pointer globally rather than passing to IRQ handlers Maintain a per-CPU global "struct pt_regs " variable which can be used instead of passing regs around manually through all ~1800 interrupt handlers in the Linux kernel. The regs pointer is used in few places, but it potentially costs both stack space and code to pass it around. On the FRV arch, removing the regs parameter from all the genirq function results in a 20% speed up of the IRQ exit path (ie: from leaving timer_interrupt() to leaving do_IRQ()). Where appropriate, an arch may override the generic storage facility and do something different with the variable. On FRV, for instance, the address is maintained in GR28 at all times inside the kernel as part of general exception handling. Having looked over the code, it appears that the parameter may be handed down through up to twenty or so layers of functions. Consider a USB character device attached to a USB hub, attached to a USB controller that posts its interrupts through a cascaded auxiliary interrupt controller. A character device driver may want to pass regs to the sysrq handler through the input layer which adds another few layers of parameter passing. I've build this code with allyesconfig for x86_64 and i386. I've runtested the main part of the code on FRV and i386, though I can't test most of the drivers. I've also done partial conversion for powerpc and MIPS - these at least compile with minimal configurations. This will affect all archs. Mostly the changes should be relatively easy. Take do_IRQ(), store the regs pointer at the beginning, saving the old one: struct pt_regs old_regs = set_irq_regs(regs); And put the old one back at the end: set_irq_regs(old_regs); Don't pass regs through to generic_handle_irq() or __do_IRQ(). In timer_interrupt(), this sort of change will be necessary: - update_process_times(user_mode(regs)); - profile_tick(CPU_PROFILING, regs); + update_process_times(user_mode(get_irq_regs())); + profile_tick(CPU_PROFILING); I'd like to move update_process_times()'s use of get_irq_regs() into itself, except that i386, alone of the archs, uses something other than user_mode(). Some notes on the interrupt handling in the drivers: () input_dev() is now gone entirely. The regs pointer is no longer stored in the input_dev struct. () finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does something different depending on whether it's been supplied with a regs pointer or not. (*) Various IRQ handler function pointers have been moved to type irq_handler_t. Signed-Off-By: David Howells <dhowells@redhat.com> (cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)	2006-10-05 15:10:12 +01:00
David Howells	da482792a6	IRQ: Typedef the IRQ handler function type Typedef the IRQ handler function type. Signed-Off-By: David Howells <dhowells@redhat.com> (cherry picked from 1356d1e5fd256997e3d3dce0777ab787d0515c7a commit)	2006-10-05 13:28:27 +01:00
David Howells	57a58a9435	IRQ: Typedef the IRQ flow handler function type Typedef the IRQ flow handler function type. Signed-Off-By: David Howells <dhowells@redhat.com> (cherry picked from 8e973fbdf5716b93a0a8c0365be33a31ca0fa351 commit)	2006-10-05 13:28:06 +01:00
Linus Torvalds	fefd26b3b8	Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/configh * master.kernel.org:/pub/scm/linux/kernel/git/davej/configh: Remove all inclusions of <linux/config.h> Manually resolved trivial path conflicts due to removed files in the sound/oss/ subdirectory.	2006-10-04 09:59:57 -07:00
Linus Torvalds	18e6756a6b	Merge branch 'audit.b32' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b32' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] message types updated [PATCH] name_count array overrun [PATCH] PPID filtering fix [PATCH] arch filter lists with < or > should not be accepted	2006-10-04 08:15:55 -07:00
Oleg Nesterov	20e9751bd9	[PATCH] rcu: simplify/improve batch tuning Kill a hard-to-calculate 'rsinterval' boot parameter and per-cpu rcu_data.last_rs_qlen. Instead, it adds adds a flag rcu_ctrlblk.signaled, which records the fact that one of CPUs has sent a resched IPI since the last rcu_start_batch(). Roughly speaking, we need two rcu_start_batch()s in order to move callbacks from ->nxtlist to ->donelist. This means that when ->qlen exceeds qhimark and continues to grow, we should send a resched IPI, and then do it again after we gone through a quiescent state. On the other hand, if it was already sent, we don't need to do it again when another CPU detects overflow of the queue. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:31 -07:00
Josh Triplett	4b6c2cca6e	[PATCH] rcu: add sched torture type to rcutorture Implement torture testing for the "sched" variant of RCU, which uses preempt_disable, preempt_enable, and synchronize_sched. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:31 -07:00
Josh Triplett	11a147013e	[PATCH] rcu: add rcu_bh_sync torture type to rcutorture Use the newly-generic synchronous deferred free function to implement torture testing for rcu_bh using synchronize_rcu_bh rather than the asynchronous call_rcu_bh. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:31 -07:00
Josh Triplett	20d2e4283a	[PATCH] rcu: add rcu_sync torture type to rcutorture Use the newly-generic synchronous deferred free function to implement torture testing for RCU using synchronize_rcu rather than the asynchronous call_rcu. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:31 -07:00
Josh Triplett	e303373658	[PATCH] rcu: refactor srcu_torture_deferred_free to work for any implementation Make srcu_torture_deferred_free use cur_ops->sync() so it will work for any implementation. Move and rename it in preparation for use in the ops of other implementations. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:31 -07:00
Josh Triplett	b772e1dd4b	[PATCH] RCU: add fake writers to rcutorture rcutorture currently has one writer and an arbitrary number of readers. To better exercise some of the code paths in RCU implementations, add fake writer threads which call the synchronize function for the RCU variant in a loop, with a delay between calls to arrange for different numbers of writers running in parallel. [bunk@stusta.de: cleanup] Acked-by: Paul McKenney <paulmck@us.ibm.com> Cc: Dipkanar Sarma <dipankar@in.ibm.com> Signed-off-by: Josh Triplett <josh@freedesktop.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:31 -07:00
Josh Triplett	75cfef32f2	[PATCH] rcu: Fix sign bug making rcu_random always return the same sequence rcu_random uses a counter rrs_count to occasionally mix data from get_random_bytes into the state of its pseudorandom generator. However, the rrs_counter gets declared as an unsigned long, and rcu_random checks for --rrs_count < 0, so this code will never mix any real random data into the state, and will thus always return the same sequence of random numbers. Also, change the return value of rcu_random from long to unsigned long, to avoid potential issues caused by the use of the % operator, which can return negative values for negative left operands. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:31 -07:00
Josh Triplett	2860aaba4d	[PATCH] rcu: Avoid kthread_stop on invalid pointer if rcutorture reader startup fails rcu_torture_init kmallocs the array of reader threads, then creates each one with kthread_run, cleaning up with rcu_torture_cleanup if this fails. rcu_torture_cleanup calls kthread_stop on any non-NULL pointer in the array; however, any readers after the one that failed to start up will have invalid pointers, not null pointers. Avoid this by using kzalloc instead. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:30 -07:00
Josh Triplett	3c29e03d91	[PATCH] rcu: Mention rcu_bh in description of rcutorture's torture_type parameter The comment for rcutorture's torture_type parameter only lists the RCU variants rcu and srcu, but not rcu_bh; add rcu_bh to the list. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:30 -07:00
Josh Triplett	ff2c93a537	[PATCH] rcu: Add MODULE_AUTHOR to rcutorture module Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:30 -07:00
Alan Stern	e6a92013ba	[PATCH] SRCU: report out-of-memory errors Currently the init_srcu_struct() routine has no way to report out-of-memory errors. This patch (as761) makes it return -ENOMEM when the per-cpu data allocation fails. The patch also makes srcu_init_notifier_head() report a BUG if a notifier head can't be initialized. Perhaps it should return -ENOMEM instead, but in the most likely cases where this might occur I don't think any recovery is possible. Notifier chains generally are not created dynamically. [akpm@osdl.org: avoid statement-with-side-effect in macro] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:30 -07:00
Alan Stern	eabc069401	[PATCH] Add SRCU-based notifier chains This patch (as751) adds a new type of notifier chain, based on the SRCU (Sleepable Read-Copy Update) primitives recently added to the kernel. An SRCU notifier chain is much like a blocking notifier chain, in that it must be called in process context and its callout routines are allowed to sleep. The difference is that the chain's links are protected by the SRCU mechanism rather than by an rw-semaphore, so calling the chain has extremely low overhead: no memory barriers and no cache-line bouncing. On the other hand, unregistering from the chain is expensive and the chain head requires special runtime initialization (plus cleanup if it is to be deallocated). SRCU notifiers are appropriate for notifiers that will be called very frequently and for which unregistration occurs very seldom. The proposed "task notifier" scheme qualifies, as may some of the network notifiers. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Acked-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:30 -07:00
Paul E. McKenney	b2896d2e75	[PATCH] srcu-3: add SRCU operations to rcutorture Adds SRCU operations to rcutorture and updates rcutorture documentation. Also increases the stress imposed by the rcutorture test. [bunk@stusta.de: make needlessly global code static] Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Cc: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:30 -07:00
Paul E. McKenney	621934ee7e	[PATCH] srcu-3: RCU variant permitting read-side blocking Updated patch adding a variant of RCU that permits sleeping in read-side critical sections. SRCU is as follows: o Each use of SRCU creates its own srcu_struct, and each srcu_struct has its own set of grace periods. This is critical, as it prevents one subsystem with a blocking reader from holding up SRCU grace periods for other subsystems. o The SRCU primitives (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) all take a pointer to a srcu_struct. o The SRCU primitives must be called from process context. o srcu_read_lock() returns an int that must be passed to the matching srcu_read_unlock(). Realtime RCU avoids the need for this by storing the state in the task struct, but SRCU needs to allow a given code path to pass through multiple SRCU domains -- storing state in the task struct would therefore require either arbitrary space in the task struct or arbitrary limits on SRCU nesting. So I kicked the state-storage problem up to the caller. Of course, it is not permitted to call synchronize_srcu() while in an SRCU read-side critical section. o There is no call_srcu(). It would not be hard to implement one, but it seems like too easy a way to OOM the system. (Hey, we have enough trouble with call_rcu(), which does -not- permit readers to sleep!!!) So, if you want it, please tell me why... [josht@us.ibm.com: sparse notation] Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Josh Triplett <josh@freedesktop.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:30 -07:00
Eric W. Biederman	1f80025e62	[PATCH] msi: simplify msi sanity checks by adding with generic irq code Currently msi.c is doing sanity checks that make certain before an irq is destroyed it has no more users. By adding irq_has_action I can perform the test is a generic way, instead of relying on a msi specific data structure. By performing the core check in dynamic_irq_cleanup I ensure every user of dynamic irqs has a test present and we don't free resources that are in use. In msi.c this allows me to kill the attrib.state member of msi_desc and all of the assciated code to maintain it. To keep from freeing data structures when irq cleanup code is called to soon changing dyanamic_irq_cleanup is insufficient because there are msi specific data structures that are also not safe to free. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tony Luck <tony.luck@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:29 -07:00
Eric W. Biederman	3a16d71362	[PATCH] genirq: irq: add a dynamic irq creation API With the msi support comes a new concept in irq handling, irqs that are created dynamically at run time. Currently the msi code allocates irqs backwards. First it allocates a platform dependent routing value for an interrupt the ``vector'' and then it figures out from the vector which irq you are on. This msi backwards allocator suffers from two basic problems. The allocator suffers because it is trying to do something that is architecture specific in a generic way making it brittle, inflexible, and tied to tightly to the architecture implementation. The alloctor also suffers from it's very backwards nature as it has tied things together that should have no dependencies. To solve the basic dynamic irq allocation problem two new architecture specific functions are added: create_irq and destroy_irq. create_irq takes no input and returns an unused irq number, that won't be reused until it is returned to the free poll with destroy_irq. The irq then can be used for any purpose although the only initial consumer is the msi code. destroy_irq takes an irq number allocated with create_irq and returns it to the free pool. Making this functionality per architecture increases the simplicity of the irq allocation code and increases it's flexibility. dynamic_irq_init() and dynamic_irq_cleanup() are added to automate the irq_desc initializtion that should happen for dynamic irqs. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rajesh Shah <rajesh.shah@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:27 -07:00
Eric W. Biederman	e7b946e98a	[PATCH] genirq: irq: add moved_masked_irq Currently move_native_irq disables and renables the irq we are migrating to ensure we don't take that irq when we are actually doing the migration operation. Disabling the irq needs to happen but sometimes doing the work is move_native_irq is too late. On x86 with ioapics the irq move sequences needs to be: edge_triggered: mask irq. move irq. unmask irq. ack irq. level_triggered: mask irq. ack irq. move irq. unmask irq. We can easily perform the edge triggered sequence, with the current defintion of move_native_irq. However the level triggered case does not map well. For that I have added move_masked_irq, to allow me to disable the irqs around both the ack and the move. Q: Why have we not seen this problem earlier? A: The only symptom I have been able to reproduce is that if we change the vector before acknowleding an irq the wrong irq is acknowledged. Since we currently are not reprogramming the irq vector during migration no problems show up. We have to mask the irq before we acknowledge the irq or else we could hit a window where an irq is asserted just before we acknowledge it. Edge triggered irqs do not have this problem because acknowledgements do not propogate in the same way. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rajesh Shah <rajesh.shah@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:26 -07:00
Eric W. Biederman	a24ceab4f4	[PATCH] genirq: irq: convert the move_irq flag from a 32bit word to a single bit The primary aim of this patchset is to remove maintenances problems caused by the irq infrastructure. The two big issues I address are an artificially small cap on the number of irqs, and that MSI assumes vector == irq. My primary focus is on x86_64 but I have touched other architectures where necessary to keep them from breaking. - To increase the number of irqs I modify the code to look at the (cpu, vector) pair instead of just looking at the vector. With a large number of irqs available systems with a large irq count no longer need to compress their irq numbers to fit. Removing a lot of brittle special cases. For acpi guys the result is that irq == gsi. - Addressing the fact that MSI assumes irq == vector takes a few more patches. But suffice it to say when I am done none of the generic irq code even knows what a vector is. In quick testing on a large Unisys x86_64 machine we stumbled over at least one driver that assumed that NR_IRQS could always fit into an 8 bit number. This driver is clearly buggy today. But this has become a class of bugs that it is now much easier to hit. This patch: This is a minor space optimization. In practice I don't think this has any affect because of our alignment constraints and the other fields but there is not point in chewing up an uncessary word and since we already read the flag field this should improve the cache hit ratio of the irq handler. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rajesh Shah <rajesh.shah@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:26 -07:00
Steve Grubb	ac9910ce01	[PATCH] name_count array overrun Hi, This patch removes the rdev logging from the previous patch The below patch closes an unbounded use of name_count. This can lead to oopses in some new file systems. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-10-04 08:31:21 -04:00
Alexander Viro	419c58f11f	[PATCH] PPID filtering fix On Thu, Sep 28, 2006 at 04:03:06PM -0400, Eric Paris wrote: > After some looking I did not see a way to get into audit_log_exit > without having set the ppid. So I am dropping the set from there and > only doing it at the beginning. > > Please comment/ack/nak as soon as possible. Ehh... That's one hell of an overhead to be had ;-/ Let's be lazy. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-10-04 08:31:19 -04:00
Eric Paris	4b8a311bb1	[PATCH] arch filter lists with < or > should not be accepted Currently the kernel audit system represents arch's as numbers and will gladly accept comparisons between archs using >, <, >=, <= when the only thing that makes sense is = or !=. I'm told that the next revision of auditctl will do this checking but this will provide enforcement in the kernel even for old userspace. A simple command to show the issue would be to run auditctl -d entry,always -F arch>i686 -S chmod with this patch the kernel will reject this with -EINVAL Please comment/ack/nak as soon as possible. -Eric kernel/auditfilter.c \| 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-10-04 08:31:16 -04:00
Dave Jones	038b0a6d8d	Remove all inclusions of <linux/config.h> kbuild explicitly includes this at build time. Signed-off-by: Dave Jones <davej@redhat.com>	2006-10-04 03:38:54 -04:00
Josh Triplett	4802211cfd	rcutorture: Fix incorrect description of default for nreaders parameter The comment for the nreaders parameter of rcutorture gives the default as 4ncpus, but the value actually defaults to 2ncpus; fix the comment. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-10-03 23:26:16 +02:00
Rolf Eike Beer	9f5d785e93	remove duplicate "until" from kernel/workqueue.c s/until until/until/ Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-10-03 23:07:31 +02:00
Uwe Zeisberger	f30c226954	fix file specification in comments Many files include the filename at the beginning, serveral used a wrong one. Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-10-03 23:01:26 +02:00
Christoph Lameter	ce164428c4	[PATCH] scheduler: NUMA aware placement of sched_group_allnodes When the per cpu sched domains are build then they also need to be placed on the node where the cpu resides otherwise we will have frequent off node accesses which will slow down the system. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:07 -07:00
Satoru Takeuchi	0feaece977	[PATCH] sched: fixing wrong comment for find_idlest_cpu() Fixing wrong comment for find_idlest_cpu(). Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:07 -07:00
Siddha, Suresh B	89c4710ee9	[PATCH] sched: cleanup sched_group cpu_power setup Up to now sched group's cpu_power for each sched domain is initialized independently. This made the setup code ugly as the new sched domains are getting added. Make the sched group cpu_power setup code generic, by using domain child field and new domain flag in sched_domain. For most of the sched domains(except NUMA), sched group's cpu_power is now computed generically using the domain properties of itself and of the child domain. sched groups in NUMA domains are setup little differently and hence they don't use this generic mechanism. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:06 -07:00
Siddha, Suresh B	1a84887080	[PATCH] sched: introduce child field in sched_domain Introduce the child field in sched_domain struct and use it in sched_balance_self(). We will also use this field in cleaning up the sched group cpu_power setup(done in a different patch) code. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:06 -07:00
Dave Jones	7473264643	[PATCH] sched: don't print migration cost when only 1 CPU If only a single CPU is present, printing this doesn't make much sense. Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:06 -07:00
Siddha, Suresh B	a616058b78	[PATCH] sched: remove unnecessary sched group allocations Remove dynamic sched group allocations for MC and SMP domains. These allocations can easily fail on big systems(1024 or so CPUs) and we can live with out these dynamic allocations. [akpm@osdl.org: build fix] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:06 -07:00
Nick Piggin	5c1e176781	[PATCH] sched: force /sbin/init off isolated cpus Force /sbin/init off isolated cpus (unless every CPU is specified as an isolcpu). Users seem to think that the isolated CPUs shouldn't have much running on them to begin with. That's fair enough: intuitive, I guess. It also means that the cpu affinity masks of tasks will not include isolcpus by default, which is also more intuitive, perhaps. /sbin/init is spawned from the boot CPU's idle thread, and /sbin/init starts the rest of userspace. So if the boot CPU is specified to be an isolcpu, then prior to this patch, all of userspace will be run there. (throw in a couple of plausible devinit -> cpuinit conversions I spotted while we're here). Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Paul Jackson <pj@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:06 -07:00
Randy Dunlap	e1ca66d1b9	[PATCH] kernel-doc for kernel/resource.c Add kernel-doc function headers in kernel/resource.c and use them in DocBook. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:03:41 -07:00
Randy Dunlap	eed34d0fc5	[PATCH] kernel-doc for kernel/dma.c Add kernel-doc function headers in kernel/dma.c and use it in DocBook. Clean up kernel-doc in mca_dma.h (the colon (':') represents a section header). Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:03:41 -07:00
Franck Bui-Huu	ffc5089196	[PATCH] Create kallsyms_lookup_size_offset() Some uses of kallsyms_lookup() do not need to find out the name of a symbol and its module's name it belongs. This is specially true in arch specific code, which needs to unwind the stack to show the back trace during oops (mips is an example). In this specific case, we just need to retreive the function's size and the offset of the active intruction inside it. Adds a new entry "kallsyms_lookup_size_offset()" This new entry does exactly the same as kallsyms_lookup() but does not require any buffers to store any names. It returns 0 if it fails otherwise 1. Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:03:41 -07:00
David Howells	3f2e05e90e	[PATCH] BLOCK: Revert patch to hack around undeclared sigset_t in linux/compat.h Revert Andrew Morton's patch to temporarily hack around the lack of a declaration of sigset_t in linux/compat.h to make the block-disablement patches build on IA64. This got accidentally pushed to Linus and should be fixed in a different manner. Also make linux/compat.h #include asm/signal.h to gain a definition of sigset_t so that it can externally declare sigset_from_compat(). This has been compile-tested for i386, x86_64, ia64, mips, mips64, frv, ppc and ppc64 and run-tested on frv. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 08:03:31 -07:00
Cedric Le Goater	9ec52099e4	[PATCH] replace cad_pid by a struct pid There are a few places in the kernel where the init task is signaled. The ctrl+alt+del sequence is one them. It kills a task, usually init, using a cached pid (cad_pid). This patch replaces the pid_t by a struct pid to avoid pid wrap around problem. The struct pid is initialized at boot time in init() and can be modified through systctl with /proc/sys/kernel/cad_pid [ I haven't found any distro using it ? ] It also introduces a small helper routine kill_cad_pid() which is used where it seemed ok to use cad_pid instead of pid 1. [akpm@osdl.org: cleanups, build fix] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:25 -07:00
Oleg Nesterov	1a657f78dc	[PATCH] introduce get_task_pid() to fix unsafe get_pid() proc_pid_make_inode: ei->pid = get_pid(task_pid(task)); I think this is not safe. get_pid() can be preempted after checking "pid != NULL". Then the task exits, does detach_pid(), and RCU frees the pid. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:25 -07:00
Arnd Bergmann	6760856791	[PATCH] introduce kernel_execve The use of execve() in the kernel is dubious, since it relies on the __KERNEL_SYSCALLS__ mechanism that stores the result in a global errno variable. As a first step of getting rid of this, change all users to a global kernel_execve function that returns a proper error code. This function is a terrible hack, and a later patch removes it again after the kernel syscalls are gone. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Andi Kleen <ak@muc.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:23 -07:00
Pavel	5d124e99c2	[PATCH] nsproxy cloning error path fix This patch fixes copy_namespaces()'s error path. when new nsproxy (new_ns) is created pointers to namespaces (ipc, uts) are copied from the old nsproxy. Later in copy_utsname, copy_ipcs, etc. according namespaces are get-ed. On error path needed namespaces are put-ed, so there's no need to put new nsproxy itelf as it woud cause putting namespaces for the second time. Found when incorporating namespaces into OpenVZ kernel. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:22 -07:00
Kirill Korotaev	fcfbd547b1	[PATCH] IPC namespace - sysctls Sysctl tweaks for IPC namespace Signed-off-by: Pavel Emelianiov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:22 -07:00
Kirill Korotaev	73ea41302b	[PATCH] IPC namespace - utils This patch adds basic IPC namespace functionality to IPC utils: - init_ipc_ns - copy/clone/unshare/free IPC ns - /proc preparations Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:22 -07:00
Kirill Korotaev	25b21cb2f6	[PATCH] IPC namespace core This patch set allows to unshare IPCs and have a private set of IPC objects (sem, shm, msg) inside namespace. Basically, it is another building block of containers functionality. This patch implements core IPC namespace changes: - ipc_namespace structure - new config option CONFIG_IPC_NS - adds CLONE_NEWIPC flag - unshare support [clg@fr.ibm.com: small fix for unshare of ipc namespace] [akpm@osdl.org: build fix] Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:22 -07:00
Serge Hallyn	c0b2fc3165	[PATCH] uts: copy nsproxy only when needed The nsproxy was being copied in unshare() when anything was being unshared, even if it was something not referenced from nsproxy. This should end up in some cases with far more memory usage than necessary. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:22 -07:00
Serge E. Hallyn	071df104f8	[PATCH] namespaces: utsname: implement CLONE_NEWUTS flag Implement a CLONE_NEWUTS flag, and use it at clone and sys_unshare. [clg@fr.ibm.com: IPC unshare fix] [bunk@stusta.de: cleanup] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:22 -07:00
Serge E. Hallyn	8218c74c02	[PATCH] namespaces: utsname: sysctl Sysctl uts patch. This will need to be done another way, but since sysctl itself needs to be container aware, 'the right thing' is a separate patchset. [akpm@osdl.org: ia64 build fix] [sam.vilain@catalyst.net.nz: cleanup] [sam.vilain@catalyst.net.nz: add proc_do_utsns_string] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:21 -07:00
Serge E. Hallyn	4865ecf131	[PATCH] namespaces: utsname: implement utsname namespaces This patch defines the uts namespace and some manipulators. Adds the uts namespace to task_struct, and initializes a system-wide init namespace. It leaves a #define for system_utsname so sysctl will compile. This define will be removed in a separate patch. [akpm@osdl.org: build fix, cleanup] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:21 -07:00
Serge E. Hallyn	96b644bdec	[PATCH] namespaces: utsname: use init_utsname when appropriate In some places, particularly drivers and __init code, the init utsns is the appropriate one to use. This patch replaces those with a the init_utsname helper. Changes: Removed several uses of init_utsname(). Hope I picked all the right ones in net/ipv4/ipconfig.c. These are now changed to utsname() (the per-process namespace utsname) in the previous patch (2/7) [akpm@osdl.org: CIFS fix] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:21 -07:00
Serge E. Hallyn	e9ff3990f0	[PATCH] namespaces: utsname: switch to using uts namespaces Replace references to system_utsname to the per-process uts namespace where appropriate. This includes things like uname. Changes: Per Eric Biederman's comments, use the per-process uts namespace for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c [jdike@addtoit.com: UML fix] [clg@fr.ibm.com: cleanup] [akpm@osdl.org: build fix] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:21 -07:00
Cedric Le Goater	fab413a334	[PATCH] namespaces: exit_task_namespaces() invalidates nsproxy exit_task_namespaces() has replaced the former exit_namespace(). It invalidates task->nsproxy and associated namespaces. This is an issue for the (futur) pid namespace which is required to be valid in exit_notify(). This patch moves exit_task_namespaces() after exit_notify() to keep nsproxy valid. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:21 -07:00
Serge E. Hallyn	1651e14e28	[PATCH] namespaces: incorporate fs namespace into nsproxy This moves the mount namespace into the nsproxy. The mount namespace count now refers to the number of nsproxies point to it, rather than the number of tasks. As a result, the unshare_namespace() function in kernel/fork.c no longer checks whether it is being shared. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:20 -07:00
Serge E. Hallyn	0437eb594e	[PATCH] nsproxy: move init_nsproxy into kernel/nsproxy.c Move the init_nsproxy definition out of arch/ into kernel/nsproxy.c. This avoids all arches having to be updated. Compiles and boots on s390. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:20 -07:00
Serge E. Hallyn	ab516013ad	[PATCH] namespaces: add nsproxy This patch adds a nsproxy structure to the task struct. Later patches will move the fs namespace pointer into this structure, and introduce a new utsname namespace into the nsproxy. The vserver and openvz functionality, then, would be implemented in large part by virtualizing/isolating more and more resources into namespaces, each contained in the nsproxy. [akpm@osdl.org: build fix] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:20 -07:00
Adrian Bunk	b1ba4ddde0	[PATCH] make kernel/sysctl.c:_proc_do_string() static This patch makes the needlessly global _proc_do_string() static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:20 -07:00
Sam Vilain	f5dd3d6fad	[PATCH] proc: sysctl: add _proc_do_string helper The logic in proc_do_string is worth re-using without passing in a ctl_table structure (say, we want to calculate a pointer and pass that in instead); pass in the two fields it uses from that structure as explicit arguments. Signed-off-by: Sam Vilain <sam.vilain@catalyst.net.nz> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:20 -07:00

1 2 3 4 5 ...

1694 Commits