9411 Commits

Author SHA1 Message Date
Alexey Dobriyan
b54452b07a const: struct nla_policy
Make remaining netlink policies as const.
Fixup coding style where needed.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18 14:30:18 -08:00
Yinghai Lu
b5eb78f76d sparseirq: Use radix_tree instead of ptrs array
Use radix_tree irq_desc_tree instead of irq_desc_ptrs.

-v2: according to Eric and cyrill to use radix_tree_lookup_slot and
     radix_tree_replace_slot

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-32-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-17 17:27:20 -08:00
Yinghai Lu
99558f0bbe sparseirq: Change irq_desc_ptrs to static
Add replace_irq_desc() instead of poking at the array directly.

-v2: remove unneeded boundary check in replace_irq_desc

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-31-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-17 17:27:03 -08:00
Yinghai Lu
febcb0c59a irq: Remove unnecessary bootmem code
mem_init is moved early already.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-29-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-17 17:23:59 -08:00
Thomas Gleixner
b7e56edba4 Merge branch 'linus' into x86/mm
x86/mm is on 32-rc4 and missing the spinlock namespace changes which
are needed for further commits into this topic.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-17 18:28:05 +01:00
Heiko Carstens
f850c30c8b tracing/kprobes: Make Kconfig dependencies generic
KPROBES_EVENT actually depends on the regs and stack access API
(b1cf540f) and not on x86.
So introduce a new config option which architectures can select if
they have the API implemented and switch x86.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
LKML-Reference: <20100210162517.GB6933@osiris.boeblingen.de.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-02-17 13:13:08 +01:00
Mike Frysinger
e7b8e675d9 tracing: Unify arch_syscall_addr() implementations
Most implementations of arch_syscall_addr() are the same, so create a
default version in common code and move the one piece that differs (the
syscall table) to asm/syscall.h.  New arch ports don't have to waste
time copying & pasting this simple function.

The s390/sparc versions need to be different, so document why.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1264498803-17278-1-git-send-email-vapier@gentoo.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-02-17 13:07:21 +01:00
Thomas Gleixner
83ab0aa0d5 sched: Don't use possibly stale sched_class
setscheduler() saves task->sched_class outside of the rq->lock held
region for a check after the setscheduler changes have become
effective. That might result in checking a stale value.

rtmutex_setprio() has the same problem, though it is protected by
p->pi_lock against setscheduler(), but for correctness sake (and to
avoid bad examples) it needs to be fixed as well.

Retrieve task->sched_class inside of the rq->lock held region.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org
2010-02-17 11:58:18 +01:00
Yinghai Lu
580e0ad21d core: Move early_res from arch/x86 to kernel/
This makes the range reservation feature available to other
architectures.

-v2: add get_max_mapped, max_pfn_mapped only defined in x86...
     to fix PPC compiling
-v3: according to hpa, add CONFIG_HAVE_EARLY_RES
-v4: fix typo about EARLY_RES in config

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B7B5723.4070009@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-16 21:43:39 -08:00
Tejun Heo
43cf38eb5c percpu: add __percpu sparse annotations to core kernel subsystems
Add __percpu sparse annotations to core subsystems.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors.  This patch doesn't affect normal builds.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-mm@kvack.org
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Biederman <ebiederm@xmission.com>
2010-02-17 11:17:38 +09:00
Anton Vorontsov
5a5e0f4c70 kfifo: Don't use integer as NULL pointer
This patch fixes following sparse warnings:

include/linux/kfifo.h:127:25: warning: Using plain integer as NULL pointer
kernel/kfifo.c:83:21: warning: Using plain integer as NULL pointer

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-16 15:11:08 -08:00
Anton Vorontsov
1a02d59aba kfifo: Make kfifo_initialized work after kfifo_free
After kfifo rework it's no longer possible to reliably know if kfifo is
usable, since after kfifo_free(), kfifo_initialized() would still return
true. The correct behaviour is needed for at least FHCI USB driver.

This patch fixes the issue by resetting the kfifo to zero values (the
same approach is used in kfifo_alloc() if allocation failed).

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-16 15:11:06 -08:00
Thomas Gleixner
6e40f5bbbc Merge branch 'sched/urgent' into sched/core
Conflicts: kernel/sched.c

Necessary due to the urgent fixes which conflict with the code move
from sched.c to sched_fair.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-16 16:48:56 +01:00
Peter Zijlstra
0970d2992d sched: Fix race between ttwu() and task_rq_lock()
Thomas found that due to ttwu() changing a task's cpu without holding
the rq->lock, task_rq_lock() might end up locking the wrong rq.

Avoid this by serializing against TASK_WAKING.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1266241712.15770.420.camel@laptop>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-16 15:13:59 +01:00
Suresh Siddha
9000f05c6d sched: Fix SMT scheduler regression in find_busiest_queue()
Fix a SMT scheduler performance regression that is leading to a scenario
where SMT threads in one core are completely idle while both the SMT threads
in another core (on the same socket) are busy.

This is caused by this commit (with the problematic code highlighted)

   commit bdb94aa5dbd8b55e75f5a50b61312fe589e2c2d1
   Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
   Date:   Tue Sep 1 10:34:38 2009 +0200

   sched: Try to deal with low capacity

   @@ -4203,15 +4223,18 @@ find_busiest_queue()
   ...
	for_each_cpu(i, sched_group_cpus(group)) {
   +	unsigned long power = power_of(i);

   ...

   -	wl = weighted_cpuload(i);
   +	wl = weighted_cpuload(i) * SCHED_LOAD_SCALE;
   +	wl /= power;

   -	if (rq->nr_running == 1 && wl > imbalance)
   +	if (capacity && rq->nr_running == 1 && wl > imbalance)
		continue;

On a SMT system, power of the HT logical cpu will be 589 and
the scheduler load imbalance (for scenarios like the one mentioned above)
can be approximately 1024 (SCHED_LOAD_SCALE). The above change of scaling
the weighted load with the power will result in "wl > imbalance" and
ultimately resulting in find_busiest_queue() return NULL, causing
load_balance() to think that the load is well balanced. But infact
one of the tasks can be moved to the idle core for optimal performance.

We don't need to use the weighted load (wl) scaled by the cpu power to
compare with  imabalance. In that condition, we already know there is only a
single task "rq->nr_running == 1" and the comparison between imbalance,
wl is to make sure that we select the correct priority thread which matches
imbalance. So we really need to compare the imabalnce with the original
weighted load of the cpu and not the scaled load.

But in other conditions where we want the most hammered(busiest) cpu, we can
use scaled load to ensure that we consider the cpu power in addition to the
actual load on that cpu, so that we can move the load away from the
guy that is getting most hammered with respect to the actual capacity,
as compared with the rest of the cpu's in that busiest group.

Fix it.

Reported-by: Ma Ling <ling.ma@intel.com>
Initial-Analysis-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1266023662.2808.118.camel@sbs-t61.sc.intel.com>
Cc: stable@kernel.org [2.6.32.x]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-16 15:13:59 +01:00
Linus Torvalds
7d0bab9dfe Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hrtimer, softirq: Fix hrtimer->softirq trampoline
2010-02-15 19:52:12 -08:00
Linus Torvalds
627a9a194d Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing/kprobes: Fix probe parsing
  tracing: Fix circular dead lock in stack trace
2010-02-15 19:47:59 -08:00
Linus Torvalds
3d8b4bdef7 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf top: Fix help text alignment
  perf: Fix hypervisor sample reporting
  perf: Make bp_len type to u64 generic across the arch
2010-02-15 19:47:48 -08:00
Uwe Kleine-König
dfff0615d2 tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-15 15:38:10 +01:00
Masami Hiramatsu
8b833c506c kprobes: Add mcount to the kprobes blacklist
Since mcount function can be called from everywhere,
it should be blacklisted. Moreover, the "mcount" symbol
is a special symbol name. So, it is better to put it in
the generic blacklist.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20100205062433.3745.36726.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-15 05:45:49 +01:00
Peter Zijlstra
6f93d0a7c8 perf_events: Fix FORK events
Commit 22e19085 ("Honour event state for aux stream data")
introduced a bug where we would drop FORK events.

The thing is that we deliver FORK events to the child process'
event, which at that time will be PERF_EVENT_STATE_INACTIVE
because the child won't be scheduled in (we're in the middle of
fork).

Solve this twice, change the event state filter to exclude only
disabled (STATE_OFF) or worse, and deliver FORK events to the
current (parent).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
LKML-Reference: <1266142324.5273.411.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-14 18:10:39 +01:00
Heiko Carstens
a9bb18f36c tracing/kprobes: Fix probe parsing
Trying to add a probe like:

  echo p:myprobe 0x10000 > /sys/kernel/debug/tracing/kprobe_events

will fail since the wrong pointer is passed to strict_strtoul
when trying to convert the address to an unsigned long.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20100210162346.GA6933@osiris.boeblingen.de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-14 09:43:58 +01:00
Suresh Siddha
2225a122ae ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET
Generic support for PTRACE_GETREGSET/PTRACE_SETREGSET commands which
export the regsets supported by each architecture using the correponding
NT_* types. These NT_* types are already part of the userland ABI, used
in representing the architecture specific register sets as different NOTES
in an ELF core file.

'addr' parameter for the ptrace system call encode the REGSET type (using
the corresppnding NT_* type) and the 'data' parameter points to the
struct iovec having the user buffer and the length of that buffer.

	struct iovec iov = { buf, len};
	ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, &iov);

On successful completion, iov.len will be updated by the kernel specifying
how much the kernel has written/read to/from the user's iov.buf.

x86 extended state registers are primarily exported using this interface.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100211195614.886724710@sbs-t61.sc.intel.com>
Acked-by: Hongjiu Lu <hjl.tools@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-11 15:08:33 -08:00
Li Zefan
c7c6b1fe9f ftrace: Allow to remove a single function from function graph filter
I don't see why we can only clear all functions from the filter.

After patching:

  # echo sys_open > set_graph_function
  # echo sys_close >> set_graph_function
  # cat set_graph_function
  sys_open
  sys_close
  # echo '!sys_close' >> set_graph_function
  # cat set_graph_function
  sys_open

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4B726388.2000408@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-11 14:32:38 -05:00
Jean Delvare
5c42dc7070 devres/irq: Fix devm_irq_match comment
Fix the reference (in comment).

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-11 16:01:02 +01:00
Yinghai Lu
e9a0064ad0 x86: Change range end to start+size
So make interface more consistent with early_res.
Later we can share some code with early_res.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-10-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10 17:47:17 -08:00
Yinghai Lu
27811d8cab x86: Move range related operation to one file
We have almost the same code for mtrr cleanup and amd_bus checkup, and
this code  will also be used in replacing bootmem with early_res,
so try to move them together and reuse it from different parts.

Also rename update_range to subtract_range as that is what the
function is actually doing.

-v2: update comments as Christoph requested

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-4-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10 17:47:17 -08:00
H. Peter Anvin
84abd88a70 Merge remote branch 'linus/master' into x86/bootmem 2010-02-10 16:55:28 -08:00
Brandon Phiilps
ced5b697a7 x86: Avoid race condition in pci_enable_msix()
Keep chip_data in create_irq_nr and destroy_irq.

When two drivers are setting up MSI-X at the same time via
pci_enable_msix() there is a race.  See this dmesg excerpt:

[   85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
[   85.170611]   alloc irq_desc for 99 on node -1
[   85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
[   85.170614]   alloc kstat_irqs on node -1
[   85.170616] alloc irq_2_iommu on node -1
[   85.170617]   alloc irq_desc for 100 on node -1
[   85.170619]   alloc kstat_irqs on node -1
[   85.170621] alloc irq_2_iommu on node -1
[   85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
[   85.170626]   alloc irq_desc for 101 on node -1
[   85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
[   85.170630]   alloc kstat_irqs on node -1
[   85.170631] alloc irq_2_iommu on node -1
[   85.170635]   alloc irq_desc for 102 on node -1
[   85.170636]   alloc kstat_irqs on node -1
[   85.170639] alloc irq_2_iommu on node -1
[   85.170646] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000088

As you can see igb and ixgbe are both alternating on create_irq_nr()
via pci_enable_msix() in their probe function.

ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
NULL via dynamic_irq_init().

igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:

	cfg_new = irq_desc_ptrs[102]->chip_data;
	if (cfg_new->vector != 0)
		continue;

This hits the NULL deref.

Another possible race exists via pci_disable_msix() in a driver or in
the number of error paths that call free_msi_irqs():

destroy_irq()
dynamic_irq_cleanup() which sets desc->chip_data = NULL
...race window...
desc->chip_data = cfg;

Remove the save and restore code for cfg in create_irq_nr() and
destroy_irq() and take the desc->lock when checking the irq_cfg.

Reported-and-analyzed-by: Brandon Philips <bphilips@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-3-git-send-email-yinghai@kernel.org>
Signed-off-by: Brandon Phililps <bphilips@suse.de>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10 14:27:28 -08:00
Steven Rostedt
ede55c9d78 tracing: Add correct/incorrect to sort keys for branch annotation output
The branch annotation is a bit difficult to see the worst offenders
because it only sorts by percentage:

 correct incorrect  %        Function                  File              Line
 ------- ---------  -        --------                  ----              ----
       0      163 100 qdisc_restart                  sch_generic.c        179
       0      163 100 pfifo_fast_dequeue             sch_generic.c        447
       0        4 100 pskb_trim_rcsum                skbuff.h             1689
       0        4 100 llc_rcv                        llc_input.c          170
       0       18 100 psmouse_interrupt              psmouse-base.c       304
       0        3 100 atkbd_interrupt                atkbd.c              389
       0        5 100 usb_alloc_dev                  usb.c                437
       0       11 100 vsscanf                        vsprintf.c           1897
       0        2 100 IS_ERR                         err.h                34
       0       23 100 __rmqueue_fallback             page_alloc.c         865
       0        4 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 142
       0        3 100 move_masked_irq                migration.c          11

Adding the incorrect and correct values as sort keys makes this file a
bit more informative:

 correct incorrect  %        Function                  File              Line
 ------- ---------  -        --------                  ----              ----
       0   366541 100 audit_syscall_entry            auditsc.c            1637
       0   366538 100 audit_syscall_exit             auditsc.c            1685
       0   115839 100 sched_info_switch              sched_stats.h        269
       0    74567 100 sched_info_queued              sched_stats.h        222
       0    66578 100 sched_info_dequeued            sched_stats.h        177
       0    15113 100 trace_workqueue_insertion      workqueue.h          38
       0    15107 100 trace_workqueue_execution      workqueue.h          45
       0     3622 100 syscall_trace_leave            ptrace.c             1772
       0     2750 100 sched_move_task                sched.c              10100
       0     2750 100 sched_move_task                sched.c              10110
       0     1815 100 pre_schedule_rt                sched_rt.c           1462
       0      837 100 audit_alloc                    auditsc.c            879
       0      814 100 tcp_mss_split_point            tcp_output.c         1302

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-09 21:35:05 -05:00
Jason Wang
c93d89f3db Export the symbol of getboottime and mmonotonic_to_bootbased
Export getboottime and monotonic_to_bootbased in order to let them
could be used by following patch.

Cc: stable@kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-02-09 19:20:15 +02:00
Anton Blanchard
301ba0457f kthread, sched: Remove reference to kthread_create_on_cpu
kthread_create_on_cpu doesn't exist so update a comment in
kthread.c to reflect this.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100209040740.GB3702@kryten>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-09 11:47:39 +01:00
Anton Blanchard
b80109e256 Remove reference to kthread_create_on_cpu
kthread_create_on_cpu doesn't exist so update a comment in kthread.c to reflect
this.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-09 11:28:48 +01:00
Al Viro
cccc6bba3f Lose the first argument of audit_inode_child()
it's always equal to ->d_name.name of the second argument

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-08 14:38:36 -05:00
Anton Blanchard
fa535a77bd sched: cpuacct: Use bigger percpu counter batch values for stats counters
When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are
enabled we can call cpuacct_update_stats with values much larger
than percpu_counter_batch.  This means the call to
percpu_counter_add will always add to the global count which is
protected by a spinlock and we end up with a global spinlock in
the scheduler.

Based on an idea by KOSAKI Motohiro, this patch scales the batch
value by cputime_one_jiffy such that we have the same batch
limit as we would if CONFIG_VIRT_CPU_ACCOUNTING was disabled.
His patch did this once at boot but that initialisation happened
too early on PowerPC (before time_init) and it was never updated
at runtime as a result of a hotplug cpu add/remove.

This patch instead scales percpu_counter_batch by
cputime_one_jiffy at runtime, which keeps the batch correct even
after cpu hotplug operations.  We cap it at INT_MAX in case of
overflow.

For architectures that do not support
CONFIG_VIRT_CPU_ACCOUNTING, cputime_one_jiffy is the constant 1
and gcc is smart enough to optimise min(s32
percpu_counter_batch, INT_MAX) to just percpu_counter_batch at
least on x86 and PowerPC.  So there is no need to add an #ifdef.

On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark
is 234x faster and almost matches a CONFIG_CGROUP_CPUACCT
disabled kernel:

 CONFIG_CGROUP_CPUACCT disabled:   16906698 ctx switches/sec
 CONFIG_CGROUP_CPUACCT enabled:       61720 ctx switches/sec
 CONFIG_CGROUP_CPUACCT + patch:	   16663217 ctx switches/sec

Tested with:

 wget http://ozlabs.org/~anton/junkcode/context_switch.c
 make context_switch
 for i in `seq 0 63`; do taskset -c $i ./context_switch & done
 vmstat 1

Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-08 08:57:37 +01:00
Ingo Molnar
6d3e0907b8 Merge branch 'sched/urgent' into sched/core
Merge reason: Merge dependent fix, update to latest -rc.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-08 08:55:46 +01:00
Andrew Morton
50200df462 kernel/sched.c: Suppress unused var warning
On UP:

 kernel/sched.c: In function 'wake_up_new_task':
 kernel/sched.c:2631: warning: unused variable 'cpu'

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-08 08:53:19 +01:00
H Hartley Sweeten
6622e670b2 posix-timers.c: Don't export local functions
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-05 14:54:10 +01:00
Magnus Damm
c54a42b19f clocksource: add suspend callback
Add a clocksource suspend callback.  This callback can be used by the
clocksource driver to shutdown and perform any kind of late suspend
activities even though the clocksource driver itself is a non-sysdev
driver.

One example where this is useful is to fix the sh_cmt.c platform driver
that today suspends using the platform bus and shuts down the clocksource
too early.

With this callback in place the sh_cmt driver will suspend using the
clocksource and clockevent hooks and leave the platform device pm
callbacks unused.

Signed-off-by: Magnus Damm <damm@opensource.se>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-05 14:54:10 +01:00
Magnus Damm
17622339af clocksource: add argument to resume callback
Pass the clocksource as an argument to the clocksource resume callback. 
Needed so we can point out which CMT channel the sh_cmt.c driver shall
resume.

Signed-off-by: Magnus Damm <damm@opensource.se>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-05 14:54:10 +01:00
Daniel Mack
1537a3638c tree-wide: fix 'lenght' typo in comments and code
Some misspelled occurences of 'octet' and some comments were also fixed
as I was on it.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Jiri Kosina <trivial@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-05 12:22:45 +01:00
Baruch Siach
9ce8e498ee devres: typo fix s/dev/devm/
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-05 12:22:43 +01:00
Edward Z. Yang
350f82586b Remove redundant trailing semicolons from macros
Signed-off-by: Edward Z. Yang <ezyang@ksplice.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-05 12:22:43 +01:00
Thadeu Lima de Souza Cascardo
af66585270 fix comment typo boo -> boot in ksysfs.c
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-05 12:22:37 +01:00
Adam Buchbinder
c9404c9c39 Fix misspelling of "should" and "shouldn't" in comments.
Some comments misspell "should" or "shouldn't"; this fixes them. No code changes.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-05 12:22:30 +01:00
Masami Hiramatsu
5ecaafdbf4 kprobes: Add mcount to the kprobes blacklist
Since mcount function can be called from everywhere,
it should be blacklisted. Moreover, the "mcount" symbol
is a special symbol name. So, it is better to put it in
the generic blacklist.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20100205062433.3745.36726.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-05 08:13:57 +01:00
Linus Torvalds
aa16cd8d12 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: Handle futex value corruption gracefully
  futex: Handle user space corruption gracefully
  futex_lock_pi() key refcnt fix
  softlockup: Add sched_clock_tick() to avoid kernel warning on kgdb resume
2010-02-04 16:07:41 -08:00
Adam Buchbinder
2a61aa4016 Fix misspellings of "invocation" in comments.
Some comments misspell "invocation"; this fixes them. No code
changes.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-04 11:55:45 +01:00
Adam Buchbinder
c41b20e721 Fix misspellings of "truly" in comments.
Some comments misspell "truly"; this fixes them. No code changes.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-04 11:55:45 +01:00
Peter Zijlstra
9717e6cd3d perf_events: Optimize perf_event_task_tick()
Pretty much all of the calls do perf_disable/perf_enable cycles, pull
that out to cut back on hardware programming.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-04 09:59:49 +01:00