18340 Commits

Author SHA1 Message Date
Thomas Gleixner
d8179bc0db genirq: Remove dynamic_irq mess
No more users. Get rid of the cruft.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140507154341.012847637@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:05:22 +02:00
Thomas Gleixner
c940e01c94 genirq: Replace dynamic_irq_init/cleanup
Create a new interface and confine it with a config switch which makes
clear that this is just legacy support and not to be used for new code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140507154340.574437049@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:05:22 +02:00
Thomas Gleixner
1d008353ba genirq: Remove irq_reserve_irq[s]
No more users. And it's not going to come back. If you need
hotplugable irq chips, use irq domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-and-acked-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140507154340.302183048@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:05:22 +02:00
Thomas Gleixner
f63b6a05f2 genirq: Replace reserve_irqs in core code
We want to get rid of the public interface.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140507154340.061990194@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:05:22 +02:00
Thomas Gleixner
7b6ef12625 genirq: Provide generic hwirq allocation facility
Not really the solution to the problem, but at least it confines the
mess in the core code and allows to get rid of the create/destroy_irq
variants from hell, i.e. 3 implementations with different semantics
plus the x86 specific variants __create_irqs and create_irq_nr
which have been invented in another circle of hell.

x86 : x86 should be converted to irq domains and I'm deliberately
      making it impossible to do the multi-vector MSI support by
      adding more crap to the current mess. It's not that hard to do
      and I'm really tired of the trainwrecks which have been invented
      by baindaid engineering so far. Any attempt to do multi-vector
      MSI or ioapic hotplug without converting to irq domains is NAKed
      hereby.

tile: Might use irq domains as well, but it has a very limited
      interrupt space, so handling it via this functionality might be
      the right thing to do even in the long run.

ia64: That's an hopeless case, as I doubt that anyone has the stomach
      to rewrite the homebrewn dynamic allocation facilities. I stared
      at it for a couple of hours and gave up. The create/destroy_irq
      mess could be made private to itanic right away if there
      wouldn't be the iommu/dmar driver being shared with x86. So to
      do that I'm going to add a separate ia64 specific implementation
      later in order not to deep-six itanic right away.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20140507154334.208629358@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:05:18 +02:00
Thomas Gleixner
67bb90fd74 Merge branches 'linus' and 'irq/urgent' into irq/core
Reason: Get the upstream and urgent fixes before applying more complex
changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:04:17 +02:00
Rafael J. Wysocki
1f0b63866f ACPI / PM: Hold ACPI scan lock over the "freeze" sleep state
The "freeze" sleep state suffers from the same issue that was
addressed by commit ad07277e82de (ACPI / PM: Hold acpi_scan_lock over
system PM transitions) for ACPI sleep states, that is, things break
if ->remove() is called for devices whose system resume callbacks
haven't been executed yet.

It also can be addressed in the same way, by holding the ACPI scan
lock over the "freeze" sleep state and PM transitions to and from
that state, but ->begin() and ->end() platform operations for the
"freeze" sleep state are needed for this purpose.

This change has been tested on Acer Aspire S5 with Thunderbolt.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-16 12:18:27 +02:00
Alexei Starovoitov
8f577cadf7 seccomp: JIT compile seccomp filter
Take advantage of internal BPF JIT

05-sim-long_jumps.c of libseccomp was used as micro-benchmark:

 seccomp_rule_add_exact(ctx,...
 seccomp_rule_add_exact(ctx,...

 rc = seccomp_load(ctx);

 for (i = 0; i < 10000000; i++)
    syscall(...);

$ sudo sysctl net.core.bpf_jit_enable=1
$ time ./bench
real	0m2.769s
user	0m1.136s
sys	0m1.624s

$ sudo sysctl net.core.bpf_jit_enable=0
$ time ./bench
real	0m5.825s
user	0m1.268s
sys	0m4.548s

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-15 16:31:30 -04:00
Steven Rostedt (Red Hat)
4449bf927b tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks
Being able to show a cpumask of events can be useful as some events
may affect only some CPUs. There is no standard way to record the
cpumask and converting it to a string is rather expensive during
the trace as traces happen in hotpaths. It would be better to record
the raw event mask and be able to parse it at print time.

The following macros were added for use with the TRACE_EVENT() macro:

  __bitmask()
  __assign_bitmask()
  __get_bitmask()

To test this, I added this to the sched_migrate_task event, which
looked like this:

TRACE_EVENT(sched_migrate_task,

	TP_PROTO(struct task_struct *p, int dest_cpu, const struct cpumask *cpus),

	TP_ARGS(p, dest_cpu, cpus),

	TP_STRUCT__entry(
		__array(	char,	comm,	TASK_COMM_LEN	)
		__field(	pid_t,	pid			)
		__field(	int,	prio			)
		__field(	int,	orig_cpu		)
		__field(	int,	dest_cpu		)
		__bitmask(	cpumask, num_possible_cpus()	)
	),

	TP_fast_assign(
		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
		__entry->pid		= p->pid;
		__entry->prio		= p->prio;
		__entry->orig_cpu	= task_cpu(p);
		__entry->dest_cpu	= dest_cpu;
		__assign_bitmask(cpumask, cpumask_bits(cpus), num_possible_cpus());
	),

	TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d cpumask=%s",
		  __entry->comm, __entry->pid, __entry->prio,
		  __entry->orig_cpu, __entry->dest_cpu,
		  __get_bitmask(cpumask))
);

With the output of:

        ksmtuned-3613  [003] d..2   485.220508: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=3 dest_cpu=2 cpumask=00000000,0000000f
     migration/1-13    [001] d..5   485.221202: sched_migrate_task: comm=ksmtuned pid=3614 prio=120 orig_cpu=1 dest_cpu=0 cpumask=00000000,0000000f
             awk-3615  [002] d.H5   485.221747: sched_migrate_task: comm=rcu_preempt pid=7 prio=120 orig_cpu=0 dest_cpu=1 cpumask=00000000,000000ff
     migration/2-18    [002] d..5   485.222062: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=2 dest_cpu=3 cpumask=00000000,0000000f

Link: http://lkml.kernel.org/r/1399377998-14870-6-git-send-email-javi.merino@arm.com
Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.home

Suggested-by: Javi Merino <javi.merino@arm.com>
Tested-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-15 11:29:37 -04:00
WANG Cong
122ff243f5 ipv4: make ip_local_reserved_ports per netns
ip_local_port_range is already per netns, so should ip_local_reserved_ports
be. And since it is none by default we don't actually need it when we don't
enable CONFIG_SYSCTL.

By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port()

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 15:31:45 -04:00
Uma Sharma
e534165bbf rcu: Variable name changed in tree_plugin.h and used in tree.c
The variable and struct both having the name "rcu_state" confuses
sparse in some situations, so this commit changes the variable to
"rcu_state_p" in order to avoid this confusion.  This also makes
things easier for human readers.

Signed-off-by: Uma Sharma <uma.sharma523@gmail.com>
[ paulmck: Changed the declaration and several additional uses. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-05-14 11:41:04 -07:00
Paul E. McKenney
f5d2a0450d Merge branches 'doc.2014.04.29a', 'fixes.2014.04.29a' and 'torture.2014.05.14a' into HEAD
doc.2014.04.29a:  Documentation updates.
fixes.2014.04.29a:  Miscellaneous fixes.
torture.2014.05.14a:  RCU/Lock torture tests.
2014-05-14 10:57:31 -07:00
Pranith Kumar
2b3f8ffe46 torture: Remove __init from torture_init_begin/end
Loading rcutorture as a module (as opposed to building it directly into
the kernel) results in the following splat:

[Wed Apr 16 15:29:33 2014] BUG: unable to handle kernel paging request at ffffffffa0003000
[Wed Apr 16 15:29:33 2014] IP: [<ffffffffa0003000>] 0xffffffffa0003000
[Wed Apr 16 15:29:33 2014] PGD 1c0f067 PUD 1c10063 PMD 378a6067 PTE 0
[Wed Apr 16 15:29:33 2014] Oops: 0010 [#1] SMP
[Wed Apr 16 15:29:33 2014] Modules linked in: rcutorture(+) torture
[Wed Apr 16 15:29:33 2014] CPU: 0 PID: 4257 Comm: modprobe Not tainted 3.15.0-rc1 #10
[Wed Apr 16 15:29:33 2014] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
[Wed Apr 16 15:29:33 2014] task: ffff8800db1e88d0 ti: ffff8800db25c000 task.ti: ffff8800db25c000
[Wed Apr 16 15:29:33 2014] RIP: 0010:[<ffffffffa0003000>]  [<ffffffffa0003000>] 0xffffffffa0003000
[Wed Apr 16 15:29:33 2014] RSP: 0018:ffff8800db25dca0  EFLAGS: 00010282
[Wed Apr 16 15:29:33 2014] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[Wed Apr 16 15:29:33 2014] RDX: ffffffffa00090a8 RSI: 0000000000000001 RDI: ffffffffa0008337
[Wed Apr 16 15:29:33 2014] RBP: ffff8800db25dd50 R08: 0000000000000000 R09: 0000000000000000
[Wed Apr 16 15:29:33 2014] R10: ffffea000357b680 R11: ffffffff8113257a R12: ffffffffa000d000
[Wed Apr 16 15:29:33 2014] R13: ffffffffa00094c0 R14: ffffffffa0009510 R15: 0000000000000001
[Wed Apr 16 15:29:33 2014] FS:  00007fee30ce5700(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
[Wed Apr 16 15:29:33 2014] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[Wed Apr 16 15:29:33 2014] CR2: ffffffffa0003000 CR3: 00000000d5eb1000 CR4: 00000000000006f0
[Wed Apr 16 15:29:33 2014] Stack:
[Wed Apr 16 15:29:33 2014]  ffffffffa000d02c 0000000000000000 ffff88021700d400 0000000000000000
[Wed Apr 16 15:29:33 2014]  ffff8800db25dd40 ffffffff81647951 ffff8802162bd000 ffff88021541846c
[Wed Apr 16 15:29:33 2014]  0000000000000000 ffffffff817dbe2d ffffffff817dbe2d 0000000000000001
[Wed Apr 16 15:29:33 2014] Call Trace:
[Wed Apr 16 15:29:33 2014]  [<ffffffffa000d02c>] ? rcu_torture_init+0x2c/0x8b4 [rcutorture]
[Wed Apr 16 15:29:33 2014]  [<ffffffff81647951>] ? netlink_broadcast_filtered+0x121/0x3a0
[Wed Apr 16 15:29:33 2014]  [<ffffffff817dbe2d>] ? mutex_lock+0xd/0x2a
[Wed Apr 16 15:29:33 2014]  [<ffffffff817dbe2d>] ? mutex_lock+0xd/0x2a
[Wed Apr 16 15:29:33 2014]  [<ffffffff810e7022>] ? trace_module_notify+0x62/0x1d0
[Wed Apr 16 15:29:33 2014]  [<ffffffffa000d000>] ? 0xffffffffa000cfff
[Wed Apr 16 15:29:33 2014]  [<ffffffff8100034a>] do_one_initcall+0xfa/0x140
[Wed Apr 16 15:29:33 2014]  [<ffffffff8106b4ce>] ? __blocking_notifier_call_chain+0x5e/0x80
[Wed Apr 16 15:29:33 2014]  [<ffffffff810b3481>] load_module+0x1931/0x21b0
[Wed Apr 16 15:29:33 2014]  [<ffffffff810b0330>] ? show_initstate+0x50/0x50
[Wed Apr 16 15:29:33 2014]  [<ffffffff810b3d9e>] SyS_init_module+0x9e/0xc0
[Wed Apr 16 15:29:33 2014]  [<ffffffff817e4c22>] system_call_fastpath+0x16/0x1b
[Wed Apr 16 15:29:33 2014] Code:  Bad RIP value.
[Wed Apr 16 15:29:33 2014] RIP  [<ffffffffa0003000>] 0xffffffffa0003000
[Wed Apr 16 15:29:33 2014]  RSP <ffff8800db25dca0>
[Wed Apr 16 15:29:33 2014] CR2: ffffffffa0003000
[Wed Apr 16 15:29:33 2014] ---[ end trace 3e88c173037af84b ]---

This splat is due to the fact that torture_init_begin() and
torture_init_end() are both marked with __init, despite their use
at runtime.  This commit therefore removes __init from both functions.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:46:30 -07:00
Paul E. McKenney
5228084eed torture: Check for multiple concurrent torture tests
The torture tests are designed to run in isolation, but do not enforce
this isolation.  This commit therefore checks for concurrent torture
tests, and refuses to start new tests while old tests are running.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:46:29 -07:00
Paul E. McKenney
d065eacfdb locktorture: Remove reference to nonexistent Kconfig parameter
The locktorture module references CONFIG_LOCK_TORTURE_TEST_RUNNABLE,
which does not exist.  Which is a good thing, because otherwise
randconfig testing could enable both rcutorture and locktorture
concurrently, which the torture tests are not set up for.  This
commit therefore removes the reference, so that test is runnable
immediately only when inserted as a module.

Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:46:28 -07:00
Paul E. McKenney
48d684fdad rcutorture: Run rcu_torture_writer at normal priority
There are usually lots of readers and only one writer, so if there has
to be a choice, we would want rcu_torture_writer to win.  This commit
therefore removes the set_user_nice() from rcu_torture_writer().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:46:26 -07:00
Thomas Gleixner
424c1b6820 rcutorture: Add missing destroy_timer_on_stack()
The rcu_torture_reader() function uses an on-stack timer_list structure
which it initializes with setup_timer_on_stack().  However, it fails to
use destroy_timer_on_stack() before exiting, which results in leaking a
tracking object if DEBUG_OBJECTS is enabled.  This commit therefore
invokes destroy_timer_on_stack() to avoid this leakage.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-05-14 09:46:24 -07:00
Paul E. McKenney
f0bf8fab4f rcutorture: Explicitly test synchronous grace-period primitives
The original rcu_torture_writer() avoided testing the synchronous
grace-period primitives because they were simply wrappers around
call_rcu() invocations.  The testing of these synchronous primitives
was delegated to the fake writers.  However, there really is no excuse
not to test them, especially in the case of SRCU, where the wrappering
is somewhat more elaborate.  This commit therefore makes the default
rcutorture parameters cause rcu_torture_writer() to include synchronous
grace-period primitives in its testing.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:46:22 -07:00
Paul E. McKenney
a48f3fad4f rcutorture: Add tests for get_state_synchronize_rcu()
This commit adds rcutorture testing for get_state_synchronize_rcu()
and cond_synchronize_rcu().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:46:21 -07:00
Paul E. McKenney
d0d0606e2c rcutorture: Check for rcu_torture_fqs creation errors
The return value from torture_create_kthread() is currently ignored
when creating the rcu_torture_fqs kthread.  This commit therefore
captures the return value so that it can be tested for errors.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:46:17 -07:00
Iulia Manda
5ed63b199c torture: Notice if an all-zero cpumask is passed inside a critical section
In torture_shuffle_tasks function, the check if an all-zero mask can
be passed to set_cpus_allowed_ptr() is redundant after clearing the
shuffle_idle_cpu bit. If the mask had more than one bit set, after
clearing a bit it has at least one bit set. If the mask had only
one bit set, a check is made at the beginning, where the function
returns, as there is no need to shuffle only one cpu.

Also, this code is executed inside a critical section, delimited by
get_online_cpus(), and put_online_cpus(), preventing CPUs from leaving between
the check of num_online_cpus and the calls to set_cpus_allowed_ptr() function.

Signed-off-by: Iulia Manda <iulia.manda21@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:46:14 -07:00
Paul E. McKenney
64e4b43ae0 rcutorture: Make rcu_torture_reader() use cond_resched()
The rcu_torture_reader() function currently uses schedule().  This commit
therefore speeds things up a bit by substituting cond_resched().
This change makes rcu_torture_reader() more CPU-bound, so this commit
also adjusts the number of readers (the "nreaders" module parameter,
which feeds into the "nrealreaders" variable) to allow one CPU to be
free of readers on SMP systems.  The point of this is to increase the
probability that readers will be watching while an updater makes a change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:46:13 -07:00
Paul E. McKenney
ac1bea8578 sched,rcu: Make cond_resched() report RCU quiescent states
Given a CPU running a loop containing cond_resched(), with no
other tasks runnable on that CPU, RCU will eventually report RCU
CPU stall warnings due to lack of quiescent states.  Fortunately,
every call to cond_resched() is a perfectly good quiescent state.
Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight
for cond_resched(), especially given the need to disable preemption,
and, for RCU-preempt, interrupts as well.

This commit therefore maintains a per-CPU counter that causes
cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call
rcu_note_context_switch(), but only about once per 256 invocations.
This ratio was chosen in keeping with the relative time constants of
RCU grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:46:11 -07:00
Paul E. McKenney
afea227fd4 rcutorture: Export RCU grace-period kthread wait state to rcutorture
This commit allows rcutorture to print additional state for the
RCU grace-period kthreads in cases where RCU seems reluctant to
start a new grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:46:09 -07:00
Paul E. McKenney
945fa9c631 torture: Dump ftrace buffer when the RCU grace period stalls
This commit adds a call to rcutorture_trace_dump() to dump the ftrace
buffer when the RCU grace period stalls in order to help debug the
stall.  Note that this is different than the RCU CPU stall warning,
as it is rcutorture detecting the stall rather than the underlying RCU
implementation.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:46:07 -07:00
Paul E. McKenney
ab7d45053f torture: Increase stutter-end intensity
Currently, all stuttered kthreads block a jiffy at a time, which can
result in them starting at different times.  (Note: This is not an
energy-efficiency problem unless you run torture tests in production,
in which case you have other problems!)  This commit increases the
intensity of the restart event by causing kthreads to spin through the
last jiffy, restarting when they see the variable change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:46:02 -07:00
Paul E. McKenney
0d6821d5f7 torture: Include "Stopping" string to torture_kthread_stopping()
Currently, torture_kthread_stopping() prints only the name of the
kthread that is stopping, which can be unedifying.  This commit therefore
adds "Stopping" to make things more evident.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-05-14 09:45:59 -07:00
Paul E. McKenney
589a8f5950 rcutorture: Print negatives for SRCU counter wraparound
The srcu_torture_stats() function prints SRCU's per-CPU c[] array with
an unsigned format, which means that the number one less than zero is
a very large number.  This commit therefore prints this array with a
signed format in order to improve readability of the rcutorture output.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:45:58 -07:00
Rashika Kheria
b3b8a4d42b rcutorture: Mark function as static in kernel/rcu/torture.c
Mark functions as static in kernel/rcu/torture.c because they are not
used outside this file.

This eliminates the following warning in kernel/rcu/torture.c:
kernel/rcu/torture.c:902:6: warning: no previous prototype for ‘rcutorture_trace_dump’ [-Wmissing-prototypes]
kernel/rcu/torture.c:1572:6: warning: no previous prototype for ‘rcu_torture_barrier_cbf’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-05-14 09:45:53 -07:00
Steven Rostedt (Red Hat)
f1b2f2bd58 ftrace: Remove FTRACE_UPDATE_MODIFY_CALL_REGS flag
As the decision to what needs to be done (converting a call to the
ftrace_caller to ftrace_caller_regs or to convert from ftrace_caller_regs
to ftrace_caller) can easily be determined from the rec->flags of
FTRACE_FL_REGS and FTRACE_FL_REGS_EN, there's no need to have the
ftrace_check_record() return either a UPDATE_MODIFY_CALL_REGS or a
UPDATE_MODIFY_CALL. Just he latter is enough. This added flag causes
more complexity than is required. Remove it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:30 -04:00
Steven Rostedt (Red Hat)
7c0868e03b ftrace: Use the ftrace_addr helper functions to find the ftrace_addr
With the moving of the functions that determine what the mcount call site
should be replaced with into the generic code, there is a few places
in the generic code that can use them instead of hard coding it as it
does.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:29 -04:00
Steven Rostedt (Red Hat)
7413af1fb7 ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global
Move and rename get_ftrace_addr() and get_ftrace_addr_old() to
ftrace_get_addr_new() and ftrace_get_addr_curr() respectively.

This moves these two helper functions in the generic code out from
the arch specific code, and renames them to have a better generic
name. This will allow other archs to use them as well as makes it
a bit easier to work on getting separate trampolines for different
functions.

ftrace_get_addr_new() returns the trampoline address that the mcount
call address will be converted to.

ftrace_get_addr_curr() returns the trampoline address of what the
mcount call address currently jumps to.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:29 -04:00
Steven Rostedt (Red Hat)
68f40969f0 ftrace: Always inline ftrace_hash_empty() helper function
The ftrace_hash_empty() function is a simple test:

	return !hash || !hash->count;

But gcc seems to want to make it a call. As this is in an extreme
hot path of the function tracer, there's no reason it needs to be
a call. I only wrote it to be a helper function anyway, otherwise
it would have been inlined manually.

Force gcc to inline it, as it could have also been a macro.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:28 -04:00
Steven Rostedt (Red Hat)
19eab4a472 ftrace: Write in missing comment from a very old commit
Back in 2011 Commit ed926f9b35cda "ftrace: Use counters to enable
functions to trace" changed the way ftrace accounts for enabled
and disabled traced functions. There was a comment started as:

	/*
	 *
	 */

But never finished. Well, that's rather useless. I probably forgot
to save the file before committing it. And it passed review from all
this time.

Anyway, better late than never. I updated the comment to express what
is happening in that somewhat complex code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:27 -04:00
Steven Rostedt (Red Hat)
66209a5bd4 ftrace: Remove boolean of hash_enable and hash_disable
Commit 4104d326b670 "ftrace: Remove global function list and call
function directly" cleaned up the global_ops filtering and made
the code simpler, but it left a variable "hash_enable" that was used
to know if the hash functions should be updated or not. It was
updated if the global_ops did not override them. As the global_ops
are now no different than any other ftrace_ops, the hash always
gets updated and there's no reason to use the hash_enable boolean.

The same goes for hash_disable used in ftrace_shutdown().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:25 -04:00
Tejun Heo
9d755d33f0 cgroup: use cgroup->self.refcnt for cgroup refcnting
Currently cgroup implements refcnting separately using atomic_t
cgroup->refcnt.  The destruction paths of cgroup and css are rather
complex and bear a lot of similiarities including the use of RCU and
bouncing to a work item.

This patch makes cgroup use the refcnt of self css for refcnting
instead of using its own.  This makes cgroup refcnting use css's
percpu refcnt and share the destruction mechanism.

* css_release_work_fn() and css_free_work_fn() are updated to handle
  both csses and cgroups.  This is a bit messy but should do until we
  can make cgroup->self a full css, which currently can't be done
  thanks to multiple hierarchies.

* cgroup_destroy_locked() now performs
  percpu_ref_kill(&cgrp->self.refcnt) instead of cgroup_put(cgrp).

* Negative refcnt sanity check in cgroup_get() is no longer necessary
  as percpu_ref already handles it.

* Similarly, as a cgroup which hasn't been killed will never be
  released regardless of its refcnt value and percpu_ref has sanity
  check on kill, cgroup_is_dead() sanity check in cgroup_put() is no
  longer necessary.

* As whether a refcnt reached zero or not can only be decided after
  the reference count is killed, cgroup_root->cgrp's refcnting can no
  longer be used to decide whether to kill the root or not.  Let's
  make cgroup_kill_sb() explicitly initiate destruction if the root
  doesn't have any children.  This makes sense anyway as unmounted
  cgroup hierarchy without any children should be destroyed.

While this is a bit messy, this will allow pushing more bookkeeping
towards cgroup->self and thus handling cgroups and csses in more
uniform way.  In the very long term, it should be possible to
introduce a base subsystem and convert the self css to a proper one
making things whole lot simpler and unified.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-14 09:15:02 -04:00
Tejun Heo
9395a45004 cgroup: enable refcnting for root csses
Currently, css_get(), css_tryget() and css_tryget_online() are noops
for root csses as an optimization; however, we're planning to use css
refcnts to track of cgroup lifetime too and root cgroups also need to
be reference counted.  Since css has been converted to percpu_refcnt,
the overhead of refcnting is miniscule and this optimization isn't too
meaningful anymore.  Furthermore, controllers which optimize the root
cgroup often never even invoke these functions in their hot paths.

This patch enables refcnting for root csses too.  This makes CSS_ROOT
flag unused and removes it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-14 09:15:02 -04:00
Tejun Heo
25e15d8350 cgroup: bounce css release through css->destroy_work
css release is planned to do more and would require process context.
Bounce it through css->destroy_work.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-14 09:15:02 -04:00
Tejun Heo
249f3468a2 cgroup: remove cgroup_destory_css_killed()
cgroup_destroy_css_killed() is cgroup destruction stage which happens
after all csses are offlined.  After the recent updates, it no longer
does anything other than putting the base reference.  This patch
removes the function and makes cgroup_destroy_locked() put the base
ref at the end isntead.

This also makes cgroup->nr_css unnecessary.  Removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-14 09:15:01 -04:00
Tejun Heo
4e4e284723 cgroup: move cgroup->sibling unlinking to cgroup_put()
Move cgroup->sibling unlinking from cgroup_destroy_css_killed() to
cgroup_put().  This is later but still before the RCU grace period, so
it doesn't break css_next_child() although there now is a larger
window in which a dead cgroup is visible during css iteration.  As css
iteration always could have included offline csses, this doesn't
affect correctness; however, it does make css_next_child() fall back
to reiterting mode more often.  This also makes cgroup_put() directly
take cgroup_mutex, which limits where it can be called from.  These
are not immediately problematic and will be dealt with later.

This change enables simplification of cgroup destruction path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-14 09:15:01 -04:00
Tejun Heo
9e4173e1f2 cgroup: move check_for_release(parent) call to the end of cgroup_destroy_locked()
Currently, check_for_release() on the parent of a destroyed cgroup is
invoked from cgroup_destroy_css_killed().  This is because this is
where the destroyed cgroup can be removed from the parent's children
list.  check_for_release() tests the emptiness of the list directly,
so invoking it before removing the cgroup from the list makes it think
that the parent still has children even when it no longer does.

This patch updates check_for_release() to use
cgroup_has_live_children() instead of directly testing ->children
emptiness and moves check_for_release(parent) earlier to the end of
cgroup_destroy_locked().  As cgroup_has_live_children() ignores
cgroups marked DEAD, check_for_release() functions correctly as long
as it's called after asserting DEAD.

This makes release notification slightly more timely and more
importantly enables further simplification of cgroup destruction path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-14 09:15:01 -04:00
Tejun Heo
cbc125efad cgroup: separate out cgroup_has_live_children() from cgroup_destroy_locked()
We're expecting another user.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-14 09:15:01 -04:00
Tejun Heo
9d800df12d cgroup: rename cgroup->dummy_css to ->self and move it to the top
cgroup->dummy_css is used as the placeholder css when performing css
oriended operations on the cgroup.  We're gonna shift more cgroup
management to this css.  Let's rename it to ->self and move it to the
top.

This is pure rename and field relocation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-14 09:15:00 -04:00
Tejun Heo
a015edd26e cgroup: use restart_syscall() for mount retries
cgroup_mount() uses dumb delay-and-retry logic to wait for cgroup_root
which is being destroyed.  The retry currently loops inside
cgroup_mount() proper.  This patch makes it return with
restart_syscall() instead so that retry travels out to userland
boundary.

This slightly simplifies the logic and more importantly makes the
retry logic behave better when the wait for some reason becomes
lengthy or infinite by allowing the operation to be suspended or
terminated from userland.

v2: The original patch forgot to free memory allocated for @opts.
    Fixed.  Caught by Li Zefan.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-14 09:15:00 -04:00
Oleg Nesterov
b02ef20a9f uprobes/x86: Fix the wrong ->si_addr when xol triggers a trap
If the probed insn triggers a trap, ->si_addr = regs->ip is technically
correct, but this is not what the signal handler wants; we need to pass
the address of the probed insn, not the address of xol slot.

Add the new arch-agnostic helper, uprobe_get_trap_addr(), and change
fill_trap_info() and math_error() to use it. !CONFIG_UPROBES case in
uprobes.h uses a macro to avoid include hell and ensure that it can be
compiled even if an architecture doesn't define instruction_pointer().

Test-case:

	#include <signal.h>
	#include <stdio.h>
	#include <unistd.h>

	extern void probe_div(void);

	void sigh(int sig, siginfo_t *info, void *c)
	{
		int passed = (info->si_addr == probe_div);
		printf(passed ? "PASS\n" : "FAIL\n");
		_exit(!passed);
	}

	int main(void)
	{
		struct sigaction sa = {
			.sa_sigaction	= sigh,
			.sa_flags	= SA_SIGINFO,
		};

		sigaction(SIGFPE, &sa, NULL);

		asm (
			"xor %ecx,%ecx\n"
			".globl probe_div; probe_div:\n"
			"idiv %ecx\n"
		);

		return 0;
	}

it fails if probe_div() is probed.

Note: show_unhandled_signals users should probably use this helper too,
but we need to cleanup them first.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
2014-05-14 13:57:28 +02:00
Oleg Nesterov
29dedee0e6 uprobes: Add mem_cgroup_charge_anon() into uprobe_write_opcode()
Hugh says:

    The one I noticed was that it forgets all about memcg (because
    it was copied from KSM, and there the replacement page has already
    been charged to a memcg). See how mm/memory.c do_anonymous_page()
    does a mem_cgroup_charge_anon().

Hopefully not a big problem, uprobes is a system-wide thing and only
root can insert the probes. But I agree, should be fixed anyway.

Add mem_cgroup_{un,}charge_anon() into uprobe_write_opcode(). To simplify
the error handling (and avoid the new "uncharge" label) the patch also
moves anon_vma_prepare() up before we alloc/charge the new page.

While at it fix the comment about ->mmap_sem, it is held for write.

Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14 13:57:24 +02:00
Rusty Russell
4982223e51 module: set nx before marking module MODULE_STATE_COMING.
We currently set RO & NX on modules very late: after we move them from
MODULE_STATE_UNFORMED to MODULE_STATE_COMING, and after we call
parse_args() (which can exec code in the module).

Much better is to do it in complete_formation() and then call
the notifier.

This means that the notifiers will be called on a module which
is already RO & NX, so that may cause problems (ftrace already
changed so they're unaffected).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-05-14 10:55:47 +09:30
Paul E. McKenney
da601c63fd torture: Intensify locking test
The current lock_torture_writer() spends too much time sleeping and not
enough time hammering locks, as in an eight-CPU test will often only be
utilizing a CPU or two.  This commit therefore makes lock_torture_writer()
sleep less and hammer more.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-05-13 17:03:01 -07:00
Paul E. McKenney
ad0dc7f94d rcutorture: Add forward-progress checking for writer
The rcutorture output currently does not distinguish between stalls in
the RCU implementation and stalls in the rcu_torture_writer() kthreads.
This commit therefore adds some diagnostics to help distinguish between
these two conditions, at least for the non-SRCU implementations.  (SRCU
does not provide evidence of update-side forward progress by design.)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-05-13 11:18:18 -07:00
Tejun Heo
8353da1f91 cgroup: remove cgroup_tree_mutex
cgroup_tree_mutex was introduced to work around the circular
dependency between cgroup_mutex and kernfs active protection - some
kernfs file and directory operations needed cgroup_mutex putting
cgroup_mutex under active protection but cgroup also needs to be able
to access cgroup hierarchies and cftypes to determine which
kernfs_nodes need to be removed.  cgroup_tree_mutex nested above both
cgroup_mutex and kernfs active protection and used to protect the
hierarchy and cftypes.  While this worked, it added a lot of double
lockings and was generally cumbersome.

kernfs provides a mechanism to opt out of active protection and cgroup
was already using it for removal and subtree_control.  There's no
reason to mix both methods of avoiding circular locking dependency and
the preceding cgroup_kn_lock_live() changes applied it to all relevant
cgroup kernfs operations making it unnecessary to nest cgroup_mutex
under kernfs active protection.  The previous patch reversed the
original lock ordering and put cgroup_mutex above kernfs active
protection.

After these changes, all cgroup_tree_mutex usages are now accompanied
by cgroup_mutex making the former completely redundant.  This patch
removes cgroup_tree_mutex and all its usages.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-13 12:19:23 -04:00