19572 Commits

Author SHA1 Message Date
Steven Rostedt (Red Hat)
04b74b27c2 printk/percpu: Define printk_func when printk is not defined
To avoid include hell, the per_cpu variable printk_func was declared
in percpu.h. But it is only defined if printk is defined.

As users of printk may also use the printk_func variable, it needs to
be defined even if CONFIG_PRINTK is not.

Also add a printk.h include in percpu.h just to be safe.

Link: http://lkml.kernel.org/r/20141121183215.01ba539c@canb.auug.org.au

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-21 11:19:15 -05:00
Steven Rostedt (Red Hat)
0af26492d5 tracing/trivial: Fix typos and make an int into a bool
Fix up a few typos in comments and convert an int into a bool in
update_traceon_count().

Link: http://lkml.kernel.org/r/546DD445.5080108@hitachi.com

Suggested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-20 10:05:36 -05:00
Jiri Kosina
a02001086b Merge Linus' tree to be be to apply submitted patches to newer code than
current trivial.git base
2014-11-20 14:42:02 +01:00
Frans Klaver
eff264efee kernel: trace: fix printk message
s,produciton,production

Signed-off-by: Frans Klaver <frans.klaver@xsens.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-11-20 14:29:19 +01:00
Ingo Molnar
d360b78f99 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

 - Streamline RCU's use of per-CPU variables, shifting from "cpu"
   arguments to functions to "this_"-style per-CPU variable accessors.

 - Signal-handling RCU updates.

 - Real-time updates.

 - Torture-test updates.

 - Miscellaneous fixes.

 - Documentation updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-20 08:57:58 +01:00
Steven Rostedt (Red Hat)
afdc34a3d3 printk: Add per_cpu printk func to allow printk to be diverted
Being able to divert printk to call another function besides the normal
logging is useful for such things like NMI handling. If some functions
are to be called from NMI that does printk() it is possible to lock up
the box if the nmi handler triggers when another printk is happening.

One example of this use is to perform a stack trace on all CPUs via NMI.
But if the NMI is to do the printk() it can cause the system to lock up.
By allowing the printk to be diverted to another function that can safely
record the printk output and then print it when it in a safe context
then NMIs will be safe to call these functions like show_regs().

Link: http://lkml.kernel.org/p/20140619213952.209176403@goodmis.org

Tested-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:21 -05:00
Steven Rostedt (Red Hat)
8d58e99af5 seq_buf: Move the seq_buf code to lib/
The seq_buf functions are rather useful outside of tracing. Instead
of having it be dependent on CONFIG_TRACING, move the code into lib/
and allow other users to have access to it even when tracing is not
configured.

The seq_buf utility is similar to the seq_file utility, but instead of
writing sending data back up to userland, it writes it into a buffer
defined at seq_buf_init(). This allows us to send a descriptor around
that writes printf() formatted strings into it that can be retrieved
later.

It is currently used by the tracing facility for such things like trace
events to convert its binary saved data in the ring buffer into an
ASCII human readable context to be displayed in /sys/kernel/debug/trace.

It can also be used for doing NMI prints safely from NMI context into
the seq_buf and retrieved later and dumped to printk() safely. Doing
printk() from an NMI context is dangerous because an NMI can preempt
a current printk() and deadlock on it.

Link: http://lkml.kernel.org/p/20140619213952.058255809@goodmis.org

Tested-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:20 -05:00
Steven Rostedt (Red Hat)
2448913ed2 seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF
The function bstr_printf() from lib/vsprnintf.c is only available if
CONFIG_BINARY_PRINTF is defined. This is due to the only user currently
being the tracing infrastructure, which needs to select this config
when tracing is configured. Until there is another user of the binary
printf formats, this will continue to be the case.

Since seq_buf.c is now lives in lib/ and is compiled even without
tracing, it must encompass its use of bstr_printf() which is used
by seq_buf_printf(). This too is only used by the tracing infrastructure
and is still encapsulated by the CONFIG_BINARY_PRINTF.

Link: http://lkml.kernel.org/r/20141104160222.969013383@goodmis.org

Tested-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:19 -05:00
Steven Rostedt (Red Hat)
01cb06a4c2 tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions
Add two helper functions; seq_buf_get_buf() and seq_buf_commit() that
are used by seq_buf_path(). This makes the code similar to the
seq_file: seq_path() function, and will help to be able to consolidate
the functions between seq_file and trace_seq.

Link: http://lkml.kernel.org/r/20141104160222.644881406@goodmis.org
Link: http://lkml.kernel.org/r/20141114011412.977571447@goodmis.org

Tested-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:18 -05:00
Steven Rostedt (Red Hat)
8cd709ae76 tracing: Have seq_buf use full buffer
Currently seq_buf is full when all but one byte of the buffer is
filled. Change it so that the seq_buf is full when all of the
buffer is filled.

Some of the functions would fill the buffer completely and report
everything was fine. This was inconsistent with the max of size - 1.
Changing this to be max of size makes all functions consistent.

Link: http://lkml.kernel.org/r/20141104160222.502133196@goodmis.org
Link: http://lkml.kernel.org/r/20141114011412.811957882@goodmis.org

Tested-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:17 -05:00
Steven Rostedt (Red Hat)
9b77215382 seq_buf: Add seq_buf_can_fit() helper function
Add a seq_buf_can_fit() helper function that removes the possible mistakes
of comparing the seq_buf length plus added data compared to the size of
the buffer.

Link: http://lkml.kernel.org/r/20141118164025.GL23958@pathway.suse.cz

Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:17 -05:00
Steven Rostedt (Red Hat)
820b75f63d tracing: Add paranoid size check in trace_printk_seq()
To be really paranoid about writing out of bound data in
trace_printk_seq(), add another check of len compared to size.

Link: http://lkml.kernel.org/r/20141119144004.GB2332@dhcp128.suse.cz

Suggested-by: Petr Mladek <pmladek@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:16 -05:00
Steven Rostedt (Red Hat)
5ac4837841 tracing: Use trace_seq_used() and seq_buf_used() instead of len
As the seq_buf->len will soon be +1 size when there's an overflow, we
must use trace_seq_used() or seq_buf_used() methods to get the real
length. This will prevent buffer overflow issues if just the len
of the seq_buf descriptor is used to copy memory.

Link: http://lkml.kernel.org/r/20141114121911.09ba3d38@gandalf.local.home

Reported-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:15 -05:00
Steven Rostedt (Red Hat)
74f06bb723 tracing: Clean up tracing_fill_pipe_page()
The function tracing_fill_pipe_page() logic is a little confusing with the
use of count saving the seq.len and reusing it.

Instead of subtracting a number that is calculated from the saved
value of the seq.len from seq.len, just save the seq.len at the start
and if we need to reset it, just assign it again.

When the seq_buf overflow is len == size + 1, the current logic will
break. Changing it to use a saved length for resetting back to the
original value is more robust and will work when we change the way
seq_buf sets the overflow.

Link: http://lkml.kernel.org/r/20141118161546.GJ23958@pathway.suse.cz

Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:14 -05:00
Steven Rostedt (Red Hat)
eeab98154d seq_buf: Create seq_buf_used() to find out how much was written
Add a helper function seq_buf_used() that replaces the SEQ_BUF_USED()
private macro to let callers have a method to know how much of the
seq_buf was written to.

Link: http://lkml.kernel.org/r/20141114011412.170377300@goodmis.org
Link: http://lkml.kernel.org/r/20141114011413.321654244@goodmis.org

Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:13 -05:00
Steven Rostedt (Red Hat)
dd23180aac tracing: Convert seq_buf_path() to be like seq_path()
Rewrite seq_buf_path() like it is done in seq_path() and allow
it to accept any escape character instead of just "\n".

Making seq_buf_path() like seq_path() will help prevent problems
when converting seq_file to use the seq_buf logic.

Link: http://lkml.kernel.org/r/20141104160222.048795666@goodmis.org
Link: http://lkml.kernel.org/r/20141114011412.338523371@goodmis.org

Tested-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:10 -05:00
Steven Rostedt (Red Hat)
3a161d99c4 tracing: Create seq_buf layer in trace_seq
Create a seq_buf layer that trace_seq sits on. The seq_buf will not
be limited to page size. This will allow other usages of seq_buf
instead of a hard set PAGE_SIZE one that trace_seq has.

Link: http://lkml.kernel.org/r/20141104160221.864997179@goodmis.org
Link: http://lkml.kernel.org/r/20141114011412.170377300@goodmis.org

Tested-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:09 -05:00
Markus Elfring
16a8ef2751 tracing: Deletion of an unnecessary check before iput()
The iput() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Link: http://lkml.kernel.org/r/5468F875.7080907@users.sourceforge.net

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 16:28:45 -05:00
Alexei Starovoitov
daaf427c6a bpf: fix arraymap NULL deref and missing overflow and zero size checks
- fix NULL pointer dereference:
kernel/bpf/arraymap.c:41 array_map_alloc() error: potential null dereference 'array'.  (kzalloc returns null)
kernel/bpf/arraymap.c:41 array_map_alloc() error: we previously assumed 'array' could be null (see line 40)

- integer overflow check was missing in arraymap
(hashmap checks for overflow via kmalloc_array())

- arraymap can round_up(value_size, 8) to zero. check was missing.

- hashmap was missing zero size check as well, since roundup_pow_of_two() can
truncate into zero

- found a typo in the arraymap comment and unnecessary empty line

Fix all of these issues and make both overflow checks explicit U32 in size.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-19 15:40:00 -05:00
Steven Rostedt (Red Hat)
8e2e095cbe tracing: Fix return value of ftrace_raw_output_prep()
If the trace_seq of ftrace_raw_output_prep() is full this function
returns TRACE_TYPE_PARTIAL_LINE, otherwise it returns zero.

The problem is that TRACE_TYPE_PARTIAL_LINE happens to be zero!

The thing is, the caller of ftrace_raw_output_prep() expects a
success to be zero. Change that to expect it to be
TRACE_TYPE_HANDLED.

Link: http://lkml.kernel.org/r/20141114112522.GA2988@dhcp128.suse.cz

Reminded-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:48 -05:00
Steven Rostedt (Red Hat)
dba39448ab tracing: Remove return values of most trace_seq_*() functions
The trace_seq_printf() and friends are used to store strings into a buffer
that can be passed around from function to function. If the trace_seq buffer
fills up, it will not print any more. The return values were somewhat
inconsistant and using trace_seq_has_overflowed() was a better way to know
if the write to the trace_seq buffer succeeded or not.

Now that all users have removed reading the return value of the printf()
type functions, they can safely return void and keep future users of them
from reading the inconsistent values as well.

Link: http://lkml.kernel.org/r/20141114011411.992510720@goodmis.org

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:47 -05:00
Steven Rostedt (Red Hat)
183742f08c tracing: Do not use return values of trace_seq_printf() in syscall tracing
The functions trace_seq_printf() and friends will not be returning values
soon and will be void functions. To know if they succeeded or not, the
functions trace_seq_has_overflowed() and trace_handle_return() should be
used instead.

Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:46 -05:00
Steven Rostedt (Red Hat)
8579a107a6 tracing/uprobes: Do not use return values of trace_seq_printf()
The functions trace_seq_printf() and friends will soon no longer have
return values. Using trace_seq_has_overflowed() and trace_handle_return()
should be used instead.

Link: http://lkml.kernel.org/r/20141114011411.693008134@goodmis.org
Link: http://lkml.kernel.org/r/20141115050602.333705855@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatu.pt@hitachi.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:45 -05:00
Steven Rostedt (Red Hat)
d2b0191a38 tracing/probes: Do not use return value of trace_seq_printf()
The functions trace_seq_printf() and friends will soon not have a return
value and will only be a void function. Use trace_seq_has_overflowed()
instead to know if the trace_seq operations succeeded or not.

Link: http://lkml.kernel.org/r/20141114011411.530216306@goodmis.org

Reviewed-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:44 -05:00
Steven Rostedt (Red Hat)
a72e10afab tracing: Do not check return values of trace_seq_p*() for mmio tracer
The return values for trace_seq_printf() and friends are going to be
removed and they will become void functions. The mmio tracer checked
their return and even did so incorrectly.

Some of the funtions which returned the values were never checked
themselves. Removing all the checks simplifies the code.

Use trace_seq_has_overflowed() and trace_handle_return() where
necessary instead.

Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:44 -05:00
Steven Rostedt (Red Hat)
85224da0b8 kprobes/tracing: Use trace_seq_has_overflowed() for overflow checks
Instead of checking the return value of trace_seq_printf() and friends
for overflowing of the buffer, use the trace_seq_has_overflowed() helper
function.

This cleans up the code quite a bit and also takes us a step closer to
changing the return values of trace_seq_printf() and friends to void.

Link: http://lkml.kernel.org/r/20141114011411.181812785@goodmis.org

Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:43 -05:00
Steven Rostedt (Red Hat)
9d9add34ec tracing: Have function_graph use trace_seq_has_overflowed()
Instead of doing individual checks all over the place that makes the code
very messy. Just check trace_seq_has_overflowed() at the end or in
strategic places.

This makes the code much cleaner and also helps with getting closer
to removing the return values of trace_seq_printf() and friends.

Link: http://lkml.kernel.org/r/20141114011410.987913836@goodmis.org

Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:42 -05:00
Steven Rostedt (Red Hat)
7d40f67165 tracing: Have branch tracer use trace_handle_return() helper function
The branch tracer should not be checking the trace_seq_printf() return value
as that will soon be void. There's a new trace_handle_return() helper function
that will return TRACE_TYPE_PARTIAL_LINE if the trace_seq overflowed
and TRACE_TYPE_HANDLED otherwise.

Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:41 -05:00
Steven Rostedt (Red Hat)
c0cd93aa16 ring-buffer: Remove check of trace_seq_{puts,printf}() return values
Remove checking the return value of all trace_seq_puts(). It was wrong
anyway as only the last return value mattered. But as the trace_seq_puts()
is going to be a void function in the future, we should not be checking
the return value of it anyway.

Just return !trace_seq_has_overflowed() instead.

Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:40 -05:00
Steven Rostedt (Red Hat)
f4a1d08ce6 blktrace/tracing: Use trace_seq_has_overflowed() helper function
Checking the return code of every trace_seq_printf() operation and having
to return early if it overflowed makes the code messy.

Using the new trace_seq_has_overflowed() and trace_handle_return() functions
allows us to clean up the code.

In the future, trace_seq_printf() and friends will be turning into void
functions and not returning a value. The trace_seq_has_overflowed() is to
be used instead. This cleanup allows that change to take place.

Cc: Jens Axboe <axboe@fb.com>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:39 -05:00
Steven Rostedt (Red Hat)
19a7fe2062 tracing: Add trace_seq_has_overflowed() and trace_handle_return()
Adding a trace_seq_has_overflowed() which returns true if the trace_seq
had too much written into it allows us to simplify the code.

Instead of checking the return value of every call to trace_seq_printf()
and friends, they can all be called normally, and at the end we can
return !trace_seq_has_overflowed() instead.

Several functions also return TRACE_TYPE_PARTIAL_LINE when the trace_seq
overflowed and TRACE_TYPE_HANDLED otherwise. Another helper function
was created called trace_handle_return() which takes a trace_seq and
returns these enums. Using this helper function also simplifies the
code.

This change also makes it possible to remove the return values of
trace_seq_printf() and friends. They should instead just be
void functions.

Link: http://lkml.kernel.org/r/20141114011410.365183157@goodmis.org

Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:39 -05:00
Steven Rostedt (Red Hat)
e400a40cff tracing: Fix trace_seq_bitmask() to start at current position
In trace_seq_bitmask() it calls bitmap_scnprintf() not from the current
position of the trace_seq buffer (s->buffer + s->len), but instead from
the beginning of the buffer (s->buffer).

Luckily, the only user of this "ipi_raise tracepoint" uses it as the
first parameter, and as such, the start of the temp buffer in
include/trace/ftrace.h (see __get_bitmask()).

Reported-by: Petr Mladek <pmladek@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:38 -05:00
Steven Rostedt (Red Hat)
aec0be2d6e ftrace/x86/extable: Add is_ftrace_trampoline() function
Stack traces that happen from function tracing check if the address
on the stack is a __kernel_text_address(). That is, is the address
kernel code. This calls core_kernel_text() which returns true
if the address is part of the builtin kernel code. It also calls
is_module_text_address() which returns true if the address belongs
to module code.

But what is missing is ftrace dynamically allocated trampolines.
These trampolines are allocated for individual ftrace_ops that
call the ftrace_ops callback functions directly. But if they do a
stack trace, the code checking the stack wont detect them as they
are neither core kernel code nor module address space.

Adding another field to ftrace_ops that also stores the size of
the trampoline assigned to it we can create a new function called
is_ftrace_trampoline() that returns true if the address is a
dynamically allocate ftrace trampoline. Note, it ignores trampolines
that are not dynamically allocated as they will return true with
the core_kernel_text() function.

Link: http://lkml.kernel.org/r/20141119034829.497125839@goodmis.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 15:25:26 -05:00
Al Viro
9f45f5bf30 new helper: audit_file()
... for situations when we don't have any candidate in pathnames - basically,
in descriptor-based syscalls.

[Folded the build fix for !CONFIG_AUDITSYSCALL configs from Chen Gang]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 13:01:26 -05:00
Al Viro
b583043e99 kill f_dentry uses
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 13:01:25 -05:00
Alexey Ishchuk
4eafad7feb s390/kernel: add system calls for PCI memory access
Add the new __NR_s390_pci_mmio_write and __NR_s390_pci_mmio_read
system calls to allow user space applications to access device PCI I/O
memory pages on s390x platform.

[ Martin Schwidefsky: some code beautification ]

Signed-off-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-19 09:46:43 +01:00
Steven Rostedt (Red Hat)
a9ce7c36aa tracing: Fix race of function probes counting
The function probe counting for traceon and traceoff suffered a race
condition where if the probe was executing on two or more CPUs at the
same time, it could decrement the counter by more than one when
disabling (or enabling) the tracer only once.

The way the traceon and traceoff probes are suppose to work is that
they disable (or enable) tracing once per count. If a user were to
echo 'schedule:traceoff:3' into set_ftrace_filter, then when the
schedule function was called, it would disable tracing. But the count
should only be decremented once (to 2). Then if the user enabled tracing
again (via tracing_on file), the next call to schedule would disable
tracing again and the count would be decremented to 1.

But if multiple CPUS called schedule at the same time, it is possible
that the count would be decremented more than once because of the
simple "count--" used.

By reading the count into a local variable and using memory barriers
we can guarantee that the count would only be decremented once per
disable (or enable).

The stack trace probe had a similar race, but here the stack trace will
decrement for each time it is called. But this had the read-modify-
write race, where it could stack trace more than the number of times
that was specified. This case we use a cmpxchg to stack trace only the
number of times specified.

The dump probes can still use the old "update_count()" function as
they only run once, and that is controlled by the dump logic
itself.

Link: http://lkml.kernel.org/r/20141118134643.4b550ee4@gandalf.local.home

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-18 23:06:35 -05:00
Rafael J. Wysocki
b2b49ccbdd PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected
The number of and dependencies between high-level power management
Kconfig options make life much harder than necessary.  Several
conbinations of them have to be tested and supported, even though
some of those combinations are very rarely used in practice (if
they are used in practice at all).  Moreover, the fact that we
have separate independent Kconfig options for runtime PM and
system suspend is a serious obstacle for integration between
the two frameworks.

To overcome these difficulties, always select PM_RUNTIME if PM_SLEEP
is set.  Among other things, this will allow system suspend callbacks
provided by bus types and device drivers to rely on the runtime PM
framework regardless of the kernel configuration.

Enthusiastically-acked-by: Kevin Hilman <khilman@linaro.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-18 23:41:40 +01:00
Alexei Starovoitov
7943c0f329 bpf: remove test map scaffolding and user proper types
proper types and function helpers are ready. Use them in verifier testsuite.
Remove temporary stubs

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 13:44:00 -05:00
Alexei Starovoitov
d0003ec01c bpf: allow eBPF programs to use maps
expose bpf_map_lookup_elem(), bpf_map_update_elem(), bpf_map_delete_elem()
map accessors to eBPF programs

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 13:44:00 -05:00
Alexei Starovoitov
a1854d6ac0 bpf: fix BPF_MAP_LOOKUP_ELEM command return code
fix errno of BPF_MAP_LOOKUP_ELEM command as bpf manpage
described it in commit b4fc1a460f30("Merge branch 'bpf-next'"):
-----
BPF_MAP_LOOKUP_ELEM
    int bpf_lookup_elem(int fd, void *key, void *value)
    {
        union bpf_attr attr = {
            .map_fd = fd,
            .key = ptr_to_u64(key),
            .value = ptr_to_u64(value),
        };

        return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
    }
    bpf() syscall looks up an element with given key in  a  map  fd.
    If  element  is found it returns zero and stores element's value
    into value.  If element is not found  it  returns  -1  and  sets
    errno to ENOENT.

and further down in manpage:

   ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM,  indicates  that
          element with given key was not found.
-----

In general all BPF commands return ENOENT when map element is not found
(including BPF_MAP_GET_NEXT_KEY and BPF_MAP_UPDATE_ELEM with
 flags == BPF_MAP_UPDATE_ONLY)

Subsequent patch adds a testsuite to check return values for all of
these combinations.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 13:43:59 -05:00
Alexei Starovoitov
28fbcfa08d bpf: add array type of eBPF maps
add new map type BPF_MAP_TYPE_ARRAY and its implementation

- optimized for fastest possible lookup()
  . in the future verifier/JIT may recognize lookup() with constant key
    and optimize it into constant pointer. Can optimize non-constant
    key into direct pointer arithmetic as well, since pointers and
    value_size are constant for the life of the eBPF program.
    In other words array_map_lookup_elem() may be 'inlined' by verifier/JIT
    while preserving concurrent access to this map from user space

- two main use cases for array type:
  . 'global' eBPF variables: array of 1 element with key=0 and value is a
    collection of 'global' variables which programs can use to keep the state
    between events
  . aggregation of tracing events into fixed set of buckets

- all array elements pre-allocated and zero initialized at init time

- key as an index in array and can only be 4 byte

- map_delete_elem() returns EINVAL, since elements cannot be deleted

- map_update_elem() replaces elements in an non-atomic way
  (for atomic updates hashtable type should be used instead)

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 13:43:59 -05:00
Alexei Starovoitov
0f8e4bd8a1 bpf: add hashtable type of eBPF maps
add new map type BPF_MAP_TYPE_HASH and its implementation

- maps are created/destroyed by userspace. Both userspace and eBPF programs
  can lookup/update/delete elements from the map

- eBPF programs can be called in_irq(), so use spin_lock_irqsave() mechanism
  for concurrent updates

- key/value are opaque range of bytes (aligned to 8 bytes)

- user space provides 3 configuration attributes via BPF syscall:
  key_size, value_size, max_entries

- map takes care of allocating/freeing key/value pairs

- map_update_elem() must fail to insert new element when max_entries
  limit is reached to make sure that eBPF programs cannot exhaust memory

- map_update_elem() replaces elements in an atomic way

- optimized for speed of lookup() which can be called multiple times from
  eBPF program which itself is triggered by high volume of events
  . in the future JIT compiler may recognize lookup() call and optimize it
    further, since key_size is constant for life of eBPF program

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 13:43:25 -05:00
Alexei Starovoitov
3274f52073 bpf: add 'flags' attribute to BPF_MAP_UPDATE_ELEM command
the current meaning of BPF_MAP_UPDATE_ELEM syscall command is:
either update existing map element or create a new one.
Initially the plan was to add a new command to handle the case of
'create new element if it didn't exist', but 'flags' style looks
cleaner and overall diff is much smaller (more code reused), so add 'flags'
attribute to BPF_MAP_UPDATE_ELEM command with the following meaning:
 #define BPF_ANY	0 /* create new element or update existing */
 #define BPF_NOEXIST	1 /* create new element if it didn't exist */
 #define BPF_EXIST	2 /* update existing element */

bpf_update_elem(fd, key, value, BPF_NOEXIST) call can fail with EEXIST
if element already exists.

bpf_update_elem(fd, key, value, BPF_EXIST) can fail with ENOENT
if element doesn't exist.

Userspace will call it as:
int bpf_update_elem(int fd, void *key, void *value, __u64 flags)
{
    union bpf_attr attr = {
        .map_fd = fd,
        .key = ptr_to_u64(key),
        .value = ptr_to_u64(value),
        .flags = flags;
    };

    return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
}

First two bits of 'flags' are used to encode style of bpf_update_elem() command.
Bits 2-63 are reserved for future use.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 13:43:25 -05:00
Tejun Heo
eeecbd1971 cgroup: implement cgroup_get_e_css()
Implement cgroup_get_e_css() which finds and gets the effective css
for the specified cgroup and subsystem combination.  This function
always returns a valid pinned css.  This will be used by cgroup
writeback support.

While at it, add comment to cgroup_e_css() to explain why that
function is different from cgroup_get_e_css() and has to test
cgrp->child_subsys_mask instead of cgroup_css(cgrp, ss).

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
2014-11-18 02:49:52 -05:00
Tejun Heo
56c807ba4e cgroup: add cgroup_subsys->css_e_css_changed()
Add a new cgroup_subsys operatoin ->css_e_css_changed().  This is
invoked if any of the effective csses seen from the css's cgroup may
have changed.  This will be used to implement cgroup writeback
support.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
2014-11-18 02:49:51 -05:00
Tejun Heo
7d172cc89b cgroup: add cgroup_subsys->css_released()
Add a new cgroup subsys callback css_released().  This is called when
the reference count of the css (cgroup_subsys_state) reaches zero
before RCU scheduling free.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
2014-11-18 02:49:51 -05:00
Tejun Heo
db6e305345 cgroup: fix the async css offline wait logic in cgroup_subtree_control_write()
When a subsystem is offlined, its entry on @cgrp->subsys[] is cleared
asynchronously.  If cgroup_subtree_control_write() is requested to
enable the subsystem again before the entry is cleared, it has to wait
for the previous offlining to finish and clear the @cgrp->subsys[]
entry before trying to enable the subsystem again.

This is currently done while verifying the input enable / disable
parameters.  This used to be correct but f63070d350e3 ("cgroup: make
interface files visible iff enabled on cgroup->subtree_control")
breaks it.  The commit is one of the commits implementing subsystem
dependency.

Through subsystem dependency, some subsystems may be enabled and
disabled implicitly in addition to the explicitly requested ones.  The
actual subsystems to be enabled and disabled are determined during
@css_enable/disable calculation.  The current offline wait logic skips
the ones which are already implicitly enabled and then waits for
subsystems in @enable; however, this misses the subsystems which may
be implicitly enabled through dependency from @enable.  If such
implicitly subsystem hasn't yet finished offlining yet, the function
ends up trying to create a css when its @cgrp->subsys[] slot is
already occupied triggering BUG_ON() in init_and_link_css().

Fix it by moving the wait logic after @css_enable is calculated and
waiting for all the subsystems in @css_enable.  This fixes the above
bug as the mask contains all subsystems which are to be enabled
including the ones enabled through dependencies.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: f63070d350e3 ("cgroup: make interface files visible iff enabled on cgroup->subtree_control")
Acked-by: Zefan Li <lizefan@huawei.com>
2014-11-18 02:49:51 -05:00
Tejun Heo
755bf5ee86 cgroup: restructure child_subsys_mask handling in cgroup_subtree_control_write()
Make cgroup_subtree_control_write() first calculate new
subtree_control (new_sc), child_subsys_mask (new_ss) and
css_enable/disable masks before applying them to the cgroup.  Also,
store the original subtree_control (old_sc) and child_subsys_mask
(old_ss) and use them to restore the orignal state after failure.

This patch shouldn't cause any behavior changes.  This prepares for a
fix for a bug in the async css offline wait logic.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
2014-11-18 02:49:50 -05:00
Tejun Heo
0f060deb5c cgroup: separate out cgroup_calc_child_subsys_mask() from cgroup_refresh_child_subsys_mask()
cgroup_refresh_child_subsys_mask() calculates and updates the
effective @cgrp->child_subsys_maks according to the current
@cgrp->subtree_control.  Separate out the calculation part into
cgroup_calc_child_subsys_mask().  This will be used to fix a bug in
the async css offline wait logic.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
2014-11-18 02:49:50 -05:00