linux

iv/linux

Author	SHA1	Message	Date
Steven Rostedt (Red Hat)	04b74b27c2	printk/percpu: Define printk_func when printk is not defined To avoid include hell, the per_cpu variable printk_func was declared in percpu.h. But it is only defined if printk is defined. As users of printk may also use the printk_func variable, it needs to be defined even if CONFIG_PRINTK is not. Also add a printk.h include in percpu.h just to be safe. Link: http://lkml.kernel.org/r/20141121183215.01ba539c@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-21 11:19:15 -05:00
Steven Rostedt (Red Hat)	0af26492d5	tracing/trivial: Fix typos and make an int into a bool Fix up a few typos in comments and convert an int into a bool in update_traceon_count(). Link: http://lkml.kernel.org/r/546DD445.5080108@hitachi.com Suggested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-20 10:05:36 -05:00
Jiri Kosina	a02001086b	Merge Linus' tree to be be to apply submitted patches to newer code than current trivial.git base	2014-11-20 14:42:02 +01:00
Frans Klaver	eff264efee	kernel: trace: fix printk message s,produciton,production Signed-off-by: Frans Klaver <frans.klaver@xsens.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-11-20 14:29:19 +01:00
Ingo Molnar	d360b78f99	Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Streamline RCU's use of per-CPU variables, shifting from "cpu" arguments to functions to "this_"-style per-CPU variable accessors. - Signal-handling RCU updates. - Real-time updates. - Torture-test updates. - Miscellaneous fixes. - Documentation updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-20 08:57:58 +01:00
Steven Rostedt (Red Hat)	afdc34a3d3	printk: Add per_cpu printk func to allow printk to be diverted Being able to divert printk to call another function besides the normal logging is useful for such things like NMI handling. If some functions are to be called from NMI that does printk() it is possible to lock up the box if the nmi handler triggers when another printk is happening. One example of this use is to perform a stack trace on all CPUs via NMI. But if the NMI is to do the printk() it can cause the system to lock up. By allowing the printk to be diverted to another function that can safely record the printk output and then print it when it in a safe context then NMIs will be safe to call these functions like show_regs(). Link: http://lkml.kernel.org/p/20140619213952.209176403@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:21 -05:00
Steven Rostedt (Red Hat)	8d58e99af5	seq_buf: Move the seq_buf code to lib/ The seq_buf functions are rather useful outside of tracing. Instead of having it be dependent on CONFIG_TRACING, move the code into lib/ and allow other users to have access to it even when tracing is not configured. The seq_buf utility is similar to the seq_file utility, but instead of writing sending data back up to userland, it writes it into a buffer defined at seq_buf_init(). This allows us to send a descriptor around that writes printf() formatted strings into it that can be retrieved later. It is currently used by the tracing facility for such things like trace events to convert its binary saved data in the ring buffer into an ASCII human readable context to be displayed in /sys/kernel/debug/trace. It can also be used for doing NMI prints safely from NMI context into the seq_buf and retrieved later and dumped to printk() safely. Doing printk() from an NMI context is dangerous because an NMI can preempt a current printk() and deadlock on it. Link: http://lkml.kernel.org/p/20140619213952.058255809@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:20 -05:00
Steven Rostedt (Red Hat)	2448913ed2	seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF The function bstr_printf() from lib/vsprnintf.c is only available if CONFIG_BINARY_PRINTF is defined. This is due to the only user currently being the tracing infrastructure, which needs to select this config when tracing is configured. Until there is another user of the binary printf formats, this will continue to be the case. Since seq_buf.c is now lives in lib/ and is compiled even without tracing, it must encompass its use of bstr_printf() which is used by seq_buf_printf(). This too is only used by the tracing infrastructure and is still encapsulated by the CONFIG_BINARY_PRINTF. Link: http://lkml.kernel.org/r/20141104160222.969013383@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:19 -05:00
Steven Rostedt (Red Hat)	01cb06a4c2	tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions Add two helper functions; seq_buf_get_buf() and seq_buf_commit() that are used by seq_buf_path(). This makes the code similar to the seq_file: seq_path() function, and will help to be able to consolidate the functions between seq_file and trace_seq. Link: http://lkml.kernel.org/r/20141104160222.644881406@goodmis.org Link: http://lkml.kernel.org/r/20141114011412.977571447@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:18 -05:00
Steven Rostedt (Red Hat)	8cd709ae76	tracing: Have seq_buf use full buffer Currently seq_buf is full when all but one byte of the buffer is filled. Change it so that the seq_buf is full when all of the buffer is filled. Some of the functions would fill the buffer completely and report everything was fine. This was inconsistent with the max of size - 1. Changing this to be max of size makes all functions consistent. Link: http://lkml.kernel.org/r/20141104160222.502133196@goodmis.org Link: http://lkml.kernel.org/r/20141114011412.811957882@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:17 -05:00
Steven Rostedt (Red Hat)	9b77215382	seq_buf: Add seq_buf_can_fit() helper function Add a seq_buf_can_fit() helper function that removes the possible mistakes of comparing the seq_buf length plus added data compared to the size of the buffer. Link: http://lkml.kernel.org/r/20141118164025.GL23958@pathway.suse.cz Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:17 -05:00
Steven Rostedt (Red Hat)	820b75f63d	tracing: Add paranoid size check in trace_printk_seq() To be really paranoid about writing out of bound data in trace_printk_seq(), add another check of len compared to size. Link: http://lkml.kernel.org/r/20141119144004.GB2332@dhcp128.suse.cz Suggested-by: Petr Mladek <pmladek@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:16 -05:00
Steven Rostedt (Red Hat)	5ac4837841	tracing: Use trace_seq_used() and seq_buf_used() instead of len As the seq_buf->len will soon be +1 size when there's an overflow, we must use trace_seq_used() or seq_buf_used() methods to get the real length. This will prevent buffer overflow issues if just the len of the seq_buf descriptor is used to copy memory. Link: http://lkml.kernel.org/r/20141114121911.09ba3d38@gandalf.local.home Reported-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:15 -05:00
Steven Rostedt (Red Hat)	74f06bb723	tracing: Clean up tracing_fill_pipe_page() The function tracing_fill_pipe_page() logic is a little confusing with the use of count saving the seq.len and reusing it. Instead of subtracting a number that is calculated from the saved value of the seq.len from seq.len, just save the seq.len at the start and if we need to reset it, just assign it again. When the seq_buf overflow is len == size + 1, the current logic will break. Changing it to use a saved length for resetting back to the original value is more robust and will work when we change the way seq_buf sets the overflow. Link: http://lkml.kernel.org/r/20141118161546.GJ23958@pathway.suse.cz Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:14 -05:00
Steven Rostedt (Red Hat)	eeab98154d	seq_buf: Create seq_buf_used() to find out how much was written Add a helper function seq_buf_used() that replaces the SEQ_BUF_USED() private macro to let callers have a method to know how much of the seq_buf was written to. Link: http://lkml.kernel.org/r/20141114011412.170377300@goodmis.org Link: http://lkml.kernel.org/r/20141114011413.321654244@goodmis.org Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:13 -05:00
Steven Rostedt (Red Hat)	dd23180aac	tracing: Convert seq_buf_path() to be like seq_path() Rewrite seq_buf_path() like it is done in seq_path() and allow it to accept any escape character instead of just "\n". Making seq_buf_path() like seq_path() will help prevent problems when converting seq_file to use the seq_buf logic. Link: http://lkml.kernel.org/r/20141104160222.048795666@goodmis.org Link: http://lkml.kernel.org/r/20141114011412.338523371@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:10 -05:00
Steven Rostedt (Red Hat)	3a161d99c4	tracing: Create seq_buf layer in trace_seq Create a seq_buf layer that trace_seq sits on. The seq_buf will not be limited to page size. This will allow other usages of seq_buf instead of a hard set PAGE_SIZE one that trace_seq has. Link: http://lkml.kernel.org/r/20141104160221.864997179@goodmis.org Link: http://lkml.kernel.org/r/20141114011412.170377300@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:09 -05:00
Markus Elfring	16a8ef2751	tracing: Deletion of an unnecessary check before iput() The iput() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Link: http://lkml.kernel.org/r/5468F875.7080907@users.sourceforge.net Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 16:28:45 -05:00
Alexei Starovoitov	daaf427c6a	bpf: fix arraymap NULL deref and missing overflow and zero size checks - fix NULL pointer dereference: kernel/bpf/arraymap.c:41 array_map_alloc() error: potential null dereference 'array'. (kzalloc returns null) kernel/bpf/arraymap.c:41 array_map_alloc() error: we previously assumed 'array' could be null (see line 40) - integer overflow check was missing in arraymap (hashmap checks for overflow via kmalloc_array()) - arraymap can round_up(value_size, 8) to zero. check was missing. - hashmap was missing zero size check as well, since roundup_pow_of_two() can truncate into zero - found a typo in the arraymap comment and unnecessary empty line Fix all of these issues and make both overflow checks explicit U32 in size. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-19 15:40:00 -05:00
Steven Rostedt (Red Hat)	8e2e095cbe	tracing: Fix return value of ftrace_raw_output_prep() If the trace_seq of ftrace_raw_output_prep() is full this function returns TRACE_TYPE_PARTIAL_LINE, otherwise it returns zero. The problem is that TRACE_TYPE_PARTIAL_LINE happens to be zero! The thing is, the caller of ftrace_raw_output_prep() expects a success to be zero. Change that to expect it to be TRACE_TYPE_HANDLED. Link: http://lkml.kernel.org/r/20141114112522.GA2988@dhcp128.suse.cz Reminded-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:48 -05:00
Steven Rostedt (Red Hat)	dba39448ab	tracing: Remove return values of most trace_seq_*() functions The trace_seq_printf() and friends are used to store strings into a buffer that can be passed around from function to function. If the trace_seq buffer fills up, it will not print any more. The return values were somewhat inconsistant and using trace_seq_has_overflowed() was a better way to know if the write to the trace_seq buffer succeeded or not. Now that all users have removed reading the return value of the printf() type functions, they can safely return void and keep future users of them from reading the inconsistent values as well. Link: http://lkml.kernel.org/r/20141114011411.992510720@goodmis.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:47 -05:00
Steven Rostedt (Red Hat)	183742f08c	tracing: Do not use return values of trace_seq_printf() in syscall tracing The functions trace_seq_printf() and friends will not be returning values soon and will be void functions. To know if they succeeded or not, the functions trace_seq_has_overflowed() and trace_handle_return() should be used instead. Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:46 -05:00
Steven Rostedt (Red Hat)	8579a107a6	tracing/uprobes: Do not use return values of trace_seq_printf() The functions trace_seq_printf() and friends will soon no longer have return values. Using trace_seq_has_overflowed() and trace_handle_return() should be used instead. Link: http://lkml.kernel.org/r/20141114011411.693008134@goodmis.org Link: http://lkml.kernel.org/r/20141115050602.333705855@goodmis.org Reviewed-by: Masami Hiramatsu <masami.hiramatu.pt@hitachi.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:45 -05:00
Steven Rostedt (Red Hat)	d2b0191a38	tracing/probes: Do not use return value of trace_seq_printf() The functions trace_seq_printf() and friends will soon not have a return value and will only be a void function. Use trace_seq_has_overflowed() instead to know if the trace_seq operations succeeded or not. Link: http://lkml.kernel.org/r/20141114011411.530216306@goodmis.org Reviewed-by: Petr Mladek <pmladek@suse.cz> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:44 -05:00
Steven Rostedt (Red Hat)	a72e10afab	tracing: Do not check return values of trace_seq_p*() for mmio tracer The return values for trace_seq_printf() and friends are going to be removed and they will become void functions. The mmio tracer checked their return and even did so incorrectly. Some of the funtions which returned the values were never checked themselves. Removing all the checks simplifies the code. Use trace_seq_has_overflowed() and trace_handle_return() where necessary instead. Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:44 -05:00
Steven Rostedt (Red Hat)	85224da0b8	kprobes/tracing: Use trace_seq_has_overflowed() for overflow checks Instead of checking the return value of trace_seq_printf() and friends for overflowing of the buffer, use the trace_seq_has_overflowed() helper function. This cleans up the code quite a bit and also takes us a step closer to changing the return values of trace_seq_printf() and friends to void. Link: http://lkml.kernel.org/r/20141114011411.181812785@goodmis.org Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Petr Mladek <pmladek@suse.cz> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:43 -05:00
Steven Rostedt (Red Hat)	9d9add34ec	tracing: Have function_graph use trace_seq_has_overflowed() Instead of doing individual checks all over the place that makes the code very messy. Just check trace_seq_has_overflowed() at the end or in strategic places. This makes the code much cleaner and also helps with getting closer to removing the return values of trace_seq_printf() and friends. Link: http://lkml.kernel.org/r/20141114011410.987913836@goodmis.org Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:42 -05:00
Steven Rostedt (Red Hat)	7d40f67165	tracing: Have branch tracer use trace_handle_return() helper function The branch tracer should not be checking the trace_seq_printf() return value as that will soon be void. There's a new trace_handle_return() helper function that will return TRACE_TYPE_PARTIAL_LINE if the trace_seq overflowed and TRACE_TYPE_HANDLED otherwise. Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:41 -05:00
Steven Rostedt (Red Hat)	c0cd93aa16	ring-buffer: Remove check of trace_seq_{puts,printf}() return values Remove checking the return value of all trace_seq_puts(). It was wrong anyway as only the last return value mattered. But as the trace_seq_puts() is going to be a void function in the future, we should not be checking the return value of it anyway. Just return !trace_seq_has_overflowed() instead. Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:40 -05:00
Steven Rostedt (Red Hat)	f4a1d08ce6	blktrace/tracing: Use trace_seq_has_overflowed() helper function Checking the return code of every trace_seq_printf() operation and having to return early if it overflowed makes the code messy. Using the new trace_seq_has_overflowed() and trace_handle_return() functions allows us to clean up the code. In the future, trace_seq_printf() and friends will be turning into void functions and not returning a value. The trace_seq_has_overflowed() is to be used instead. This cleanup allows that change to take place. Cc: Jens Axboe <axboe@fb.com> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:39 -05:00
Steven Rostedt (Red Hat)	19a7fe2062	tracing: Add trace_seq_has_overflowed() and trace_handle_return() Adding a trace_seq_has_overflowed() which returns true if the trace_seq had too much written into it allows us to simplify the code. Instead of checking the return value of every call to trace_seq_printf() and friends, they can all be called normally, and at the end we can return !trace_seq_has_overflowed() instead. Several functions also return TRACE_TYPE_PARTIAL_LINE when the trace_seq overflowed and TRACE_TYPE_HANDLED otherwise. Another helper function was created called trace_handle_return() which takes a trace_seq and returns these enums. Using this helper function also simplifies the code. This change also makes it possible to remove the return values of trace_seq_printf() and friends. They should instead just be void functions. Link: http://lkml.kernel.org/r/20141114011410.365183157@goodmis.org Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:39 -05:00
Steven Rostedt (Red Hat)	e400a40cff	tracing: Fix trace_seq_bitmask() to start at current position In trace_seq_bitmask() it calls bitmap_scnprintf() not from the current position of the trace_seq buffer (s->buffer + s->len), but instead from the beginning of the buffer (s->buffer). Luckily, the only user of this "ipi_raise tracepoint" uses it as the first parameter, and as such, the start of the temp buffer in include/trace/ftrace.h (see __get_bitmask()). Reported-by: Petr Mladek <pmladek@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:38 -05:00
Steven Rostedt (Red Hat)	aec0be2d6e	ftrace/x86/extable: Add is_ftrace_trampoline() function Stack traces that happen from function tracing check if the address on the stack is a __kernel_text_address(). That is, is the address kernel code. This calls core_kernel_text() which returns true if the address is part of the builtin kernel code. It also calls is_module_text_address() which returns true if the address belongs to module code. But what is missing is ftrace dynamically allocated trampolines. These trampolines are allocated for individual ftrace_ops that call the ftrace_ops callback functions directly. But if they do a stack trace, the code checking the stack wont detect them as they are neither core kernel code nor module address space. Adding another field to ftrace_ops that also stores the size of the trampoline assigned to it we can create a new function called is_ftrace_trampoline() that returns true if the address is a dynamically allocate ftrace trampoline. Note, it ignores trampolines that are not dynamically allocated as they will return true with the core_kernel_text() function. Link: http://lkml.kernel.org/r/20141119034829.497125839@goodmis.org Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:26 -05:00
Al Viro	9f45f5bf30	new helper: audit_file() ... for situations when we don't have any candidate in pathnames - basically, in descriptor-based syscalls. [Folded the build fix for !CONFIG_AUDITSYSCALL configs from Chen Gang] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-11-19 13:01:26 -05:00
Al Viro	b583043e99	kill f_dentry uses Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-11-19 13:01:25 -05:00
Alexey Ishchuk	4eafad7feb	s390/kernel: add system calls for PCI memory access Add the new __NR_s390_pci_mmio_write and __NR_s390_pci_mmio_read system calls to allow user space applications to access device PCI I/O memory pages on s390x platform. [ Martin Schwidefsky: some code beautification ] Signed-off-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2014-11-19 09:46:43 +01:00
Steven Rostedt (Red Hat)	a9ce7c36aa	tracing: Fix race of function probes counting The function probe counting for traceon and traceoff suffered a race condition where if the probe was executing on two or more CPUs at the same time, it could decrement the counter by more than one when disabling (or enabling) the tracer only once. The way the traceon and traceoff probes are suppose to work is that they disable (or enable) tracing once per count. If a user were to echo 'schedule:traceoff:3' into set_ftrace_filter, then when the schedule function was called, it would disable tracing. But the count should only be decremented once (to 2). Then if the user enabled tracing again (via tracing_on file), the next call to schedule would disable tracing again and the count would be decremented to 1. But if multiple CPUS called schedule at the same time, it is possible that the count would be decremented more than once because of the simple "count--" used. By reading the count into a local variable and using memory barriers we can guarantee that the count would only be decremented once per disable (or enable). The stack trace probe had a similar race, but here the stack trace will decrement for each time it is called. But this had the read-modify- write race, where it could stack trace more than the number of times that was specified. This case we use a cmpxchg to stack trace only the number of times specified. The dump probes can still use the old "update_count()" function as they only run once, and that is controlled by the dump logic itself. Link: http://lkml.kernel.org/r/20141118134643.4b550ee4@gandalf.local.home Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-18 23:06:35 -05:00
Rafael J. Wysocki	b2b49ccbdd	PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected The number of and dependencies between high-level power management Kconfig options make life much harder than necessary. Several conbinations of them have to be tested and supported, even though some of those combinations are very rarely used in practice (if they are used in practice at all). Moreover, the fact that we have separate independent Kconfig options for runtime PM and system suspend is a serious obstacle for integration between the two frameworks. To overcome these difficulties, always select PM_RUNTIME if PM_SLEEP is set. Among other things, this will allow system suspend callbacks provided by bus types and device drivers to rely on the runtime PM framework regardless of the kernel configuration. Enthusiastically-acked-by: Kevin Hilman <khilman@linaro.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-11-18 23:41:40 +01:00
Alexei Starovoitov	7943c0f329	bpf: remove test map scaffolding and user proper types proper types and function helpers are ready. Use them in verifier testsuite. Remove temporary stubs Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-18 13:44:00 -05:00
Alexei Starovoitov	d0003ec01c	bpf: allow eBPF programs to use maps expose bpf_map_lookup_elem(), bpf_map_update_elem(), bpf_map_delete_elem() map accessors to eBPF programs Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-18 13:44:00 -05:00
Alexei Starovoitov	a1854d6ac0	bpf: fix BPF_MAP_LOOKUP_ELEM command return code fix errno of BPF_MAP_LOOKUP_ELEM command as bpf manpage described it in commit b4fc1a460f30("Merge branch 'bpf-next'"): ----- BPF_MAP_LOOKUP_ELEM int bpf_lookup_elem(int fd, void key, void value) { union bpf_attr attr = { .map_fd = fd, .key = ptr_to_u64(key), .value = ptr_to_u64(value), }; return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr)); } bpf() syscall looks up an element with given key in a map fd. If element is found it returns zero and stores element's value into value. If element is not found it returns -1 and sets errno to ENOENT. and further down in manpage: ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM, indicates that element with given key was not found. ----- In general all BPF commands return ENOENT when map element is not found (including BPF_MAP_GET_NEXT_KEY and BPF_MAP_UPDATE_ELEM with flags == BPF_MAP_UPDATE_ONLY) Subsequent patch adds a testsuite to check return values for all of these combinations. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-18 13:43:59 -05:00
Alexei Starovoitov	28fbcfa08d	bpf: add array type of eBPF maps add new map type BPF_MAP_TYPE_ARRAY and its implementation - optimized for fastest possible lookup() . in the future verifier/JIT may recognize lookup() with constant key and optimize it into constant pointer. Can optimize non-constant key into direct pointer arithmetic as well, since pointers and value_size are constant for the life of the eBPF program. In other words array_map_lookup_elem() may be 'inlined' by verifier/JIT while preserving concurrent access to this map from user space - two main use cases for array type: . 'global' eBPF variables: array of 1 element with key=0 and value is a collection of 'global' variables which programs can use to keep the state between events . aggregation of tracing events into fixed set of buckets - all array elements pre-allocated and zero initialized at init time - key as an index in array and can only be 4 byte - map_delete_elem() returns EINVAL, since elements cannot be deleted - map_update_elem() replaces elements in an non-atomic way (for atomic updates hashtable type should be used instead) Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-18 13:43:59 -05:00
Alexei Starovoitov	0f8e4bd8a1	bpf: add hashtable type of eBPF maps add new map type BPF_MAP_TYPE_HASH and its implementation - maps are created/destroyed by userspace. Both userspace and eBPF programs can lookup/update/delete elements from the map - eBPF programs can be called in_irq(), so use spin_lock_irqsave() mechanism for concurrent updates - key/value are opaque range of bytes (aligned to 8 bytes) - user space provides 3 configuration attributes via BPF syscall: key_size, value_size, max_entries - map takes care of allocating/freeing key/value pairs - map_update_elem() must fail to insert new element when max_entries limit is reached to make sure that eBPF programs cannot exhaust memory - map_update_elem() replaces elements in an atomic way - optimized for speed of lookup() which can be called multiple times from eBPF program which itself is triggered by high volume of events . in the future JIT compiler may recognize lookup() call and optimize it further, since key_size is constant for life of eBPF program Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-18 13:43:25 -05:00
Alexei Starovoitov	3274f52073	bpf: add 'flags' attribute to BPF_MAP_UPDATE_ELEM command the current meaning of BPF_MAP_UPDATE_ELEM syscall command is: either update existing map element or create a new one. Initially the plan was to add a new command to handle the case of 'create new element if it didn't exist', but 'flags' style looks cleaner and overall diff is much smaller (more code reused), so add 'flags' attribute to BPF_MAP_UPDATE_ELEM command with the following meaning: #define BPF_ANY 0 /* create new element or update existing / #define BPF_NOEXIST 1 / create new element if it didn't exist / #define BPF_EXIST 2 / update existing element / bpf_update_elem(fd, key, value, BPF_NOEXIST) call can fail with EEXIST if element already exists. bpf_update_elem(fd, key, value, BPF_EXIST) can fail with ENOENT if element doesn't exist. Userspace will call it as: int bpf_update_elem(int fd, void key, void *value, __u64 flags) { union bpf_attr attr = { .map_fd = fd, .key = ptr_to_u64(key), .value = ptr_to_u64(value), .flags = flags; }; return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr)); } First two bits of 'flags' are used to encode style of bpf_update_elem() command. Bits 2-63 are reserved for future use. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-18 13:43:25 -05:00
Tejun Heo	eeecbd1971	cgroup: implement cgroup_get_e_css() Implement cgroup_get_e_css() which finds and gets the effective css for the specified cgroup and subsystem combination. This function always returns a valid pinned css. This will be used by cgroup writeback support. While at it, add comment to cgroup_e_css() to explain why that function is different from cgroup_get_e_css() and has to test cgrp->child_subsys_mask instead of cgroup_css(cgrp, ss). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:52 -05:00
Tejun Heo	56c807ba4e	cgroup: add cgroup_subsys->css_e_css_changed() Add a new cgroup_subsys operatoin ->css_e_css_changed(). This is invoked if any of the effective csses seen from the css's cgroup may have changed. This will be used to implement cgroup writeback support. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:51 -05:00
Tejun Heo	7d172cc89b	cgroup: add cgroup_subsys->css_released() Add a new cgroup subsys callback css_released(). This is called when the reference count of the css (cgroup_subsys_state) reaches zero before RCU scheduling free. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:51 -05:00
Tejun Heo	db6e305345	cgroup: fix the async css offline wait logic in cgroup_subtree_control_write() When a subsystem is offlined, its entry on @cgrp->subsys[] is cleared asynchronously. If cgroup_subtree_control_write() is requested to enable the subsystem again before the entry is cleared, it has to wait for the previous offlining to finish and clear the @cgrp->subsys[] entry before trying to enable the subsystem again. This is currently done while verifying the input enable / disable parameters. This used to be correct but f63070d350e3 ("cgroup: make interface files visible iff enabled on cgroup->subtree_control") breaks it. The commit is one of the commits implementing subsystem dependency. Through subsystem dependency, some subsystems may be enabled and disabled implicitly in addition to the explicitly requested ones. The actual subsystems to be enabled and disabled are determined during @css_enable/disable calculation. The current offline wait logic skips the ones which are already implicitly enabled and then waits for subsystems in @enable; however, this misses the subsystems which may be implicitly enabled through dependency from @enable. If such implicitly subsystem hasn't yet finished offlining yet, the function ends up trying to create a css when its @cgrp->subsys[] slot is already occupied triggering BUG_ON() in init_and_link_css(). Fix it by moving the wait logic after @css_enable is calculated and waiting for all the subsystems in @css_enable. This fixes the above bug as the mask contains all subsystems which are to be enabled including the ones enabled through dependencies. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: f63070d350e3 ("cgroup: make interface files visible iff enabled on cgroup->subtree_control") Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:51 -05:00
Tejun Heo	755bf5ee86	cgroup: restructure child_subsys_mask handling in cgroup_subtree_control_write() Make cgroup_subtree_control_write() first calculate new subtree_control (new_sc), child_subsys_mask (new_ss) and css_enable/disable masks before applying them to the cgroup. Also, store the original subtree_control (old_sc) and child_subsys_mask (old_ss) and use them to restore the orignal state after failure. This patch shouldn't cause any behavior changes. This prepares for a fix for a bug in the async css offline wait logic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:50 -05:00
Tejun Heo	0f060deb5c	cgroup: separate out cgroup_calc_child_subsys_mask() from cgroup_refresh_child_subsys_mask() cgroup_refresh_child_subsys_mask() calculates and updates the effective @cgrp->child_subsys_maks according to the current @cgrp->subtree_control. Separate out the calculation part into cgroup_calc_child_subsys_mask(). This will be used to fix a bug in the async css offline wait logic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:50 -05:00

... 3 4 5 6 7 ...

19572 Commits