linux

iv/linux

Author	SHA1	Message	Date
Daniel Borkmann	c78f8bdfa1	bpf: mark all registered map/prog types as __ro_after_init All map types and prog types are registered to the BPF core through bpf_register_map_type() and bpf_register_prog_type() during init and remain unchanged thereafter. As by design we don't (and never will) have any pluggable code that can register to that at any later point in time, lets mark all the existing bpf_{map,prog}_type_list objects in the tree as __ro_after_init, so they can be moved to read-only section from then onwards. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-02-17 13:40:04 -05:00
Joel Fernandes	67d04bb2bc	tracing: Remove outdated ring buffer comment The comment about ring buffer's organization is outdated and the code sits elsewhere, remove the comment. Link: http://lkml.kernel.org/r/20170217041058.23904-1-joelaf@google.com Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Joel Fernandes <joelaf@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-17 09:56:51 -05:00
Masami Hiramatsu	bef5da6059	tracing/probes: Fix a warning message to show correct maximum length Since tracing/*probe_events will accept a probe definition up to 4096 - 2 ('\n' and '\0') bytes, it must show 4094 instead of 4096 in warning message. Note that there is one possible case of exceed 4094. If user prepare 4096 bytes null-terminated string and syscall write it with the count == 4095, then it can be accepted. However, if user puts a '\n' after that, it must rejected. So IMHO, the warning message should indicate shorter one, since it is safer. Link: http://lkml.kernel.org/r/148673290462.2579.7966778294009665632.stgit@devbox Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-15 10:32:48 -05:00
Wei Yongjun	8f0994bb8c	tracing: Fix return value check in trace_benchmark_reg() In case of error, the function kthread_run() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Link: http://lkml.kernel.org/r/20170112135502.28556-1-weiyj.lk@gmail.com Cc: stable@vger.kernel.org Fixes: 81dc9f0e ("tracing: Add tracepoint benchmark tracepoint") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-15 09:02:27 -05:00
Arnd Bergmann	eb583cd484	tracing: Use modern function declaration We get a lot of harmless warnings about this header file at W=1 level because of an unusual function declaration: kernel/trace/trace.h:766:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration] This moves the inline statement where it normally belongs, avoiding the warning. Link: http://lkml.kernel.org/r/20170123122521.3389010-1-arnd@arndb.de Fixes: 4046bf023b06 ("ftrace: Expose ftrace_hash_empty and ftrace_lookup_ip") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-15 09:02:27 -05:00
Masami Hiramatsu	7257634135	tracing/probe: Show subsystem name in messages Show "trace_probe:", "trace_kprobe:" and "trace_uprobe:" headers for each warning/error/info message. This will help people to notice that kprobe/uprobe events caused those messages. Link: http://lkml.kernel.org/r/148646647813.24658.16705315294927615333.stgit@devbox Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-15 09:02:25 -05:00
Luiz Capitulino	8e0f11429a	tracing/hwlat: Update old comment about migration The ftrace hwlat does support a cpumask. Link: http://lkml.kernel.org/r/20170213122517.6e211955@redhat.com Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-15 09:02:25 -05:00
Steven Rostedt (VMware)	1f9b3546cf	tracing: Have traceprobe_probes_write() not access userspace unnecessarily The code in traceprobe_probes_write() reads up to 4096 bytes from userpace for each line. If userspace passes in several lines to execute, the code will do a large read for each line, even though, it is highly likely that the first read from userspace received all of the lines at once. I changed the logic to do a single read from userspace, and to only read from userspace again if not all of the read from userspace made it in. I tested this by adding printk()s and writing files that would test -1, ==, and +1 the buffer size, to make sure that there's no overflows and that if a single line is written with +1 the buffer size, that it fails properly. Link: http://lkml.kernel.org/r/20170209180458.5c829ab2@gandalf.local.home Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-15 09:00:55 -05:00
Steven Rostedt (VMware)	4c7384131c	tracing: Have COMM event filter key be treated as a string The GLOB operation "~" should be able to work with the COMM filter key in order to trace programs with a glob. For example echo 'COMM ~ "systemd*"' > events/syscalls/filter Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-10 14:19:45 -05:00
David S. Miller	3efa70d78f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net The conflict was an interaction between a bug fix in the netvsc driver in 'net' and an optimization of the RX path in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-02-07 16:29:30 -05:00
Daniel Borkmann	3898fac1f4	trace: rename trace_print_hex_seq arg and add kdoc Steven suggested to improve trace_print_hex_seq() a bit after commit 2acae0d5b0f7 ("trace: add variant without spacing in trace_print_hex_seq") in two ways: i) by adding a kdoc comment for the helper function itself and ii) by renaming 'spacing' argument into 'concatenate' to better denote that we don't add spaces between each hex bytes. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-02-03 15:50:18 -05:00
Steven Rostedt (VMware)	e704eff3ff	ftrace: Have set_graph_function handle multiple functions in one write Currently, only one function can be written to set_graph_function and set_graph_notrace. The last function in the list will have saved, even though other functions will be added then removed. Change the behavior to be the same as set_ftrace_function as to allow multiple functions to be written. If any one fails, none of them will be added. The addition of the functions are done at the end when the file is closed. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:59:52 -05:00
Steven Rostedt (VMware)	649b988b12	ftrace: Do not hold references of ftrace_graph_{notrace_}hash out of graph_lock The hashs ftrace_graph_hash and ftrace_graph_notrace_hash are modified within the graph_lock being held. Holding a pointer to them and passing them along can lead to a use of a stale pointer (fgd->hash). Move assigning the pointer and its use to within the holding of the lock. Note, it's an rcu_sched protected data, and other instances of referencing them are done with preemption disabled. But the file manipuation code must be protected by the lock. The fgd->hash pointer is set to NULL when the lock is being released. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:59:42 -05:00
Steven Rostedt (VMware)	0e684b6578	tracing: Reset parser->buffer to allow multiple "puts" trace_parser_put() simply frees the allocated parser buffer. But it does not reset the pointer that was freed. This means that if trace_parser_put() is called on the same parser more than once, it will corrupt the allocation system. Setting parser->buffer to NULL after free allows it to be called more than once without any ill effect. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:59:31 -05:00
Steven Rostedt (VMware)	ae98d27afc	ftrace: Have set_graph_functions handle write with RDWR Since reading the set_graph_functions uses seq functions, which sets the file->private_data pointer to a seq_file descriptor. On writes the ftrace_graph_data descriptor is set to file->private_data. But if the file is opened for RDWR, the ftrace_graph_write() will incorrectly use the file->private_data descriptor instead of ((struct seq_file *)file->private_data)->private pointer, and this can crash the kernel. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:59:23 -05:00
Steven Rostedt (VMware)	d4ad9a1cca	ftrace: Reset fgd->hash in ftrace_graph_write() fgd->hash is saved and then freed, but is never reset to either ftrace_graph_hash nor ftrace_graph_notrace_hash. But if multiple writes are performed, then the freed hash could be accessed again. # cd /sys/kernel/debug/tracing # head -1000 available_filter_functions > /tmp/funcs # cat /tmp/funcs > set_graph_function Causes: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: [...] CPU: 2 PID: 1337 Comm: cat Not tainted 4.10.0-rc2-test-00010-g6b052e9 #32 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 task: ffff880113a12200 task.stack: ffffc90001940000 RIP: 0010:free_ftrace_hash+0x7c/0x160 RSP: 0018:ffffc90001943db0 EFLAGS: 00010246 RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: 6b6b6b6b6b6b6b6b RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff8800ce1e1d40 RBP: ffff8800ce1e1d50 R08: 0000000000000000 R09: 0000000000006400 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8800ce1e1d40 R14: 0000000000004000 R15: 0000000000000001 FS: 00007f9408a07740(0000) GS:ffff88011e500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000aee1f0 CR3: 0000000116bb4000 CR4: 00000000001406e0 Call Trace: ? ftrace_graph_write+0x150/0x190 ? __vfs_write+0x1f6/0x210 ? __audit_syscall_entry+0x17f/0x200 ? rw_verify_area+0xdb/0x210 ? _cond_resched+0x2b/0x50 ? __sb_start_write+0xb4/0x130 ? vfs_write+0x1c8/0x330 ? SyS_write+0x62/0xf0 ? do_syscall_64+0xa3/0x1b0 ? entry_SYSCALL64_slow_path+0x25/0x25 Code: 01 48 85 db 0f 84 92 00 00 00 b8 01 00 00 00 d3 e0 85 c0 7e 3f 83 e8 01 48 8d 6f 10 45 31 e4 4c 8d 34 c5 08 00 00 00 49 8b 45 08 <4a> 8b 34 20 48 85 f6 74 13 48 8b 1e 48 89 ef e8 20 fa ff ff 48 RIP: free_ftrace_hash+0x7c/0x160 RSP: ffffc90001943db0 ---[ end trace 999b48216bf4b393 ]--- Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:59:06 -05:00
Steven Rostedt (VMware)	555fc7813e	ftrace: Replace (void )1 with a meaningful macro name FTRACE_GRAPH_EMPTY When the set_graph_function or set_graph_notrace contains no records, a banner is displayed of either "#### all functions enabled ####" or "#### all functions disabled ####" respectively. To tell the seq operations to do this, (void )1 is passed as a return value. Instead of using a hardcoded meaningless variable, define it as a macro. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:58:48 -05:00
Steven Rostedt (VMware)	2b2c279c81	ftrace: Create a slight optimization on searching the ftrace_hash This is a micro-optimization, but as it has to deal with a fast path of the function tracer, these optimizations can be noticed. The ftrace_lookup_ip() returns true if the given ip is found in the hash. If it's not found or the hash is NULL, it returns false. But there's some cases that a NULL hash is a true, and the ftrace_hash_empty() is tested before calling ftrace_lookup_ip() in those cases. But as ftrace_lookup_ip() tests that first, that adds a few extra unneeded instructions in those cases. A new static "always_inlined" function is created that does not perform the hash empty test. This most only be used by callers that do the check first anyway, as an empty or NULL hash could cause a crash if a lookup is performed on it. Also add kernel doc for the ftrace_lookup_ip() main function. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:58:22 -05:00
Steven Rostedt (VMware)	2b0cce0e19	tracing: Add ftrace_hash_key() helper function Replace the couple of use cases that has small logic to produce the ftrace function key id with a helper function. No need for duplicate code. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:58:05 -05:00
David S. Miller	e2160156bf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net All merge conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-02-02 16:54:00 -05:00
Omar Sandoval	6ac93117ab	blktrace: use existing disk debugfs directory We may already have a directory to put the blktrace stuff in if 1. The disk uses blk-mq 2. CONFIG_BLK_DEBUG_FS is enabled 3. We are tracing the whole disk and not a partition Instead of hardcoding this very specific case, let's use the new debugfs_lookup(). If the directory exists, we use it, otherwise we create one and clean it up later. Fixes: 07e4fead45e6 ("blk-mq: create debugfs directory tree") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-02 10:20:16 -07:00
Omar Sandoval	18fbda91c6	block: use same block debugfs directory for blk-mq and blktrace When I added the blk-mq debugging information to debugfs, I didn't notice that blktrace also creates a "block" directory in debugfs. Make them use the same dentry, now created in the core block code. Based on a patch from Jens. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-02 10:20:16 -07:00
Omar Sandoval	a428d314eb	blktrace: make do_blk_trace_setup() static This isn't used outside of blktrace.c anymore. Fixes: 62c2a7d969f3 ("block: push BKL into blktrace ioctls") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-02 10:20:16 -07:00
Arnd Bergmann	26a346f23c	tracing/kprobes: Fix __init annotation clang complains about "__init" being attached to a struct name: kernel/trace/trace_kprobe.c:1375:15: error: '__section__' attribute only applies to functions and global variables The intention must have been to mark the function as __init instead of the type, so move the attribute there. Link: http://lkml.kernel.org/r/20170201165826.2625888-1-arnd@arndb.de Fixes: f18f97ac43d7 ("tracing/kprobes: Add a helper method to return number of probe hits") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-02 10:48:35 -05:00
Eric W. Biederman	93faccbbfa	fs: Better permission checking for submounts To support unprivileged users mounting filesystems two permission checks have to be performed: a test to see if the user allowed to create a mount in the mount namespace, and a test to see if the user is allowed to access the specified filesystem. The automount case is special in that mounting the original filesystem grants permission to mount the sub-filesystems, to any user who happens to stumble across the their mountpoint and satisfies the ordinary filesystem permission checks. Attempting to handle the automount case by using override_creds almost works. It preserves the idea that permission to mount the original filesystem is permission to mount the sub-filesystem. Unfortunately using override_creds messes up the filesystems ordinary permission checks. Solve this by being explicit that a mount is a submount by introducing vfs_submount, and using it where appropriate. vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let sget and friends know that a mount is a submount so they can take appropriate action. sget and sget_userns are modified to not perform any permission checks on submounts. follow_automount is modified to stop using override_creds as that has proven problemantic. do_mount is modified to always remove the new MS_SUBMOUNT flag so that we know userspace will never by able to specify it. autofs4 is modified to stop using current_real_cred that was put in there to handle the previous version of submount permission checking. cifs is modified to pass the mountpoint all of the way down to vfs_submount. debugfs is modified to pass the mountpoint all of the way down to trace_automount by adding a new parameter. To make this change easier a new typedef debugfs_automount_t is introduced to capture the type of the debugfs automount function. Cc: stable@vger.kernel.org Fixes: 069d5ac9ae0d ("autofs: Fix automounts by using current_real_cred()->uid") Fixes: aeaa4a79ff6a ("fs: Call d_automount with the filesystems creds") Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2017-02-02 04:36:12 +13:00
Steven Rostedt (VMware)	f447c196fe	tracing: Clean up the hwlat binding code Instead of initializing the affinity of the hwlat kthread in the thread itself, simply set up the initial affinity at thread creation. This simplifies the code. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-31 16:48:23 -05:00
Christoph Hellwig	57292b58dd	block: introduce blk_rq_is_passthrough This can be used to check for fs vs non-fs requests and basically removes all knowledge of BLOCK_PC specific from the block layer, as well as preparing for removing the cmd_type field in struct request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-01-31 14:00:34 -07:00
Steven Rostedt (VMware)	79c6f448c8	tracing: Fix hwlat kthread migration The hwlat tracer creates a kernel thread at start of the tracer. It is pinned to a single CPU and will move to the next CPU after each period of running. If the user modifies the migration thread's affinity, it will not change after that happens. The original code created the thread at the first instance it was called, but later was changed to destroy the thread after the tracer was finished, and would not be created until the next instance of the tracer was established. The code that initialized the affinity was only called on the initial instantiation of the tracer. After that, it was not initialized, and the previous affinity did not match the current newly created one, making it appear that the user modified the thread's affinity when it did not, and the thread failed to migrate again. Cc: stable@vger.kernel.org Fixes: 0330f7aa8ee6 ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-31 09:13:49 -05:00
Christoph Hellwig	48b77ad608	block: cleanup tracing A couple tweaks to the tracing code: - trace the request size for all requests - trace request sector and nr_sectors only for fs requests, enforced by helpers - drop SCSI CDB tracing - we have SCSI tracing for this and are going to me the CDB out of the generic struct request soon. With this the tracing code stops to know about BLOCK_PC requests entirely, it's just FS vs passthrough requests now, where the latter includes any driver-private requests. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-01-27 15:08:35 -07:00
Daniel Borkmann	2acae0d5b0	trace: add variant without spacing in trace_print_hex_seq For upcoming tracepoint support for BPF, we want to dump the program's tag. Format should be similar to __print_hex(), but without spacing. Add a __print_hex_str() variant for exactly that purpose that reuses trace_print_hex_seq(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-25 13:17:47 -05:00
Namhyung Kim	b9b0c831be	ftrace: Convert graph filter to use hash tables Use ftrace_hash instead of a static array of a fixed size. This is useful when a graph filter pattern matches to a large number of functions. Now hash lookup is done with preemption disabled to protect from the hash being changed/freed. Link: http://lkml.kernel.org/r/20170120024447.26097-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-20 14:50:58 -05:00
Namhyung Kim	4046bf023b	ftrace: Expose ftrace_hash_empty and ftrace_lookup_ip It will be used when checking graph filter hashes later. Link: http://lkml.kernel.org/r/20170120024447.26097-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> [ Moved ftrace_hash dec and functions outside of FUNCTION_GRAPH define ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-20 14:50:21 -05:00
Gianluca Borello	a5e8c07059	bpf: add bpf_probe_read_str helper Provide a simple helper with the same semantics of strncpy_from_unsafe(): int bpf_probe_read_str(void dst, int size, const void unsafe_addr) This gives more flexibility to a bpf program. A typical use case is intercepting a file name during sys_open(). The current approach is: SEC("kprobe/sys_open") void bpf_sys_open(struct pt_regs ctx) { char buf[PATHLEN]; // PATHLEN is defined to 256 bpf_probe_read(buf, sizeof(buf), ctx->di); / consume buf / } This is suboptimal because the size of the string needs to be estimated at compile time, causing more memory to be copied than often necessary, and can become more problematic if further processing on buf is done, for example by pushing it to userspace via bpf_perf_event_output(), since the real length of the string is unknown and the entire buffer must be copied (and defining an unrolled strnlen() inside the bpf program is a very inefficient and unfeasible approach). With the new helper, the code can easily operate on the actual string length rather than the buffer size: SEC("kprobe/sys_open") void bpf_sys_open(struct pt_regs ctx) { char buf[PATHLEN]; // PATHLEN is defined to 256 int res = bpf_probe_read_str(buf, sizeof(buf), ctx->di); /* consume buf, for example push it to userspace via * bpf_perf_event_output(), but this time we can use * res (the string length) as event size, after checking * its boundaries. */ } Another useful use case is when parsing individual process arguments or individual environment variables navigating current->mm->arg_start and current->mm->env_start: using this helper and the return value, one can quickly iterate at the right offset of the memory area. The code changes simply leverage the already existent strncpy_from_unsafe() kernel function, which is safe to be called from a bpf program as it is used in bpf_trace_printk(). Signed-off-by: Gianluca Borello <g.borello@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:08:43 -05:00
Namhyung Kim	3e278c0dc1	ftrace: Factor out __ftrace_hash_move() The __ftrace_hash_move() is to allocates properly-sized hash and move entries in the src ftrace_hash. It will be used to set function graph filters which has nothing to do with the dyn_ftrace records. Link: http://lkml.kernel.org/r/20170120024447.26097-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-20 11:40:07 -05:00
Steven Rostedt (VMware)	068f530b3f	tracing: Add the constant count for branch tracer The unlikely/likely branch profiler now gets called even if the if statement is a constant (always goes in one direction without a compare). Add a value to denote this in the likely/unlikely tracer as well. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-19 08:57:41 -05:00
Steven Rostedt (VMware)	134e6a034c	tracing: Show number of constants profiled in likely profiler Now that constants are traced, it is useful to see the number of constants that are traced in the likely/unlikely profiler in order to know if they should be ignored or not. The likely/unlikely will display a number after the "correct" number if a "constant" count exists. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-19 08:57:14 -05:00
Steven Rostedt (VMware)	d45ae1f704	tracing: Process constants for (un)likely() profiler When running the likely/unlikely profiler, one of the results did not look accurate. It noted that the unlikely() in link_path_walk() was 100% incorrect. When I added a trace_printk() to see what was happening there, it became 80% correct! Looking deeper into what whas happening, I found that gcc split that if statement into two paths. One where the if statement became a constant, the other path a variable. The other path had the if statement always hit (making the unlikely there, always false), but since the #define unlikely() has: #define unlikely() (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0)) Where constants are ignored by the branch profiler, the "constant" path made by the compiler was ignored, even though it was hit 80% of the time. By just passing the constant value to the __branch_check__() function and tracing it out of line (as always correct, as likely/unlikely isn't a factor for constants), then we get back the accurate readings of branches that were optimized by gcc causing part of the execution to become constant. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-17 15:13:05 -05:00
Kenny Yu	6496bb72bf	uprobe: Find last occurrence of ':' when parsing uprobe PATH:OFFSET Previously, `create_trace_uprobe` found the first occurence of the ':' character when parsing `PATH:OFFSET` for a uprobe. However, if the path contains a ':' character, then the function would parse the path incorrectly. Even worse, if the path does not exist, the subsequent call to `kern_path()` would set `ret` to `ENOENT`, leading to very cryptic errno values in user space. The fix is to find the last occurence of ':'. How to repro:: The write fails with "No such file or directory", suggesting incorrectly that the `uprobe_events` file does not exist. $ mkdir testing && cd testing $ cp /bin/bash . $ cp /bin/bash ./bash:with:colon $ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events # this works $ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events # this doesn't -bash: echo: write error: No such file or directory With the patch: $ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events # this still works $ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events # this works now too! $ cat /sys/kernel/debug/tracing/uprobe_events p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x0000000000000006 p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x0000000000000006 Link: http://lkml.kernel.org/r/20170113165834.4081016-1-kennyyu@fb.com Signed-off-by: Kenny Yu <kennyyu@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-17 12:57:47 -05:00
Daniel Borkmann	2d071c643f	bpf, trace: make ctx access checks more robust Make sure that ctx cannot potentially be accessed oob by asserting explicitly that ctx access size into pt_regs for BPF_PROG_TYPE_KPROBE programs must be within limits. In case some 32bit archs have pt_regs not being a multiple of 8, then BPF_DW access could cause such access. BPF_PROG_TYPE_KPROBE progs don't have a ctx conversion function since there's no extra mapping needed. kprobe_prog_is_valid_access() didn't enforce sizeof(long) as the only allowed access size, since LLVM can generate non BPF_W/BPF_DW access to regs from time to time. For BPF_PROG_TYPE_TRACEPOINT we don't have a ctx conversion either, so add a BUILD_BUG_ON() check to make sure that BPF_DW access will not be a similar issue in future (ctx works on event buffer as opposed to pt_regs there). Fixes: 2541517c32be ("tracing, perf: Implement BPF programs attached to kprobes") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-16 14:41:42 -05:00
Daniel Borkmann	6b8cc1d11e	bpf: pass original insn directly to convert_ctx_access Currently, when calling convert_ctx_access() callback for the various program types, we pass in insn->dst_reg, insn->src_reg, insn->off from the original instruction. This information is needed to rewrite the instruction that is based on the user ctx structure into a kernel representation for the ctx. As we'd like to allow access size beyond just BPF_W, we'd need also insn->code for that in order to decode the original access size. Given that, lets just pass insn directly to the convert_ctx_access() callback and work on that to not clutter the callback with even more arguments we need to pass when everything is already contained in insn. So lets go through that once, no functional change. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-12 10:00:31 -05:00
Alexei Starovoitov	39f19ebbf5	bpf: rename ARG_PTR_TO_STACK since ARG_PTR_TO_STACK is no longer just pointer to stack rename it to ARG_PTR_TO_MEM and adjust comment. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-09 16:56:27 -05:00
Al Viro	f81dc7d7d5	splice_pipe_desc: kill ->flags no users left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-12-26 23:53:38 -05:00
Thomas Gleixner	a5a1d1c291	clocksource: Use a plain u64 instead of cycle_t There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>	2016-12-25 11:04:12 +01:00
Linus Torvalds	179a7ba680	This release has a few updates: o STM can hook into the function tracer o Function filtering now supports more advance glob matching o Ftrace selftests updates and added tests o Softirq tag in traces now show only softirqs o ARM nop added to non traced locations at compile time o New trace_marker_raw file that allows for binary input o Optimizations to the ring buffer o Removal of kmap in trace_marker o Wakeup and irqsoff tracers now adhere to the set_graph_notrace file o Other various fixes and clean ups Note, there are two patches marked for stable. These were discovered near the end of the 4.9 rc release cycle. By the time I had them tested it was just a matter of days before 4.9 would be released, and I figured I would just submit them in the merge window. They are old bugs and not critical. Nothing non-root could abuse. -----BEGIN PGP SIGNATURE----- iQExBAABCAAbBQJYUrFHFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L 2+AIAIr20kSQV/nA5htGAeCTobVk3WUxY6bvjd9mIJDKPP19akNLyREW0G3KnfCr yhx4aFRZG98fRu/6F8qieRosyN36lADDVYHelMFHMpcTOpE2aZGjaaOuNGxOEA9v FmMPTX+K3+dzKyFP4l68R3+5JuQ1/AqLTioTWeLW8IDQ2OOVsjD8+0BuXrNKMJDY o6U4Hk5U/vn+zHc6BmgBzloAXemBd7iJ1t5V3FRRGvm8yv3HU85Twc5ofGeYTWvB J8PboEywRlIzxg0Kd8mxnMI5PgaKZSEc2ub8E7cY/CZ5PYpDE2xDA2hJmJgfYp00 1VW+DHRpRZfElsCcya6S6P4bs5Y= =MGZ/ -----END PGP SIGNATURE----- Merge tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This release has a few updates: - STM can hook into the function tracer - Function filtering now supports more advance glob matching - Ftrace selftests updates and added tests - Softirq tag in traces now show only softirqs - ARM nop added to non traced locations at compile time - New trace_marker_raw file that allows for binary input - Optimizations to the ring buffer - Removal of kmap in trace_marker - Wakeup and irqsoff tracers now adhere to the set_graph_notrace file - Other various fixes and clean ups" * tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits) selftests: ftrace: Shift down default message verbosity kprobes/trace: Fix kprobe selftest for newer gcc tracing/kprobes: Add a helper method to return number of probe hits tracing/rb: Init the CPU mask on allocation tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too fgraph: Handle a case where a tracer ignores set_graph_notrace tracing: Replace kmap with copy_from_user() in trace_marker writing ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it tracing: Allow benchmark to be enabled at early_initcall() tracing: Have system enable return error if one of the events fail tracing: Do not start benchmark on boot up tracing: Have the reg function allow to fail ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline ring-buffer: Froce rb_update_write_stamp() to be inlined ring-buffer: Force inline of hotpath helper functions tracing: Make __buffer_unlock_commit() always_inline tracing: Make tracepoint_printk a static_key ring-buffer: Always inline rb_event_data() ring-buffer: Make rb_reserve_next_event() always inlined ...	2016-12-15 13:49:34 -08:00
Linus Torvalds	36869cb93d	Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block Pull block layer updates from Jens Axboe: "This is the main block pull request this series. Contrary to previous release, I've kept the core and driver changes in the same branch. We always ended up having dependencies between the two for obvious reasons, so makes more sense to keep them together. That said, I'll probably try and keep more topical branches going forward, especially for cycles that end up being as busy as this one. The major parts of this pull request is: - Improved support for O_DIRECT on block devices, with a small private implementation instead of using the pig that is fs/direct-io.c. From Christoph. - Request completion tracking in a scalable fashion. This is utilized by two components in this pull, the new hybrid polling and the writeback queue throttling code. - Improved support for polling with O_DIRECT, adding a hybrid mode that combines pure polling with an initial sleep. From me. - Support for automatic throttling of writeback queues on the block side. This uses feedback from the device completion latencies to scale the queue on the block side up or down. From me. - Support from SMR drives in the block layer and for SD. From Hannes and Shaun. - Multi-connection support for nbd. From Josef. - Cleanup of request and bio flags, so we have a clear split between which are bio (or rq) private, and which ones are shared. From Christoph. - A set of patches from Bart, that improve how we handle queue stopping and starting in blk-mq. - Support for WRITE_ZEROES from Chaitanya. - Lightnvm updates from Javier/Matias. - Supoort for FC for the nvme-over-fabrics code. From James Smart. - A bunch of fixes from a whole slew of people, too many to name here" * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits) blk-stat: fix a few cases of missing batch flushing blk-flush: run the queue when inserting blk-mq flush elevator: make the rqhash helpers exported blk-mq: abstract out blk_mq_dispatch_rq_list() helper blk-mq: add blk_mq_start_stopped_hw_queue() block: improve handling of the magic discard payload blk-wbt: don't throttle discard or write zeroes nbd: use dev_err_ratelimited in io path nbd: reset the setup task for NBD_CLEAR_SOCK nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME nvme-fabrics: Add target support for FC transport nvme-fabrics: Add host support for FC transport nvme-fabrics: Add FC transport LLDD api definitions nvme-fabrics: Add FC transport FC-NVME definitions nvme-fabrics: Add FC transport error codes to nvme.h Add type 0x28 NVME type code to scsi fc headers nvme-fabrics: patch target code in prep for FC transport support nvme-fabrics: set sqe.command_id in core not transports parser: add u64 number parser nvme-rdma: align to generic ib_event logging helper ...	2016-12-13 10:19:16 -08:00
Linus Torvalds	52281b38bc	Improvements and fixes to pstore subsystem: - Add additional checks for bad platform data - Remove bounce buffer in console writer - Protect read/unlink race with a mutex - Correctly give up during dump locking failures - Increase ftrace bandwidth by splitting ftrace buffers per CPU -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Kees Cook <kees@outflux.net> iQIcBAABCgAGBQJYSJxYAAoJEIly9N/cbcAmYBsQAIAmHDgk3ootLQhyatZ9H2X0 Nyl24xA7UCPaz13ddF1tUaItI4mYBWfY4gde+3fIVXDitgmFxZZqb8YV68CvFgUt Hb8tlTiM0F2z/muGBIgJ5TN5XiB4dO0WgvcKvnQdzyNGPVlAXvowHPkaM9X+iEA1 y4U2Le7iK9+9fvkH7RM4O3hMiTmpKeUITYTWo1Y8n9LaZo3w5+pqhS+TPu75uyD0 pLb53EOzZmg1nu9hcac5t4G5W1Lr4ji2EekDXemi/571HAzQnMXxJWc6ZVYLDNfP W4D0UGcHAERDzrYwWcGn8HIThYlpbnVw9atSTTodJTiIubtsRt4haycUH1hqMS5o 4R2myhbAoM0A3zYBqrhwtQHg8apNes2hOR2WycAqgvylZZl1o6zaEs9zc7aafYuy N/M0x5tlya3fOgkvkJsmERT5jtqDVMhtBZ2xa8NYfJCHgULaUmjEx25eTr1kF3nW ERIX/3IayMvqHwYptP9dOzy2owLpXC8yZlM34AeM+ub93hHj1ELLfG7aN0bklD/+ wfmIX8HpOA2XGWflOk5fiHLHro6pwRU9zOIIHFJ4Tf60PMoN+rjRfej1fjz+KOhO gxUYaCb+/4BlCqLqdFvF54qhQO2qmVuOAg/1BLu+hnZtXSyhVJxePthSs5shyoE8 owL8rVXDGapjF1xO6WCR =UmFL -----END PGP SIGNATURE----- Merge tag 'pstore-v4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: "Improvements and fixes to pstore subsystem: - add additional checks for bad platform data - remove bounce buffer in console writer - protect read/unlink race with a mutex - correctly give up during dump locking failures - increase ftrace bandwidth by splitting ftrace buffers per CPU" * tag 'pstore-v4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: ramoops: add pdata NULL check to ramoops_probe pstore: Convert console write to use ->write_buf pstore: Protect unlink with read_mutex pstore: Use global ftrace filters for function trace filtering ftrace: Provide API to use global filtering for ftrace ops pstore: Clarify context field przs as dprzs pstore: improve error report for failed setup pstore: Merge per-CPU ftrace records into one pstore: Add ftrace timestamp counter ramoops: Split ftrace buffer space into per-CPU zones pstore: Make ramoops_init_przs generic for other prz arrays pstore: Allow prz to control need for locking pstore: Warn on PSTORE_TYPE_PMSG using deprecated function pstore: Make spinlock per zone instead of global pstore: Actually give up during locking failure	2016-12-13 09:16:11 -08:00
Linus Torvalds	9465d9cc31	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The time/timekeeping/timer folks deliver with this update: - Fix a reintroduced signed/unsigned issue and cleanup the whole signed/unsigned mess in the timekeeping core so this wont happen accidentaly again. - Add a new trace clock based on boot time - Prevent injection of random sleep times when PM tracing abuses the RTC for storage - Make posix timers configurable for real tiny systems - Add tracepoints for the alarm timer subsystem so timer based suspend wakeups can be instrumented - The usual pile of fixes and updates to core and drivers" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) timekeeping: Use mul_u64_u32_shr() instead of open coding it timekeeping: Get rid of pointless typecasts timekeeping: Make the conversion call chain consistently unsigned timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion alarmtimer: Add tracepoints for alarm timers trace: Update documentation for mono, mono_raw and boot clock trace: Add an option for boot clock as trace clock timekeeping: Add a fast and NMI safe boot clock timekeeping/clocksource_cyc2ns: Document intended range limitation timekeeping: Ignore the bogus sleep time if pm_trace is enabled selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous" clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map() arm64: dts: rockchip: Arch counter doesn't tick in system suspend clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend posix-timers: Make them configurable posix_cpu_timers: Move the add_device_randomness() call to a proper place timer: Move sys_alarm from timer.c to itimer.c ptp_clock: Allow for it to be optional Kconfig: Regenerate *.c_shipped files after previous changes ...	2016-12-12 19:56:15 -08:00
Linus Torvalds	e71c3978d6	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp hotplug updates from Thomas Gleixner: "This is the final round of converting the notifier mess to the state machine. The removal of the notifiers and the related infrastructure will happen around rc1, as there are conversions outstanding in other trees. The whole exercise removed about 2000 lines of code in total and in course of the conversion several dozen bugs got fixed. The new mechanism allows to test almost every hotplug step standalone, so usage sites can exercise all transitions extensively. There is more room for improvement, like integrating all the pointlessly different architecture mechanisms of synchronizing, setting cpus online etc into the core code" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) tracing/rb: Init the CPU mask on allocation soc/fsl/qbman: Convert to hotplug state machine soc/fsl/qbman: Convert to hotplug state machine zram: Convert to hotplug state machine KVM/PPC/Book3S HV: Convert to hotplug state machine arm64/cpuinfo: Convert to hotplug state machine arm64/cpuinfo: Make hotplug notifier symmetric mm/compaction: Convert to hotplug state machine iommu/vt-d: Convert to hotplug state machine mm/zswap: Convert pool to hotplug state machine mm/zswap: Convert dst-mem to hotplug state machine mm/zsmalloc: Convert to hotplug state machine mm/vmstat: Convert to hotplug state machine mm/vmstat: Avoid on each online CPU loops mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead() tracing/rb: Convert to hotplug state machine oprofile/nmi timer: Convert to hotplug state machine net/iucv: Use explicit clean up labels in iucv_init() x86/pci/amd-bus: Convert to hotplug state machine x86/oprofile/nmi: Convert to hotplug state machine ...	2016-12-12 19:25:04 -08:00
Marcin Nowakowski	d4d7ccc834	kprobes/trace: Fix kprobe selftest for newer gcc Commit 265a5b7ee3eb ("kprobes/trace: Fix kprobe selftest for gcc 4.6") has added __used attribute to kprobe_trace_selftest_target to ensure that the method is listed in kallsyms table. However, even though the method remains in the kernel image, the actual call is optimized away as there are no side effects and the return value is never checked. Add a return value check and a 'noinline' attribute to ensure that an inlined copy of the method is not used by the caller. Also add checks that verify that the kprobe was really hit, as at the moment the tests show positive results despite the test method being optimized away. Finally, add __init annotations to find_trace_probe_file() and kprobe_trace_selftest_target() as they are only called from within an __init method. Link: http://lkml.kernel.org/r/1481293178-3128-2-git-send-email-marcin.nowakowski@imgtec.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-12-12 21:21:43 -05:00
Marcin Nowakowski	f18f97ac43	tracing/kprobes: Add a helper method to return number of probe hits The number of probe hits is stored in a percpu variable and therefore can't be read directly. Add a helper method trace_kprobe_nhit() that performs the required calculation. It will be used in a follow-up commit that changes kprobe selftests to verify the number of probe hits. Link: http://lkml.kernel.org/r/1481293178-3128-1-git-send-email-marcin.nowakowski@imgtec.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-12-12 21:17:44 -05:00

... 2 3 4 5 6 ...

3461 Commits