2008-05-12 21:20:42 +02:00
# ifndef _LINUX_KERNEL_TRACE_H
# define _LINUX_KERNEL_TRACE_H
# include <linux/fs.h>
# include <asm/atomic.h>
# include <linux/sched.h>
# include <linux/clocksource.h>
2008-09-29 23:02:41 -04:00
# include <linux/ring_buffer.h>
ftrace: mmiotrace, updates
here is a patch that makes mmiotrace work almost well within the tracing
framework. The patch applies on top of my previous patch. I have my own
output formatting in place now.
Summary of changes:
- fix the NULL dereference that was due to not calling tracing_reset()
- add print_line() callback into struct tracer
- implement print_line() for mmiotrace, producing up-to-spec text
- add my output header, but that is not really called in the right place
- rewrote the main structs in mmiotrace
- added two new trace entry types: TRACE_MMIO_RW and TRACE_MMIO_MAP
- made some functions in trace.c non-static
- check current==NULL in tracing_generic_entry_update()
- fix(?) comparison in trace_seq_printf()
Things seem to work fine except a few issues. Markers (text lines injected
into mmiotrace log) are missing, I did not feel hacking them in before we
have variable length entries. My output header is printed only for 'trace'
file, but not 'trace_pipe'. For some reason, despite my quick fix,
iter->trace is NULL in print_trace_line() when called from 'trace_pipe'
file, which means I don't get proper output formatting.
I only tried by loading nouveau.ko, which just detects the card, and that
is traced fine. I didn't try further. Map, two reads and unmap. Works
perfectly.
I am missing the information about overflows, I'd prefer to have a
counter for lost events. I didn't try, but I guess currently there is no
way of knowning when it overflows?
So, not too far from being fully operational, it seems :-)
And looking at the diffstat, there also is some 700-900 lines of user space
code that just became obsolete.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:57 +02:00
# include <linux/mmiotrace.h>
2008-09-23 11:32:08 +01:00
# include <linux/ftrace.h>
2008-11-11 23:21:31 +01:00
# include <trace/boot.h>
2008-12-29 13:42:23 -08:00
# include <trace/kmemtrace.h>
2009-02-07 22:16:12 +01:00
# include <trace/power.h>
2008-05-12 21:20:42 +02:00
2008-05-23 21:37:28 +02:00
enum trace_type {
__TRACE_FIRST_TYPE = 0 ,
TRACE_FN ,
TRACE_CTX ,
TRACE_WAKE ,
TRACE_STACK ,
2008-08-01 12:26:41 -04:00
TRACE_PRINT ,
2009-03-12 18:24:49 +01:00
TRACE_BPRINT ,
2008-05-23 21:37:28 +02:00
TRACE_SPECIAL ,
ftrace: mmiotrace, updates
here is a patch that makes mmiotrace work almost well within the tracing
framework. The patch applies on top of my previous patch. I have my own
output formatting in place now.
Summary of changes:
- fix the NULL dereference that was due to not calling tracing_reset()
- add print_line() callback into struct tracer
- implement print_line() for mmiotrace, producing up-to-spec text
- add my output header, but that is not really called in the right place
- rewrote the main structs in mmiotrace
- added two new trace entry types: TRACE_MMIO_RW and TRACE_MMIO_MAP
- made some functions in trace.c non-static
- check current==NULL in tracing_generic_entry_update()
- fix(?) comparison in trace_seq_printf()
Things seem to work fine except a few issues. Markers (text lines injected
into mmiotrace log) are missing, I did not feel hacking them in before we
have variable length entries. My output header is printed only for 'trace'
file, but not 'trace_pipe'. For some reason, despite my quick fix,
iter->trace is NULL in print_trace_line() when called from 'trace_pipe'
file, which means I don't get proper output formatting.
I only tried by loading nouveau.ko, which just detects the card, and that
is traced fine. I didn't try further. Map, two reads and unmap. Works
perfectly.
I am missing the information about overflows, I'd prefer to have a
counter for lost events. I didn't try, but I guess currently there is no
way of knowning when it overflows?
So, not too far from being fully operational, it seems :-)
And looking at the diffstat, there also is some 700-900 lines of user space
code that just became obsolete.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:57 +02:00
TRACE_MMIO_RW ,
TRACE_MMIO_MAP ,
2008-11-12 15:24:24 -05:00
TRACE_BRANCH ,
2008-11-11 23:24:42 +01:00
TRACE_BOOT_CALL ,
TRACE_BOOT_RET ,
2008-11-26 00:57:25 +01:00
TRACE_GRAPH_RET ,
TRACE_GRAPH_ENT ,
2008-11-22 13:28:47 +02:00
TRACE_USER_STACK ,
2008-12-11 13:53:26 +01:00
TRACE_HW_BRANCHES ,
2009-03-07 05:52:59 +01:00
TRACE_SYSCALL_ENTER ,
TRACE_SYSCALL_EXIT ,
2008-12-29 13:42:23 -08:00
TRACE_KMEM_ALLOC ,
TRACE_KMEM_FREE ,
2008-11-23 16:49:58 -08:00
TRACE_POWER ,
blktrace: add ftrace plugin
Impact: New way of using the blktrace infrastructure
This drops the requirement of userspace utilities to use the blktrace
facility.
Configuration is done thru sysfs, adding a "trace" directory to the
partition directory where blktrace can be enabled for the associated
request_queue.
The same filters present in the IOCTL interface are present as sysfs
device attributes.
The /sys/block/sdX/sdXN/trace/enable file allows tracing without any
filters.
The other files in this directory: pid, act_mask, start_lba and end_lba
can be used with the same meaning as with the IOCTL interface.
Using the sysfs interface will only setup the request_queue->blk_trace
fields, tracing will only take place when the "blk" tracer is selected
via the ftrace interface, as in the following example:
To see the trace, one can use the /d/tracing/trace file or the
/d/tracign/trace_pipe file, with semantics defined in the ftrace
documentation in Documentation/ftrace.txt.
[root@f10-1 ~]# cat /t/trace
kjournald-305 [000] 3046.491224: 8,1 A WBS 6367 + 8 <- (8,1) 6304
kjournald-305 [000] 3046.491227: 8,1 Q R 6367 + 8 [kjournald]
kjournald-305 [000] 3046.491236: 8,1 G RB 6367 + 8 [kjournald]
kjournald-305 [000] 3046.491239: 8,1 P NS [kjournald]
kjournald-305 [000] 3046.491242: 8,1 I RBS 6367 + 8 [kjournald]
kjournald-305 [000] 3046.491251: 8,1 D WB 6367 + 8 [kjournald]
kjournald-305 [000] 3046.491610: 8,1 U WS [kjournald] 1
<idle>-0 [000] 3046.511914: 8,1 C RS 6367 + 8 [6367]
[root@f10-1 ~]#
The default line context (prefix) format is the one described in the ftrace
documentation, with the blktrace specific bits using its existing format,
described in blkparse(8).
If one wants to have the classic blktrace formatting, this is possible by
using:
[root@f10-1 ~]# echo blk_classic > /t/trace_options
[root@f10-1 ~]# cat /t/trace
8,1 0 3046.491224 305 A WBS 6367 + 8 <- (8,1) 6304
8,1 0 3046.491227 305 Q R 6367 + 8 [kjournald]
8,1 0 3046.491236 305 G RB 6367 + 8 [kjournald]
8,1 0 3046.491239 305 P NS [kjournald]
8,1 0 3046.491242 305 I RBS 6367 + 8 [kjournald]
8,1 0 3046.491251 305 D WB 6367 + 8 [kjournald]
8,1 0 3046.491610 305 U WS [kjournald] 1
8,1 0 3046.511914 0 C RS 6367 + 8 [6367]
[root@f10-1 ~]#
Using the ftrace standard format allows more flexibility, such
as the ability of asking for backtraces via trace_options:
[root@f10-1 ~]# echo noblk_classic > /t/trace_options
[root@f10-1 ~]# echo stacktrace > /t/trace_options
[root@f10-1 ~]# cat /t/trace
kjournald-305 [000] 3318.826779: 8,1 A WBS 6375 + 8 <- (8,1) 6312
kjournald-305 [000] 3318.826782:
<= submit_bio
<= submit_bh
<= sync_dirty_buffer
<= journal_commit_transaction
<= kjournald
<= kthread
<= child_rip
kjournald-305 [000] 3318.826836: 8,1 Q R 6375 + 8 [kjournald]
kjournald-305 [000] 3318.826837:
<= generic_make_request
<= submit_bio
<= submit_bh
<= sync_dirty_buffer
<= journal_commit_transaction
<= kjournald
<= kthread
Please read the ftrace documentation to use aditional, standardized
tracing filters such as /d/tracing/trace_cpumask, etc.
See also /d/tracing/trace_mark to add comments in the trace stream,
that is equivalent to the /d/block/sdaN/msg interface.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 12:06:27 -02:00
TRACE_BLK ,
2008-05-23 21:37:28 +02:00
2008-12-23 23:24:12 -05:00
__TRACE_LAST_TYPE ,
2008-05-23 21:37:28 +02:00
} ;
2008-09-29 23:02:42 -04:00
/*
* The trace entry - the most basic unit of tracing . This is what
* is printed in the end as a single line in the trace output , such as :
*
* bash - 15816 [ 01 ] 235.197585 : idle_cpu < - irq_enter
*/
struct trace_entry {
unsigned char type ;
unsigned char flags ;
unsigned char preempt_count ;
int pid ;
2008-11-22 13:28:47 +02:00
int tgid ;
2008-09-29 23:02:42 -04:00
} ;
2008-05-12 21:20:42 +02:00
/*
* Function trace entry - function address and parent function addres :
*/
struct ftrace_entry {
2008-09-29 23:02:42 -04:00
struct trace_entry ent ;
2008-05-12 21:20:42 +02:00
unsigned long ip ;
unsigned long parent_ip ;
} ;
2008-11-11 07:14:25 +01:00
2008-11-26 00:57:25 +01:00
/* Function call entry */
struct ftrace_graph_ent_entry {
2009-02-06 18:30:44 +01:00
struct trace_entry ent ;
2008-11-26 00:57:25 +01:00
struct ftrace_graph_ent graph_ent ;
} ;
2008-11-11 07:14:25 +01:00
/* Function return entry */
2008-11-26 00:57:25 +01:00
struct ftrace_graph_ret_entry {
2009-02-06 18:30:44 +01:00
struct trace_entry ent ;
2008-11-26 00:57:25 +01:00
struct ftrace_graph_ret ret ;
2008-11-11 07:14:25 +01:00
} ;
2008-09-23 11:32:08 +01:00
extern struct tracer boot_tracer ;
2008-05-12 21:20:42 +02:00
/*
* Context switch trace entry - which task ( and prio ) we switched from / to :
*/
struct ctx_switch_entry {
2008-09-29 23:02:42 -04:00
struct trace_entry ent ;
2008-05-12 21:20:42 +02:00
unsigned int prev_pid ;
unsigned char prev_prio ;
unsigned char prev_state ;
unsigned int next_pid ;
unsigned char next_prio ;
2008-05-12 21:20:53 +02:00
unsigned char next_state ;
2008-09-04 10:24:16 +02:00
unsigned int next_cpu ;
2008-05-12 21:20:42 +02:00
} ;
2008-05-12 21:20:47 +02:00
/*
* Special ( free - form ) trace entry :
*/
struct special_entry {
2008-09-29 23:02:42 -04:00
struct trace_entry ent ;
2008-05-12 21:20:47 +02:00
unsigned long arg1 ;
unsigned long arg2 ;
unsigned long arg3 ;
} ;
2008-05-12 21:20:51 +02:00
/*
* Stack - trace entry :
*/
2008-05-12 21:21:15 +02:00
# define FTRACE_STACK_ENTRIES 8
2008-05-12 21:20:51 +02:00
struct stack_entry {
2008-09-29 23:02:42 -04:00
struct trace_entry ent ;
2008-05-12 21:20:51 +02:00
unsigned long caller [ FTRACE_STACK_ENTRIES ] ;
} ;
2008-11-22 13:28:47 +02:00
struct userstack_entry {
struct trace_entry ent ;
unsigned long caller [ FTRACE_STACK_ENTRIES ] ;
} ;
2008-08-01 12:26:41 -04:00
/*
2009-03-05 10:24:48 +01:00
* trace_printk entry :
2008-08-01 12:26:41 -04:00
*/
2009-03-12 18:24:49 +01:00
struct bprint_entry {
2008-09-29 23:02:42 -04:00
struct trace_entry ent ;
2009-03-06 17:52:03 +01:00
unsigned long ip ;
2009-03-06 17:21:49 +01:00
const char * fmt ;
2009-03-06 17:52:03 +01:00
u32 buf [ ] ;
2009-03-06 17:21:47 +01:00
} ;
2009-03-12 18:24:49 +01:00
struct print_entry {
struct trace_entry ent ;
unsigned long ip ;
char buf [ ] ;
} ;
2008-09-29 23:02:42 -04:00
# define TRACE_OLD_SIZE 88
struct trace_field_cont {
unsigned char type ;
/* Temporary till we get rid of this completely */
char buf [ TRACE_OLD_SIZE - 1 ] ;
} ;
struct trace_mmiotrace_rw {
struct trace_entry ent ;
struct mmiotrace_rw rw ;
} ;
struct trace_mmiotrace_map {
struct trace_entry ent ;
struct mmiotrace_map map ;
} ;
2008-11-11 23:24:42 +01:00
struct trace_boot_call {
2008-09-29 23:02:42 -04:00
struct trace_entry ent ;
2008-11-11 23:24:42 +01:00
struct boot_trace_call boot_call ;
} ;
struct trace_boot_ret {
struct trace_entry ent ;
struct boot_trace_ret boot_ret ;
2008-09-29 23:02:42 -04:00
} ;
2008-11-12 00:14:40 -05:00
# define TRACE_FUNC_SIZE 30
# define TRACE_FILE_SIZE 20
2008-11-12 15:24:24 -05:00
struct trace_branch {
2008-11-12 00:14:40 -05:00
struct trace_entry ent ;
unsigned line ;
char func [ TRACE_FUNC_SIZE + 1 ] ;
char file [ TRACE_FILE_SIZE + 1 ] ;
char correct ;
} ;
2008-12-11 13:53:26 +01:00
struct hw_branch_entry {
2008-11-25 09:24:15 +01:00
struct trace_entry ent ;
2008-12-11 13:53:26 +01:00
u64 from ;
u64 to ;
2008-11-25 09:24:15 +01:00
} ;
2008-11-23 16:49:58 -08:00
struct trace_power {
struct trace_entry ent ;
struct power_trace state_data ;
} ;
2008-12-29 13:42:23 -08:00
struct kmemtrace_alloc_entry {
struct trace_entry ent ;
enum kmemtrace_type_id type_id ;
unsigned long call_site ;
const void * ptr ;
size_t bytes_req ;
size_t bytes_alloc ;
gfp_t gfp_flags ;
int node ;
} ;
struct kmemtrace_free_entry {
struct trace_entry ent ;
enum kmemtrace_type_id type_id ;
unsigned long call_site ;
const void * ptr ;
} ;
2009-03-13 15:42:11 +01:00
struct syscall_trace_enter {
struct trace_entry ent ;
int nr ;
unsigned long args [ ] ;
} ;
struct syscall_trace_exit {
struct trace_entry ent ;
int nr ;
unsigned long ret ;
} ;
2008-09-16 22:02:27 +03:00
/*
* trace_flag_type is an enumeration that holds different
* states when a trace occurs . These are :
2008-10-24 09:42:59 -04:00
* IRQS_OFF - interrupts were disabled
2009-03-06 17:52:03 +01:00
* IRQS_NOSUPPORT - arch does not support irqs_disabled_flags
2008-10-24 09:42:59 -04:00
* NEED_RESCED - reschedule is requested
* HARDIRQ - inside an interrupt handler
* SOFTIRQ - inside a softirq handler
2008-09-16 22:02:27 +03:00
*/
enum trace_flag_type {
TRACE_FLAG_IRQS_OFF = 0x01 ,
2008-10-24 09:42:59 -04:00
TRACE_FLAG_IRQS_NOSUPPORT = 0x02 ,
TRACE_FLAG_NEED_RESCHED = 0x04 ,
TRACE_FLAG_HARDIRQ = 0x08 ,
TRACE_FLAG_SOFTIRQ = 0x10 ,
2008-09-16 22:02:27 +03:00
} ;
2008-09-16 22:06:42 +03:00
# define TRACE_BUF_SIZE 1024
2008-05-12 21:20:42 +02:00
/*
* The CPU trace array - it consists of thousands of trace entries
* plus some other descriptor data : ( for example which task started
* the trace , etc . )
*/
struct trace_array_cpu {
atomic_t disabled ;
2008-12-01 22:20:19 -05:00
void * buffer_page ; /* ring buffer spare */
2008-05-12 21:20:45 +02:00
2008-05-12 21:20:45 +02:00
/* these fields get copied into max-trace: */
unsigned long trace_idx ;
2008-05-12 21:21:01 +02:00
unsigned long overrun ;
2008-05-12 21:20:42 +02:00
unsigned long saved_latency ;
unsigned long critical_start ;
unsigned long critical_end ;
unsigned long critical_sequence ;
unsigned long nice ;
unsigned long policy ;
unsigned long rt_priority ;
cycle_t preempt_timestamp ;
pid_t pid ;
uid_t uid ;
char comm [ TASK_COMM_LEN ] ;
} ;
struct trace_iterator ;
/*
* The trace array - an array of per - CPU trace arrays . This is the
* highest level data structure that individual tracers deal with .
* They have on / off state as well :
*/
struct trace_array {
2008-09-29 23:02:41 -04:00
struct ring_buffer * buffer ;
2008-05-12 21:20:42 +02:00
unsigned long entries ;
int cpu ;
cycle_t time_start ;
2008-05-12 21:20:46 +02:00
struct task_struct * waiter ;
2008-05-12 21:20:42 +02:00
struct trace_array_cpu * data [ NR_CPUS ] ;
} ;
2008-10-01 10:52:51 -04:00
# define FTRACE_CMP_TYPE(var, type) \
__builtin_types_compatible_p ( typeof ( var ) , type * )
# undef IF_ASSIGN
# define IF_ASSIGN(var, entry, etype, id) \
if ( FTRACE_CMP_TYPE ( var , etype ) ) { \
var = ( typeof ( var ) ) ( entry ) ; \
WARN_ON ( id & & ( entry ) - > type ! = id ) ; \
break ; \
}
/* Will cause compile errors if type is not found. */
extern void __ftrace_bad_type ( void ) ;
/*
* The trace_assign_type is a verifier that the entry type is
* the same as the type being assigned . To add new types simply
* add a line with the following format :
*
* IF_ASSIGN ( var , ent , type , id ) ;
*
* Where " type " is the trace type that includes the trace_entry
* as the " ent " item . And " id " is the trace identifier that is
* used in the trace_type enum .
*
* If the type can have more than one id , then use zero .
*/
# define trace_assign_type(var, ent) \
do { \
IF_ASSIGN ( var , ent , struct ftrace_entry , TRACE_FN ) ; \
IF_ASSIGN ( var , ent , struct ctx_switch_entry , 0 ) ; \
IF_ASSIGN ( var , ent , struct stack_entry , TRACE_STACK ) ; \
2008-11-22 13:28:47 +02:00
IF_ASSIGN ( var , ent , struct userstack_entry , TRACE_USER_STACK ) ; \
2008-10-01 10:52:51 -04:00
IF_ASSIGN ( var , ent , struct print_entry , TRACE_PRINT ) ; \
2009-03-12 18:24:49 +01:00
IF_ASSIGN ( var , ent , struct bprint_entry , TRACE_BPRINT ) ; \
2008-10-01 10:52:51 -04:00
IF_ASSIGN ( var , ent , struct special_entry , 0 ) ; \
IF_ASSIGN ( var , ent , struct trace_mmiotrace_rw , \
TRACE_MMIO_RW ) ; \
IF_ASSIGN ( var , ent , struct trace_mmiotrace_map , \
TRACE_MMIO_MAP ) ; \
2008-11-11 23:24:42 +01:00
IF_ASSIGN ( var , ent , struct trace_boot_call , TRACE_BOOT_CALL ) ; \
IF_ASSIGN ( var , ent , struct trace_boot_ret , TRACE_BOOT_RET ) ; \
2008-11-12 15:24:24 -05:00
IF_ASSIGN ( var , ent , struct trace_branch , TRACE_BRANCH ) ; \
2008-11-26 00:57:25 +01:00
IF_ASSIGN ( var , ent , struct ftrace_graph_ent_entry , \
TRACE_GRAPH_ENT ) ; \
IF_ASSIGN ( var , ent , struct ftrace_graph_ret_entry , \
TRACE_GRAPH_RET ) ; \
2008-12-11 13:53:26 +01:00
IF_ASSIGN ( var , ent , struct hw_branch_entry , TRACE_HW_BRANCHES ) ; \
2009-03-06 17:52:03 +01:00
IF_ASSIGN ( var , ent , struct trace_power , TRACE_POWER ) ; \
2008-12-29 13:42:23 -08:00
IF_ASSIGN ( var , ent , struct kmemtrace_alloc_entry , \
TRACE_KMEM_ALLOC ) ; \
IF_ASSIGN ( var , ent , struct kmemtrace_free_entry , \
TRACE_KMEM_FREE ) ; \
2009-03-13 15:42:11 +01:00
IF_ASSIGN ( var , ent , struct syscall_trace_enter , \
TRACE_SYSCALL_ENTER ) ; \
IF_ASSIGN ( var , ent , struct syscall_trace_exit , \
TRACE_SYSCALL_EXIT ) ; \
2008-10-01 10:52:51 -04:00
__ftrace_bad_type ( ) ; \
} while ( 0 )
2008-09-29 20:18:34 +02:00
/* Return values for print_line callback */
enum print_line_t {
TRACE_TYPE_PARTIAL_LINE = 0 , /* Retry after flushing the seq */
TRACE_TYPE_HANDLED = 1 ,
2009-02-06 18:30:44 +01:00
TRACE_TYPE_UNHANDLED = 2 , /* Relay to other output functions */
TRACE_TYPE_NO_CONSUME = 3 /* Handled but ask to not consume */
2008-09-29 20:18:34 +02:00
} ;
2008-11-17 19:23:42 +01:00
/*
* An option specific to a tracer . This is a boolean value .
* The bit is the bit index that sets its value on the
* flags value in struct tracer_flags .
*/
struct tracer_opt {
2009-03-06 17:52:03 +01:00
const char * name ; /* Will appear on the trace_options file */
u32 bit ; /* Mask assigned in val field in tracer_flags */
2008-11-17 19:23:42 +01:00
} ;
/*
* The set of specific options for a tracer . Your tracer
* have to set the initial value of the flags val .
*/
struct tracer_flags {
u32 val ;
2009-03-06 17:52:03 +01:00
struct tracer_opt * opts ;
2008-11-17 19:23:42 +01:00
} ;
/* Makes more easy to define a tracer opt */
# define TRACER_OPT(s, b) .name = #s, .bit = b
2009-01-08 10:03:56 -08:00
2009-02-11 02:25:00 +01:00
/**
* struct tracer - a specific tracer and its callbacks to interact with debugfs
* @ name : the name chosen to select it on the available_tracers file
* @ init : called when one switches to this tracer ( echo name > current_tracer )
* @ reset : called when one switches to another tracer
* @ start : called when tracing is unpaused ( echo 1 > tracing_enabled )
* @ stop : called when tracing is paused ( echo 0 > tracing_enabled )
* @ open : called when the trace file is opened
* @ pipe_open : called when the trace_pipe file is opened
* @ wait_pipe : override how the user waits for traces on trace_pipe
* @ close : called when the trace file is released
* @ read : override the default read callback on trace_pipe
* @ splice_read : override the default splice_read callback on trace_pipe
* @ selftest : selftest to run on boot ( see trace_selftest . c )
* @ print_headers : override the first lines that describe your columns
* @ print_line : callback that prints a trace
* @ set_flag : signals one of your private flags changed ( trace_options file )
* @ flags : your private flags
2008-05-12 21:20:42 +02:00
*/
struct tracer {
const char * name ;
2008-11-16 05:57:26 +01:00
int ( * init ) ( struct trace_array * tr ) ;
2008-05-12 21:20:42 +02:00
void ( * reset ) ( struct trace_array * tr ) ;
ftrace: restructure tracing start/stop infrastructure
Impact: change where tracing is started up and stopped
Currently, when a new tracer is selected via echo'ing a tracer name into
the current_tracer file, the startup is only done if tracing_enabled is
set to one. If tracing_enabled is changed to zero (by echo'ing 0 into
the tracing_enabled file) a full shutdown is performed.
The full startup and shutdown of a tracer can be expensive and the
user can lose out traces when echo'ing in 0 to the tracing_enabled file,
because the process takes too long. There can also be places that
the user would like to start and stop the tracer several times and
doing the full startup and shutdown of a tracer might be too expensive.
This patch performs the full startup and shutdown when a tracer is
selected. It also adds a way to do a quick start or stop of a tracer.
The quick version is just a flag that prevents the tracing from
taking place, but the overhead of the code is still there.
For example, the startup of a tracer may enable tracepoints, or enable
the function tracer. The stop and start will just set a flag to
have the tracer ignore the calls when the tracepoint or function trace
is called. The overhead of the tracer may still be present when
the tracer is stopped, but no tracing will occur. Setting the tracer
to the 'nop' tracer (or any other tracer) will perform the shutdown
of the tracer which will disable the tracepoint or disable the
function tracer.
The tracing_enabled file will simply start or stop tracing.
This change is all internal. The end result for the user should be the same
as before. If tracing_enabled is not set, no trace will happen.
If tracing_enabled is set, then the trace will happen. The tracing_enabled
variable is static between tracers. Enabling tracing_enabled and
going to another tracer will keep tracing_enabled enabled. Same
is true with disabling tracing_enabled.
This patch will now provide a fast start/stop method to the users
for enabling or disabling tracing.
Note: There were two methods to the struct tracer that were never
used: The methods start and stop. These were to be used as a hook
to the reading of the trace output, but ended up not being
necessary. These two methods are now used to enable the start
and stop of each tracer, in case the tracer needs to do more than
just not write into the buffer. For example, the irqsoff tracer
must stop recording max latencies when tracing is stopped.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05 16:05:44 -05:00
void ( * start ) ( struct trace_array * tr ) ;
void ( * stop ) ( struct trace_array * tr ) ;
2008-05-12 21:20:42 +02:00
void ( * open ) ( struct trace_iterator * iter ) ;
2008-05-12 21:21:01 +02:00
void ( * pipe_open ) ( struct trace_iterator * iter ) ;
2009-02-11 02:25:00 +01:00
void ( * wait_pipe ) ( struct trace_iterator * iter ) ;
2008-05-12 21:20:42 +02:00
void ( * close ) ( struct trace_iterator * iter ) ;
2008-05-12 21:21:01 +02:00
ssize_t ( * read ) ( struct trace_iterator * iter ,
struct file * filp , char __user * ubuf ,
size_t cnt , loff_t * ppos ) ;
2009-02-09 08:15:56 +02:00
ssize_t ( * splice_read ) ( struct trace_iterator * iter ,
struct file * filp ,
loff_t * ppos ,
struct pipe_inode_info * pipe ,
size_t len ,
unsigned int flags ) ;
2008-05-12 21:20:44 +02:00
# ifdef CONFIG_FTRACE_STARTUP_TEST
int ( * selftest ) ( struct tracer * trace ,
struct trace_array * tr ) ;
# endif
2008-11-25 09:12:31 +01:00
void ( * print_header ) ( struct seq_file * m ) ;
2008-09-29 20:18:34 +02:00
enum print_line_t ( * print_line ) ( struct trace_iterator * iter ) ;
2008-11-17 19:23:42 +01:00
/* If you handled the flag setting, return 0 */
int ( * set_flag ) ( u32 old_flags , u32 bit , int set ) ;
2008-05-12 21:20:42 +02:00
struct tracer * next ;
int print_max ;
2009-03-06 17:52:03 +01:00
struct tracer_flags * flags ;
2009-01-08 10:03:56 -08:00
struct tracer_stat * stats ;
2008-05-12 21:20:42 +02:00
} ;
2008-05-12 21:20:46 +02:00
struct trace_seq {
unsigned char buffer [ PAGE_SIZE ] ;
unsigned int len ;
2008-05-12 21:21:02 +02:00
unsigned int readpos ;
2008-05-12 21:20:46 +02:00
} ;
2009-03-02 14:04:40 -05:00
static inline void
trace_seq_init ( struct trace_seq * s )
{
s - > len = 0 ;
s - > readpos = 0 ;
}
tracing/core: introduce per cpu tracing files
Impact: split up tracing output per cpu
Currently, on the tracing debugfs directory, three files are
available to the user to let him extracting the trace output:
- trace is an iterator through the ring-buffer. It's a reader
but not a consumer It doesn't block when no more traces are
available.
- trace pretty similar to the former, except that it adds more
informations such as prempt count, irq flag, ...
- trace_pipe is a reader and a consumer, it will also block
waiting for traces if necessary (heh, yes it's a pipe).
The traces coming from different cpus are curretly mixed up
inside these files. Sometimes it messes up the informations,
sometimes it's useful, depending on what does the tracer
capture.
The tracing_cpumask file is useful to filter the output and
select only the traces captured a custom defined set of cpus.
But still it is not enough powerful to extract at the same time
one trace buffer per cpu.
So this patch creates a new directory: /debug/tracing/per_cpu/.
Inside this directory, you will now find one trace_pipe file and
one trace file per cpu.
Which means if you have two cpus, you will have:
trace0
trace1
trace_pipe0
trace_pipe1
And of course, reading these files will have the same effect
than with the usual tracing files, except that you will only see
the traces from the given cpu.
The original all-in-one cpu trace file are still available on
their original place.
Until now, only one consumer was allowed on trace_pipe to avoid
racy consuming on the ring-buffer. Now the approach changed a
bit, you can have only one consumer per cpu.
Which means you are allowed to read concurrently trace_pipe0 and
trace_pipe1 But you can't have two readers on trace_pipe0 or
trace_pipe1.
Following the same logic, if there is one reader on the common
trace_pipe, you can not have at the same time another reader on
trace_pipe0 or in trace_pipe1. Because in trace_pipe is already
a consumer in all cpu buffers in essence.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 03:22:28 +01:00
# define TRACE_PIPE_ALL_CPU -1
2008-05-12 21:20:42 +02:00
/*
* Trace iterator - used by printout routines who present trace
* results to users and which routines might sleep , etc :
*/
struct trace_iterator {
struct trace_array * tr ;
struct tracer * trace ;
2008-05-12 21:21:01 +02:00
void * private ;
tracing/core: introduce per cpu tracing files
Impact: split up tracing output per cpu
Currently, on the tracing debugfs directory, three files are
available to the user to let him extracting the trace output:
- trace is an iterator through the ring-buffer. It's a reader
but not a consumer It doesn't block when no more traces are
available.
- trace pretty similar to the former, except that it adds more
informations such as prempt count, irq flag, ...
- trace_pipe is a reader and a consumer, it will also block
waiting for traces if necessary (heh, yes it's a pipe).
The traces coming from different cpus are curretly mixed up
inside these files. Sometimes it messes up the informations,
sometimes it's useful, depending on what does the tracer
capture.
The tracing_cpumask file is useful to filter the output and
select only the traces captured a custom defined set of cpus.
But still it is not enough powerful to extract at the same time
one trace buffer per cpu.
So this patch creates a new directory: /debug/tracing/per_cpu/.
Inside this directory, you will now find one trace_pipe file and
one trace file per cpu.
Which means if you have two cpus, you will have:
trace0
trace1
trace_pipe0
trace_pipe1
And of course, reading these files will have the same effect
than with the usual tracing files, except that you will only see
the traces from the given cpu.
The original all-in-one cpu trace file are still available on
their original place.
Until now, only one consumer was allowed on trace_pipe to avoid
racy consuming on the ring-buffer. Now the approach changed a
bit, you can have only one consumer per cpu.
Which means you are allowed to read concurrently trace_pipe0 and
trace_pipe1 But you can't have two readers on trace_pipe0 or
trace_pipe1.
Following the same logic, if there is one reader on the common
trace_pipe, you can not have at the same time another reader on
trace_pipe0 or in trace_pipe1. Because in trace_pipe is already
a consumer in all cpu buffers in essence.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 03:22:28 +01:00
int cpu_file ;
tracing/core: make the read callbacks reentrants
Now that several per-cpu files can be read or spliced at the
same, we want the read/splice callbacks for tracing files to be
reentrants.
Until now, a single global mutex (trace_types_lock) serialized
the access to tracing_read_pipe(), tracing_splice_read_pipe(),
and the seq helpers.
Ie: it means that if a user tries to read trace_pipe0 and
trace_pipe1 at the same time, the access to the function
tracing_read_pipe() is contended and one reader must wait for
the other to finish its read call.
The trace_type_lock mutex is mostly here to serialize the access
to the global current tracer (current_trace), which can be
changed concurrently. Although the iter struct keeps a private
pointer to this tracer, its callbacks can be changed by another
function.
The method used here is to not keep anymore private reference to
the tracer inside the iterator but to make a copy of it inside
the iterator. Then it checks on subsequents read calls if the
tracer has changed. This is not costly because the current
tracer is not expected to be changed often, so we use a branch
prediction for that.
Moreover, we add a private mutex to the iterator (there is one
iterator per file descriptor) to serialize the accesses in case
of multiple consumers per file descriptor (which would be a
silly idea from the user). Note that this is not to protect the
ring buffer, since the ring buffer already serializes the
readers accesses. This is to prevent from traces weirdness in
case of concurrent consumers. But these mutexes can be dropped
anyway, that would not result in any crash. Just tell me what
you think about it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 06:13:16 +01:00
struct mutex mutex ;
struct ring_buffer_iter * buffer_iter [ NR_CPUS ] ;
2008-05-12 21:20:45 +02:00
2008-05-12 21:21:01 +02:00
/* The below is zeroed out in pipe_read */
struct trace_seq seq ;
2008-05-12 21:20:42 +02:00
struct trace_entry * ent ;
2008-05-12 21:20:45 +02:00
int cpu ;
2008-09-29 23:02:41 -04:00
u64 ts ;
2008-05-12 21:20:45 +02:00
2008-05-12 21:20:42 +02:00
unsigned long iter_flags ;
loff_t pos ;
2008-05-12 21:20:43 +02:00
long idx ;
2008-11-07 22:36:02 -05:00
2009-01-01 10:12:23 +10:30
cpumask_var_t started ;
2008-05-12 21:20:42 +02:00
} ;
2009-02-05 18:02:00 -02:00
int tracer_init ( struct tracer * t , struct trace_array * tr ) ;
ftrace: restructure tracing start/stop infrastructure
Impact: change where tracing is started up and stopped
Currently, when a new tracer is selected via echo'ing a tracer name into
the current_tracer file, the startup is only done if tracing_enabled is
set to one. If tracing_enabled is changed to zero (by echo'ing 0 into
the tracing_enabled file) a full shutdown is performed.
The full startup and shutdown of a tracer can be expensive and the
user can lose out traces when echo'ing in 0 to the tracing_enabled file,
because the process takes too long. There can also be places that
the user would like to start and stop the tracer several times and
doing the full startup and shutdown of a tracer might be too expensive.
This patch performs the full startup and shutdown when a tracer is
selected. It also adds a way to do a quick start or stop of a tracer.
The quick version is just a flag that prevents the tracing from
taking place, but the overhead of the code is still there.
For example, the startup of a tracer may enable tracepoints, or enable
the function tracer. The stop and start will just set a flag to
have the tracer ignore the calls when the tracepoint or function trace
is called. The overhead of the tracer may still be present when
the tracer is stopped, but no tracing will occur. Setting the tracer
to the 'nop' tracer (or any other tracer) will perform the shutdown
of the tracer which will disable the tracepoint or disable the
function tracer.
The tracing_enabled file will simply start or stop tracing.
This change is all internal. The end result for the user should be the same
as before. If tracing_enabled is not set, no trace will happen.
If tracing_enabled is set, then the trace will happen. The tracing_enabled
variable is static between tracers. Enabling tracing_enabled and
going to another tracer will keep tracing_enabled enabled. Same
is true with disabling tracing_enabled.
This patch will now provide a fast start/stop method to the users
for enabling or disabling tracing.
Note: There were two methods to the struct tracer that were never
used: The methods start and stop. These were to be used as a hook
to the reading of the trace output, but ended up not being
necessary. These two methods are now used to enable the start
and stop of each tracer, in case the tracer needs to do more than
just not write into the buffer. For example, the irqsoff tracer
must stop recording max latencies when tracing is stopped.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05 16:05:44 -05:00
int tracing_is_enabled ( void ) ;
2008-09-16 21:56:41 +03:00
void trace_wake_up ( void ) ;
2008-09-29 23:02:41 -04:00
void tracing_reset ( struct trace_array * tr , int cpu ) ;
2008-12-19 12:08:39 +02:00
void tracing_reset_online_cpus ( struct trace_array * tr ) ;
2008-05-12 21:20:42 +02:00
int tracing_open_generic ( struct inode * inode , struct file * filp ) ;
struct dentry * tracing_init_dentry ( void ) ;
2008-05-12 21:20:49 +02:00
void init_tracer_sysprof_debugfs ( struct dentry * d_tracer ) ;
tracing: Introduce trace_buffer_{lock_reserve,unlock_commit}
Impact: new API
These new functions do what previously was being open coded, reducing
the number of details ftrace plugin writers have to worry about.
It also standardizes the handling of stacktrace, userstacktrace and
other trace options we may introduce in the future.
With this patch, for instance, the blk tracer (and some others already
in the tree) can use the "userstacktrace" /d/tracing/trace_options
facility.
$ codiff /tmp/vmlinux.before /tmp/vmlinux.after
linux-2.6-tip/kernel/trace/trace.c:
trace_vprintk | -5
trace_graph_return | -22
trace_graph_entry | -26
trace_function | -45
__ftrace_trace_stack | -27
ftrace_trace_userstack | -29
tracing_sched_switch_trace | -66
tracing_stop | +1
trace_seq_to_user | -1
ftrace_trace_special | -63
ftrace_special | +1
tracing_sched_wakeup_trace | -70
tracing_reset_online_cpus | -1
13 functions changed, 2 bytes added, 355 bytes removed, diff: -353
linux-2.6-tip/block/blktrace.c:
__blk_add_trace | -58
1 function changed, 58 bytes removed, diff: -58
linux-2.6-tip/kernel/trace/trace.c:
trace_buffer_lock_reserve | +88
trace_buffer_unlock_commit | +86
2 functions changed, 174 bytes added, diff: +174
/tmp/vmlinux.after:
16 functions changed, 176 bytes added, 413 bytes removed, diff: -237
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Frédéric Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-05 16:14:13 -02:00
struct ring_buffer_event ;
struct ring_buffer_event * trace_buffer_lock_reserve ( struct trace_array * tr ,
unsigned char type ,
unsigned long len ,
unsigned long flags ,
int pc ) ;
void trace_buffer_unlock_commit ( struct trace_array * tr ,
struct ring_buffer_event * event ,
unsigned long flags , int pc ) ;
2009-02-27 19:38:04 -05:00
struct ring_buffer_event *
trace_current_buffer_lock_reserve ( unsigned char type , unsigned long len ,
unsigned long flags , int pc ) ;
void trace_current_buffer_unlock_commit ( struct ring_buffer_event * event ,
unsigned long flags , int pc ) ;
2008-09-16 21:56:41 +03:00
struct trace_entry * tracing_get_trace_entry ( struct trace_array * tr ,
struct trace_array_cpu * data ) ;
2009-02-02 20:29:21 -02:00
struct trace_entry * trace_find_next_entry ( struct trace_iterator * iter ,
int * ent_cpu , u64 * ent_ts ) ;
2008-09-16 21:56:41 +03:00
void tracing_generic_entry_update ( struct trace_entry * entry ,
2008-10-01 13:14:09 -04:00
unsigned long flags ,
int pc ) ;
2008-09-16 21:56:41 +03:00
2009-02-11 02:25:00 +01:00
void default_wait_pipe ( struct trace_iterator * iter ) ;
void poll_wait_pipe ( struct trace_iterator * iter ) ;
2008-05-12 21:20:42 +02:00
void ftrace ( struct trace_array * tr ,
struct trace_array_cpu * data ,
unsigned long ip ,
unsigned long parent_ip ,
2008-10-01 13:14:09 -04:00
unsigned long flags , int pc ) ;
2008-05-12 21:20:42 +02:00
void tracing_sched_switch_trace ( struct trace_array * tr ,
struct task_struct * prev ,
struct task_struct * next ,
2008-10-01 13:14:09 -04:00
unsigned long flags , int pc ) ;
2008-05-12 21:20:42 +02:00
void tracing_record_cmdline ( struct task_struct * tsk ) ;
2008-05-12 21:20:51 +02:00
void tracing_sched_wakeup_trace ( struct trace_array * tr ,
struct task_struct * wakee ,
struct task_struct * cur ,
2008-10-01 13:14:09 -04:00
unsigned long flags , int pc ) ;
2008-05-12 21:20:47 +02:00
void trace_special ( struct trace_array * tr ,
struct trace_array_cpu * data ,
unsigned long arg1 ,
unsigned long arg2 ,
2008-10-01 13:14:09 -04:00
unsigned long arg3 , int pc ) ;
2008-05-12 21:20:49 +02:00
void trace_function ( struct trace_array * tr ,
unsigned long ip ,
unsigned long parent_ip ,
2008-10-01 13:14:09 -04:00
unsigned long flags , int pc ) ;
2008-05-12 21:20:42 +02:00
2008-11-26 00:57:25 +01:00
void trace_graph_return ( struct ftrace_graph_ret * trace ) ;
2008-12-02 23:50:05 -05:00
int trace_graph_entry ( struct ftrace_graph_ent * trace ) ;
2008-11-25 09:24:15 +01:00
2008-05-22 11:49:22 -04:00
void tracing_start_cmdline_record ( void ) ;
void tracing_stop_cmdline_record ( void ) ;
2008-11-07 22:36:02 -05:00
void tracing_sched_switch_assign_trace ( struct trace_array * tr ) ;
void tracing_stop_sched_switch_record ( void ) ;
void tracing_start_sched_switch_record ( void ) ;
2008-05-12 21:20:42 +02:00
int register_tracer ( struct tracer * type ) ;
void unregister_tracer ( struct tracer * type ) ;
extern unsigned long nsecs_to_usecs ( unsigned long nsecs ) ;
extern unsigned long tracing_max_latency ;
extern unsigned long tracing_thresh ;
void update_max_tr ( struct trace_array * tr , struct task_struct * tsk , int cpu ) ;
void update_max_tr_single ( struct trace_array * tr ,
struct task_struct * tsk , int cpu ) ;
2009-01-15 19:12:40 -05:00
void __trace_stack ( struct trace_array * tr ,
unsigned long flags ,
int skip , int pc ) ;
2008-05-12 21:20:51 +02:00
extern cycle_t ftrace_now ( int cpu ) ;
2008-05-12 21:20:42 +02:00
# ifdef CONFIG_CONTEXT_SWITCH_TRACER
typedef void
( * tracer_switch_func_t ) ( void * private ,
2008-05-12 21:21:10 +02:00
void * __rq ,
2008-05-12 21:20:42 +02:00
struct task_struct * prev ,
struct task_struct * next ) ;
struct tracer_switch_ops {
tracer_switch_func_t func ;
void * private ;
struct tracer_switch_ops * next ;
} ;
# endif /* CONFIG_CONTEXT_SWITCH_TRACER */
2009-03-16 19:20:15 -04:00
extern void trace_find_cmdline ( int pid , char comm [ ] ) ;
2008-12-29 13:02:17 +01:00
2008-05-12 21:20:42 +02:00
# ifdef CONFIG_DYNAMIC_FTRACE
extern unsigned long ftrace_update_tot_cnt ;
2008-05-12 21:20:54 +02:00
# define DYN_FTRACE_TEST_NAME trace_selftest_dynamic_test_func
extern int DYN_FTRACE_TEST_NAME ( void ) ;
2008-05-12 21:20:42 +02:00
# endif
2008-05-12 21:20:44 +02:00
# ifdef CONFIG_FTRACE_STARTUP_TEST
extern int trace_selftest_startup_function ( struct tracer * trace ,
struct trace_array * tr ) ;
2009-02-07 21:33:57 +01:00
extern int trace_selftest_startup_function_graph ( struct tracer * trace ,
struct trace_array * tr ) ;
2008-05-12 21:20:44 +02:00
extern int trace_selftest_startup_irqsoff ( struct tracer * trace ,
struct trace_array * tr ) ;
extern int trace_selftest_startup_preemptoff ( struct tracer * trace ,
struct trace_array * tr ) ;
extern int trace_selftest_startup_preemptirqsoff ( struct tracer * trace ,
struct trace_array * tr ) ;
extern int trace_selftest_startup_wakeup ( struct tracer * trace ,
struct trace_array * tr ) ;
2008-09-19 03:06:43 -07:00
extern int trace_selftest_startup_nop ( struct tracer * trace ,
struct trace_array * tr ) ;
2008-05-12 21:20:44 +02:00
extern int trace_selftest_startup_sched_switch ( struct tracer * trace ,
struct trace_array * tr ) ;
2008-05-12 21:20:47 +02:00
extern int trace_selftest_startup_sysprof ( struct tracer * trace ,
struct trace_array * tr ) ;
2008-11-12 15:24:24 -05:00
extern int trace_selftest_startup_branch ( struct tracer * trace ,
struct trace_array * tr ) ;
2008-05-12 21:20:44 +02:00
# endif /* CONFIG_FTRACE_STARTUP_TEST */
2008-05-12 21:20:45 +02:00
extern void * head_page ( struct trace_array_cpu * data ) ;
2008-05-23 21:37:28 +02:00
extern long ns2usecs ( cycle_t nsec ) ;
2008-12-03 23:45:11 +01:00
extern int
2009-03-19 14:03:53 -04:00
trace_vbprintk ( unsigned long ip , const char * fmt , va_list args ) ;
2009-03-12 18:24:49 +01:00
extern int
2009-03-19 14:03:53 -04:00
trace_vprintk ( unsigned long ip , const char * fmt , va_list args ) ;
2008-05-12 21:20:45 +02:00
2008-05-12 21:20:52 +02:00
extern unsigned long trace_flags ;
2008-11-11 07:14:25 +01:00
/* Standard output formatting function used for function return traces */
2008-11-25 21:07:04 +01:00
# ifdef CONFIG_FUNCTION_GRAPH_TRACER
extern enum print_line_t print_graph_function ( struct trace_iterator * iter ) ;
2008-12-03 15:36:57 -05:00
# ifdef CONFIG_DYNAMIC_FTRACE
/* TODO: make this variable */
# define FTRACE_GRAPH_MAX_FUNCS 32
extern int ftrace_graph_count ;
extern unsigned long ftrace_graph_funcs [ FTRACE_GRAPH_MAX_FUNCS ] ;
static inline int ftrace_graph_addr ( unsigned long addr )
{
int i ;
if ( ! ftrace_graph_count | | test_tsk_trace_graph ( current ) )
return 1 ;
for ( i = 0 ; i < ftrace_graph_count ; i + + ) {
if ( addr = = ftrace_graph_funcs [ i ] )
return 1 ;
}
return 0 ;
}
2008-11-11 07:14:25 +01:00
# else
2008-12-03 15:36:57 -05:00
static inline int ftrace_trace_addr ( unsigned long addr )
{
2008-12-04 09:18:28 +01:00
return 1 ;
}
static inline int ftrace_graph_addr ( unsigned long addr )
{
return 1 ;
2008-12-03 15:36:57 -05:00
}
# endif /* CONFIG_DYNAMIC_FTRACE */
# else /* CONFIG_FUNCTION_GRAPH_TRACER */
2008-11-11 07:14:25 +01:00
static inline enum print_line_t
2008-11-25 21:07:04 +01:00
print_graph_function ( struct trace_iterator * iter )
2008-11-11 07:14:25 +01:00
{
return TRACE_TYPE_UNHANDLED ;
}
2008-12-03 15:36:57 -05:00
# endif /* CONFIG_FUNCTION_GRAPH_TRACER */
2008-11-11 07:14:25 +01:00
2008-12-04 00:26:40 -05:00
extern struct pid * ftrace_pid_trace ;
2008-12-03 15:36:59 -05:00
static inline int ftrace_trace_task ( struct task_struct * task )
{
2008-12-04 23:49:47 +01:00
if ( ! ftrace_pid_trace )
2008-12-03 15:36:59 -05:00
return 1 ;
return test_tsk_trace_trace ( task ) ;
}
2008-05-12 21:21:00 +02:00
/*
* trace_iterator_flags is an enumeration that defines bit
* positions into trace_flags that controls the output .
*
* NOTE : These bits must match the trace_options array in
* trace . c .
*/
2008-05-12 21:20:52 +02:00
enum trace_iterator_flags {
TRACE_ITER_PRINT_PARENT = 0x01 ,
TRACE_ITER_SYM_OFFSET = 0x02 ,
TRACE_ITER_SYM_ADDR = 0x04 ,
TRACE_ITER_VERBOSE = 0x08 ,
TRACE_ITER_RAW = 0x10 ,
TRACE_ITER_HEX = 0x20 ,
TRACE_ITER_BIN = 0x40 ,
TRACE_ITER_BLOCK = 0x80 ,
TRACE_ITER_STACKTRACE = 0x100 ,
2008-05-12 21:20:52 +02:00
TRACE_ITER_SCHED_TREE = 0x200 ,
2008-09-04 10:24:14 +02:00
TRACE_ITER_PRINTK = 0x400 ,
ftrace: function tracer with irqs disabled
Impact: disable interrupts during trace entry creation (as opposed to preempt)
To help with performance, I set the ftracer to not disable interrupts,
and only to disable preemption. If an interrupt occurred, it would not
be traced, because the function tracer protects itself from recursion.
This may be faster, but the trace output might miss some traces.
This patch makes the fuction trace disable interrupts, but it also
adds a runtime feature to disable preemption instead. It does this by
having two different tracer functions. When the function tracer is
enabled, it will check to see which version is requested (irqs disabled
or preemption disabled). Then it will use the corresponding function
as the tracer.
Irq disabling is the default behaviour, but if the user wants better
performance, with the chance of missing traces, then they can choose
the preempt disabled version.
Running hackbench 3 times with the irqs disabled and 3 times with
the preempt disabled function tracer yielded:
tracing type times entries recorded
------------ -------- ----------------
irq disabled 43.393 166433066
43.282 166172618
43.298 166256704
preempt disabled 38.969 159871710
38.943 159972935
39.325 161056510
Average:
irqs disabled: 43.324 166287462
preempt disabled: 39.079 160300385
preempt is 10.8 percent faster than irqs disabled.
I wrote a patch to count function trace recursion and reran hackbench.
With irq disabled: 1,150 times the function tracer did not trace due to
recursion.
with preempt disabled: 5,117,718 times.
The thousand times with irq disabled could be due to NMIs, or simply a case
where it called a function that was not protected by notrace.
But we also see that a large amount of the trace is lost with the
preempt version.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-03 23:15:57 -05:00
TRACE_ITER_PREEMPTONLY = 0x800 ,
2008-11-12 15:24:24 -05:00
TRACE_ITER_BRANCH = 0x1000 ,
2008-11-12 17:52:38 -05:00
TRACE_ITER_ANNOTATE = 0x2000 ,
2008-11-22 13:28:48 +02:00
TRACE_ITER_USERSTACKTRACE = 0x4000 ,
2008-12-13 20:18:13 +01:00
TRACE_ITER_SYM_USEROBJ = 0x8000 ,
2009-02-02 20:29:21 -02:00
TRACE_ITER_PRINTK_MSGONLY = 0x10000 ,
2009-03-04 20:34:24 -05:00
TRACE_ITER_CONTEXT_INFO = 0x20000 , /* Print pid/cpu/time */
TRACE_ITER_LATENCY_FMT = 0x40000 ,
2009-03-17 18:09:55 -04:00
TRACE_ITER_GLOBAL_CLK = 0x80000 ,
2008-05-12 21:20:52 +02:00
} ;
2008-11-11 07:14:25 +01:00
/*
* TRACE_ITER_SYM_MASK masks the options in trace_flags that
* control the output of kernel symbols .
*/
# define TRACE_ITER_SYM_MASK \
( TRACE_ITER_PRINT_PARENT | TRACE_ITER_SYM_OFFSET | TRACE_ITER_SYM_ADDR )
2008-09-21 20:16:30 +02:00
extern struct tracer nop_trace ;
2008-11-03 23:15:55 -05:00
/**
* ftrace_preempt_disable - disable preemption scheduler safe
*
* When tracing can happen inside the scheduler , there exists
* cases that the tracing might happen before the need_resched
* flag is checked . If this happens and the tracer calls
* preempt_enable ( after a disable ) , a schedule might take place
* causing an infinite recursion .
*
2009-02-06 17:33:27 +08:00
* To prevent this , we read the need_resched flag before
2008-11-03 23:15:55 -05:00
* disabling preemption . When we want to enable preemption we
* check the flag , if it is set , then we call preempt_enable_no_resched .
* Otherwise , we call preempt_enable .
*
2009-02-06 17:33:27 +08:00
* The rational for doing the above is that if need_resched is set
2008-11-03 23:15:55 -05:00
* and we have yet to reschedule , we are either in an atomic location
* ( where we do not need to check for scheduling ) or we are inside
* the scheduler and do not want to resched .
*/
static inline int ftrace_preempt_disable ( void )
{
int resched ;
resched = need_resched ( ) ;
preempt_disable_notrace ( ) ;
return resched ;
}
/**
* ftrace_preempt_enable - enable preemption scheduler safe
* @ resched : the return value from ftrace_preempt_disable
*
* This is a scheduler safe way to enable preemption and not miss
* any preemption checks . The disabled saved the state of preemption .
2009-02-06 17:33:27 +08:00
* If resched is set , then we are either inside an atomic or
2008-11-03 23:15:55 -05:00
* are inside the scheduler ( we would have already scheduled
* otherwise ) . In this case , we do not want to call normal
* preempt_enable , but preempt_enable_no_resched instead .
*/
static inline void ftrace_preempt_enable ( int resched )
{
if ( resched )
preempt_enable_no_resched_notrace ( ) ;
else
preempt_enable_notrace ( ) ;
}
2008-11-12 15:24:24 -05:00
# ifdef CONFIG_BRANCH_TRACER
2008-11-12 15:24:24 -05:00
extern int enable_branch_tracing ( struct trace_array * tr ) ;
extern void disable_branch_tracing ( void ) ;
static inline int trace_branch_enable ( struct trace_array * tr )
2008-11-12 00:14:40 -05:00
{
2008-11-12 15:24:24 -05:00
if ( trace_flags & TRACE_ITER_BRANCH )
return enable_branch_tracing ( tr ) ;
2008-11-12 00:14:40 -05:00
return 0 ;
}
2008-11-12 15:24:24 -05:00
static inline void trace_branch_disable ( void )
2008-11-12 00:14:40 -05:00
{
/* due to races, always disable */
2008-11-12 15:24:24 -05:00
disable_branch_tracing ( ) ;
2008-11-12 00:14:40 -05:00
}
# else
2008-11-12 15:24:24 -05:00
static inline int trace_branch_enable ( struct trace_array * tr )
2008-11-12 00:14:40 -05:00
{
return 0 ;
}
2008-11-12 15:24:24 -05:00
static inline void trace_branch_disable ( void )
2008-11-12 00:14:40 -05:00
{
}
2008-11-12 15:24:24 -05:00
# endif /* CONFIG_BRANCH_TRACER */
2008-11-12 00:14:40 -05:00
2009-03-11 14:33:00 -04:00
/* set ring buffers to default size if not already done so */
int tracing_update_buffers ( void ) ;
2009-02-28 02:41:25 -05:00
/* trace event type bit fields, not numeric */
enum {
TRACE_EVENT_TYPE_PRINTF = 1 ,
TRACE_EVENT_TYPE_RAW = 2 ,
} ;
2009-03-22 03:30:39 -05:00
struct ftrace_event_field {
struct list_head link ;
char * name ;
char * type ;
int offset ;
int size ;
} ;
tracing: add raw trace point recording infrastructure
Impact: lower overhead tracing
The current event tracer can automatically pick up trace points
that are registered with the TRACE_FORMAT macro. But it required
a printf format string and parsing. Although, this adds the ability
to get guaranteed information like task names and such, it took
a hit in overhead processing. This processing can add about 500-1000
nanoseconds overhead, but in some cases that too is considered
too much and we want to shave off as much from this overhead as
possible.
Tom Zanussi recently posted tracing patches to lkml that are based
on a nice idea about capturing the data via C structs using
STRUCT_ENTER, STRUCT_EXIT type of macros.
I liked that method very much, but did not like the implementation
that required a developer to add data/code in several disjoint
locations.
This patch extends the event_tracer macros to do a similar "raw C"
approach that Tom Zanussi did. But instead of having the developers
needing to tweak a bunch of code all over the place, they can do it
all in one macro - preferably placed near the code that it is
tracing. That makes it much more likely that tracepoints will be
maintained on an ongoing basis by the code they modify.
The new macro TRACE_EVENT_FORMAT is created for this approach. (Note,
a developer may still utilize the more low level DECLARE_TRACE macros
if they don't care about getting their traces automatically in the event
tracer.)
They can also use the existing TRACE_FORMAT if they don't need to code
the tracepoint in C, but just want to use the convenience of printf.
So if the developer wants to "hardwire" a tracepoint in the fastest
possible way, and wants to acquire their data via a user space utility
in a raw binary format, or wants to see it in the trace output but not
sacrifice any performance, then they can implement the faster but
more complex TRACE_EVENT_FORMAT macro.
Here's what usage looks like:
TRACE_EVENT_FORMAT(name,
TPPROTO(proto),
TPARGS(args),
TPFMT(fmt, fmt_args),
TRACE_STUCT(
TRACE_FIELD(type1, item1, assign1)
TRACE_FIELD(type2, item2, assign2)
[...]
),
TPRAWFMT(raw_fmt)
);
Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT
uses.
name: is the unique identifier of the trace point
proto: The proto type that the trace point uses
args: the args in the proto type
fmt: printf format to use with the event printf tracer
fmt_args: the printf argments to match fmt
TRACE_STRUCT starts the ability to create a structure.
Each item in the structure is defined with a TRACE_FIELD
TRACE_FIELD(type, item, assign)
type: the C type of item.
item: the name of the item in the stucture
assign: what to assign the item in the trace point callback
raw_fmt is a way to pretty print the struct. It must match
the order of the items are added in TRACE_STUCT
An example of this would be:
TRACE_EVENT_FORMAT(sched_wakeup,
TPPROTO(struct rq *rq, struct task_struct *p, int success),
TPARGS(rq, p, success),
TPFMT("task %s:%d %s",
p->comm, p->pid, success?"succeeded":"failed"),
TRACE_STRUCT(
TRACE_FIELD(pid_t, pid, p->pid)
TRACE_FIELD(int, success, success)
),
TPRAWFMT("task %d success=%d")
);
This creates us a unique struct of:
struct {
pid_t pid;
int success;
};
And the way the call back would assign these values would be:
entry->pid = p->pid;
entry->success = success;
The nice part about this is that the creation of the assignent is done
via macro magic in the event tracer. Once the TRACE_EVENT_FORMAT is
created, the developer will then have a faster method to record
into the ring buffer. They do not need to worry about the tracer itself.
The developer would only need to touch the files in include/trace/*.h
Again, I would like to give special thanks to Tom Zanussi for this
nice idea.
Idea-from: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-27 19:12:30 -05:00
struct ftrace_event_call {
2009-03-22 03:30:39 -05:00
char * name ;
char * system ;
struct dentry * dir ;
int enabled ;
int ( * regfunc ) ( void ) ;
void ( * unregfunc ) ( void ) ;
int id ;
int ( * raw_init ) ( void ) ;
int ( * show_format ) ( struct trace_seq * s ) ;
int ( * define_fields ) ( void ) ;
struct list_head fields ;
2009-03-22 03:31:04 -05:00
struct filter_pred * * preds ;
2009-03-19 20:26:15 +01:00
# ifdef CONFIG_EVENT_PROFILE
atomic_t profile_count ;
int ( * profile_enable ) ( struct ftrace_event_call * ) ;
void ( * profile_disable ) ( struct ftrace_event_call * ) ;
# endif
tracing: add raw trace point recording infrastructure
Impact: lower overhead tracing
The current event tracer can automatically pick up trace points
that are registered with the TRACE_FORMAT macro. But it required
a printf format string and parsing. Although, this adds the ability
to get guaranteed information like task names and such, it took
a hit in overhead processing. This processing can add about 500-1000
nanoseconds overhead, but in some cases that too is considered
too much and we want to shave off as much from this overhead as
possible.
Tom Zanussi recently posted tracing patches to lkml that are based
on a nice idea about capturing the data via C structs using
STRUCT_ENTER, STRUCT_EXIT type of macros.
I liked that method very much, but did not like the implementation
that required a developer to add data/code in several disjoint
locations.
This patch extends the event_tracer macros to do a similar "raw C"
approach that Tom Zanussi did. But instead of having the developers
needing to tweak a bunch of code all over the place, they can do it
all in one macro - preferably placed near the code that it is
tracing. That makes it much more likely that tracepoints will be
maintained on an ongoing basis by the code they modify.
The new macro TRACE_EVENT_FORMAT is created for this approach. (Note,
a developer may still utilize the more low level DECLARE_TRACE macros
if they don't care about getting their traces automatically in the event
tracer.)
They can also use the existing TRACE_FORMAT if they don't need to code
the tracepoint in C, but just want to use the convenience of printf.
So if the developer wants to "hardwire" a tracepoint in the fastest
possible way, and wants to acquire their data via a user space utility
in a raw binary format, or wants to see it in the trace output but not
sacrifice any performance, then they can implement the faster but
more complex TRACE_EVENT_FORMAT macro.
Here's what usage looks like:
TRACE_EVENT_FORMAT(name,
TPPROTO(proto),
TPARGS(args),
TPFMT(fmt, fmt_args),
TRACE_STUCT(
TRACE_FIELD(type1, item1, assign1)
TRACE_FIELD(type2, item2, assign2)
[...]
),
TPRAWFMT(raw_fmt)
);
Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT
uses.
name: is the unique identifier of the trace point
proto: The proto type that the trace point uses
args: the args in the proto type
fmt: printf format to use with the event printf tracer
fmt_args: the printf argments to match fmt
TRACE_STRUCT starts the ability to create a structure.
Each item in the structure is defined with a TRACE_FIELD
TRACE_FIELD(type, item, assign)
type: the C type of item.
item: the name of the item in the stucture
assign: what to assign the item in the trace point callback
raw_fmt is a way to pretty print the struct. It must match
the order of the items are added in TRACE_STUCT
An example of this would be:
TRACE_EVENT_FORMAT(sched_wakeup,
TPPROTO(struct rq *rq, struct task_struct *p, int success),
TPARGS(rq, p, success),
TPFMT("task %s:%d %s",
p->comm, p->pid, success?"succeeded":"failed"),
TRACE_STRUCT(
TRACE_FIELD(pid_t, pid, p->pid)
TRACE_FIELD(int, success, success)
),
TPRAWFMT("task %d success=%d")
);
This creates us a unique struct of:
struct {
pid_t pid;
int success;
};
And the way the call back would assign these values would be:
entry->pid = p->pid;
entry->success = success;
The nice part about this is that the creation of the assignent is done
via macro magic in the event tracer. Once the TRACE_EVENT_FORMAT is
created, the developer will then have a faster method to record
into the ring buffer. They do not need to worry about the tracer itself.
The developer would only need to touch the files in include/trace/*.h
Again, I would like to give special thanks to Tom Zanussi for this
nice idea.
Idea-from: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-27 19:12:30 -05:00
} ;
2009-03-22 03:31:17 -05:00
struct event_subsystem {
struct list_head list ;
const char * name ;
struct dentry * entry ;
struct filter_pred * * preds ;
} ;
# define events_for_each(event) \
for ( event = __start_ftrace_events ; \
( unsigned long ) event < ( unsigned long ) __stop_ftrace_events ; \
event + + )
2009-03-22 03:31:04 -05:00
# define MAX_FILTER_PRED 8
struct filter_pred ;
typedef int ( * filter_pred_fn_t ) ( struct filter_pred * pred , void * event ) ;
struct filter_pred {
filter_pred_fn_t fn ;
u64 val ;
char * str_val ;
int str_len ;
char * field_name ;
int offset ;
int not ;
int or ;
int compound ;
int clear ;
} ;
2009-03-22 03:30:39 -05:00
int trace_define_field ( struct ftrace_event_call * call , char * type ,
char * name , int offset , int size ) ;
2009-03-22 03:31:04 -05:00
extern void filter_free_pred ( struct filter_pred * pred ) ;
extern int filter_print_preds ( struct filter_pred * * preds , char * buf ) ;
extern int filter_parse ( char * * pbuf , struct filter_pred * pred ) ;
extern int filter_add_pred ( struct ftrace_event_call * call ,
struct filter_pred * pred ) ;
extern void filter_free_preds ( struct ftrace_event_call * call ) ;
extern int filter_match_preds ( struct ftrace_event_call * call , void * rec ) ;
2009-03-22 03:31:17 -05:00
extern void filter_free_subsystem_preds ( struct event_subsystem * system ) ;
extern int filter_add_subsystem_pred ( struct event_subsystem * system ,
struct filter_pred * pred ) ;
2009-03-22 03:31:04 -05:00
tracing: add raw trace point recording infrastructure
Impact: lower overhead tracing
The current event tracer can automatically pick up trace points
that are registered with the TRACE_FORMAT macro. But it required
a printf format string and parsing. Although, this adds the ability
to get guaranteed information like task names and such, it took
a hit in overhead processing. This processing can add about 500-1000
nanoseconds overhead, but in some cases that too is considered
too much and we want to shave off as much from this overhead as
possible.
Tom Zanussi recently posted tracing patches to lkml that are based
on a nice idea about capturing the data via C structs using
STRUCT_ENTER, STRUCT_EXIT type of macros.
I liked that method very much, but did not like the implementation
that required a developer to add data/code in several disjoint
locations.
This patch extends the event_tracer macros to do a similar "raw C"
approach that Tom Zanussi did. But instead of having the developers
needing to tweak a bunch of code all over the place, they can do it
all in one macro - preferably placed near the code that it is
tracing. That makes it much more likely that tracepoints will be
maintained on an ongoing basis by the code they modify.
The new macro TRACE_EVENT_FORMAT is created for this approach. (Note,
a developer may still utilize the more low level DECLARE_TRACE macros
if they don't care about getting their traces automatically in the event
tracer.)
They can also use the existing TRACE_FORMAT if they don't need to code
the tracepoint in C, but just want to use the convenience of printf.
So if the developer wants to "hardwire" a tracepoint in the fastest
possible way, and wants to acquire their data via a user space utility
in a raw binary format, or wants to see it in the trace output but not
sacrifice any performance, then they can implement the faster but
more complex TRACE_EVENT_FORMAT macro.
Here's what usage looks like:
TRACE_EVENT_FORMAT(name,
TPPROTO(proto),
TPARGS(args),
TPFMT(fmt, fmt_args),
TRACE_STUCT(
TRACE_FIELD(type1, item1, assign1)
TRACE_FIELD(type2, item2, assign2)
[...]
),
TPRAWFMT(raw_fmt)
);
Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT
uses.
name: is the unique identifier of the trace point
proto: The proto type that the trace point uses
args: the args in the proto type
fmt: printf format to use with the event printf tracer
fmt_args: the printf argments to match fmt
TRACE_STRUCT starts the ability to create a structure.
Each item in the structure is defined with a TRACE_FIELD
TRACE_FIELD(type, item, assign)
type: the C type of item.
item: the name of the item in the stucture
assign: what to assign the item in the trace point callback
raw_fmt is a way to pretty print the struct. It must match
the order of the items are added in TRACE_STUCT
An example of this would be:
TRACE_EVENT_FORMAT(sched_wakeup,
TPPROTO(struct rq *rq, struct task_struct *p, int success),
TPARGS(rq, p, success),
TPFMT("task %s:%d %s",
p->comm, p->pid, success?"succeeded":"failed"),
TRACE_STRUCT(
TRACE_FIELD(pid_t, pid, p->pid)
TRACE_FIELD(int, success, success)
),
TPRAWFMT("task %d success=%d")
);
This creates us a unique struct of:
struct {
pid_t pid;
int success;
};
And the way the call back would assign these values would be:
entry->pid = p->pid;
entry->success = success;
The nice part about this is that the creation of the assignent is done
via macro magic in the event tracer. Once the TRACE_EVENT_FORMAT is
created, the developer will then have a faster method to record
into the ring buffer. They do not need to worry about the tracer itself.
The developer would only need to touch the files in include/trace/*.h
Again, I would like to give special thanks to Tom Zanussi for this
nice idea.
Idea-from: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-27 19:12:30 -05:00
void event_trace_printk ( unsigned long ip , const char * fmt , . . . ) ;
extern struct ftrace_event_call __start_ftrace_events [ ] ;
extern struct ftrace_event_call __stop_ftrace_events [ ] ;
2009-03-19 20:26:15 +01:00
# define for_each_event(event) \
for ( event = __start_ftrace_events ; \
( unsigned long ) event < ( unsigned long ) __stop_ftrace_events ; \
event + + )
2009-03-12 14:19:25 -04:00
extern const char * __start___trace_bprintk_fmt [ ] ;
extern const char * __stop___trace_bprintk_fmt [ ] ;
2009-03-13 00:12:52 -04:00
/*
* The double __builtin_constant_p is because gcc will give us an error
* if we try to allocate the static variable to fmt if it is not a
* constant . Even with the outer if statement optimizing out .
*/
2009-03-12 14:19:25 -04:00
# define event_trace_printk(ip, fmt, args...) \
do { \
__trace_printk_check_format ( fmt , # # args ) ; \
tracing_record_cmdline ( current ) ; \
if ( __builtin_constant_p ( fmt ) ) { \
static const char * trace_printk_fmt \
__attribute__ ( ( section ( " __trace_printk_fmt " ) ) ) = \
__builtin_constant_p ( fmt ) ? fmt : NULL ; \
\
__trace_bprintk ( ip , trace_printk_fmt , # # args ) ; \
} else \
__trace_printk ( ip , fmt , # # args ) ; \
} while ( 0 )
2008-05-12 21:20:42 +02:00
# endif /* _LINUX_KERNEL_TRACE_H */