2019-05-19 15:07:45 +03:00
# SPDX-License-Identifier: GPL-2.0-only
2008-05-12 23:20:42 +04:00
#
2008-10-07 03:06:12 +04:00
# Architectures that offer an FUNCTION_TRACER implementation should
# select HAVE_FUNCTION_TRACER:
2008-05-12 23:20:42 +04:00
#
2008-09-21 22:12:14 +04:00
2008-11-23 13:39:08 +03:00
config USER_STACKTRACE_SUPPORT
bool
2008-09-21 22:12:14 +04:00
config NOP_TRACER
bool
2008-10-07 03:06:12 +04:00
config HAVE_FUNCTION_TRACER
2008-05-12 23:20:42 +04:00
bool
2009-09-15 04:10:15 +04:00
help
2018-05-08 21:14:57 +03:00
See Documentation/trace/ftrace-design.rst
2008-05-12 23:20:42 +04:00
2008-11-25 23:07:04 +03:00
config HAVE_FUNCTION_GRAPH_TRACER
2008-11-11 09:14:25 +03:00
bool
2009-09-15 04:10:15 +04:00
help
2018-05-08 21:14:57 +03:00
See Documentation/trace/ftrace-design.rst
2008-11-11 09:14:25 +03:00
2008-05-17 08:01:36 +04:00
config HAVE_DYNAMIC_FTRACE
bool
2009-09-15 04:10:15 +04:00
help
2018-05-08 21:14:57 +03:00
See Documentation/trace/ftrace-design.rst
2008-05-17 08:01:36 +04:00
2012-09-28 12:15:17 +04:00
config HAVE_DYNAMIC_FTRACE_WITH_REGS
bool
2019-11-08 21:07:06 +03:00
config HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
bool
ftrace: create __mcount_loc section
This patch creates a section in the kernel called "__mcount_loc".
This will hold a list of pointers to the mcount relocation for
each call site of mcount.
For example:
objdump -dr init/main.o
[...]
Disassembly of section .text:
0000000000000000 <do_one_initcall>:
0: 55 push %rbp
[...]
000000000000017b <init_post>:
17b: 55 push %rbp
17c: 48 89 e5 mov %rsp,%rbp
17f: 53 push %rbx
180: 48 83 ec 08 sub $0x8,%rsp
184: e8 00 00 00 00 callq 189 <init_post+0xe>
185: R_X86_64_PC32 mcount+0xfffffffffffffffc
[...]
We will add a section to point to each function call.
.section __mcount_loc,"a",@progbits
[...]
.quad .text + 0x185
[...]
The offset to of the mcount call site in init_post is an offset from
the start of the section, and not the start of the function init_post.
The mcount relocation is at the call site 0x185 from the start of the
.text section.
.text + 0x185 == init_post + 0xa
We need a way to add this __mcount_loc section in a way that we do not
lose the relocations after final link. The .text section here will
be attached to all other .text sections after final link and the
offsets will be meaningless. We need to keep track of where these
.text sections are.
To do this, we use the start of the first function in the section.
do_one_initcall. We can make a tmp.s file with this function as a reference
to the start of the .text section.
.section __mcount_loc,"a",@progbits
[...]
.quad do_one_initcall + 0x185
[...]
Then we can compile the tmp.s into a tmp.o
gcc -c tmp.s -o tmp.o
And link it into back into main.o.
ld -r main.o tmp.o -o tmp_main.o
mv tmp_main.o main.o
But we have a problem. What happens if the first function in a section
is not exported, and is a static function. The linker will not let
the tmp.o use it. This case exists in main.o as well.
Disassembly of section .init.text:
0000000000000000 <set_reset_devices>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <set_reset_devices+0x9>
5: R_X86_64_PC32 mcount+0xfffffffffffffffc
The first function in .init.text is a static function.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
The lowercase 't' means that set_reset_devices is local and is not exported.
If we simply try to link the tmp.o with the set_reset_devices we end
up with two symbols: one local and one global.
.section __mcount_loc,"a",@progbits
.quad set_reset_devices + 0x10
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
U set_reset_devices
We still have an undefined reference to set_reset_devices, and if we try
to compile the kernel, we will end up with an undefined reference to
set_reset_devices, or even worst, it could be exported someplace else,
and then we will have a reference to the wrong location.
To handle this case, we make an intermediate step using objcopy.
We convert set_reset_devices into a global exported symbol before linking
it with tmp.o and set it back afterwards.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
Now we have a section in main.o called __mcount_loc that we can place
somewhere in the kernel using vmlinux.ld.S and access it to convert
all these locations that call mcount into nops before starting SMP
and thus, eliminating the need to do this with kstop_machine.
Note, A well documented perl script (scripts/recordmcount.pl) is used
to do all this in one location.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-14 23:45:07 +04:00
config HAVE_FTRACE_MCOUNT_RECORD
bool
2009-09-15 04:10:15 +04:00
help
2018-05-08 21:14:57 +03:00
See Documentation/trace/ftrace-design.rst
ftrace: create __mcount_loc section
This patch creates a section in the kernel called "__mcount_loc".
This will hold a list of pointers to the mcount relocation for
each call site of mcount.
For example:
objdump -dr init/main.o
[...]
Disassembly of section .text:
0000000000000000 <do_one_initcall>:
0: 55 push %rbp
[...]
000000000000017b <init_post>:
17b: 55 push %rbp
17c: 48 89 e5 mov %rsp,%rbp
17f: 53 push %rbx
180: 48 83 ec 08 sub $0x8,%rsp
184: e8 00 00 00 00 callq 189 <init_post+0xe>
185: R_X86_64_PC32 mcount+0xfffffffffffffffc
[...]
We will add a section to point to each function call.
.section __mcount_loc,"a",@progbits
[...]
.quad .text + 0x185
[...]
The offset to of the mcount call site in init_post is an offset from
the start of the section, and not the start of the function init_post.
The mcount relocation is at the call site 0x185 from the start of the
.text section.
.text + 0x185 == init_post + 0xa
We need a way to add this __mcount_loc section in a way that we do not
lose the relocations after final link. The .text section here will
be attached to all other .text sections after final link and the
offsets will be meaningless. We need to keep track of where these
.text sections are.
To do this, we use the start of the first function in the section.
do_one_initcall. We can make a tmp.s file with this function as a reference
to the start of the .text section.
.section __mcount_loc,"a",@progbits
[...]
.quad do_one_initcall + 0x185
[...]
Then we can compile the tmp.s into a tmp.o
gcc -c tmp.s -o tmp.o
And link it into back into main.o.
ld -r main.o tmp.o -o tmp_main.o
mv tmp_main.o main.o
But we have a problem. What happens if the first function in a section
is not exported, and is a static function. The linker will not let
the tmp.o use it. This case exists in main.o as well.
Disassembly of section .init.text:
0000000000000000 <set_reset_devices>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <set_reset_devices+0x9>
5: R_X86_64_PC32 mcount+0xfffffffffffffffc
The first function in .init.text is a static function.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
The lowercase 't' means that set_reset_devices is local and is not exported.
If we simply try to link the tmp.o with the set_reset_devices we end
up with two symbols: one local and one global.
.section __mcount_loc,"a",@progbits
.quad set_reset_devices + 0x10
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
U set_reset_devices
We still have an undefined reference to set_reset_devices, and if we try
to compile the kernel, we will end up with an undefined reference to
set_reset_devices, or even worst, it could be exported someplace else,
and then we will have a reference to the wrong location.
To handle this case, we make an intermediate step using objcopy.
We convert set_reset_devices into a global exported symbol before linking
it with tmp.o and set it back afterwards.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
Now we have a section in main.o called __mcount_loc that we can place
somewhere in the kernel using vmlinux.ld.S and access it to convert
all these locations that call mcount into nops before starting SMP
and thus, eliminating the need to do this with kstop_machine.
Note, A well documented perl script (scripts/recordmcount.pl) is used
to do all this in one location.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-14 23:45:07 +04:00
2009-08-25 01:43:11 +04:00
config HAVE_SYSCALL_TRACEPOINTS
2009-03-07 07:52:59 +03:00
bool
2009-09-15 04:10:15 +04:00
help
2018-05-08 21:14:57 +03:00
See Documentation/trace/ftrace-design.rst
2009-03-07 07:52:59 +03:00
2011-02-09 21:15:59 +03:00
config HAVE_FENTRY
bool
help
Arch supports the gcc options -pg with -mfentry
2018-08-06 16:17:46 +03:00
config HAVE_NOP_MCOUNT
bool
help
Arch supports the gcc options -pg with -mrecord-mcount and -nop-mcount
2010-10-15 07:32:44 +04:00
config HAVE_C_RECORDMCOUNT
2010-10-14 01:12:30 +04:00
bool
help
C version of recordmcount available?
2008-05-12 23:20:42 +04:00
config TRACER_MAX_TRACE
bool
trace: Stop compiling in trace_clock unconditionally
Commit 56449f437 "tracing: make the trace clocks available generally",
in April 2009, made trace_clock available unconditionally, since
CONFIG_X86_DS used it too.
Commit faa4602e47 "x86, perf, bts, mm: Delete the never used BTS-ptrace code",
in March 2010, removed CONFIG_X86_DS, and now only CONFIG_RING_BUFFER (split
out from CONFIG_TRACING for general use) has a dependency on trace_clock. So,
only compile in trace_clock with CONFIG_RING_BUFFER or CONFIG_TRACING
enabled.
Link: http://lkml.kernel.org/r/20120903024513.GA19583@leaf
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-03 06:45:14 +04:00
config TRACE_CLOCK
bool
tracing: unified trace buffer
This is a unified tracing buffer that implements a ring buffer that
hopefully everyone will eventually be able to use.
The events recorded into the buffer have the following structure:
struct ring_buffer_event {
u32 type:2, len:3, time_delta:27;
u32 array[];
};
The minimum size of an event is 8 bytes. All events are 4 byte
aligned inside the buffer.
There are 4 types (all internal use for the ring buffer, only
the data type is exported to the interface users).
RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
of a buffer page.
RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
is greater than the 27 bit delta can hold. We add another
32 bits, and record that in its own event (8 byte size).
RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
help keep the buffer timestamps in sync.
RINGBUF_TYPE_DATA: The event actually holds user data.
The "len" field is only three bits. Since the data must be
4 byte aligned, this field is shifted left by 2, giving a
max length of 28 bytes. If the data load is greater than 28
bytes, the first array field holds the full length of the
data load and the len field is set to zero.
Example, data size of 7 bytes:
type = RINGBUF_TYPE_DATA
len = 2
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0..1]: <7 bytes of data> <1 byte empty>
This event is saved in 12 bytes of the buffer.
An event with 82 bytes of data:
type = RINGBUF_TYPE_DATA
len = 0
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0]: 84 (Note the alignment)
array[1..14]: <82 bytes of data> <2 bytes empty>
The above event is saved in 92 bytes (if my math is correct).
82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.
Do not reference the above event struct directly. Use the following
functions to gain access to the event table, since the
ring_buffer_event structure may change in the future.
ring_buffer_event_length(event): get the length of the event.
This is the size of the memory used to record this
event, and not the size of the data pay load.
ring_buffer_time_delta(event): get the time delta of the event
This returns the delta time stamp since the last event.
Note: Even though this is in the header, there should
be no reason to access this directly, accept
for debugging.
ring_buffer_event_data(event): get the data from the event
This is the function to use to get the actual data
from the event. Note, it is only a pointer to the
data inside the buffer. This data must be copied to
another location otherwise you risk it being written
over in the buffer.
ring_buffer_lock: A way to lock the entire buffer.
ring_buffer_unlock: unlock the buffer.
ring_buffer_alloc: create a new ring buffer. Can choose between
overwrite or consumer/producer mode. Overwrite will
overwrite old data, where as consumer producer will
throw away new data if the consumer catches up with the
producer. The consumer/producer is the default.
ring_buffer_free: free the ring buffer.
ring_buffer_resize: resize the buffer. Changes the size of each cpu
buffer. Note, it is up to the caller to provide that
the buffer is not being used while this is happening.
This requirement may go away but do not count on it.
ring_buffer_lock_reserve: locks the ring buffer and allocates an
entry on the buffer to write to.
ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
the buffer.
ring_buffer_write: writes some data into the ring buffer.
ring_buffer_peek: Look at a next item in the cpu buffer.
ring_buffer_consume: get the next item in the cpu buffer and
consume it. That is, this function increments the head
pointer.
ring_buffer_read_start: Start an iterator of a cpu buffer.
For now, this disables the cpu buffer, until you issue
a finish. This is just because we do not want the iterator
to be overwritten. This restriction may change in the future.
But note, this is used for static reading of a buffer which
is usually done "after" a trace. Live readings would want
to use the ring_buffer_consume above, which will not
disable the ring buffer.
ring_buffer_read_finish: Finishes the read iterator and reenables
the ring buffer.
ring_buffer_iter_peek: Look at the next item in the cpu iterator.
ring_buffer_read: Read the iterator and increment it.
ring_buffer_iter_reset: Reset the iterator to point to the beginning
of the cpu buffer.
ring_buffer_iter_empty: Returns true if the iterator is at the end
of the cpu buffer.
ring_buffer_size: returns the size in bytes of each cpu buffer.
Note, the real size is this times the number of CPUs.
ring_buffer_reset_cpu: Sets the cpu buffer to empty
ring_buffer_reset: sets all cpu buffers to empty
ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
cpu buffer of another buffer. This is handy when you
want to take a snap shot of a running trace on just one
cpu. Having a backup buffer, to swap with facilitates this.
Ftrace max latencies use this.
ring_buffer_empty: Returns true if the ring buffer is empty.
ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.
ring_buffer_record_disable: disable all cpu buffers (read only)
ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
ring_buffer_record_enable: enable all cpu buffers.
ring_buffer_record_enabl_cpu: enable a single cpu buffer.
ring_buffer_entries: The number of entries in a ring buffer.
ring_buffer_overruns: The number of entries removed due to writing wrap.
ring_buffer_time_stamp: Get the time stamp used by the ring buffer
ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
into nanosecs.
I still need to implement the GTOD feature. But we need support from
the cpu frequency infrastructure. But this can be done at a later
time without affecting the ring buffer interface.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-30 07:02:38 +04:00
config RING_BUFFER
bool
trace: Stop compiling in trace_clock unconditionally
Commit 56449f437 "tracing: make the trace clocks available generally",
in April 2009, made trace_clock available unconditionally, since
CONFIG_X86_DS used it too.
Commit faa4602e47 "x86, perf, bts, mm: Delete the never used BTS-ptrace code",
in March 2010, removed CONFIG_X86_DS, and now only CONFIG_RING_BUFFER (split
out from CONFIG_TRACING for general use) has a dependency on trace_clock. So,
only compile in trace_clock with CONFIG_RING_BUFFER or CONFIG_TRACING
enabled.
Link: http://lkml.kernel.org/r/20120903024513.GA19583@leaf
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-03 06:45:14 +04:00
select TRACE_CLOCK
2013-05-03 19:16:18 +04:00
select IRQ_WORK
tracing: unified trace buffer
This is a unified tracing buffer that implements a ring buffer that
hopefully everyone will eventually be able to use.
The events recorded into the buffer have the following structure:
struct ring_buffer_event {
u32 type:2, len:3, time_delta:27;
u32 array[];
};
The minimum size of an event is 8 bytes. All events are 4 byte
aligned inside the buffer.
There are 4 types (all internal use for the ring buffer, only
the data type is exported to the interface users).
RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
of a buffer page.
RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
is greater than the 27 bit delta can hold. We add another
32 bits, and record that in its own event (8 byte size).
RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
help keep the buffer timestamps in sync.
RINGBUF_TYPE_DATA: The event actually holds user data.
The "len" field is only three bits. Since the data must be
4 byte aligned, this field is shifted left by 2, giving a
max length of 28 bytes. If the data load is greater than 28
bytes, the first array field holds the full length of the
data load and the len field is set to zero.
Example, data size of 7 bytes:
type = RINGBUF_TYPE_DATA
len = 2
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0..1]: <7 bytes of data> <1 byte empty>
This event is saved in 12 bytes of the buffer.
An event with 82 bytes of data:
type = RINGBUF_TYPE_DATA
len = 0
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0]: 84 (Note the alignment)
array[1..14]: <82 bytes of data> <2 bytes empty>
The above event is saved in 92 bytes (if my math is correct).
82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.
Do not reference the above event struct directly. Use the following
functions to gain access to the event table, since the
ring_buffer_event structure may change in the future.
ring_buffer_event_length(event): get the length of the event.
This is the size of the memory used to record this
event, and not the size of the data pay load.
ring_buffer_time_delta(event): get the time delta of the event
This returns the delta time stamp since the last event.
Note: Even though this is in the header, there should
be no reason to access this directly, accept
for debugging.
ring_buffer_event_data(event): get the data from the event
This is the function to use to get the actual data
from the event. Note, it is only a pointer to the
data inside the buffer. This data must be copied to
another location otherwise you risk it being written
over in the buffer.
ring_buffer_lock: A way to lock the entire buffer.
ring_buffer_unlock: unlock the buffer.
ring_buffer_alloc: create a new ring buffer. Can choose between
overwrite or consumer/producer mode. Overwrite will
overwrite old data, where as consumer producer will
throw away new data if the consumer catches up with the
producer. The consumer/producer is the default.
ring_buffer_free: free the ring buffer.
ring_buffer_resize: resize the buffer. Changes the size of each cpu
buffer. Note, it is up to the caller to provide that
the buffer is not being used while this is happening.
This requirement may go away but do not count on it.
ring_buffer_lock_reserve: locks the ring buffer and allocates an
entry on the buffer to write to.
ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
the buffer.
ring_buffer_write: writes some data into the ring buffer.
ring_buffer_peek: Look at a next item in the cpu buffer.
ring_buffer_consume: get the next item in the cpu buffer and
consume it. That is, this function increments the head
pointer.
ring_buffer_read_start: Start an iterator of a cpu buffer.
For now, this disables the cpu buffer, until you issue
a finish. This is just because we do not want the iterator
to be overwritten. This restriction may change in the future.
But note, this is used for static reading of a buffer which
is usually done "after" a trace. Live readings would want
to use the ring_buffer_consume above, which will not
disable the ring buffer.
ring_buffer_read_finish: Finishes the read iterator and reenables
the ring buffer.
ring_buffer_iter_peek: Look at the next item in the cpu iterator.
ring_buffer_read: Read the iterator and increment it.
ring_buffer_iter_reset: Reset the iterator to point to the beginning
of the cpu buffer.
ring_buffer_iter_empty: Returns true if the iterator is at the end
of the cpu buffer.
ring_buffer_size: returns the size in bytes of each cpu buffer.
Note, the real size is this times the number of CPUs.
ring_buffer_reset_cpu: Sets the cpu buffer to empty
ring_buffer_reset: sets all cpu buffers to empty
ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
cpu buffer of another buffer. This is handy when you
want to take a snap shot of a running trace on just one
cpu. Having a backup buffer, to swap with facilitates this.
Ftrace max latencies use this.
ring_buffer_empty: Returns true if the ring buffer is empty.
ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.
ring_buffer_record_disable: disable all cpu buffers (read only)
ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
ring_buffer_record_enable: enable all cpu buffers.
ring_buffer_record_enabl_cpu: enable a single cpu buffer.
ring_buffer_entries: The number of entries in a ring buffer.
ring_buffer_overruns: The number of entries removed due to writing wrap.
ring_buffer_time_stamp: Get the time stamp used by the ring buffer
ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
into nanosecs.
I still need to implement the GTOD feature. But we need support from
the cpu frequency infrastructure. But this can be done at a later
time without affecting the ring buffer interface.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-30 07:02:38 +04:00
2009-04-08 12:14:01 +04:00
config EVENT_TRACING
2009-05-25 14:11:59 +04:00
select CONTEXT_SWITCH_TRACER
2019-11-20 16:38:07 +03:00
select GLOB
2009-05-25 14:11:59 +04:00
bool
config CONTEXT_SWITCH_TRACER
2009-04-08 12:14:01 +04:00
bool
2009-09-04 22:24:40 +04:00
config RING_BUFFER_ALLOW_SWAP
bool
help
Allow the use of ring_buffer_swap_cpu.
Adds a very slight overhead to tracing when enabled.
2018-07-31 01:24:23 +03:00
config PREEMPTIRQ_TRACEPOINTS
bool
depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS
select TRACING
default y
help
Create preempt/irq toggle tracepoints if needed, so that other parts
of the kernel can use them to generate or add hooks to them.
2009-05-28 23:50:13 +04:00
# All tracer options should select GENERIC_TRACER. For those options that are
# enabled by all tracers (context switch and event tracer) they select TRACING.
# This allows those options to appear when no other tracer is selected. But the
# options do not appear when something else selects it. We need the two options
# GENERIC_TRACER and TRACING to avoid circular dependencies to accomplish the
2009-12-21 23:01:17 +03:00
# hiding of the automatic options.
2009-05-28 23:50:13 +04:00
2008-05-12 23:20:42 +04:00
config TRACING
bool
tracing: unified trace buffer
This is a unified tracing buffer that implements a ring buffer that
hopefully everyone will eventually be able to use.
The events recorded into the buffer have the following structure:
struct ring_buffer_event {
u32 type:2, len:3, time_delta:27;
u32 array[];
};
The minimum size of an event is 8 bytes. All events are 4 byte
aligned inside the buffer.
There are 4 types (all internal use for the ring buffer, only
the data type is exported to the interface users).
RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
of a buffer page.
RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
is greater than the 27 bit delta can hold. We add another
32 bits, and record that in its own event (8 byte size).
RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
help keep the buffer timestamps in sync.
RINGBUF_TYPE_DATA: The event actually holds user data.
The "len" field is only three bits. Since the data must be
4 byte aligned, this field is shifted left by 2, giving a
max length of 28 bytes. If the data load is greater than 28
bytes, the first array field holds the full length of the
data load and the len field is set to zero.
Example, data size of 7 bytes:
type = RINGBUF_TYPE_DATA
len = 2
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0..1]: <7 bytes of data> <1 byte empty>
This event is saved in 12 bytes of the buffer.
An event with 82 bytes of data:
type = RINGBUF_TYPE_DATA
len = 0
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0]: 84 (Note the alignment)
array[1..14]: <82 bytes of data> <2 bytes empty>
The above event is saved in 92 bytes (if my math is correct).
82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.
Do not reference the above event struct directly. Use the following
functions to gain access to the event table, since the
ring_buffer_event structure may change in the future.
ring_buffer_event_length(event): get the length of the event.
This is the size of the memory used to record this
event, and not the size of the data pay load.
ring_buffer_time_delta(event): get the time delta of the event
This returns the delta time stamp since the last event.
Note: Even though this is in the header, there should
be no reason to access this directly, accept
for debugging.
ring_buffer_event_data(event): get the data from the event
This is the function to use to get the actual data
from the event. Note, it is only a pointer to the
data inside the buffer. This data must be copied to
another location otherwise you risk it being written
over in the buffer.
ring_buffer_lock: A way to lock the entire buffer.
ring_buffer_unlock: unlock the buffer.
ring_buffer_alloc: create a new ring buffer. Can choose between
overwrite or consumer/producer mode. Overwrite will
overwrite old data, where as consumer producer will
throw away new data if the consumer catches up with the
producer. The consumer/producer is the default.
ring_buffer_free: free the ring buffer.
ring_buffer_resize: resize the buffer. Changes the size of each cpu
buffer. Note, it is up to the caller to provide that
the buffer is not being used while this is happening.
This requirement may go away but do not count on it.
ring_buffer_lock_reserve: locks the ring buffer and allocates an
entry on the buffer to write to.
ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
the buffer.
ring_buffer_write: writes some data into the ring buffer.
ring_buffer_peek: Look at a next item in the cpu buffer.
ring_buffer_consume: get the next item in the cpu buffer and
consume it. That is, this function increments the head
pointer.
ring_buffer_read_start: Start an iterator of a cpu buffer.
For now, this disables the cpu buffer, until you issue
a finish. This is just because we do not want the iterator
to be overwritten. This restriction may change in the future.
But note, this is used for static reading of a buffer which
is usually done "after" a trace. Live readings would want
to use the ring_buffer_consume above, which will not
disable the ring buffer.
ring_buffer_read_finish: Finishes the read iterator and reenables
the ring buffer.
ring_buffer_iter_peek: Look at the next item in the cpu iterator.
ring_buffer_read: Read the iterator and increment it.
ring_buffer_iter_reset: Reset the iterator to point to the beginning
of the cpu buffer.
ring_buffer_iter_empty: Returns true if the iterator is at the end
of the cpu buffer.
ring_buffer_size: returns the size in bytes of each cpu buffer.
Note, the real size is this times the number of CPUs.
ring_buffer_reset_cpu: Sets the cpu buffer to empty
ring_buffer_reset: sets all cpu buffers to empty
ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
cpu buffer of another buffer. This is handy when you
want to take a snap shot of a running trace on just one
cpu. Having a backup buffer, to swap with facilitates this.
Ftrace max latencies use this.
ring_buffer_empty: Returns true if the ring buffer is empty.
ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.
ring_buffer_record_disable: disable all cpu buffers (read only)
ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
ring_buffer_record_enable: enable all cpu buffers.
ring_buffer_record_enabl_cpu: enable a single cpu buffer.
ring_buffer_entries: The number of entries in a ring buffer.
ring_buffer_overruns: The number of entries removed due to writing wrap.
ring_buffer_time_stamp: Get the time stamp used by the ring buffer
ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
into nanosecs.
I still need to implement the GTOD feature. But we need support from
the cpu frequency infrastructure. But this can be done at a later
time without affecting the ring buffer interface.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-30 07:02:38 +04:00
select RING_BUFFER
2008-10-31 22:50:41 +03:00
select STACKTRACE if STACKTRACE_SUPPORT
2008-07-23 16:15:22 +04:00
select TRACEPOINTS
2008-10-29 18:15:57 +03:00
select NOP_TRACER
2009-03-06 19:21:49 +03:00
select BINARY_PRINTF
2009-04-08 12:14:01 +04:00
select EVENT_TRACING
trace: Stop compiling in trace_clock unconditionally
Commit 56449f437 "tracing: make the trace clocks available generally",
in April 2009, made trace_clock available unconditionally, since
CONFIG_X86_DS used it too.
Commit faa4602e47 "x86, perf, bts, mm: Delete the never used BTS-ptrace code",
in March 2010, removed CONFIG_X86_DS, and now only CONFIG_RING_BUFFER (split
out from CONFIG_TRACING for general use) has a dependency on trace_clock. So,
only compile in trace_clock with CONFIG_RING_BUFFER or CONFIG_TRACING
enabled.
Link: http://lkml.kernel.org/r/20120903024513.GA19583@leaf
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-03 06:45:14 +04:00
select TRACE_CLOCK
2008-05-12 23:20:42 +04:00
2009-05-28 23:50:13 +04:00
config GENERIC_TRACER
bool
select TRACING
2009-03-05 23:19:55 +03:00
#
# Minimum requirements an architecture has to meet for us to
# be able to offer generic tracing facilities:
#
config TRACING_SUPPORT
bool
2018-05-02 14:29:48 +03:00
depends on TRACE_IRQFLAGS_SUPPORT
2009-03-05 23:19:55 +03:00
depends on STACKTRACE_SUPPORT
2009-03-06 04:40:53 +03:00
default y
2009-03-05 23:19:55 +03:00
if TRACING_SUPPORT
2009-04-20 18:47:36 +04:00
menuconfig FTRACE
bool "Tracers"
2009-05-07 20:49:27 +04:00
default y if DEBUG_KERNEL
2009-04-20 18:47:36 +04:00
help
2009-12-21 23:01:17 +03:00
Enable the kernel tracing infrastructure.
2009-04-20 18:47:36 +04:00
if FTRACE
2008-10-21 18:31:18 +04:00
2020-01-30 00:30:30 +03:00
config BOOTTIME_TRACING
bool "Boot-time Tracing support"
2020-02-20 15:18:33 +03:00
depends on TRACING
select BOOT_CONFIG
2020-01-30 00:30:30 +03:00
help
Enable developer to setup ftrace subsystem via supplemental
kernel cmdline at boot time for debugging (tracing) driver
initialization and boot process.
2008-10-07 03:06:12 +04:00
config FUNCTION_TRACER
ftrace: function tracer
This is a simple trace that uses the ftrace infrastructure. It is
designed to be fast and small, and easy to use. It is useful to
record things that happen over a very short period of time, and
not to analyze the system in general.
Updates:
available_tracers
"function" is added to this file.
current_tracer
To enable the function tracer:
echo function > /debugfs/tracing/current_tracer
To disable the tracer:
echo disable > /debugfs/tracing/current_tracer
The output of the function_trace file is as follows
"echo noverbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
swapper-0 0d.h. 1595128us+: set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper-0 0d.h. 1595131us+: _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
Or with verbose turned on:
"echo verbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
swapper 0 0 9 00000000 00000000 [f3675f41] 1595.128ms (+0.003ms): set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper 0 0 9 00000000 00000001 [f3675f45] 1595.131ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
swapper 0 0 9 00000000 00000002 [f3675f48] 1595.135ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
The "trace" file is not affected by the verbose mode, but is by the symonly.
echo "nosymonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479967] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <ffffffff80337a4d> <-- _spin_unlock_irqrestore+0xe/0x5a <ffffffff8048cc8f>
[ 81.479967] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <ffffffff8048ccbf> <-- sub_preempt_count+0xc/0x7a <ffffffff80233d7b>
[ 81.479968] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <ffffffff80233d9f> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
[ 81.479968] CPU 0: bash:3154 vfs_write+0x11d/0x155 <ffffffff8029a043> <-- dnotify_parent+0x12/0x78 <ffffffff802d54fb>
[ 81.479968] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <ffffffff802d5516> <-- _spin_lock+0xe/0x70 <ffffffff8048c910>
[ 81.479969] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <ffffffff8048c91d> <-- add_preempt_count+0xe/0x77 <ffffffff80233df7>
[ 81.479969] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <ffffffff80233e27> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
echo "symonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479913] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <-- _spin_unlock_irqrestore+0xe/0x5a
[ 81.479913] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <-- sub_preempt_count+0xc/0x7a
[ 81.479913] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <-- in_lock_functions+0x9/0x24
[ 81.479914] CPU 0: bash:3154 vfs_write+0x11d/0x155 <-- dnotify_parent+0x12/0x78
[ 81.479914] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <-- _spin_lock+0xe/0x70
[ 81.479914] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <-- add_preempt_count+0xe/0x77
[ 81.479914] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <-- in_lock_functions+0x9/0x24
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 23:20:42 +04:00
bool "Kernel Function Tracer"
2008-10-07 03:06:12 +04:00
depends on HAVE_FUNCTION_TRACER
2009-02-19 06:06:18 +03:00
select KALLSYMS
2009-05-28 23:50:13 +04:00
select GENERIC_TRACER
2008-05-12 23:20:42 +04:00
select CONTEXT_SWITCH_TRACER
2017-04-06 17:28:12 +03:00
select GLOB
2019-07-27 00:19:38 +03:00
select TASKS_RCU if PREEMPTION
2020-04-03 22:10:28 +03:00
select TASKS_RUDE_RCU
ftrace: function tracer
This is a simple trace that uses the ftrace infrastructure. It is
designed to be fast and small, and easy to use. It is useful to
record things that happen over a very short period of time, and
not to analyze the system in general.
Updates:
available_tracers
"function" is added to this file.
current_tracer
To enable the function tracer:
echo function > /debugfs/tracing/current_tracer
To disable the tracer:
echo disable > /debugfs/tracing/current_tracer
The output of the function_trace file is as follows
"echo noverbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
swapper-0 0d.h. 1595128us+: set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper-0 0d.h. 1595131us+: _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
Or with verbose turned on:
"echo verbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
swapper 0 0 9 00000000 00000000 [f3675f41] 1595.128ms (+0.003ms): set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper 0 0 9 00000000 00000001 [f3675f45] 1595.131ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
swapper 0 0 9 00000000 00000002 [f3675f48] 1595.135ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
The "trace" file is not affected by the verbose mode, but is by the symonly.
echo "nosymonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479967] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <ffffffff80337a4d> <-- _spin_unlock_irqrestore+0xe/0x5a <ffffffff8048cc8f>
[ 81.479967] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <ffffffff8048ccbf> <-- sub_preempt_count+0xc/0x7a <ffffffff80233d7b>
[ 81.479968] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <ffffffff80233d9f> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
[ 81.479968] CPU 0: bash:3154 vfs_write+0x11d/0x155 <ffffffff8029a043> <-- dnotify_parent+0x12/0x78 <ffffffff802d54fb>
[ 81.479968] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <ffffffff802d5516> <-- _spin_lock+0xe/0x70 <ffffffff8048c910>
[ 81.479969] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <ffffffff8048c91d> <-- add_preempt_count+0xe/0x77 <ffffffff80233df7>
[ 81.479969] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <ffffffff80233e27> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
echo "symonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479913] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <-- _spin_unlock_irqrestore+0xe/0x5a
[ 81.479913] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <-- sub_preempt_count+0xc/0x7a
[ 81.479913] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <-- in_lock_functions+0x9/0x24
[ 81.479914] CPU 0: bash:3154 vfs_write+0x11d/0x155 <-- dnotify_parent+0x12/0x78
[ 81.479914] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <-- _spin_lock+0xe/0x70
[ 81.479914] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <-- add_preempt_count+0xe/0x77
[ 81.479914] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <-- in_lock_functions+0x9/0x24
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 23:20:42 +04:00
help
Enable the kernel to trace every kernel function. This is done
by using a compiler feature to insert a small, 5-byte No-Operation
2009-12-21 23:01:17 +03:00
instruction at the beginning of every kernel function, which NOP
ftrace: function tracer
This is a simple trace that uses the ftrace infrastructure. It is
designed to be fast and small, and easy to use. It is useful to
record things that happen over a very short period of time, and
not to analyze the system in general.
Updates:
available_tracers
"function" is added to this file.
current_tracer
To enable the function tracer:
echo function > /debugfs/tracing/current_tracer
To disable the tracer:
echo disable > /debugfs/tracing/current_tracer
The output of the function_trace file is as follows
"echo noverbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
swapper-0 0d.h. 1595128us+: set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper-0 0d.h. 1595131us+: _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
Or with verbose turned on:
"echo verbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
swapper 0 0 9 00000000 00000000 [f3675f41] 1595.128ms (+0.003ms): set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper 0 0 9 00000000 00000001 [f3675f45] 1595.131ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
swapper 0 0 9 00000000 00000002 [f3675f48] 1595.135ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
The "trace" file is not affected by the verbose mode, but is by the symonly.
echo "nosymonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479967] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <ffffffff80337a4d> <-- _spin_unlock_irqrestore+0xe/0x5a <ffffffff8048cc8f>
[ 81.479967] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <ffffffff8048ccbf> <-- sub_preempt_count+0xc/0x7a <ffffffff80233d7b>
[ 81.479968] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <ffffffff80233d9f> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
[ 81.479968] CPU 0: bash:3154 vfs_write+0x11d/0x155 <ffffffff8029a043> <-- dnotify_parent+0x12/0x78 <ffffffff802d54fb>
[ 81.479968] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <ffffffff802d5516> <-- _spin_lock+0xe/0x70 <ffffffff8048c910>
[ 81.479969] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <ffffffff8048c91d> <-- add_preempt_count+0xe/0x77 <ffffffff80233df7>
[ 81.479969] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <ffffffff80233e27> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
echo "symonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479913] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <-- _spin_unlock_irqrestore+0xe/0x5a
[ 81.479913] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <-- sub_preempt_count+0xc/0x7a
[ 81.479913] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <-- in_lock_functions+0x9/0x24
[ 81.479914] CPU 0: bash:3154 vfs_write+0x11d/0x155 <-- dnotify_parent+0x12/0x78
[ 81.479914] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <-- _spin_lock+0xe/0x70
[ 81.479914] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <-- add_preempt_count+0xe/0x77
[ 81.479914] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <-- in_lock_functions+0x9/0x24
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 23:20:42 +04:00
sequence is then dynamically patched into a tracer call when
tracing is enabled by the administrator. If it's runtime disabled
(the bootup default), then the overhead of the instructions is very
small and not measurable even in micro-benchmarks.
2008-05-12 23:20:42 +04:00
2008-11-25 23:07:04 +03:00
config FUNCTION_GRAPH_TRACER
bool "Kernel Function Graph Tracer"
depends on HAVE_FUNCTION_GRAPH_TRACER
2008-11-11 09:14:25 +03:00
depends on FUNCTION_TRACER
function-graph: disable when both x86_32 and optimize for size are configured
On x86_32, when optimize for size is set, gcc may align the frame pointer
and make a copy of the the return address inside the stack frame.
The return address that is located in the stack frame may not be
the one used to return to the calling function. This will break the
function graph tracer.
The function graph tracer replaces the return address with a jump to a hook
function that can trace the exit of the function. If it only replaces
a copy, then the hook will not be called when the function returns.
Worse yet, when the parent function returns, the function graph tracer
will return back to the location of the child function which will
easily crash the kernel with weird results.
To see the problem, when i386 is compiled with -Os we get:
c106be03: 57 push %edi
c106be04: 8d 7c 24 08 lea 0x8(%esp),%edi
c106be08: 83 e4 e0 and $0xffffffe0,%esp
c106be0b: ff 77 fc pushl 0xfffffffc(%edi)
c106be0e: 55 push %ebp
c106be0f: 89 e5 mov %esp,%ebp
c106be11: 57 push %edi
c106be12: 56 push %esi
c106be13: 53 push %ebx
c106be14: 81 ec 8c 00 00 00 sub $0x8c,%esp
c106be1a: e8 f5 57 fb ff call c1021614 <mcount>
When it is compiled with -O2 instead we get:
c10896f0: 55 push %ebp
c10896f1: 89 e5 mov %esp,%ebp
c10896f3: 83 ec 28 sub $0x28,%esp
c10896f6: 89 5d f4 mov %ebx,0xfffffff4(%ebp)
c10896f9: 89 75 f8 mov %esi,0xfffffff8(%ebp)
c10896fc: 89 7d fc mov %edi,0xfffffffc(%ebp)
c10896ff: e8 d0 08 fa ff call c1029fd4 <mcount>
The compile with -Os will align the stack pointer then set up the
frame pointer (%ebp), and it copies the return address back into
the stack frame. The change to the return address in mcount is done
to the copy and not the real place holder of the return address.
Then compile with -O2 sets up the frame pointer first, this makes
the change to the return address by mcount affect where the function
will jump on exit.
Reported-by: Jake Edge <jake@lwn.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-18 20:53:21 +04:00
depends on !X86_32 || !CC_OPTIMIZE_FOR_SIZE
2008-12-03 12:33:58 +03:00
default y
2008-11-11 09:14:25 +03:00
help
2008-11-25 23:07:04 +03:00
Enable the kernel to trace a function at both its return
and its entry.
2009-01-26 13:12:25 +03:00
Its first purpose is to trace the duration of functions and
draw a call graph for each thread with some information like
2009-12-21 23:01:17 +03:00
the return value. This is done by setting the current return
2009-01-26 13:12:25 +03:00
address on the current task structure into a stack of calls.
2008-11-11 09:14:25 +03:00
2020-01-30 00:19:10 +03:00
config DYNAMIC_FTRACE
bool "enable/disable function tracing dynamically"
depends on FUNCTION_TRACER
depends on HAVE_DYNAMIC_FTRACE
default y
help
This option will modify all the calls to function tracing
dynamically (will patch them out of the binary image and
replace them with a No-Op instruction) on boot up. During
compile time, a table is made of all the locations that ftrace
can function trace, and this table is linked into the kernel
image. When this is enabled, functions can be individually
enabled, and the functions not enabled will not affect
performance of the system.
See the files in /sys/kernel/debug/tracing:
available_filter_functions
set_ftrace_filter
set_ftrace_notrace
This way a CONFIG_FUNCTION_TRACER kernel is slightly larger, but
otherwise has native performance as long as no tracing is active.
config DYNAMIC_FTRACE_WITH_REGS
def_bool y
depends on DYNAMIC_FTRACE
depends on HAVE_DYNAMIC_FTRACE_WITH_REGS
config DYNAMIC_FTRACE_WITH_DIRECT_CALLS
def_bool y
depends on DYNAMIC_FTRACE
depends on HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
config FUNCTION_PROFILER
bool "Kernel function profiler"
depends on FUNCTION_TRACER
default n
help
This option enables the kernel function profiler. A file is created
in debugfs called function_profile_enabled which defaults to zero.
When a 1 is echoed into this file profiling begins, and when a
zero is entered, profiling stops. A "functions" file is created in
the trace_stat directory; this file shows the list of functions that
have been hit and their counters.
If in doubt, say N.
config STACK_TRACER
bool "Trace max stack"
depends on HAVE_FUNCTION_TRACER
select FUNCTION_TRACER
select STACKTRACE
select KALLSYMS
help
This special tracer records the maximum stack footprint of the
kernel and displays it in /sys/kernel/debug/tracing/stack_trace.
This tracer works by hooking into every function call that the
kernel executes, and keeping a maximum stack depth value and
stack-trace saved. If this is configured with DYNAMIC_FTRACE
then it will not have any overhead while the stack tracer
is disabled.
To enable the stack tracer on bootup, pass in 'stacktrace'
on the kernel command line.
The stack tracer can also be enabled or disabled via the
sysctl kernel.stack_tracer_enabled
Say N if unsure.
2018-07-31 01:24:23 +03:00
config TRACE_PREEMPT_TOGGLE
bool
help
Enables hooks which will be called when preemption is first disabled,
and last enabled.
2009-03-20 19:50:56 +03:00
2008-05-12 23:20:42 +04:00
config IRQSOFF_TRACER
bool "Interrupts-off Latency Tracer"
default n
depends on TRACE_IRQFLAGS_SUPPORT
2010-07-14 04:56:20 +04:00
depends on !ARCH_USES_GETTIMEOFFSET
2008-05-12 23:20:42 +04:00
select TRACE_IRQFLAGS
2009-05-28 23:50:13 +04:00
select GENERIC_TRACER
2008-05-12 23:20:42 +04:00
select TRACER_MAX_TRACE
2009-09-04 22:24:40 +04:00
select RING_BUFFER_ALLOW_SWAP
2013-03-05 16:30:24 +04:00
select TRACER_SNAPSHOT
2013-03-05 23:50:23 +04:00
select TRACER_SNAPSHOT_PER_CPU_SWAP
2008-05-12 23:20:42 +04:00
help
This option measures the time spent in irqs-off critical
sections, with microsecond accuracy.
The default measurement method is a maximum search, which is
disabled by default and can be runtime (re-)started
via:
2009-06-02 10:01:37 +04:00
echo 0 > /sys/kernel/debug/tracing/tracing_max_latency
2008-05-12 23:20:42 +04:00
2009-12-21 23:01:17 +03:00
(Note that kernel size and overhead increase with this option
2008-05-12 23:20:42 +04:00
enabled. This option and the preempt-off timing option can be
used together or separately.)
config PREEMPT_TRACER
bool "Preemption-off Latency Tracer"
default n
2010-07-14 04:56:20 +04:00
depends on !ARCH_USES_GETTIMEOFFSET
2019-07-27 00:19:40 +03:00
depends on PREEMPTION
2009-05-28 23:50:13 +04:00
select GENERIC_TRACER
2008-05-12 23:20:42 +04:00
select TRACER_MAX_TRACE
2009-09-04 22:24:40 +04:00
select RING_BUFFER_ALLOW_SWAP
2013-03-05 16:30:24 +04:00
select TRACER_SNAPSHOT
2013-03-05 23:50:23 +04:00
select TRACER_SNAPSHOT_PER_CPU_SWAP
2018-07-31 01:24:23 +03:00
select TRACE_PREEMPT_TOGGLE
2008-05-12 23:20:42 +04:00
help
2009-12-21 23:01:17 +03:00
This option measures the time spent in preemption-off critical
2008-05-12 23:20:42 +04:00
sections, with microsecond accuracy.
The default measurement method is a maximum search, which is
disabled by default and can be runtime (re-)started
via:
2009-06-02 10:01:37 +04:00
echo 0 > /sys/kernel/debug/tracing/tracing_max_latency
2008-05-12 23:20:42 +04:00
2009-12-21 23:01:17 +03:00
(Note that kernel size and overhead increase with this option
2008-05-12 23:20:42 +04:00
enabled. This option and the irqs-off timing option can be
used together or separately.)
2008-05-12 23:20:42 +04:00
config SCHED_TRACER
bool "Scheduling Latency Tracer"
2009-05-28 23:50:13 +04:00
select GENERIC_TRACER
2008-05-12 23:20:42 +04:00
select CONTEXT_SWITCH_TRACER
select TRACER_MAX_TRACE
2013-03-05 16:30:24 +04:00
select TRACER_SNAPSHOT
2008-05-12 23:20:42 +04:00
help
This tracer tracks the latency of the highest priority task
to be scheduled in, starting from the point it has woken up.
2016-06-23 19:45:36 +03:00
config HWLAT_TRACER
bool "Tracer to detect hardware latencies (like SMIs)"
select GENERIC_TRACER
help
This tracer, when enabled will create one or more kernel threads,
2017-06-13 14:06:59 +03:00
depending on what the cpumask file is set to, which each thread
2016-06-23 19:45:36 +03:00
spinning in a loop looking for interruptions caused by
something other than the kernel. For example, if a
System Management Interrupt (SMI) takes a noticeable amount of
time, this tracer will detect it. This is useful for testing
if a system is reliable for Real Time tasks.
Some files are created in the tracing directory when this
is enabled:
hwlat_detector/width - time in usecs for how long to spin for
hwlat_detector/window - time in usecs between the start of each
iteration
A kernel thread is created that will spin with interrupts disabled
2017-06-13 14:06:59 +03:00
for "width" microseconds in every "window" cycle. It will not spin
2016-06-23 19:45:36 +03:00
for "window - width" microseconds, where the system can
continue to operate.
The output will appear in the trace and trace_pipe files.
When the tracer is not running, it has no affect on the system,
but when it is running, it can cause the system to be
periodically non responsive. Do not run this tracer on a
production system.
To enable this tracer, echo in "hwlat" into the current_tracer
file. Every time a latency is greater than tracing_thresh, it will
be recorded into the ring buffer.
2020-01-30 00:26:45 +03:00
config MMIOTRACE
bool "Memory mapped IO tracing"
depends on HAVE_MMIOTRACE_SUPPORT && PCI
select GENERIC_TRACER
help
Mmiotrace traces Memory Mapped I/O access and is meant for
debugging and reverse engineering. It is called from the ioremap
implementation and works via page faults. Tracing is disabled by
default and can be enabled at run-time.
See Documentation/trace/mmiotrace.rst.
If you are not helping to develop drivers, say N.
2009-05-29 00:31:21 +04:00
config ENABLE_DEFAULT_TRACERS
bool "Trace process context switches and events"
2009-05-28 23:50:13 +04:00
depends on !GENERIC_TRACER
2009-02-24 18:21:36 +03:00
select TRACING
help
2009-12-21 23:01:17 +03:00
This tracer hooks to various trace points in the kernel,
2009-02-24 18:21:36 +03:00
allowing the user to pick and choose which trace point they
2009-05-29 00:31:21 +04:00
want to trace. It also includes the sched_switch tracer plugin.
2009-04-20 18:59:34 +04:00
2009-03-07 07:52:59 +03:00
config FTRACE_SYSCALLS
bool "Trace syscalls"
2009-08-25 01:43:11 +04:00
depends on HAVE_SYSCALL_TRACEPOINTS
2009-05-28 23:50:13 +04:00
select GENERIC_TRACER
2009-03-16 00:10:38 +03:00
select KALLSYMS
2009-03-07 07:52:59 +03:00
help
Basic tracer to catch the syscall entry and exit events.
2012-12-26 06:53:00 +04:00
config TRACER_SNAPSHOT
bool "Create a snapshot trace buffer"
select TRACER_MAX_TRACE
help
Allow tracing users to take snapshot of the current buffer using the
ftrace interface, e.g.:
echo 1 > /sys/kernel/debug/tracing/snapshot
cat snapshot
2013-03-05 23:50:23 +04:00
config TRACER_SNAPSHOT_PER_CPU_SWAP
2019-11-20 16:38:07 +03:00
bool "Allow snapshot to swap per CPU"
2013-03-05 23:50:23 +04:00
depends on TRACER_SNAPSHOT
select RING_BUFFER_ALLOW_SWAP
help
Allow doing a snapshot of a single CPU buffer instead of a
full swap (all buffers). If this is set, then the following is
allowed:
echo 1 > /sys/kernel/debug/tracing/per_cpu/cpu2/snapshot
After which, only the tracing buffer for CPU 2 was swapped with
the main tracing buffer, and the other CPU buffers remain the same.
When this is enabled, this adds a little more overhead to the
trace recording, as it needs to add some checks to synchronize
recording with swaps. But this does not affect the performance
of the overall system. This is enabled by default when the preempt
or irq latency tracers are enabled, as those need to swap as well
and already adds the overhead (plus a lot more).
2008-11-12 23:24:24 +03:00
config TRACE_BRANCH_PROFILING
2009-04-20 18:27:58 +04:00
bool
2009-05-28 23:50:13 +04:00
select GENERIC_TRACER
2009-04-20 18:27:58 +04:00
choice
prompt "Branch Profiling"
default BRANCH_PROFILE_NONE
help
The branch profiling is a software profiler. It will add hooks
into the C conditionals to test which path a branch takes.
The likely/unlikely profiler only looks at the conditions that
are annotated with a likely or unlikely macro.
2009-12-21 23:01:17 +03:00
The "all branch" profiler will profile every if-statement in the
2009-04-20 18:27:58 +04:00
kernel. This profiler will also enable the likely/unlikely
2009-12-21 23:01:17 +03:00
profiler.
2009-04-20 18:27:58 +04:00
2009-12-21 23:01:17 +03:00
Either of the above profilers adds a bit of overhead to the system.
If unsure, choose "No branch profiling".
2009-04-20 18:27:58 +04:00
config BRANCH_PROFILE_NONE
bool "No branch profiling"
help
2009-12-21 23:01:17 +03:00
No branch profiling. Branch profiling adds a bit of overhead.
Only enable it if you want to analyse the branching behavior.
Otherwise keep it disabled.
2009-04-20 18:27:58 +04:00
config PROFILE_ANNOTATED_BRANCHES
bool "Trace likely/unlikely profiler"
select TRACE_BRANCH_PROFILING
2008-11-12 08:14:39 +03:00
help
2012-04-17 19:01:21 +04:00
This tracer profiles all likely and unlikely macros
2008-11-12 08:14:39 +03:00
in the kernel. It will display the results in:
2011-03-17 03:17:08 +03:00
/sys/kernel/debug/tracing/trace_stat/branch_annotated
2008-11-12 08:14:39 +03:00
2009-12-21 23:01:17 +03:00
Note: this will add a significant overhead; only turn this
2008-11-12 08:14:39 +03:00
on if you need to profile the system's use of these macros.
2008-11-21 09:30:54 +03:00
config PROFILE_ALL_BRANCHES
2018-01-15 22:07:27 +03:00
bool "Profile all if conditionals" if !FORTIFY_SOURCE
2009-04-20 18:27:58 +04:00
select TRACE_BRANCH_PROFILING
2008-11-21 09:30:54 +03:00
help
This tracer profiles all branch conditions. Every if ()
taken in the kernel is recorded whether it hit or miss.
The results will be displayed in:
2011-03-17 03:17:08 +03:00
/sys/kernel/debug/tracing/trace_stat/branch_all
2008-11-21 09:30:54 +03:00
2009-04-20 18:27:58 +04:00
This option also enables the likely/unlikely profiler.
2008-11-21 09:30:54 +03:00
This configuration, when enabled, will impose a great overhead
on the system. This should only be enabled when the system
2009-12-21 23:01:17 +03:00
is to be analyzed in much detail.
2009-04-20 18:27:58 +04:00
endchoice
2008-11-21 09:30:54 +03:00
2008-11-12 23:24:24 +03:00
config TRACING_BRANCHES
2008-11-12 08:14:40 +03:00
bool
help
Selected by tracers that will trace the likely and unlikely
conditions. This prevents the tracers themselves from being
profiled. Profiling the tracing infrastructure can only happen
when the likelys and unlikelys are not being traced.
2008-11-12 23:24:24 +03:00
config BRANCH_TRACER
2008-11-12 08:14:40 +03:00
bool "Trace likely/unlikely instances"
2008-11-12 23:24:24 +03:00
depends on TRACE_BRANCH_PROFILING
select TRACING_BRANCHES
2008-11-12 08:14:40 +03:00
help
This traces the events of likely and unlikely condition
calls in the kernel. The difference between this and the
"Trace likely/unlikely profiler" is that this is not a
histogram of the callers, but actually places the calling
events into a running trace buffer to see when and where the
events happened, as well as their results.
Say N if unsure.
2009-02-07 22:46:45 +03:00
config BLK_DEV_IO_TRACE
2009-12-21 23:01:17 +03:00
bool "Support for tracing block IO actions"
2009-02-07 22:46:45 +03:00
depends on SYSFS
2009-02-09 14:06:54 +03:00
depends on BLOCK
2009-02-07 22:46:45 +03:00
select RELAY
select DEBUG_FS
select TRACEPOINTS
2009-05-28 23:50:13 +04:00
select GENERIC_TRACER
2009-02-07 22:46:45 +03:00
select STACKTRACE
help
Say Y here if you want to be able to trace the block layer actions
on a given queue. Tracing allows you to see any traffic happening
on a block device queue. For more information (and the userspace
support tools needed), fetch the blktrace tools from:
git://git.kernel.dk/blktrace.git
Tracing also is possible using the ftrace interface, e.g.:
echo 1 > /sys/block/sda/sda1/trace/enable
echo blk > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/trace_pipe
If unsure, say N.
2008-12-30 00:42:23 +03:00
2017-02-16 09:00:50 +03:00
config KPROBE_EVENTS
2009-08-14 00:35:11 +04:00
depends on KPROBES
2010-02-10 19:25:17 +03:00
depends on HAVE_REGS_AND_STACK_ACCESS_API
2009-11-04 03:12:47 +03:00
bool "Enable kprobes-based dynamic events"
2009-08-14 00:35:11 +04:00
select TRACING
2012-04-09 13:11:44 +04:00
select PROBE_EVENTS
2018-11-05 12:02:36 +03:00
select DYNAMIC_EVENTS
2009-11-04 03:12:47 +03:00
default y
2009-08-14 00:35:11 +04:00
help
2009-12-21 23:01:17 +03:00
This allows the user to add tracing events (similar to tracepoints)
on the fly via the ftrace interface. See
2018-05-08 21:14:57 +03:00
Documentation/trace/kprobetrace.rst for more details.
2009-11-04 03:12:47 +03:00
Those events can be inserted wherever kprobes can probe, and record
various register and memory values.
2009-12-21 23:01:17 +03:00
This option is also required by perf-probe subcommand of perf tools.
If you want to use perf tools, this option is strongly recommended.
2009-08-14 00:35:11 +04:00
2018-07-30 13:20:14 +03:00
config KPROBE_EVENTS_ON_NOTRACE
bool "Do NOT protect notrace function from kprobe events"
depends on KPROBE_EVENTS
depends on KPROBES_ON_FTRACE
default n
help
This is only for the developers who want to debug ftrace itself
using kprobe events.
If kprobes can use ftrace instead of breakpoint, ftrace related
functions are protected from kprobe-events to prevent an infinit
recursion or any unexpected execution path which leads to a kernel
crash.
This option disables such protection and allows you to put kprobe
events on ftrace functions for debugging ftrace by itself.
Note that this might let you shoot yourself in the foot.
If unsure, say N.
2017-02-16 09:00:50 +03:00
config UPROBE_EVENTS
2012-04-11 14:30:43 +04:00
bool "Enable uprobes-based dynamic events"
depends on ARCH_SUPPORTS_UPROBES
depends on MMU
2014-03-07 19:32:22 +04:00
depends on PERF_EVENTS
2012-04-11 14:30:43 +04:00
select UPROBES
select PROBE_EVENTS
2018-11-05 12:03:04 +03:00
select DYNAMIC_EVENTS
2012-04-11 14:30:43 +04:00
select TRACING
2017-03-16 18:42:02 +03:00
default y
2012-04-11 14:30:43 +04:00
help
This allows the user to add tracing events on top of userspace
dynamic events (similar to tracepoints) on the fly via the trace
events interface. Those events can be inserted wherever uprobes
can probe, and record various registers.
This option is required if you plan to use perf-probe subcommand
of perf tools on user space applications.
2015-04-02 16:51:39 +03:00
config BPF_EVENTS
depends on BPF_SYSCALL
2017-02-16 09:00:50 +03:00
depends on (KPROBE_EVENTS || UPROBE_EVENTS) && PERF_EVENTS
2015-04-02 16:51:39 +03:00
bool
default y
help
2019-08-21 02:08:57 +03:00
This allows the user to attach BPF programs to kprobe, uprobe, and
tracepoint events.
2015-04-02 16:51:39 +03:00
2018-11-05 12:02:08 +03:00
config DYNAMIC_EVENTS
def_bool n
2012-04-09 13:11:44 +04:00
config PROBE_EVENTS
def_bool n
2017-12-11 19:36:48 +03:00
config BPF_KPROBE_OVERRIDE
bool "Enable BPF programs to override a kprobed function"
depends on BPF_EVENTS
2018-01-12 20:55:03 +03:00
depends on FUNCTION_ERROR_INJECTION
2017-12-11 19:36:48 +03:00
default n
help
Allows BPF to override the execution of a probed function and
set a different return value. This is used for error injection.
ftrace: create __mcount_loc section
This patch creates a section in the kernel called "__mcount_loc".
This will hold a list of pointers to the mcount relocation for
each call site of mcount.
For example:
objdump -dr init/main.o
[...]
Disassembly of section .text:
0000000000000000 <do_one_initcall>:
0: 55 push %rbp
[...]
000000000000017b <init_post>:
17b: 55 push %rbp
17c: 48 89 e5 mov %rsp,%rbp
17f: 53 push %rbx
180: 48 83 ec 08 sub $0x8,%rsp
184: e8 00 00 00 00 callq 189 <init_post+0xe>
185: R_X86_64_PC32 mcount+0xfffffffffffffffc
[...]
We will add a section to point to each function call.
.section __mcount_loc,"a",@progbits
[...]
.quad .text + 0x185
[...]
The offset to of the mcount call site in init_post is an offset from
the start of the section, and not the start of the function init_post.
The mcount relocation is at the call site 0x185 from the start of the
.text section.
.text + 0x185 == init_post + 0xa
We need a way to add this __mcount_loc section in a way that we do not
lose the relocations after final link. The .text section here will
be attached to all other .text sections after final link and the
offsets will be meaningless. We need to keep track of where these
.text sections are.
To do this, we use the start of the first function in the section.
do_one_initcall. We can make a tmp.s file with this function as a reference
to the start of the .text section.
.section __mcount_loc,"a",@progbits
[...]
.quad do_one_initcall + 0x185
[...]
Then we can compile the tmp.s into a tmp.o
gcc -c tmp.s -o tmp.o
And link it into back into main.o.
ld -r main.o tmp.o -o tmp_main.o
mv tmp_main.o main.o
But we have a problem. What happens if the first function in a section
is not exported, and is a static function. The linker will not let
the tmp.o use it. This case exists in main.o as well.
Disassembly of section .init.text:
0000000000000000 <set_reset_devices>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <set_reset_devices+0x9>
5: R_X86_64_PC32 mcount+0xfffffffffffffffc
The first function in .init.text is a static function.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
The lowercase 't' means that set_reset_devices is local and is not exported.
If we simply try to link the tmp.o with the set_reset_devices we end
up with two symbols: one local and one global.
.section __mcount_loc,"a",@progbits
.quad set_reset_devices + 0x10
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
U set_reset_devices
We still have an undefined reference to set_reset_devices, and if we try
to compile the kernel, we will end up with an undefined reference to
set_reset_devices, or even worst, it could be exported someplace else,
and then we will have a reference to the wrong location.
To handle this case, we make an intermediate step using objcopy.
We convert set_reset_devices into a global exported symbol before linking
it with tmp.o and set it back afterwards.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
Now we have a section in main.o called __mcount_loc that we can place
somewhere in the kernel using vmlinux.ld.S and access it to convert
all these locations that call mcount into nops before starting SMP
and thus, eliminating the need to do this with kstop_machine.
Note, A well documented perl script (scripts/recordmcount.pl) is used
to do all this in one location.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-14 23:45:07 +04:00
config FTRACE_MCOUNT_RECORD
def_bool y
depends on DYNAMIC_FTRACE
depends on HAVE_FTRACE_MCOUNT_RECORD
2015-12-10 21:50:50 +03:00
config TRACING_MAP
bool
depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
help
tracing_map is a special-purpose lock-free map for tracing,
separated out as a stand-alone facility in order to allow it
to be shared between multiple tracers. It isn't meant to be
generally used outside of that context, and is normally
selected by tracers that use it.
2020-05-28 22:32:37 +03:00
config SYNTH_EVENTS
bool "Synthetic trace events"
select TRACING
select DYNAMIC_EVENTS
default n
help
Synthetic events are user-defined trace events that can be
used to combine data from other trace events or in fact any
data source. Synthetic events can be generated indirectly
via the trace() action of histogram triggers or directly
by way of an in-kernel API.
See Documentation/trace/events.rst or
Documentation/trace/histogram.rst for details and examples.
If in doubt, say N.
tracing: Add 'hist' event trigger command
'hist' triggers allow users to continually aggregate trace events,
which can then be viewed afterwards by simply reading a 'hist' file
containing the aggregation in a human-readable format.
The basic idea is very simple and boils down to a mechanism whereby
trace events, rather than being exhaustively dumped in raw form and
viewed directly, are automatically 'compressed' into meaningful tables
completely defined by the user.
This is done strictly via single-line command-line commands and
without the aid of any kind of programming language or interpreter.
A surprising number of typical use cases can be accomplished by users
via this simple mechanism. In fact, a large number of the tasks that
users typically do using the more complicated script-based tracing
tools, at least during the initial stages of an investigation, can be
accomplished by simply specifying a set of keys and values to be used
in the creation of a hash table.
The Linux kernel trace event subsystem happens to provide an extensive
list of keys and values ready-made for such a purpose in the form of
the event format files associated with each trace event. By simply
consulting the format file for field names of interest and by plugging
them into the hist trigger command, users can create an endless number
of useful aggregations to help with investigating various properties
of the system. See Documentation/trace/events.txt for examples.
hist triggers are implemented on top of the existing event trigger
infrastructure, and as such are consistent with the existing triggers
from a user's perspective as well.
The basic syntax follows the existing trigger syntax. Users start an
aggregation by writing a 'hist' trigger to the event of interest's
trigger file:
# echo hist:keys=xxx [ if filter] > event/trigger
Once a hist trigger has been set up, by default it continually
aggregates every matching event into a hash table using the event key
and a value field named 'hitcount'.
To view the aggregation at any point in time, simply read the 'hist'
file in the same directory as the 'trigger' file:
# cat event/hist
The detailed syntax provides additional options for user control, and
is described exhaustively in Documentation/trace/events.txt and in the
virtual tracing/README file in the tracing subsystem.
Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-03 21:54:42 +03:00
config HIST_TRIGGERS
bool "Histogram triggers"
depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
select TRACING_MAP
2016-07-03 16:51:34 +03:00
select TRACING
2018-11-05 12:03:33 +03:00
select DYNAMIC_EVENTS
2020-05-28 22:32:37 +03:00
select SYNTH_EVENTS
tracing: Add 'hist' event trigger command
'hist' triggers allow users to continually aggregate trace events,
which can then be viewed afterwards by simply reading a 'hist' file
containing the aggregation in a human-readable format.
The basic idea is very simple and boils down to a mechanism whereby
trace events, rather than being exhaustively dumped in raw form and
viewed directly, are automatically 'compressed' into meaningful tables
completely defined by the user.
This is done strictly via single-line command-line commands and
without the aid of any kind of programming language or interpreter.
A surprising number of typical use cases can be accomplished by users
via this simple mechanism. In fact, a large number of the tasks that
users typically do using the more complicated script-based tracing
tools, at least during the initial stages of an investigation, can be
accomplished by simply specifying a set of keys and values to be used
in the creation of a hash table.
The Linux kernel trace event subsystem happens to provide an extensive
list of keys and values ready-made for such a purpose in the form of
the event format files associated with each trace event. By simply
consulting the format file for field names of interest and by plugging
them into the hist trigger command, users can create an endless number
of useful aggregations to help with investigating various properties
of the system. See Documentation/trace/events.txt for examples.
hist triggers are implemented on top of the existing event trigger
infrastructure, and as such are consistent with the existing triggers
from a user's perspective as well.
The basic syntax follows the existing trigger syntax. Users start an
aggregation by writing a 'hist' trigger to the event of interest's
trigger file:
# echo hist:keys=xxx [ if filter] > event/trigger
Once a hist trigger has been set up, by default it continually
aggregates every matching event into a hash table using the event key
and a value field named 'hitcount'.
To view the aggregation at any point in time, simply read the 'hist'
file in the same directory as the 'trigger' file:
# cat event/hist
The detailed syntax provides additional options for user control, and
is described exhaustively in Documentation/trace/events.txt and in the
virtual tracing/README file in the tracing subsystem.
Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-03 21:54:42 +03:00
default n
help
Hist triggers allow one or more arbitrary trace event fields
to be aggregated into hash tables and dumped to stdout by
reading a debugfs/tracefs file. They're useful for
gathering quick and dirty (though precise) summaries of
event activity as an initial guide for further investigation
using more advanced tools.
2018-01-16 05:52:10 +03:00
Inter-event tracing of quantities such as latencies is also
supported using hist triggers under this option.
2018-06-26 12:49:11 +03:00
See Documentation/trace/histogram.rst.
tracing: Add 'hist' event trigger command
'hist' triggers allow users to continually aggregate trace events,
which can then be viewed afterwards by simply reading a 'hist' file
containing the aggregation in a human-readable format.
The basic idea is very simple and boils down to a mechanism whereby
trace events, rather than being exhaustively dumped in raw form and
viewed directly, are automatically 'compressed' into meaningful tables
completely defined by the user.
This is done strictly via single-line command-line commands and
without the aid of any kind of programming language or interpreter.
A surprising number of typical use cases can be accomplished by users
via this simple mechanism. In fact, a large number of the tasks that
users typically do using the more complicated script-based tracing
tools, at least during the initial stages of an investigation, can be
accomplished by simply specifying a set of keys and values to be used
in the creation of a hash table.
The Linux kernel trace event subsystem happens to provide an extensive
list of keys and values ready-made for such a purpose in the form of
the event format files associated with each trace event. By simply
consulting the format file for field names of interest and by plugging
them into the hist trigger command, users can create an endless number
of useful aggregations to help with investigating various properties
of the system. See Documentation/trace/events.txt for examples.
hist triggers are implemented on top of the existing event trigger
infrastructure, and as such are consistent with the existing triggers
from a user's perspective as well.
The basic syntax follows the existing trigger syntax. Users start an
aggregation by writing a 'hist' trigger to the event of interest's
trigger file:
# echo hist:keys=xxx [ if filter] > event/trigger
Once a hist trigger has been set up, by default it continually
aggregates every matching event into a hash table using the event key
and a value field named 'hitcount'.
To view the aggregation at any point in time, simply read the 'hist'
file in the same directory as the 'trigger' file:
# cat event/hist
The detailed syntax provides additional options for user control, and
is described exhaustively in Documentation/trace/events.txt and in the
virtual tracing/README file in the tracing subsystem.
Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-03 21:54:42 +03:00
If in doubt, say N.
tracing: Introduce trace event injection
We have been trying to use rasdaemon to monitor hardware errors like
correctable memory errors. rasdaemon uses trace events to monitor
various hardware errors. In order to test it, we have to inject some
hardware errors, unfortunately not all of them provide error
injections. MCE does provide a way to inject MCE errors, but errors
like PCI error and devlink error don't, it is not easy to add error
injection to each of them. Instead, it is relatively easier to just
allow users to inject trace events in a generic way so that all trace
events can be injected.
This patch introduces trace event injection, where a new 'inject' is
added to each tracepoint directory. Users could write into this file
with key=value pairs to specify the value of each fields of the trace
event, all unspecified fields are set to zero values by default.
For example, for the net/net_dev_queue tracepoint, we can inject:
INJECT=/sys/kernel/debug/tracing/events/net/net_dev_queue/inject
echo "" > $INJECT
echo "name='test'" > $INJECT
echo "name='test' len=1024" > $INJECT
cat /sys/kernel/debug/tracing/trace
...
<...>-614 [000] .... 36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
<...>-614 [001] .... 136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
<...>-614 [001] .N.. 208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024
Triggers could be triggered as usual too:
echo "stacktrace if len == 1025" > /sys/kernel/debug/tracing/events/net/net_dev_queue/trigger
echo "len=1025" > $INJECT
cat /sys/kernel/debug/tracing/trace
...
bash-614 [000] .... 36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
bash-614 [001] .... 136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
bash-614 [001] .N.. 208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024
bash-614 [001] .N.1 284.236349: <stack trace>
=> event_inject_write
=> vfs_write
=> ksys_write
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
The only thing that can't be injected is string pointers as they
require constant string pointers, this can't be done at run time.
Link: http://lkml.kernel.org/r/20191130045218.18979-1-xiyou.wangcong@gmail.com
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-11-30 07:52:18 +03:00
config TRACE_EVENT_INJECT
bool "Trace event injection"
depends on TRACING
help
Allow user-space to inject a specific trace event into the ring
buffer. This is mainly used for testing purpose.
If unsure, say N.
2014-05-30 06:49:07 +04:00
config TRACEPOINT_BENCHMARK
2019-11-20 16:38:07 +03:00
bool "Add tracepoint that benchmarks tracepoints"
2014-05-30 06:49:07 +04:00
help
This option creates the tracepoint "benchmark:benchmark_event".
When the tracepoint is enabled, it kicks off a kernel thread that
goes into an infinite loop (calling cond_sched() to let other tasks
run), and calls the tracepoint. Each iteration will record the time
it took to write to the tracepoint and the next iteration that
data will be passed to the tracepoint itself. That is, the tracepoint
will report the time it took to do the previous tracepoint.
The string written to the tracepoint is a static string of 128 bytes
to keep the time the same. The initial string is simply a write of
"START". The second string records the cold cache time of the first
write which is not added to the rest of the calculations.
As it is a tight loop, it benchmarks as hot cache. That's fine because
we care most about hot paths that are probably in cache already.
An example of the output:
START
first=3672 [COLD CACHED]
last=632 first=3672 max=632 min=632 avg=316 std=446 std^2=199712
last=278 first=3672 max=632 min=278 avg=303 std=316 std^2=100337
last=277 first=3672 max=632 min=277 avg=296 std=258 std^2=67064
last=273 first=3672 max=632 min=273 avg=292 std=224 std^2=50411
last=273 first=3672 max=632 min=273 avg=288 std=200 std^2=40389
last=281 first=3672 max=632 min=273 avg=287 std=183 std^2=33666
2009-05-06 06:47:18 +04:00
config RING_BUFFER_BENCHMARK
tristate "Ring buffer benchmark stress tester"
depends on RING_BUFFER
help
2009-12-21 23:01:17 +03:00
This option creates a test to stress the ring buffer and benchmark it.
It creates its own ring buffer such that it will not interfere with
2009-05-06 06:47:18 +04:00
any other users of the ring buffer (such as ftrace). It then creates
a producer and consumer that will run for 10 seconds and sleep for
10 seconds. Each interval it will print out the number of events
it recorded and give a rough estimate of how long each iteration took.
It does not disable interrupts or raise its priority, so it may be
affected by processes that are running.
2009-12-21 23:01:17 +03:00
If unsure, say N.
2009-05-06 06:47:18 +04:00
2020-01-30 00:30:30 +03:00
config TRACE_EVAL_MAP_FILE
bool "Show eval mappings for trace events"
depends on TRACING
help
The "print fmt" of the trace events will show the enum/sizeof names
instead of their values. This can cause problems for user space tools
that use this string to parse the raw data as user space does not know
how to convert the string to its value.
To fix this, there's a special macro in the kernel that can be used
to convert an enum/sizeof into its value. If this macro is used, then
the print fmt strings will be converted to their values.
If something does not get converted properly, this option can be
used to show what enums/sizeof the kernel tried to convert.
This option is for debugging the conversions. A file is created
in the tracing directory called "eval_map" that will show the
names matched with their values and what trace event system they
belong too.
Normally, the mapping of the strings to values will be freed after
boot up or module load. With this option, they will not be freed, as
they are needed for the "eval_map" file. Enabling this option will
increase the memory footprint of the running kernel.
If unsure, say N.
2020-11-06 05:32:46 +03:00
config FTRACE_RECORD_RECURSION
bool "Record functions that recurse in function tracing"
depends on FUNCTION_TRACER
help
All callbacks that attach to the function tracing have some sort
of protection against recursion. Even though the protection exists,
it adds overhead. This option will create a file in the tracefs
file system called "recursed_functions" that will list the functions
that triggered a recursion.
This will add more overhead to cases that have recursion.
If unsure, say N
config FTRACE_RECORD_RECURSION_SIZE
int "Max number of recursed functions to record"
default 128
depends on FTRACE_RECORD_RECURSION
help
This defines the limit of number of functions that can be
listed in the "recursed_functions" file, that lists all
the functions that caused a recursion to happen.
This file can be reset, but the limit can not change in
size at runtime.
2020-11-02 22:43:10 +03:00
config RING_BUFFER_RECORD_RECURSION
bool "Record functions that recurse in the ring buffer"
depends on FTRACE_RECORD_RECURSION
# default y, because it is coupled with FTRACE_RECORD_RECURSION
default y
help
The ring buffer has its own internal recursion. Although when
recursion happens it wont cause harm because of the protection,
but it does cause an unwanted overhead. Enabling this option will
place where recursion was detected into the ftrace "recursed_functions"
file.
This will add more overhead to cases that have recursion.
2020-01-30 00:30:30 +03:00
config GCOV_PROFILE_FTRACE
bool "Enable GCOV profiling on ftrace subsystem"
depends on GCOV_KERNEL
help
Enable GCOV profiling on ftrace subsystem for checking
which functions/lines are tested.
If unsure, say N.
Note that on a kernel compiled with this config, ftrace will
run significantly slower.
config FTRACE_SELFTEST
bool
config FTRACE_STARTUP_TEST
bool "Perform a startup test on ftrace"
depends on GENERIC_TRACER
select FTRACE_SELFTEST
help
This option performs a series of startup tests on ftrace. On bootup
a series of tests are made to verify that the tracer is
functioning properly. It will do tests on all the configured
tracers of ftrace.
config EVENT_TRACE_STARTUP_TEST
bool "Run selftest on trace events"
depends on FTRACE_STARTUP_TEST
default y
help
This option performs a test on all trace events in the system.
It basically just enables each event and runs some code that
will trigger events (not necessarily the event it enables)
This may take some time run as there are a lot of events.
config EVENT_TRACE_TEST_SYSCALLS
bool "Run selftest on syscall events"
depends on EVENT_TRACE_STARTUP_TEST
help
This option will also enable testing every syscall event.
It only enables the event and disables it and runs various loads
with the event enabled. This adds a bit more time for kernel boot
up since it runs this on every system call defined.
TBD - enable a way to actually call the syscalls as we test their
events
2013-03-15 19:32:53 +04:00
config RING_BUFFER_STARTUP_TEST
bool "Ring buffer startup self test"
depends on RING_BUFFER
help
2019-11-20 16:38:07 +03:00
Run a simple self test on the ring buffer on boot up. Late in the
2013-03-15 19:32:53 +04:00
kernel boot sequence, the test will start that kicks off
a thread per cpu. Each thread will write various size events
into the ring buffer. Another thread is created to send IPIs
to each of the threads, where the IPI handler will also write
to the ring buffer, to test/stress the nesting ability.
If any anomalies are discovered, a warning will be displayed
and all ring buffers will be disabled.
The test runs for 10 seconds. This will slow your boot time
by at least 10 more seconds.
At the end of the test, statics and more checks are done.
It will output the stats of each per cpu buffer. What
was written, the sizes, what was read, what was lost, and
other similar details.
If unsure, say N
2020-01-30 00:23:04 +03:00
config MMIOTRACE_TEST
tristate "Test module for mmiotrace"
depends on MMIOTRACE && m
help
This is a dumb module for testing mmiotrace. It is very dangerous
as it will write garbage to IO memory starting at a given address.
However, it should be safe to use on e.g. unused portion of VRAM.
Say N, unless you absolutely know what you are doing.
2018-07-13 00:36:11 +03:00
config PREEMPTIRQ_DELAY_TEST
2020-01-30 00:23:04 +03:00
tristate "Test module to create a preempt / IRQ disable delay thread to test latency tracers"
2018-07-13 00:36:11 +03:00
depends on m
help
Select this option to build a test module that can help test latency
tracers by executing a preempt or irq disable section with a user
configurable delay. The module busy waits for the duration of the
critical section.
2019-10-09 01:08:22 +03:00
For example, the following invocation generates a burst of three
irq-disabled critical sections for 500us:
modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3
2018-07-13 00:36:11 +03:00
If unsure, say N
2020-01-29 21:59:28 +03:00
config SYNTH_EVENT_GEN_TEST
tristate "Test module for in-kernel synthetic event generation"
2020-05-28 22:32:37 +03:00
depends on SYNTH_EVENTS
2020-01-29 21:59:28 +03:00
help
This option creates a test module to check the base
functionality of in-kernel synthetic event definition and
generation.
To test, insert the module, and then check the trace buffer
for the generated sample events.
If unsure, say N.
2020-01-29 21:59:31 +03:00
config KPROBE_EVENT_GEN_TEST
tristate "Test module for in-kernel kprobe event generation"
depends on KPROBE_EVENTS
help
This option creates a test module to check the base
functionality of in-kernel kprobe event definition.
To test, insert the module, and then check the trace buffer
for the generated kprobe events.
If unsure, say N.
2020-04-03 22:31:21 +03:00
config HIST_TRIGGERS_DEBUG
bool "Hist trigger debug support"
depends on HIST_TRIGGERS
help
Add "hist_debug" file for each event, which when read will
dump out a bunch of internal details about the hist triggers
defined on that event.
The hist_debug file serves a couple of purposes:
- Helps developers verify that nothing is broken.
- Provides educational information to support the details
of the hist trigger internals as described by
Documentation/trace/histogram-design.rst.
The hist_debug output only covers the data structures
related to the histogram definitions themselves and doesn't
display the internals of map buckets or variable values of
running histograms.
If unsure, say N.
2009-04-20 18:47:36 +04:00
endif # FTRACE
2009-03-05 23:19:55 +03:00
endif # TRACING_SUPPORT