2018-08-16 11:23:53 -04:00
// SPDX-License-Identifier: GPL-2.0
2008-05-12 21:20:42 +02:00
/*
* Infrastructure for profiling code inserted by ' gcc - pg ' .
*
* Copyright ( C ) 2007 - 2008 Steven Rostedt < srostedt @ redhat . com >
* Copyright ( C ) 2004 - 2008 Ingo Molnar < mingo @ redhat . com >
*
* Originally ported from the - rt patch by :
* Copyright ( C ) 2007 Arnaldo Carvalho de Melo < acme @ redhat . com >
*
* Based on code in the latency_tracer , that is :
*
* Copyright ( C ) 2004 - 2006 Ingo Molnar
2012-12-06 10:39:54 +01:00
* Copyright ( C ) 2004 Nadia Yvette Chambers
2008-05-12 21:20:42 +02:00
*/
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
# include <linux/stop_machine.h>
# include <linux/clocksource.h>
2017-02-08 18:51:36 +01:00
# include <linux/sched/task.h>
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
# include <linux/kallsyms.h>
2019-10-11 17:22:50 -04:00
# include <linux/security.h>
2008-05-12 21:20:43 +02:00
# include <linux/seq_file.h>
2015-01-20 12:13:40 -05:00
# include <linux/tracefs.h>
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
# include <linux/hardirq.h>
2008-02-23 16:55:50 +01:00
# include <linux/kthread.h>
2008-05-12 21:20:43 +02:00
# include <linux/uaccess.h>
2011-12-16 19:27:42 -05:00
# include <linux/bsearch.h>
2011-05-26 17:53:52 -04:00
# include <linux/module.h>
2008-02-23 16:55:50 +01:00
# include <linux/ftrace.h>
2008-05-12 21:20:43 +02:00
# include <linux/sysctl.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
# include <linux/slab.h>
2008-05-12 21:20:43 +02:00
# include <linux/ctype.h>
2011-12-16 17:06:45 -05:00
# include <linux/sort.h>
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
# include <linux/list.h>
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
# include <linux/hash.h>
2010-03-05 15:03:25 -08:00
# include <linux/rcupdate.h>
2019-02-24 01:50:20 +09:00
# include <linux/kprobes.h>
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
2009-04-14 19:39:12 -04:00
# include <trace/events/sched.h>
2009-03-24 01:10:15 -04:00
2017-04-03 12:57:35 -04:00
# include <asm/sections.h>
2009-05-28 13:37:24 -04:00
# include <asm/setup.h>
2008-06-21 23:47:27 +05:30
2018-11-15 12:32:38 -05:00
# include "ftrace_internal.h"
2009-03-23 23:12:58 -04:00
# include "trace_output.h"
2009-03-20 12:50:56 -04:00
# include "trace_stat.h"
2008-05-12 21:20:42 +02:00
2023-01-24 09:56:53 -05:00
/* Flags that do not get reset */
2023-05-02 21:32:33 -04:00
# define FTRACE_NOCLEAR_FLAGS (FTRACE_FL_DISABLED | FTRACE_FL_TOUCHED | \
FTRACE_FL_MODIFIED )
2023-01-24 09:56:53 -05:00
ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function
If an unused weak function was traced, it's call to fentry will still
exist, which gets added into the __mcount_loc table. Ftrace will use
kallsyms to retrieve the name for each location in __mcount_loc to display
it in the available_filter_functions and used to enable functions via the
name matching in set_ftrace_filter/notrace. Enabling these functions do
nothing but enable an unused call to ftrace_caller. If a traced weak
function is overridden, the symbol of the function would be used for it,
which will either created duplicate names, or if the previous function was
not traced, it would be incorrectly be listed in available_filter_functions
as a function that can be traced.
This became an issue with BPF[1] as there are tooling that enables the
direct callers via ftrace but then checks to see if the functions were
actually enabled. The case of one function that was marked notrace, but
was followed by an unused weak function that was traced. The unused
function's call to fentry was added to the __mcount_loc section, and
kallsyms retrieved the untraced function's symbol as the weak function was
overridden. Since the untraced function would not get traced, the BPF
check would detect this and fail.
The real fix would be to fix kallsyms to not show addresses of weak
functions as the function before it. But that would require adding code in
the build to add function size to kallsyms so that it can know when the
function ends instead of just using the start of the next known symbol.
In the mean time, this is a work around. Add a FTRACE_MCOUNT_MAX_OFFSET
macro that if defined, ftrace will ignore any function that has its call
to fentry/mcount that has an offset from the symbol that is greater than
FTRACE_MCOUNT_MAX_OFFSET.
If CONFIG_HAVE_FENTRY is defined for x86, define FTRACE_MCOUNT_MAX_OFFSET
to zero (unless IBT is enabled), which will have ftrace ignore all locations
that are not at the start of the function (or one after the ENDBR
instruction).
A worker thread is added at boot up to scan all the ftrace record entries,
and will mark any that fail the FTRACE_MCOUNT_MAX_OFFSET test as disabled.
They will still appear in the available_filter_functions file as:
__ftrace_invalid_address___<invalid-offset>
(showing the offset that caused it to be invalid).
This is required for tools that use libtracefs (like trace-cmd does) that
scan the available_filter_functions and enable set_ftrace_filter and
set_ftrace_notrace using indexes of the function listed in the file (this
is a speedup, as enabling thousands of files via names is an O(n^2)
operation and can take minutes to complete, where the indexing takes less
than a second).
The invalid functions cannot be removed from available_filter_functions as
the names there correspond to the ftrace records in the array that manages
them (and the indexing depends on this).
[1] https://lore.kernel.org/all/20220412094923.0abe90955e5db486b7bca279@kernel.org/
Link: https://lkml.kernel.org/r/20220526141912.794c2786@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 14:19:12 -04:00
# define FTRACE_INVALID_FUNCTION "__ftrace_invalid_address__"
2008-10-23 09:33:03 -04:00
# define FTRACE_WARN_ON(cond) \
2011-04-29 10:36:31 -04:00
( { \
int ___r = cond ; \
if ( WARN_ON ( ___r ) ) \
2008-10-23 09:33:03 -04:00
ftrace_kill ( ) ; \
2011-04-29 10:36:31 -04:00
___r ; \
} )
2008-10-23 09:33:03 -04:00
# define FTRACE_WARN_ON_ONCE(cond) \
2011-04-29 10:36:31 -04:00
( { \
int ___r = cond ; \
if ( WARN_ON_ONCE ( ___r ) ) \
2008-10-23 09:33:03 -04:00
ftrace_kill ( ) ; \
2011-04-29 10:36:31 -04:00
___r ; \
} )
2008-10-23 09:33:03 -04:00
2009-02-16 15:28:00 -05:00
/* hash bits for specific function selection */
2011-05-02 17:34:47 -04:00
# define FTRACE_HASH_DEFAULT_BITS 10
# define FTRACE_HASH_MAX_BITS 12
2009-02-16 15:28:00 -05:00
2013-05-09 14:44:17 +09:00
# ifdef CONFIG_DYNAMIC_FTRACE
2014-08-15 17:23:02 -04:00
# define INIT_OPS_HASH(opsname) \
. func_hash = & opsname . local_hash , \
. local_hash . regex_lock = __MUTEX_INITIALIZER ( opsname . local_hash . regex_lock ) ,
2013-05-09 14:44:17 +09:00
# else
2014-08-15 17:23:02 -04:00
# define INIT_OPS_HASH(opsname)
2013-05-09 14:44:17 +09:00
# endif
2018-12-05 12:48:53 -05:00
enum {
FTRACE_MODIFY_ENABLE_FL = ( 1 < < 0 ) ,
FTRACE_MODIFY_MAY_SLEEP_FL = ( 1 < < 1 ) ,
} ;
2018-11-15 12:32:38 -05:00
struct ftrace_ops ftrace_list_end __read_mostly = {
2011-08-08 16:57:47 -04:00
. func = ftrace_stub ,
2020-11-05 21:32:45 -05:00
. flags = FTRACE_OPS_FL_STUB ,
2014-08-15 17:23:02 -04:00
INIT_OPS_HASH ( ftrace_list_end )
2011-08-08 16:57:47 -04:00
} ;
2008-05-12 21:20:48 +02:00
/* ftrace_enabled is a method to turn ftrace on or off */
int ftrace_enabled __read_mostly ;
2022-04-07 15:46:12 +08:00
static int __maybe_unused last_ftrace_enabled ;
2008-05-12 21:20:43 +02:00
2011-08-08 16:57:47 -04:00
/* Current function tracing op */
struct ftrace_ops * function_trace_op __read_mostly = & ftrace_list_end ;
2013-11-08 14:17:30 -05:00
/* What to set function_trace_op to */
static struct ftrace_ops * set_function_trace_op ;
2008-11-05 16:05:44 -05:00
2016-04-22 18:11:33 -04:00
static bool ftrace_pids_enabled ( struct ftrace_ops * ops )
2015-07-24 10:38:12 -04:00
{
2016-04-22 18:11:33 -04:00
struct trace_array * tr ;
if ( ! ( ops - > flags & FTRACE_OPS_FL_PID ) | | ! ops - > private )
return false ;
tr = ops - > private ;
2020-03-19 23:19:06 -04:00
return tr - > function_pids ! = NULL | | tr - > function_no_pids ! = NULL ;
2015-07-24 10:38:12 -04:00
}
static void ftrace_update_trampoline ( struct ftrace_ops * ops ) ;
2008-05-12 21:20:48 +02:00
/*
* ftrace_disabled is set when an anomaly is discovered .
* ftrace_disabled is much stronger than ftrace_enabled .
*/
static int ftrace_disabled __read_mostly ;
2018-11-15 12:32:38 -05:00
DEFINE_MUTEX ( ftrace_lock ) ;
2008-05-12 21:20:43 +02:00
2018-11-15 12:32:38 -05:00
struct ftrace_ops __rcu * ftrace_ops_list __read_mostly = & ftrace_list_end ;
2008-05-12 21:20:42 +02:00
ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub ;
2018-11-15 12:32:38 -05:00
struct ftrace_ops global_ops ;
2008-05-12 21:20:42 +02:00
2022-05-21 13:11:31 +02:00
/* Defined by vmlinux.lds.h see the comment above arch_ftrace_ops_list_func for details */
2020-06-17 16:56:16 -04:00
void ftrace_ops_list_func ( unsigned long ip , unsigned long parent_ip ,
struct ftrace_ops * op , struct ftrace_regs * fregs ) ;
2011-05-04 09:27:52 -04:00
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
# ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
/*
* Stub used to invoke the list ops without requiring a separate trampoline .
*/
const struct ftrace_ops ftrace_list_ops = {
. func = ftrace_ops_list_func ,
. flags = FTRACE_OPS_FL_STUB ,
} ;
static void ftrace_ops_nop_func ( unsigned long ip , unsigned long parent_ip ,
struct ftrace_ops * op ,
struct ftrace_regs * fregs )
{
/* do nothing */
}
/*
* Stub used when a call site is disabled . May be called transiently by threads
* which have made it into ftrace_caller but haven ' t yet recovered the ops at
* the point the call site is disabled .
*/
const struct ftrace_ops ftrace_nop_ops = {
. func = ftrace_ops_nop_func ,
. flags = FTRACE_OPS_FL_STUB ,
} ;
# endif
2013-05-09 14:44:17 +09:00
static inline void ftrace_ops_init ( struct ftrace_ops * ops )
{
# ifdef CONFIG_DYNAMIC_FTRACE
if ( ! ( ops - > flags & FTRACE_OPS_FL_INITIALIZED ) ) {
2014-08-15 17:23:02 -04:00
mutex_init ( & ops - > local_hash . regex_lock ) ;
ops - > func_hash = & ops - > local_hash ;
2013-05-09 14:44:17 +09:00
ops - > flags | = FTRACE_OPS_FL_INITIALIZED ;
}
# endif
}
2011-08-08 16:57:47 -04:00
static void ftrace_pid_func ( unsigned long ip , unsigned long parent_ip ,
2020-10-28 17:42:17 -04:00
struct ftrace_ops * op , struct ftrace_regs * fregs )
2008-11-26 00:16:23 -05:00
{
2016-04-22 18:11:33 -04:00
struct trace_array * tr = op - > private ;
2020-03-19 23:40:40 -04:00
int pid ;
2016-04-22 18:11:33 -04:00
2020-03-19 23:40:40 -04:00
if ( tr ) {
pid = this_cpu_read ( tr - > array_buffer . data - > ftrace_ignore_pid ) ;
if ( pid = = FTRACE_PID_IGNORE )
return ;
if ( pid ! = FTRACE_PID_TRACE & &
pid ! = current - > pid )
return ;
}
2008-11-26 00:16:23 -05:00
2020-10-28 17:42:17 -04:00
op - > saved_func ( ip , parent_ip , op , fregs ) ;
2008-11-26 00:16:23 -05:00
}
2013-11-08 14:17:30 -05:00
static void ftrace_sync_ipi ( void * data )
{
/* Probably not needed, but do it anyway */
smp_rmb ( ) ;
}
2015-02-19 15:56:14 +01:00
static ftrace_func_t ftrace_ops_get_list_func ( struct ftrace_ops * ops )
{
/*
2022-10-25 15:39:23 +00:00
* If this is a dynamic or RCU ops , or we force list func ,
2015-02-19 15:56:14 +01:00
* then it needs to call the list anyway .
*/
2017-10-11 09:45:32 +02:00
if ( ops - > flags & ( FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_RCU ) | |
FTRACE_FORCE_LIST_FUNC )
2015-02-19 15:56:14 +01:00
return ftrace_ops_list_func ;
return ftrace_ops_get_func ( ops ) ;
}
2011-05-03 22:49:52 -04:00
static void update_ftrace_function ( void )
{
ftrace_func_t func ;
2014-09-10 10:42:46 -04:00
/*
* Prepare the ftrace_ops that the arch callback will use .
* If there ' s only one ftrace_ops registered , the ftrace_ops_list
* will point to the ops we want .
*/
2017-06-07 16:12:51 +08:00
set_function_trace_op = rcu_dereference_protected ( ftrace_ops_list ,
lockdep_is_held ( & ftrace_lock ) ) ;
2014-09-10 10:42:46 -04:00
/* If there's no ftrace_ops registered, just call the stub function */
2017-06-07 16:12:51 +08:00
if ( set_function_trace_op = = & ftrace_list_end ) {
2014-09-10 10:42:46 -04:00
func = ftrace_stub ;
2011-05-05 21:14:55 -04:00
/*
* If we are at the end of the list and this ops is
2012-07-20 11:04:44 -04:00
* recursion safe and not dynamic and the arch supports passing ops ,
* then have the mcount trampoline call the function directly .
2011-05-05 21:14:55 -04:00
*/
2017-06-07 16:12:51 +08:00
} else if ( rcu_dereference_protected ( ftrace_ops_list - > next ,
lockdep_is_held ( & ftrace_lock ) ) = = & ftrace_list_end ) {
2015-02-19 15:56:14 +01:00
func = ftrace_ops_get_list_func ( ftrace_ops_list ) ;
2014-09-10 10:42:46 -04:00
2011-08-08 16:57:47 -04:00
} else {
/* Just use the default ftrace_ops */
2013-11-08 14:17:30 -05:00
set_function_trace_op = & ftrace_list_end ;
2011-05-04 09:27:52 -04:00
func = ftrace_ops_list_func ;
2011-08-08 16:57:47 -04:00
}
2011-05-03 22:49:52 -04:00
2014-07-15 11:05:12 -04:00
update_function_graph_func ( ) ;
2013-11-08 14:17:30 -05:00
/* If there's no change, then do nothing more here */
if ( ftrace_trace_function = = func )
return ;
/*
* If we are using the list function , it doesn ' t care
* about the function_trace_ops .
*/
if ( func = = ftrace_ops_list_func ) {
ftrace_trace_function = func ;
/*
* Don ' t even bother setting function_trace_ops ,
* it would be racy to do so anyway .
*/
return ;
}
# ifndef CONFIG_DYNAMIC_FTRACE
/*
* For static tracing , we need to be a bit more careful .
* The function change takes affect immediately . Thus ,
2020-10-02 22:31:26 +08:00
* we need to coordinate the setting of the function_trace_ops
2013-11-08 14:17:30 -05:00
* with the setting of the ftrace_trace_function .
*
* Set the function to the list ops , which will call the
* function we want , albeit indirectly , but it handles the
* ftrace_ops and doesn ' t depend on function_trace_op .
*/
ftrace_trace_function = ftrace_ops_list_func ;
/*
* Make sure all CPUs see this . Yes this is slow , but static
* tracing is slow and nasty to have enabled .
*/
2020-04-03 12:10:28 -07:00
synchronize_rcu_tasks_rude ( ) ;
2013-11-08 14:17:30 -05:00
/* Now all cpus are using the list ops. */
function_trace_op = set_function_trace_op ;
/* Make sure the function_trace_op is visible on all CPUs */
smp_wmb ( ) ;
/* Nasty way to force a rmb on all cpus */
smp_call_function ( ftrace_sync_ipi , NULL , 1 ) ;
/* OK, we are all set to update the ftrace_trace_function now! */
# endif /* !CONFIG_DYNAMIC_FTRACE */
2011-04-27 21:43:36 -04:00
ftrace_trace_function = func ;
}
2017-06-07 16:12:51 +08:00
static void add_ftrace_ops ( struct ftrace_ops __rcu * * list ,
struct ftrace_ops * ops )
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
{
2017-06-07 16:12:51 +08:00
rcu_assign_pointer ( ops - > next , * list ) ;
2008-05-12 21:20:42 +02:00
/*
2011-05-04 09:27:52 -04:00
* We are entering ops into the list but another
2008-05-12 21:20:42 +02:00
* CPU might be walking that list . We need to make sure
* the ops - > next pointer is valid before another CPU sees
2011-05-04 09:27:52 -04:00
* the ops pointer included into the list .
2008-05-12 21:20:42 +02:00
*/
2011-05-03 22:49:52 -04:00
rcu_assign_pointer ( * list , ops ) ;
2008-05-12 21:20:42 +02:00
}
2017-06-07 16:12:51 +08:00
static int remove_ftrace_ops ( struct ftrace_ops __rcu * * list ,
struct ftrace_ops * ops )
2008-05-12 21:20:42 +02:00
{
struct ftrace_ops * * p ;
/*
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
* If we are removing the last function , then simply point
* to the ftrace_stub .
2008-05-12 21:20:42 +02:00
*/
2017-06-07 16:12:51 +08:00
if ( rcu_dereference_protected ( * list ,
lockdep_is_held ( & ftrace_lock ) ) = = ops & &
rcu_dereference_protected ( ops - > next ,
lockdep_is_held ( & ftrace_lock ) ) = = & ftrace_list_end ) {
2011-05-03 22:49:52 -04:00
* list = & ftrace_list_end ;
2009-02-14 01:42:44 -05:00
return 0 ;
2008-05-12 21:20:42 +02:00
}
2011-05-03 22:49:52 -04:00
for ( p = list ; * p ! = & ftrace_list_end ; p = & ( * p ) - > next )
2008-05-12 21:20:42 +02:00
if ( * p = = ops )
break ;
2009-02-14 01:42:44 -05:00
if ( * p ! = ops )
return - 1 ;
2008-05-12 21:20:42 +02:00
* p = ( * p ) - > next ;
2011-05-03 22:49:52 -04:00
return 0 ;
}
2008-05-12 21:20:42 +02:00
ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.
For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!
Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.
For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-02 23:23:31 -04:00
static void ftrace_update_trampoline ( struct ftrace_ops * ops ) ;
2018-11-15 12:32:38 -05:00
int __register_ftrace_function ( struct ftrace_ops * ops )
2011-05-03 22:49:52 -04:00
{
2014-01-10 16:17:45 -05:00
if ( ops - > flags & FTRACE_OPS_FL_DELETED )
return - EINVAL ;
2011-05-04 09:27:52 -04:00
if ( WARN_ON ( ops - > flags & FTRACE_OPS_FL_ENABLED ) )
return - EBUSY ;
2012-09-28 17:15:17 +09:00
# ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
2012-04-30 16:20:23 -04:00
/*
* If the ftrace_ops specifies SAVE_REGS , then it only can be used
* if the arch supports it , or SAVE_REGS_IF_SUPPORTED is also set .
* Setting SAVE_REGS_IF_SUPPORTED makes SAVE_REGS irrelevant .
*/
if ( ops - > flags & FTRACE_OPS_FL_SAVE_REGS & &
! ( ops - > flags & FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED ) )
return - EINVAL ;
if ( ops - > flags & FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED )
ops - > flags | = FTRACE_OPS_FL_SAVE_REGS ;
# endif
2019-10-16 13:33:13 +02:00
if ( ! ftrace_enabled & & ( ops - > flags & FTRACE_OPS_FL_PERMANENT ) )
return - EBUSY ;
2012-04-30 16:20:23 -04:00
2021-11-08 18:33:51 -08:00
if ( ! is_kernel_core_data ( ( unsigned long ) ops ) )
2011-05-05 21:14:55 -04:00
ops - > flags | = FTRACE_OPS_FL_DYNAMIC ;
2015-11-30 17:23:39 -05:00
add_ftrace_ops ( & ftrace_ops_list , ops ) ;
2011-05-04 09:27:52 -04:00
2015-07-24 10:38:12 -04:00
/* Always save the function, and reset at unregistering */
ops - > saved_func = ops - > func ;
2016-04-22 18:11:33 -04:00
if ( ftrace_pids_enabled ( ops ) )
2015-07-24 10:38:12 -04:00
ops - > func = ftrace_pid_func ;
ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.
For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!
Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.
For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-02 23:23:31 -04:00
ftrace_update_trampoline ( ops ) ;
2011-05-03 22:49:52 -04:00
if ( ftrace_enabled )
update_ftrace_function ( ) ;
return 0 ;
}
2018-11-15 12:32:38 -05:00
int __unregister_ftrace_function ( struct ftrace_ops * ops )
2011-05-03 22:49:52 -04:00
{
int ret ;
2011-05-04 09:27:52 -04:00
if ( WARN_ON ( ! ( ops - > flags & FTRACE_OPS_FL_ENABLED ) ) )
return - EBUSY ;
2015-11-30 17:23:39 -05:00
ret = remove_ftrace_ops ( & ftrace_ops_list , ops ) ;
2011-05-04 09:27:52 -04:00
2011-05-03 22:49:52 -04:00
if ( ret < 0 )
return ret ;
2011-05-04 09:27:52 -04:00
2011-04-27 21:43:36 -04:00
if ( ftrace_enabled )
update_ftrace_function ( ) ;
2008-05-12 21:20:42 +02:00
2015-07-24 10:38:12 -04:00
ops - > func = ops - > saved_func ;
2009-02-14 01:42:44 -05:00
return 0 ;
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
}
2008-11-26 00:16:23 -05:00
static void ftrace_update_pid_func ( void )
{
2015-07-24 10:38:12 -04:00
struct ftrace_ops * op ;
2011-04-27 21:43:36 -04:00
/* Only do something if we are tracing something */
2008-11-26 00:16:23 -05:00
if ( ftrace_trace_function = = ftrace_stub )
2009-03-06 15:29:04 +09:00
return ;
2008-11-26 00:16:23 -05:00
2015-07-24 10:38:12 -04:00
do_for_each_ftrace_op ( op , ftrace_ops_list ) {
if ( op - > flags & FTRACE_OPS_FL_PID ) {
2016-04-22 18:11:33 -04:00
op - > func = ftrace_pids_enabled ( op ) ?
ftrace_pid_func : op - > saved_func ;
2015-07-24 10:38:12 -04:00
ftrace_update_trampoline ( op ) ;
}
} while_for_each_ftrace_op ( op ) ;
2011-04-27 21:43:36 -04:00
update_ftrace_function ( ) ;
2008-11-26 00:16:23 -05:00
}
2009-03-23 17:12:36 -04:00
# ifdef CONFIG_FUNCTION_PROFILER
struct ftrace_profile {
struct hlist_node node ;
unsigned long ip ;
unsigned long counter ;
2009-03-23 23:12:58 -04:00
# ifdef CONFIG_FUNCTION_GRAPH_TRACER
unsigned long long time ;
2010-04-26 14:02:05 -04:00
unsigned long long time_squared ;
2009-03-23 23:12:58 -04:00
# endif
2009-02-16 15:28:00 -05:00
} ;
2009-03-23 17:12:36 -04:00
struct ftrace_profile_page {
struct ftrace_profile_page * next ;
unsigned long index ;
struct ftrace_profile records [ ] ;
2008-05-12 21:20:43 +02:00
} ;
2009-03-24 20:50:39 -04:00
struct ftrace_profile_stat {
atomic_t disabled ;
struct hlist_head * hash ;
struct ftrace_profile_page * pages ;
struct ftrace_profile_page * start ;
struct tracer_stat stat ;
} ;
2009-03-23 17:12:36 -04:00
# define PROFILE_RECORDS_SIZE \
( PAGE_SIZE - offsetof ( struct ftrace_profile_page , records ) )
2008-05-12 21:20:43 +02:00
2009-03-23 17:12:36 -04:00
# define PROFILES_PER_PAGE \
( PROFILE_RECORDS_SIZE / sizeof ( struct ftrace_profile ) )
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
2009-03-25 13:26:41 -04:00
static int ftrace_profile_enabled __read_mostly ;
/* ftrace_profile_lock - synchronize the enable and disable of the profiler */
2009-03-20 12:50:56 -04:00
static DEFINE_MUTEX ( ftrace_profile_lock ) ;
2009-03-24 20:50:39 -04:00
static DEFINE_PER_CPU ( struct ftrace_profile_stat , ftrace_profile_stats ) ;
2009-03-23 17:12:36 -04:00
2013-04-10 08:55:50 +09:00
# define FTRACE_PROFILE_HASH_BITS 10
# define FTRACE_PROFILE_HASH_SIZE (1 << FTRACE_PROFILE_HASH_BITS)
2009-03-23 17:12:36 -04:00
2009-03-20 12:50:56 -04:00
static void *
function_stat_next ( void * v , int idx )
{
2009-03-23 17:12:36 -04:00
struct ftrace_profile * rec = v ;
struct ftrace_profile_page * pg ;
2009-03-20 12:50:56 -04:00
2009-03-23 17:12:36 -04:00
pg = ( struct ftrace_profile_page * ) ( ( unsigned long ) rec & PAGE_MASK ) ;
2009-03-20 12:50:56 -04:00
again :
2009-06-26 11:15:37 +08:00
if ( idx ! = 0 )
rec + + ;
2009-03-20 12:50:56 -04:00
if ( ( void * ) rec > = ( void * ) & pg - > records [ pg - > index ] ) {
pg = pg - > next ;
if ( ! pg )
return NULL ;
rec = & pg - > records [ 0 ] ;
2009-03-23 17:12:36 -04:00
if ( ! rec - > counter )
goto again ;
2009-03-20 12:50:56 -04:00
}
return rec ;
}
static void * function_stat_start ( struct tracer_stat * trace )
{
2009-03-24 20:50:39 -04:00
struct ftrace_profile_stat * stat =
container_of ( trace , struct ftrace_profile_stat , stat ) ;
if ( ! stat | | ! stat - > start )
return NULL ;
return function_stat_next ( & stat - > start - > records [ 0 ] , 0 ) ;
2009-03-20 12:50:56 -04:00
}
2009-03-23 23:12:58 -04:00
# ifdef CONFIG_FUNCTION_GRAPH_TRACER
/* function graph compares on total time */
2019-10-07 16:56:56 +03:00
static int function_stat_cmp ( const void * p1 , const void * p2 )
2009-03-23 23:12:58 -04:00
{
2019-10-07 16:56:56 +03:00
const struct ftrace_profile * a = p1 ;
const struct ftrace_profile * b = p2 ;
2009-03-23 23:12:58 -04:00
if ( a - > time < b - > time )
return - 1 ;
if ( a - > time > b - > time )
return 1 ;
else
return 0 ;
}
# else
/* not function graph compares against hits */
2019-10-07 16:56:56 +03:00
static int function_stat_cmp ( const void * p1 , const void * p2 )
2009-03-20 12:50:56 -04:00
{
2019-10-07 16:56:56 +03:00
const struct ftrace_profile * a = p1 ;
const struct ftrace_profile * b = p2 ;
2009-03-20 12:50:56 -04:00
if ( a - > counter < b - > counter )
return - 1 ;
if ( a - > counter > b - > counter )
return 1 ;
else
return 0 ;
}
2009-03-23 23:12:58 -04:00
# endif
2009-03-20 12:50:56 -04:00
static int function_stat_headers ( struct seq_file * m )
{
2009-03-23 23:12:58 -04:00
# ifdef CONFIG_FUNCTION_GRAPH_TRACER
2014-11-08 21:42:10 +01:00
seq_puts ( m , " Function "
" Hit Time Avg s^2 \n "
" -------- "
" --- ---- --- --- \n " ) ;
2009-03-23 23:12:58 -04:00
# else
2014-11-08 21:42:10 +01:00
seq_puts ( m , " Function Hit \n "
" -------- --- \n " ) ;
2009-03-23 23:12:58 -04:00
# endif
2009-03-20 12:50:56 -04:00
return 0 ;
}
static int function_stat_show ( struct seq_file * m , void * v )
{
2009-03-23 17:12:36 -04:00
struct ftrace_profile * rec = v ;
2009-03-20 12:50:56 -04:00
char str [ KSYM_SYMBOL_LEN ] ;
2010-08-23 16:50:12 +08:00
int ret = 0 ;
2009-03-23 23:12:58 -04:00
# ifdef CONFIG_FUNCTION_GRAPH_TRACER
2009-03-25 21:00:47 -04:00
static struct trace_seq s ;
unsigned long long avg ;
2010-04-26 14:02:05 -04:00
unsigned long long stddev ;
2009-03-23 23:12:58 -04:00
# endif
2010-08-23 16:50:12 +08:00
mutex_lock ( & ftrace_profile_lock ) ;
/* we raced with function_profile_reset() */
if ( unlikely ( rec - > counter = = 0 ) ) {
ret = - EBUSY ;
goto out ;
}
2009-03-20 12:50:56 -04:00
2015-06-22 16:58:08 +05:30
# ifdef CONFIG_FUNCTION_GRAPH_TRACER
2020-01-03 11:02:48 +08:00
avg = div64_ul ( rec - > time , rec - > counter ) ;
2015-06-22 16:58:08 +05:30
if ( tracing_thresh & & ( avg < tracing_thresh ) )
goto out ;
# endif
2009-03-20 12:50:56 -04:00
kallsyms_lookup ( rec - > ip , NULL , NULL , NULL , str ) ;
2009-03-23 23:12:58 -04:00
seq_printf ( m , " %-30.30s %10lu " , str , rec - > counter ) ;
# ifdef CONFIG_FUNCTION_GRAPH_TRACER
2014-11-08 21:42:10 +01:00
seq_puts ( m , " " ) ;
2009-03-25 21:00:47 -04:00
2010-04-26 14:02:05 -04:00
/* Sample standard deviation (s^2) */
if ( rec - > counter < = 1 )
stddev = 0 ;
else {
2013-06-12 12:03:18 +02:00
/*
* Apply Welford ' s method :
* s ^ 2 = 1 / ( n * ( n - 1 ) ) * ( n * \ Sum ( x_i ) ^ 2 - ( \ Sum x_i ) ^ 2 )
*/
stddev = rec - > counter * rec - > time_squared -
rec - > time * rec - > time ;
2010-04-26 14:02:05 -04:00
/*
* Divide only 1000 for ns ^ 2 - > us ^ 2 conversion .
* trace_print_graph_duration will divide 1000 again .
*/
2020-01-03 11:02:48 +08:00
stddev = div64_ul ( stddev ,
rec - > counter * ( rec - > counter - 1 ) * 1000 ) ;
2010-04-26 14:02:05 -04:00
}
2009-03-25 21:00:47 -04:00
trace_seq_init ( & s ) ;
trace_print_graph_duration ( rec - > time , & s ) ;
trace_seq_puts ( & s , " " ) ;
trace_print_graph_duration ( avg , & s ) ;
2010-04-26 14:02:05 -04:00
trace_seq_puts ( & s , " " ) ;
trace_print_graph_duration ( stddev , & s ) ;
2009-03-23 23:12:58 -04:00
trace_print_seq ( m , & s ) ;
# endif
seq_putc ( m , ' \n ' ) ;
2010-08-23 16:50:12 +08:00
out :
mutex_unlock ( & ftrace_profile_lock ) ;
2009-03-20 12:50:56 -04:00
2010-08-23 16:50:12 +08:00
return ret ;
2009-03-20 12:50:56 -04:00
}
2009-03-24 20:50:39 -04:00
static void ftrace_profile_reset ( struct ftrace_profile_stat * stat )
2009-03-20 12:50:56 -04:00
{
2009-03-23 17:12:36 -04:00
struct ftrace_profile_page * pg ;
2009-03-20 12:50:56 -04:00
2009-03-24 20:50:39 -04:00
pg = stat - > pages = stat - > start ;
2009-03-20 12:50:56 -04:00
2009-03-23 17:12:36 -04:00
while ( pg ) {
memset ( pg - > records , 0 , PROFILE_RECORDS_SIZE ) ;
pg - > index = 0 ;
pg = pg - > next ;
2009-03-20 12:50:56 -04:00
}
2009-03-24 20:50:39 -04:00
memset ( stat - > hash , 0 ,
2009-03-23 17:12:36 -04:00
FTRACE_PROFILE_HASH_SIZE * sizeof ( struct hlist_head ) ) ;
}
2009-03-20 12:50:56 -04:00
2021-10-19 18:48:54 +08:00
static int ftrace_profile_pages_init ( struct ftrace_profile_stat * stat )
2009-03-23 17:12:36 -04:00
{
struct ftrace_profile_page * pg ;
2009-03-25 20:06:34 -04:00
int functions ;
int pages ;
2009-03-23 17:12:36 -04:00
int i ;
2009-03-20 12:50:56 -04:00
2009-03-23 17:12:36 -04:00
/* If we already allocated, do nothing */
2009-03-24 20:50:39 -04:00
if ( stat - > pages )
2009-03-23 17:12:36 -04:00
return 0 ;
2009-03-20 12:50:56 -04:00
2009-03-24 20:50:39 -04:00
stat - > pages = ( void * ) get_zeroed_page ( GFP_KERNEL ) ;
if ( ! stat - > pages )
2009-03-23 17:12:36 -04:00
return - ENOMEM ;
2009-03-20 12:50:56 -04:00
2009-03-25 20:06:34 -04:00
# ifdef CONFIG_DYNAMIC_FTRACE
functions = ftrace_update_tot_cnt ;
# else
/*
* We do not know the number of functions that exist because
* dynamic tracing is what counts them . With past experience
* we have around 20 K functions . That should be more than enough .
* It is highly unlikely we will execute every function in
* the kernel .
*/
functions = 20000 ;
# endif
2009-03-24 20:50:39 -04:00
pg = stat - > start = stat - > pages ;
2009-03-20 12:50:56 -04:00
2009-03-25 20:06:34 -04:00
pages = DIV_ROUND_UP ( functions , PROFILES_PER_PAGE ) ;
2013-04-01 21:46:24 +09:00
for ( i = 1 ; i < pages ; i + + ) {
2009-03-23 17:12:36 -04:00
pg - > next = ( void * ) get_zeroed_page ( GFP_KERNEL ) ;
if ( ! pg - > next )
2009-03-25 20:06:34 -04:00
goto out_free ;
2009-03-23 17:12:36 -04:00
pg = pg - > next ;
}
return 0 ;
2009-03-25 20:06:34 -04:00
out_free :
pg = stat - > start ;
while ( pg ) {
unsigned long tmp = ( unsigned long ) pg ;
pg = pg - > next ;
free_page ( tmp ) ;
}
stat - > pages = NULL ;
stat - > start = NULL ;
return - ENOMEM ;
2009-03-20 12:50:56 -04:00
}
2009-03-24 20:50:39 -04:00
static int ftrace_profile_init_cpu ( int cpu )
2009-03-20 12:50:56 -04:00
{
2009-03-24 20:50:39 -04:00
struct ftrace_profile_stat * stat ;
2009-03-23 17:12:36 -04:00
int size ;
2009-03-20 12:50:56 -04:00
2009-03-24 20:50:39 -04:00
stat = & per_cpu ( ftrace_profile_stats , cpu ) ;
if ( stat - > hash ) {
2009-03-23 17:12:36 -04:00
/* If the profile is already created, simply reset it */
2009-03-24 20:50:39 -04:00
ftrace_profile_reset ( stat ) ;
2009-03-23 17:12:36 -04:00
return 0 ;
}
2009-03-20 12:50:56 -04:00
2009-03-23 17:12:36 -04:00
/*
* We are profiling all functions , but usually only a few thousand
* functions are hit . We ' ll make a hash of 1024 items .
*/
size = FTRACE_PROFILE_HASH_SIZE ;
2009-03-20 12:50:56 -04:00
treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:
kzalloc(a * b, gfp)
with:
kcalloc(a * b, gfp)
as well as handling cases of:
kzalloc(a * b * c, gfp)
with:
kzalloc(array3_size(a, b, c), gfp)
as it's slightly less ugly than:
kzalloc_array(array_size(a, b), c, gfp)
This does, however, attempt to ignore constant size factors like:
kzalloc(4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@
(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 14:03:40 -07:00
stat - > hash = kcalloc ( size , sizeof ( struct hlist_head ) , GFP_KERNEL ) ;
2009-03-23 17:12:36 -04:00
2009-03-24 20:50:39 -04:00
if ( ! stat - > hash )
2009-03-23 17:12:36 -04:00
return - ENOMEM ;
2009-03-25 20:06:34 -04:00
/* Preallocate the function profiling pages */
2009-03-24 20:50:39 -04:00
if ( ftrace_profile_pages_init ( stat ) < 0 ) {
kfree ( stat - > hash ) ;
stat - > hash = NULL ;
2009-03-23 17:12:36 -04:00
return - ENOMEM ;
}
return 0 ;
2009-03-20 12:50:56 -04:00
}
2009-03-24 20:50:39 -04:00
static int ftrace_profile_init ( void )
{
int cpu ;
int ret = 0 ;
2013-12-16 15:20:01 +08:00
for_each_possible_cpu ( cpu ) {
2009-03-24 20:50:39 -04:00
ret = ftrace_profile_init_cpu ( cpu ) ;
if ( ret )
break ;
}
return ret ;
}
2009-03-23 17:12:36 -04:00
/* interrupts must be disabled */
2009-03-24 20:50:39 -04:00
static struct ftrace_profile *
ftrace_find_profiled_func ( struct ftrace_profile_stat * stat , unsigned long ip )
2009-03-20 12:50:56 -04:00
{
2009-03-23 17:12:36 -04:00
struct ftrace_profile * rec ;
2009-03-20 12:50:56 -04:00
struct hlist_head * hhd ;
unsigned long key ;
2013-04-10 08:55:50 +09:00
key = hash_long ( ip , FTRACE_PROFILE_HASH_BITS ) ;
2009-03-24 20:50:39 -04:00
hhd = & stat - > hash [ key ] ;
2009-03-20 12:50:56 -04:00
if ( hlist_empty ( hhd ) )
return NULL ;
2013-05-28 14:38:43 -04:00
hlist_for_each_entry_rcu_notrace ( rec , hhd , node ) {
2009-03-20 12:50:56 -04:00
if ( rec - > ip = = ip )
2009-03-23 17:12:36 -04:00
return rec ;
}
return NULL ;
}
2009-03-24 20:50:39 -04:00
static void ftrace_add_profile ( struct ftrace_profile_stat * stat ,
struct ftrace_profile * rec )
2009-03-23 17:12:36 -04:00
{
unsigned long key ;
2013-04-10 08:55:50 +09:00
key = hash_long ( rec - > ip , FTRACE_PROFILE_HASH_BITS ) ;
2009-03-24 20:50:39 -04:00
hlist_add_head_rcu ( & rec - > node , & stat - > hash [ key ] ) ;
2009-03-23 17:12:36 -04:00
}
2009-03-25 20:06:34 -04:00
/*
* The memory is already allocated , this simply finds a new record to use .
*/
2009-03-23 17:12:36 -04:00
static struct ftrace_profile *
2009-03-25 20:06:34 -04:00
ftrace_profile_alloc ( struct ftrace_profile_stat * stat , unsigned long ip )
2009-03-23 17:12:36 -04:00
{
struct ftrace_profile * rec = NULL ;
2009-03-25 20:06:34 -04:00
/* prevent recursion (from NMIs) */
2009-03-24 20:50:39 -04:00
if ( atomic_inc_return ( & stat - > disabled ) ! = 1 )
2009-03-23 17:12:36 -04:00
goto out ;
/*
2009-03-25 20:06:34 -04:00
* Try to find the function again since an NMI
* could have added it
2009-03-23 17:12:36 -04:00
*/
2009-03-24 20:50:39 -04:00
rec = ftrace_find_profiled_func ( stat , ip ) ;
2009-03-23 17:12:36 -04:00
if ( rec )
2009-03-24 20:50:39 -04:00
goto out ;
2009-03-23 17:12:36 -04:00
2009-03-24 20:50:39 -04:00
if ( stat - > pages - > index = = PROFILES_PER_PAGE ) {
if ( ! stat - > pages - > next )
goto out ;
stat - > pages = stat - > pages - > next ;
2009-03-20 12:50:56 -04:00
}
2009-03-23 17:12:36 -04:00
2009-03-24 20:50:39 -04:00
rec = & stat - > pages - > records [ stat - > pages - > index + + ] ;
2009-03-23 17:12:36 -04:00
rec - > ip = ip ;
2009-03-24 20:50:39 -04:00
ftrace_add_profile ( stat , rec ) ;
2009-03-23 17:12:36 -04:00
2009-03-20 12:50:56 -04:00
out :
2009-03-24 20:50:39 -04:00
atomic_dec ( & stat - > disabled ) ;
2009-03-20 12:50:56 -04:00
return rec ;
}
static void
2011-08-08 16:57:47 -04:00
function_profile_call ( unsigned long ip , unsigned long parent_ip ,
2020-10-28 17:42:17 -04:00
struct ftrace_ops * ops , struct ftrace_regs * fregs )
2009-03-20 12:50:56 -04:00
{
2009-03-24 20:50:39 -04:00
struct ftrace_profile_stat * stat ;
2009-03-23 17:12:36 -04:00
struct ftrace_profile * rec ;
2009-03-20 12:50:56 -04:00
unsigned long flags ;
if ( ! ftrace_profile_enabled )
return ;
local_irq_save ( flags ) ;
2009-03-24 20:50:39 -04:00
2014-04-29 14:17:40 -05:00
stat = this_cpu_ptr ( & ftrace_profile_stats ) ;
2009-06-01 21:51:28 -04:00
if ( ! stat - > hash | | ! ftrace_profile_enabled )
2009-03-24 20:50:39 -04:00
goto out ;
rec = ftrace_find_profiled_func ( stat , ip ) ;
2009-03-23 17:12:36 -04:00
if ( ! rec ) {
2009-03-25 20:06:34 -04:00
rec = ftrace_profile_alloc ( stat , ip ) ;
2009-03-23 17:12:36 -04:00
if ( ! rec )
goto out ;
}
2009-03-20 12:50:56 -04:00
rec - > counter + + ;
out :
local_irq_restore ( flags ) ;
}
2009-03-23 23:12:58 -04:00
# ifdef CONFIG_FUNCTION_GRAPH_TRACER
2018-11-15 12:35:13 -05:00
static bool fgraph_graph_time = true ;
void ftrace_graph_graph_time_control ( bool enable )
{
fgraph_graph_time = enable ;
}
2009-03-23 23:12:58 -04:00
static int profile_graph_entry ( struct ftrace_graph_ent * trace )
{
2018-11-19 20:54:08 -05:00
struct ftrace_ret_stack * ret_stack ;
2016-08-31 11:55:29 +09:00
2011-08-09 12:50:46 -04:00
function_profile_call ( trace - > func , 0 , NULL , NULL ) ;
2016-08-31 11:55:29 +09:00
2017-08-17 16:37:25 -04:00
/* If function graph is shutting down, ret_stack can be NULL */
if ( ! current - > ret_stack )
return 0 ;
2018-11-19 20:54:08 -05:00
ret_stack = ftrace_graph_get_ret_stack ( current , 0 ) ;
if ( ret_stack )
ret_stack - > subtime = 0 ;
2016-08-31 11:55:29 +09:00
2009-03-23 23:12:58 -04:00
return 1 ;
}
static void profile_graph_return ( struct ftrace_graph_ret * trace )
{
2018-11-19 20:54:08 -05:00
struct ftrace_ret_stack * ret_stack ;
2009-03-24 20:50:39 -04:00
struct ftrace_profile_stat * stat ;
2009-03-24 23:17:58 -04:00
unsigned long long calltime ;
2009-03-23 23:12:58 -04:00
struct ftrace_profile * rec ;
2009-03-24 20:50:39 -04:00
unsigned long flags ;
2009-03-23 23:12:58 -04:00
local_irq_save ( flags ) ;
2014-04-29 14:17:40 -05:00
stat = this_cpu_ptr ( & ftrace_profile_stats ) ;
2009-06-01 21:51:28 -04:00
if ( ! stat - > hash | | ! ftrace_profile_enabled )
2009-03-24 20:50:39 -04:00
goto out ;
2010-04-27 21:04:24 -04:00
/* If the calltime was zero'd ignore it */
if ( ! trace - > calltime )
goto out ;
2009-03-24 23:17:58 -04:00
calltime = trace - > rettime - trace - > calltime ;
2015-09-29 19:06:50 -04:00
if ( ! fgraph_graph_time ) {
2009-03-24 23:17:58 -04:00
/* Append this call time to the parent time to subtract */
2018-11-19 20:54:08 -05:00
ret_stack = ftrace_graph_get_ret_stack ( current , 1 ) ;
if ( ret_stack )
ret_stack - > subtime + = calltime ;
2009-03-24 23:17:58 -04:00
2018-11-19 20:54:08 -05:00
ret_stack = ftrace_graph_get_ret_stack ( current , 0 ) ;
if ( ret_stack & & ret_stack - > subtime < calltime )
calltime - = ret_stack - > subtime ;
2009-03-24 23:17:58 -04:00
else
calltime = 0 ;
}
2009-03-24 20:50:39 -04:00
rec = ftrace_find_profiled_func ( stat , trace - > func ) ;
2010-04-26 14:02:05 -04:00
if ( rec ) {
2009-03-24 23:17:58 -04:00
rec - > time + = calltime ;
2010-04-26 14:02:05 -04:00
rec - > time_squared + = calltime * calltime ;
}
2009-03-24 23:17:58 -04:00
2009-03-24 20:50:39 -04:00
out :
2009-03-23 23:12:58 -04:00
local_irq_restore ( flags ) ;
}
2018-11-15 14:06:47 -05:00
static struct fgraph_ops fprofiler_ops = {
. entryfunc = & profile_graph_entry ,
. retfunc = & profile_graph_return ,
} ;
2009-03-23 23:12:58 -04:00
static int register_ftrace_profiler ( void )
{
2018-11-15 14:06:47 -05:00
return register_ftrace_graph ( & fprofiler_ops ) ;
2009-03-23 23:12:58 -04:00
}
static void unregister_ftrace_profiler ( void )
{
2018-11-15 14:06:47 -05:00
unregister_ftrace_graph ( & fprofiler_ops ) ;
2009-03-23 23:12:58 -04:00
}
# else
2011-05-31 20:51:55 +01:00
static struct ftrace_ops ftrace_profile_ops __read_mostly = {
2009-03-25 13:26:41 -04:00
. func = function_profile_call ,
2020-11-05 21:32:45 -05:00
. flags = FTRACE_OPS_FL_INITIALIZED ,
2014-08-15 17:23:02 -04:00
INIT_OPS_HASH ( ftrace_profile_ops )
2009-03-20 12:50:56 -04:00
} ;
2009-03-23 23:12:58 -04:00
static int register_ftrace_profiler ( void )
{
return register_ftrace_function ( & ftrace_profile_ops ) ;
}
static void unregister_ftrace_profiler ( void )
{
unregister_ftrace_function ( & ftrace_profile_ops ) ;
}
# endif /* CONFIG_FUNCTION_GRAPH_TRACER */
2009-03-20 12:50:56 -04:00
static ssize_t
ftrace_profile_write ( struct file * filp , const char __user * ubuf ,
size_t cnt , loff_t * ppos )
{
unsigned long val ;
int ret ;
2011-06-07 21:58:27 +02:00
ret = kstrtoul_from_user ( ubuf , cnt , 10 , & val ) ;
if ( ret )
2009-03-20 12:50:56 -04:00
return ret ;
val = ! ! val ;
mutex_lock ( & ftrace_profile_lock ) ;
if ( ftrace_profile_enabled ^ val ) {
if ( val ) {
2009-03-23 17:12:36 -04:00
ret = ftrace_profile_init ( ) ;
if ( ret < 0 ) {
cnt = ret ;
goto out ;
}
2009-03-23 23:12:58 -04:00
ret = register_ftrace_profiler ( ) ;
if ( ret < 0 ) {
cnt = ret ;
goto out ;
}
2009-03-20 12:50:56 -04:00
ftrace_profile_enabled = 1 ;
} else {
ftrace_profile_enabled = 0 ;
2009-06-01 21:51:28 -04:00
/*
* unregister_ftrace_profiler calls stop_machine
2018-11-06 18:44:52 -08:00
* so this acts like an synchronize_rcu .
2009-06-01 21:51:28 -04:00
*/
2009-03-23 23:12:58 -04:00
unregister_ftrace_profiler ( ) ;
2009-03-20 12:50:56 -04:00
}
}
2009-03-23 17:12:36 -04:00
out :
2009-03-20 12:50:56 -04:00
mutex_unlock ( & ftrace_profile_lock ) ;
2009-10-23 19:36:16 -04:00
* ppos + = cnt ;
2009-03-20 12:50:56 -04:00
return cnt ;
}
2009-03-23 17:12:36 -04:00
static ssize_t
ftrace_profile_read ( struct file * filp , char __user * ubuf ,
size_t cnt , loff_t * ppos )
{
2009-03-25 13:26:41 -04:00
char buf [ 64 ] ; /* big enough to hold a number */
2009-03-23 17:12:36 -04:00
int r ;
r = sprintf ( buf , " %u \n " , ftrace_profile_enabled ) ;
return simple_read_from_buffer ( ubuf , cnt , ppos , buf , r ) ;
}
2009-03-20 12:50:56 -04:00
static const struct file_operations ftrace_profile_fops = {
. open = tracing_open_generic ,
. read = ftrace_profile_read ,
. write = ftrace_profile_write ,
llseek: automatically add .llseek fop
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.
The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.
New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.
The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.
Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.
Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.
===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}
@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}
@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}
@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}
@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};
@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};
@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};
@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};
@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};
// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};
@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};
// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};
// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};
// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};
@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};
// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////
@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};
@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};
@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};
@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
2010-08-15 18:52:59 +02:00
. llseek = default_llseek ,
2009-03-20 12:50:56 -04:00
} ;
2009-03-24 20:50:39 -04:00
/* used to initialize the real stat files */
static struct tracer_stat function_stats __initdata = {
2009-03-25 13:26:41 -04:00
. name = " functions " ,
. stat_start = function_stat_start ,
. stat_next = function_stat_next ,
. stat_cmp = function_stat_cmp ,
. stat_headers = function_stat_headers ,
. stat_show = function_stat_show
2009-03-24 20:50:39 -04:00
} ;
2015-01-20 12:13:40 -05:00
static __init void ftrace_profile_tracefs ( struct dentry * d_tracer )
2009-03-20 12:50:56 -04:00
{
2009-03-24 20:50:39 -04:00
struct ftrace_profile_stat * stat ;
char * name ;
2009-03-20 12:50:56 -04:00
int ret ;
2009-03-24 20:50:39 -04:00
int cpu ;
for_each_possible_cpu ( cpu ) {
stat = & per_cpu ( ftrace_profile_stats , cpu ) ;
2016-03-15 22:12:34 +08:00
name = kasprintf ( GFP_KERNEL , " function%d " , cpu ) ;
2009-03-24 20:50:39 -04:00
if ( ! name ) {
/*
* The files created are permanent , if something happens
* we still do not free memory .
*/
WARN ( 1 ,
" Could not allocate stat file for cpu %d \n " ,
cpu ) ;
return ;
}
stat - > stat = function_stats ;
stat - > stat . name = name ;
ret = register_stat_tracer ( & stat - > stat ) ;
if ( ret ) {
WARN ( 1 ,
" Could not register function stat for cpu %d \n " ,
cpu ) ;
kfree ( name ) ;
return ;
}
2009-03-20 12:50:56 -04:00
}
2022-01-14 21:10:52 +08:00
trace_create_file ( " function_profile_enabled " ,
TRACE_MODE_WRITE , d_tracer , NULL ,
& ftrace_profile_fops ) ;
2009-03-20 12:50:56 -04:00
}
# else /* CONFIG_FUNCTION_PROFILER */
2015-01-20 12:13:40 -05:00
static __init void ftrace_profile_tracefs ( struct dentry * d_tracer )
2009-03-20 12:50:56 -04:00
{
}
# endif /* CONFIG_FUNCTION_PROFILER */
2009-03-23 17:12:36 -04:00
# ifdef CONFIG_DYNAMIC_FTRACE
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
static struct ftrace_ops * removed_ops ;
2014-08-05 17:19:38 -04:00
/*
* Set when doing a global update , like enabling all recs or disabling them .
* It is not set when just updating a single ftrace_ops .
*/
static bool update_all_ops ;
2009-03-23 17:12:36 -04:00
# ifndef CONFIG_FTRACE_MCOUNT_RECORD
# error Dynamic ftrace depends on MCOUNT_RECORD
# endif
2017-04-18 14:50:39 -04:00
struct ftrace_func_probe {
struct ftrace_probe_ops * probe_ops ;
struct ftrace_ops ops ;
struct trace_array * tr ;
struct list_head list ;
tracing/ftrace: Add a better way to pass data via the probe functions
With the redesign of the registration and execution of the function probes
(triggers), data can now be passed from the setup of the probe to the probe
callers that are specific to the trace_array it is on. Although, all probes
still only affect the toplevel trace array, this change will allow for
instances to have their own probes separated from other instances and the
top array.
That is, something like the stacktrace probe can be set to trace only in an
instance and not the toplevel trace array. This isn't implement yet, but
this change sets the ground work for the change.
When a probe callback is triggered (someone writes the probe format into
set_ftrace_filter), it calls register_ftrace_function_probe() passing in
init_data that will be used to initialize the probe. Then for every matching
function, register_ftrace_function_probe() will call the probe_ops->init()
function with the init data that was passed to it, as well as an address to
a place holder that is associated with the probe and the instance. The first
occurrence will have a NULL in the pointer. The init() function will then
initialize it. If other probes are added, or more functions are part of the
probe, the place holder will be passed to the init() function with the place
holder data that it was initialized to the last time.
Then this place_holder is passed to each of the other probe_ops functions,
where it can be used in the function callback. When the probe_ops free()
function is called, it can be called either with the rip of the function
that is being removed from the probe, or zero, indicating that there are no
more functions attached to the probe, and the place holder is about to be
freed. This gives the probe_ops a way to free the data it assigned to the
place holder if it was allocade during the first init call.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-19 22:39:44 -04:00
void * data ;
2017-04-18 14:50:39 -04:00
int ref ;
} ;
2011-05-02 17:34:47 -04:00
/*
* We make these constant because no one should touch them ,
* but they are used as the default " empty hash " , to avoid allocating
* it all the time . These are in a read only section such that if
* anyone does try to modify it , it will cause an exception .
*/
static const struct hlist_head empty_buckets [ 1 ] ;
static const struct ftrace_hash empty_hash = {
. buckets = ( struct hlist_head * ) empty_buckets ,
2011-04-29 20:59:51 -04:00
} ;
2011-05-02 17:34:47 -04:00
# define EMPTY_HASH ((struct ftrace_hash *)&empty_hash)
2009-03-23 17:12:36 -04:00
2018-11-15 12:32:38 -05:00
struct ftrace_ops global_ops = {
2014-08-15 17:23:02 -04:00
. func = ftrace_stub ,
. local_hash . notrace_hash = EMPTY_HASH ,
. local_hash . filter_hash = EMPTY_HASH ,
INIT_OPS_HASH ( global_ops )
2020-11-05 21:32:45 -05:00
. flags = FTRACE_OPS_FL_INITIALIZED |
2015-07-24 10:38:12 -04:00
FTRACE_OPS_FL_PID ,
2011-05-02 12:29:25 -04:00
} ;
2014-11-18 21:14:11 -05:00
/*
2021-03-23 18:49:35 +01:00
* Used by the stack unwinder to know about dynamic ftrace trampolines .
2014-11-18 21:14:11 -05:00
*/
2018-01-22 22:32:51 -05:00
struct ftrace_ops * ftrace_ops_trampoline ( unsigned long addr )
2014-11-18 21:14:11 -05:00
{
2018-01-22 22:32:51 -05:00
struct ftrace_ops * op = NULL ;
2014-11-18 21:14:11 -05:00
/*
* Some of the ops may be dynamically allocated ,
2018-11-06 18:44:52 -08:00
* they are freed after a synchronize_rcu ( ) .
2014-11-18 21:14:11 -05:00
*/
preempt_disable_notrace ( ) ;
do_for_each_ftrace_op ( op , ftrace_ops_list ) {
/*
* This is to check for dynamically allocated trampolines .
* Trampolines that are in kernel text will have
* core_kernel_text ( ) return true .
*/
if ( op - > trampoline & & op - > trampoline_size )
if ( addr > = op - > trampoline & &
addr < op - > trampoline + op - > trampoline_size ) {
2018-01-22 22:32:51 -05:00
preempt_enable_notrace ( ) ;
return op ;
2014-11-18 21:14:11 -05:00
}
} while_for_each_ftrace_op ( op ) ;
preempt_enable_notrace ( ) ;
2018-01-22 22:32:51 -05:00
return NULL ;
}
/*
* This is used by __kernel_text_address ( ) to return true if the
* address is on a dynamically allocated trampoline that would
* not return true for either core_kernel_text ( ) or
* is_module_text_address ( ) .
*/
bool is_ftrace_trampoline ( unsigned long addr )
{
return ftrace_ops_trampoline ( addr ) ! = NULL ;
2014-11-18 21:14:11 -05:00
}
2009-03-23 17:12:36 -04:00
struct ftrace_page {
struct ftrace_page * next ;
2011-12-16 16:23:44 -05:00
struct dyn_ftrace * records ;
2009-03-23 17:12:36 -04:00
int index ;
2021-04-01 16:14:17 -04:00
int order ;
2009-03-23 17:12:36 -04:00
} ;
2011-12-16 16:23:44 -05:00
# define ENTRY_SIZE sizeof(struct dyn_ftrace)
# define ENTRIES_PER_PAGE (PAGE_SIZE / ENTRY_SIZE)
2009-03-23 17:12:36 -04:00
static struct ftrace_page * ftrace_pages_start ;
static struct ftrace_page * ftrace_pages ;
2017-02-01 12:19:33 -05:00
static __always_inline unsigned long
ftrace_hash_key ( struct ftrace_hash * hash , unsigned long ip )
{
if ( hash - > size_bits > 0 )
return hash_long ( ip , hash - > size_bits ) ;
return 0 ;
}
2017-02-01 15:37:07 -05:00
/* Only use this function if ftrace_hash_empty() has already been tested */
static __always_inline struct ftrace_func_entry *
__ftrace_lookup_ip ( struct ftrace_hash * hash , unsigned long ip )
2011-04-29 15:12:32 -04:00
{
unsigned long key ;
struct ftrace_func_entry * entry ;
struct hlist_head * hhd ;
2017-02-01 12:19:33 -05:00
key = ftrace_hash_key ( hash , ip ) ;
2011-04-29 15:12:32 -04:00
hhd = & hash - > buckets [ key ] ;
2013-05-28 14:38:43 -04:00
hlist_for_each_entry_rcu_notrace ( entry , hhd , hlist ) {
2011-04-29 15:12:32 -04:00
if ( entry - > ip = = ip )
return entry ;
}
return NULL ;
}
2017-02-01 15:37:07 -05:00
/**
* ftrace_lookup_ip - Test to see if an ip exists in an ftrace_hash
* @ hash : The hash to look at
* @ ip : The instruction pointer to test
*
* Search a given @ hash to see if a given instruction pointer ( @ ip )
* exists in it .
*
2024-02-22 21:48:33 -08:00
* Returns : the entry that holds the @ ip if found . NULL otherwise .
2017-02-01 15:37:07 -05:00
*/
struct ftrace_func_entry *
ftrace_lookup_ip ( struct ftrace_hash * hash , unsigned long ip )
{
if ( ftrace_hash_empty ( hash ) )
return NULL ;
return __ftrace_lookup_ip ( hash , ip ) ;
}
2011-05-02 17:34:47 -04:00
static void __add_hash_entry ( struct ftrace_hash * hash ,
struct ftrace_func_entry * entry )
2011-04-29 15:12:32 -04:00
{
struct hlist_head * hhd ;
unsigned long key ;
2017-02-01 12:19:33 -05:00
key = ftrace_hash_key ( hash , entry - > ip ) ;
2011-04-29 15:12:32 -04:00
hhd = & hash - > buckets [ key ] ;
hlist_add_head ( & entry - > hlist , hhd ) ;
hash - > count + + ;
2011-05-02 17:34:47 -04:00
}
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
static struct ftrace_func_entry *
add_hash_entry ( struct ftrace_hash * hash , unsigned long ip )
2011-05-02 17:34:47 -04:00
{
struct ftrace_func_entry * entry ;
entry = kmalloc ( sizeof ( * entry ) , GFP_KERNEL ) ;
if ( ! entry )
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
return NULL ;
2011-05-02 17:34:47 -04:00
entry - > ip = ip ;
__add_hash_entry ( hash , entry ) ;
2011-04-29 15:12:32 -04:00
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
return entry ;
2011-04-29 15:12:32 -04:00
}
static void
2011-05-02 17:34:47 -04:00
free_hash_entry ( struct ftrace_hash * hash ,
2011-04-29 15:12:32 -04:00
struct ftrace_func_entry * entry )
{
hlist_del ( & entry - > hlist ) ;
kfree ( entry ) ;
hash - > count - - ;
}
2011-05-02 17:34:47 -04:00
static void
remove_hash_entry ( struct ftrace_hash * hash ,
struct ftrace_func_entry * entry )
{
2017-04-04 21:31:28 -04:00
hlist_del_rcu ( & entry - > hlist ) ;
2011-05-02 17:34:47 -04:00
hash - > count - - ;
}
2011-04-29 15:12:32 -04:00
static void ftrace_hash_clear ( struct ftrace_hash * hash )
{
struct hlist_head * hhd ;
hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 17:06:00 -08:00
struct hlist_node * tn ;
2011-04-29 15:12:32 -04:00
struct ftrace_func_entry * entry ;
int size = 1 < < hash - > size_bits ;
int i ;
2011-05-02 17:34:47 -04:00
if ( ! hash - > count )
return ;
2011-04-29 15:12:32 -04:00
for ( i = 0 ; i < size ; i + + ) {
hhd = & hash - > buckets [ i ] ;
hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 17:06:00 -08:00
hlist_for_each_entry_safe ( entry , tn , hhd , hlist )
2011-05-02 17:34:47 -04:00
free_hash_entry ( hash , entry ) ;
2011-04-29 15:12:32 -04:00
}
FTRACE_WARN_ON ( hash - > count ) ;
}
2017-06-23 15:26:26 -04:00
static void free_ftrace_mod ( struct ftrace_mod_load * ftrace_mod )
{
list_del ( & ftrace_mod - > list ) ;
kfree ( ftrace_mod - > module ) ;
kfree ( ftrace_mod - > func ) ;
kfree ( ftrace_mod ) ;
}
static void clear_ftrace_mod_list ( struct list_head * head )
{
struct ftrace_mod_load * p , * n ;
/* stack tracer isn't supported yet */
if ( ! head )
return ;
mutex_lock ( & ftrace_lock ) ;
list_for_each_entry_safe ( p , n , head , list )
free_ftrace_mod ( p ) ;
mutex_unlock ( & ftrace_lock ) ;
}
2011-05-02 17:34:47 -04:00
static void free_ftrace_hash ( struct ftrace_hash * hash )
{
if ( ! hash | | hash = = EMPTY_HASH )
return ;
ftrace_hash_clear ( hash ) ;
kfree ( hash - > buckets ) ;
kfree ( hash ) ;
}
2011-05-05 18:03:47 -04:00
static void __free_ftrace_hash_rcu ( struct rcu_head * rcu )
{
struct ftrace_hash * hash ;
hash = container_of ( rcu , struct ftrace_hash , rcu ) ;
free_ftrace_hash ( hash ) ;
}
static void free_ftrace_hash_rcu ( struct ftrace_hash * hash )
{
if ( ! hash | | hash = = EMPTY_HASH )
return ;
2018-11-06 18:44:52 -08:00
call_rcu ( & hash - > rcu , __free_ftrace_hash_rcu ) ;
2011-05-05 18:03:47 -04:00
}
2023-01-03 12:49:11 +00:00
/**
* ftrace_free_filter - remove all filters for an ftrace_ops
2024-02-22 21:48:33 -08:00
* @ ops : the ops to remove the filters from
2023-01-03 12:49:11 +00:00
*/
ftrace, perf: Add filter support for function trace event
Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:
perf record -e ftrace:function --filter="(ip == mm_*)" ls
The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:
ip == f1[, ]f2 ... || ip != f3[, ]f4 ...
with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"', e.g.:
perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls
perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls
perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls
The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.
The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.
The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.
The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.
Link: http://lkml.kernel.org/r/1329317514-8131-8-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-15 15:51:54 +01:00
void ftrace_free_filter ( struct ftrace_ops * ops )
{
2013-05-09 14:44:17 +09:00
ftrace_ops_init ( ops ) ;
2014-08-15 17:23:02 -04:00
free_ftrace_hash ( ops - > func_hash - > filter_hash ) ;
free_ftrace_hash ( ops - > func_hash - > notrace_hash ) ;
ftrace, perf: Add filter support for function trace event
Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:
perf record -e ftrace:function --filter="(ip == mm_*)" ls
The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:
ip == f1[, ]f2 ... || ip != f3[, ]f4 ...
with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"', e.g.:
perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls
perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls
perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls
The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.
The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.
The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.
The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.
Link: http://lkml.kernel.org/r/1329317514-8131-8-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-15 15:51:54 +01:00
}
2023-01-03 12:49:11 +00:00
EXPORT_SYMBOL_GPL ( ftrace_free_filter ) ;
ftrace, perf: Add filter support for function trace event
Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:
perf record -e ftrace:function --filter="(ip == mm_*)" ls
The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:
ip == f1[, ]f2 ... || ip != f3[, ]f4 ...
with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"', e.g.:
perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls
perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls
perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls
The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.
The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.
The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.
The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.
Link: http://lkml.kernel.org/r/1329317514-8131-8-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-15 15:51:54 +01:00
2011-05-02 17:34:47 -04:00
static struct ftrace_hash * alloc_ftrace_hash ( int size_bits )
{
struct ftrace_hash * hash ;
int size ;
hash = kzalloc ( sizeof ( * hash ) , GFP_KERNEL ) ;
if ( ! hash )
return NULL ;
size = 1 < < size_bits ;
2011-11-29 22:08:00 +01:00
hash - > buckets = kcalloc ( size , sizeof ( * hash - > buckets ) , GFP_KERNEL ) ;
2011-05-02 17:34:47 -04:00
if ( ! hash - > buckets ) {
kfree ( hash ) ;
return NULL ;
}
hash - > size_bits = size_bits ;
return hash ;
}
2017-06-23 15:26:26 -04:00
static int ftrace_add_mod ( struct trace_array * tr ,
const char * func , const char * module ,
int enable )
{
struct ftrace_mod_load * ftrace_mod ;
struct list_head * mod_head = enable ? & tr - > mod_trace : & tr - > mod_notrace ;
ftrace_mod = kzalloc ( sizeof ( * ftrace_mod ) , GFP_KERNEL ) ;
if ( ! ftrace_mod )
return - ENOMEM ;
2022-11-16 09:52:07 +08:00
INIT_LIST_HEAD ( & ftrace_mod - > list ) ;
2017-06-23 15:26:26 -04:00
ftrace_mod - > func = kstrdup ( func , GFP_KERNEL ) ;
ftrace_mod - > module = kstrdup ( module , GFP_KERNEL ) ;
ftrace_mod - > enable = enable ;
if ( ! ftrace_mod - > func | | ! ftrace_mod - > module )
goto out_free ;
list_add ( & ftrace_mod - > list , mod_head ) ;
return 0 ;
out_free :
free_ftrace_mod ( ftrace_mod ) ;
return - ENOMEM ;
}
2011-05-02 17:34:47 -04:00
static struct ftrace_hash *
alloc_and_copy_ftrace_hash ( int size_bits , struct ftrace_hash * hash )
{
struct ftrace_func_entry * entry ;
struct ftrace_hash * new_hash ;
int size ;
int i ;
new_hash = alloc_ftrace_hash ( size_bits ) ;
if ( ! new_hash )
return NULL ;
2017-06-26 11:47:31 -04:00
if ( hash )
new_hash - > flags = hash - > flags ;
2011-05-02 17:34:47 -04:00
/* Empty hash? */
2011-12-19 19:07:36 -05:00
if ( ftrace_hash_empty ( hash ) )
2011-05-02 17:34:47 -04:00
return new_hash ;
size = 1 < < hash - > size_bits ;
for ( i = 0 ; i < size ; i + + ) {
hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 17:06:00 -08:00
hlist_for_each_entry ( entry , & hash - > buckets [ i ] , hlist ) {
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
if ( add_hash_entry ( new_hash , entry - > ip ) = = NULL )
2011-05-02 17:34:47 -04:00
goto free_hash ;
}
}
FTRACE_WARN_ON ( new_hash - > count ! = hash - > count ) ;
return new_hash ;
free_hash :
free_ftrace_hash ( new_hash ) ;
return NULL ;
}
2011-07-13 15:03:44 -04:00
static void
2014-08-18 13:21:08 -04:00
ftrace_hash_rec_disable_modify ( struct ftrace_ops * ops , int filter_hash ) ;
2011-07-13 15:03:44 -04:00
static void
2014-08-18 13:21:08 -04:00
ftrace_hash_rec_enable_modify ( struct ftrace_ops * ops , int filter_hash ) ;
2011-07-13 15:03:44 -04:00
2014-11-21 05:25:16 -05:00
static int ftrace_hash_ipmodify_update ( struct ftrace_ops * ops ,
struct ftrace_hash * new_hash ) ;
2019-11-08 12:25:46 -05:00
static struct ftrace_hash * dup_hash ( struct ftrace_hash * src , int size )
2011-05-02 17:34:47 -04:00
{
struct ftrace_func_entry * entry ;
2011-05-05 18:03:47 -04:00
struct ftrace_hash * new_hash ;
2019-11-08 12:25:46 -05:00
struct hlist_head * hhd ;
struct hlist_node * tn ;
2011-05-02 17:34:47 -04:00
int bits = 0 ;
int i ;
/*
2020-10-05 20:21:14 -04:00
* Use around half the size ( max bit of it ) , but
* a minimum of 2 is fine ( as size of 0 or 1 both give 1 for bits ) .
2011-05-02 17:34:47 -04:00
*/
2020-10-05 20:21:14 -04:00
bits = fls ( size / 2 ) ;
2011-05-02 17:34:47 -04:00
/* Don't allocate too much */
if ( bits > FTRACE_HASH_MAX_BITS )
bits = FTRACE_HASH_MAX_BITS ;
2011-05-05 18:03:47 -04:00
new_hash = alloc_ftrace_hash ( bits ) ;
if ( ! new_hash )
2017-01-20 11:44:45 +09:00
return NULL ;
2011-05-02 17:34:47 -04:00
2017-06-26 11:47:31 -04:00
new_hash - > flags = src - > flags ;
2011-05-02 17:34:47 -04:00
size = 1 < < src - > size_bits ;
for ( i = 0 ; i < size ; i + + ) {
hhd = & src - > buckets [ i ] ;
hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 17:06:00 -08:00
hlist_for_each_entry_safe ( entry , tn , hhd , hlist ) {
2011-05-02 17:34:47 -04:00
remove_hash_entry ( src , entry ) ;
2011-05-05 18:03:47 -04:00
__add_hash_entry ( new_hash , entry ) ;
2011-05-02 17:34:47 -04:00
}
}
2017-01-20 11:44:45 +09:00
return new_hash ;
}
2019-11-08 12:25:46 -05:00
static struct ftrace_hash *
__ftrace_hash_move ( struct ftrace_hash * src )
{
int size = src - > count ;
/*
* If the new source is empty , just return the empty_hash .
*/
if ( ftrace_hash_empty ( src ) )
return EMPTY_HASH ;
return dup_hash ( src , size ) ;
}
2017-01-20 11:44:45 +09:00
static int
ftrace_hash_move ( struct ftrace_ops * ops , int enable ,
struct ftrace_hash * * dst , struct ftrace_hash * src )
{
struct ftrace_hash * new_hash ;
int ret ;
/* Reject setting notrace hash on IPMODIFY ftrace_ops */
if ( ops - > flags & FTRACE_OPS_FL_IPMODIFY & & ! enable )
return - EINVAL ;
new_hash = __ftrace_hash_move ( src ) ;
if ( ! new_hash )
return - ENOMEM ;
2014-11-21 05:25:16 -05:00
/* Make sure this can be applied if it is IPMODIFY ftrace_ops */
if ( enable ) {
/* IPMODIFY should be updated only when filter_hash updating */
ret = ftrace_hash_ipmodify_update ( ops , new_hash ) ;
if ( ret < 0 ) {
free_ftrace_hash ( new_hash ) ;
return ret ;
}
}
2014-06-17 11:04:42 +00:00
/*
* Remove the current set , update the hash and add
* them back .
*/
2014-08-18 13:21:08 -04:00
ftrace_hash_rec_disable_modify ( ops , enable ) ;
2014-06-17 11:04:42 +00:00
2011-05-05 18:03:47 -04:00
rcu_assign_pointer ( * dst , new_hash ) ;
2014-08-18 13:21:08 -04:00
ftrace_hash_rec_enable_modify ( ops , enable ) ;
2011-07-13 15:03:44 -04:00
2014-06-17 11:04:42 +00:00
return 0 ;
2011-05-02 17:34:47 -04:00
}
2014-07-24 12:25:47 -04:00
static bool hash_contains_ip ( unsigned long ip ,
struct ftrace_ops_hash * hash )
{
/*
* The function record is a match if it exists in the filter
2020-10-02 22:31:26 +08:00
* hash and not in the notrace hash . Note , an empty hash is
2014-07-24 12:25:47 -04:00
* considered a match for the filter hash , but an empty
* notrace hash is considered not in the notrace hash .
*/
return ( ftrace_hash_empty ( hash - > filter_hash ) | |
2017-02-01 15:37:07 -05:00
__ftrace_lookup_ip ( hash - > filter_hash , ip ) ) & &
2014-07-24 12:25:47 -04:00
( ftrace_hash_empty ( hash - > notrace_hash ) | |
2017-02-01 15:37:07 -05:00
! __ftrace_lookup_ip ( hash - > notrace_hash , ip ) ) ;
2014-07-24 12:25:47 -04:00
}
2011-05-04 09:27:52 -04:00
/*
* Test the hashes for this ops to see if we want to call
* the ops - > func or not .
*
* It ' s a match if the ip is in the ops - > filter_hash or
* the filter_hash does not exist or is empty ,
* AND
* the ip is not in the ops - > notrace_hash .
2011-05-05 21:14:55 -04:00
*
* This needs to be called with preemption disabled as
2018-11-06 18:44:52 -08:00
* the hashes are freed with call_rcu ( ) .
2011-05-04 09:27:52 -04:00
*/
2018-11-15 12:32:38 -05:00
int
2013-07-23 22:06:15 -04:00
ftrace_ops_test ( struct ftrace_ops * ops , unsigned long ip , void * regs )
2011-05-04 09:27:52 -04:00
{
2014-07-24 12:25:47 -04:00
struct ftrace_ops_hash hash ;
2011-05-04 09:27:52 -04:00
int ret ;
2013-07-23 22:06:15 -04:00
# ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
/*
* There ' s a small race when adding ops that the ftrace handler
* that wants regs , may be called without them . We can not
* allow that handler to be called if regs is NULL .
*/
if ( regs = = NULL & & ( ops - > flags & FTRACE_OPS_FL_SAVE_REGS ) )
return 0 ;
# endif
2017-06-07 16:12:51 +08:00
rcu_assign_pointer ( hash . filter_hash , ops - > func_hash - > filter_hash ) ;
rcu_assign_pointer ( hash . notrace_hash , ops - > func_hash - > notrace_hash ) ;
2011-05-04 09:27:52 -04:00
2014-07-24 12:25:47 -04:00
if ( hash_contains_ip ( ip , & hash ) )
2011-05-04 09:27:52 -04:00
ret = 1 ;
else
ret = 0 ;
return ret ;
}
2009-03-23 17:12:36 -04:00
/*
* This is a double for . Do not use ' break ' to break out of the loop ,
* you must use a goto .
*/
# define do_for_each_ftrace_rec(pg, rec) \
for ( pg = ftrace_pages_start ; pg ; pg = pg - > next ) { \
int _____i ; \
for ( _____i = 0 ; _____i < pg - > index ; _____i + + ) { \
rec = & pg - > records [ _____i ] ;
# define while_for_each_ftrace_rec() \
} \
}
2011-12-16 19:27:42 -05:00
static int ftrace_cmp_recs ( const void * a , const void * b )
{
2012-04-25 13:48:13 -04:00
const struct dyn_ftrace * key = a ;
const struct dyn_ftrace * rec = b ;
2011-12-16 19:27:42 -05:00
2012-04-25 13:48:13 -04:00
if ( key - > flags < rec - > ip )
2011-12-16 19:27:42 -05:00
return - 1 ;
2012-04-25 13:48:13 -04:00
if ( key - > ip > = rec - > ip + MCOUNT_INSN_SIZE )
return 1 ;
2011-12-16 19:27:42 -05:00
return 0 ;
}
2019-11-08 12:26:46 -05:00
static struct dyn_ftrace * lookup_rec ( unsigned long start , unsigned long end )
{
struct ftrace_page * pg ;
struct dyn_ftrace * rec = NULL ;
struct dyn_ftrace key ;
key . ip = start ;
key . flags = end ; /* overload flags, as it is unsigned long */
for ( pg = ftrace_pages_start ; pg ; pg = pg - > next ) {
2023-03-09 16:02:30 +08:00
if ( pg - > index = = 0 | |
end < pg - > records [ 0 ] . ip | |
2019-11-08 12:26:46 -05:00
start > = ( pg - > records [ pg - > index - 1 ] . ip + MCOUNT_INSN_SIZE ) )
continue ;
rec = bsearch ( & key , pg - > records , pg - > index ,
sizeof ( struct dyn_ftrace ) ,
ftrace_cmp_recs ) ;
2020-03-06 18:43:17 +01:00
if ( rec )
break ;
2019-11-08 12:26:46 -05:00
}
return rec ;
}
2016-03-24 22:04:01 +11:00
/**
* ftrace_location_range - return the first address of a traced location
* if it touches the given ip range
* @ start : start of range to search .
* @ end : end of range to search ( inclusive ) . @ end points to the last byte
* to check .
*
2024-02-22 21:48:33 -08:00
* Returns : rec - > ip if the related ftrace location is a least partly within
2016-03-24 22:04:01 +11:00
* the given address range . That is , the first address of the instruction
* that is either a NOP or call to the function tracer . It checks the ftrace
* internal tables to determine if the address belongs or not .
*/
unsigned long ftrace_location_range ( unsigned long start , unsigned long end )
2011-08-16 09:53:39 -04:00
{
struct dyn_ftrace * rec ;
2011-12-16 19:27:42 -05:00
2019-11-08 12:26:46 -05:00
rec = lookup_rec ( start , end ) ;
if ( rec )
return rec - > ip ;
2011-08-16 09:53:39 -04:00
return 0 ;
}
2012-04-25 13:48:13 -04:00
/**
2022-03-08 16:30:29 +01:00
* ftrace_location - return the ftrace location
2012-04-25 13:48:13 -04:00
* @ ip : the instruction pointer to check
*
2024-02-22 21:48:33 -08:00
* Returns :
* * If @ ip matches the ftrace location , return @ ip .
* * If @ ip matches sym + 0 , return sym ' s ftrace location .
* * Otherwise , return 0.
2012-04-25 13:48:13 -04:00
*/
2012-04-25 14:39:54 -04:00
unsigned long ftrace_location ( unsigned long ip )
2012-04-25 13:48:13 -04:00
{
2022-03-08 16:30:29 +01:00
struct dyn_ftrace * rec ;
unsigned long offset ;
unsigned long size ;
rec = lookup_rec ( ip , ip ) ;
if ( ! rec ) {
if ( ! kallsyms_lookup_size_offset ( ip , & size , & offset ) )
goto out ;
/* map sym+0 to __fentry__ */
if ( ! offset )
rec = lookup_rec ( ip , ip + size - 1 ) ;
}
if ( rec )
return rec - > ip ;
out :
return 0 ;
2012-04-25 13:48:13 -04:00
}
/**
* ftrace_text_reserved - return true if range contains an ftrace location
* @ start : start of range to search
* @ end : end of range to search ( inclusive ) . @ end points to the last byte to check .
*
2024-02-22 21:48:33 -08:00
* Returns : 1 if @ start and @ end contains a ftrace location .
2012-04-25 13:48:13 -04:00
* That is , the instruction that is either a NOP or call to
* the function tracer . It checks the ftrace internal tables to
* determine if the address belongs or not .
*/
2013-01-09 18:09:20 -05:00
int ftrace_text_reserved ( const void * start , const void * end )
2012-04-25 13:48:13 -04:00
{
2012-04-25 14:39:54 -04:00
unsigned long ret ;
ret = ftrace_location_range ( ( unsigned long ) start ,
( unsigned long ) end ) ;
return ( int ) ! ! ret ;
2012-04-25 13:48:13 -04:00
}
ftrace: Allow no regs if no more callbacks require it
When registering a function callback for the function tracer, the ops
can specify if it wants to save full regs (like an interrupt would)
for each function that it traces, or if it does not care about regs
and just wants to have the fastest return possible.
Once a ops has registered a function, if other ops register that
function they all will receive the regs too. That's because it does
the work once, it does it for everyone.
Now if the ops wanting regs unregisters the function so that there's
only ops left that do not care about regs, those ops will still
continue getting regs and going through the work for it on that
function. This is because the disabling of the rec counter only
sees the ops registered, and does not see the ops that are still
attached, and does not know if the current ops that are still attached
want regs or not. To play it safe, it just keeps regs being processed
until no function is registered anymore.
Instead of doing that, check the ops that are still registered for that
function and if none want regs for it anymore, then disable the
processing of regs.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-30 22:35:48 -04:00
/* Test if ops registered to this rec needs regs */
static bool test_rec_ops_needs_regs ( struct dyn_ftrace * rec )
{
struct ftrace_ops * ops ;
bool keep_regs = false ;
for ( ops = ftrace_ops_list ;
ops ! = & ftrace_list_end ; ops = ops - > next ) {
/* pass rec in as regs to have non-NULL val */
if ( ftrace_ops_test ( ops , rec - > ip , rec ) ) {
if ( ops - > flags & FTRACE_OPS_FL_SAVE_REGS ) {
keep_regs = true ;
break ;
}
}
}
return keep_regs ;
}
2019-05-04 19:39:39 +08:00
static struct ftrace_ops *
ftrace_find_tramp_ops_any ( struct dyn_ftrace * rec ) ;
static struct ftrace_ops *
2020-11-26 23:38:38 +05:30
ftrace_find_tramp_ops_any_other ( struct dyn_ftrace * rec , struct ftrace_ops * op_exclude ) ;
static struct ftrace_ops *
2019-05-04 19:39:39 +08:00
ftrace_find_tramp_ops_next ( struct dyn_ftrace * rec , struct ftrace_ops * ops ) ;
ftrace: Still disable enabled records marked as disabled
Weak functions started causing havoc as they showed up in the
"available_filter_functions" and this confused people as to why some
functions marked as "notrace" were listed, but when enabled they did
nothing. This was because weak functions can still have fentry calls, and
these addresses get added to the "available_filter_functions" file.
kallsyms is what converts those addresses to names, and since the weak
functions are not listed in kallsyms, it would just pick the function
before that.
To solve this, there was a trick to detect weak functions listed, and
these records would be marked as DISABLED so that they do not get enabled
and are mostly ignored. As the processing of the list of all functions to
figure out what is weak or not can take a long time, this process is put
off into a kernel thread and run in parallel with the rest of start up.
Now the issue happens whet function tracing is enabled via the kernel
command line. As it starts very early in boot up, it can be enabled before
the records that are weak are marked to be disabled. This causes an issue
in the accounting, as the weak records are enabled by the command line
function tracing, but after boot up, they are not disabled.
The ftrace records have several accounting flags and a ref count. The
DISABLED flag is just one. If the record is enabled before it is marked
DISABLED it will get an ENABLED flag and also have its ref counter
incremented. After it is marked for DISABLED, neither the ENABLED flag nor
the ref counter is cleared. There's sanity checks on the records that are
performed after an ftrace function is registered or unregistered, and this
detected that there were records marked as ENABLED with ref counter that
should not have been.
Note, the module loading code uses the DISABLED flag as well to keep its
functions from being modified while its being loaded and some of these
flags may get set in this process. So changing the verification code to
ignore DISABLED records is a no go, as it still needs to verify that the
module records are working too.
Also, the weak functions still are calling a trampoline. Even though they
should never be called, it is dangerous to leave these weak functions
calling a trampoline that is freed, so they should still be set back to
nops.
There's two places that need to not skip records that have the ENABLED
and the DISABLED flags set. That is where the ftrace_ops is processed and
sets the records ref counts, and then later when the function itself is to
be updated, and the ENABLED flag gets removed. Add a helper function
"skip_record()" that returns true if the record has the DISABLED flag set
but not the ENABLED flag.
Link: https://lkml.kernel.org/r/20221005003809.27d2b97b@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: b39181f7c6907 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-10-05 00:38:09 -04:00
static bool skip_record ( struct dyn_ftrace * rec )
{
/*
* At boot up , weak functions are set to disable . Function tracing
* can be enabled before they are , and they still need to be disabled now .
* If the record is disabled , still continue if it is marked as already
* enabled ( this is needed to keep the accounting working ) .
*/
return rec - > flags & FTRACE_FL_DISABLED & &
! ( rec - > flags & FTRACE_FL_ENABLED ) ;
}
2016-03-16 15:34:32 +01:00
static bool __ftrace_hash_rec_update ( struct ftrace_ops * ops ,
2011-05-03 13:25:24 -04:00
int filter_hash ,
bool inc )
{
struct ftrace_hash * hash ;
struct ftrace_hash * other_hash ;
struct ftrace_page * pg ;
struct dyn_ftrace * rec ;
2016-03-16 15:34:32 +01:00
bool update = false ;
2011-05-03 13:25:24 -04:00
int count = 0 ;
2017-06-26 11:47:31 -04:00
int all = false ;
2011-05-03 13:25:24 -04:00
/* Only update if the ops has been registered */
if ( ! ( ops - > flags & FTRACE_OPS_FL_ENABLED ) )
2016-03-16 15:34:32 +01:00
return false ;
2011-05-03 13:25:24 -04:00
/*
* In the filter_hash case :
* If the count is zero , we update all records .
* Otherwise we just update the items in the hash .
*
* In the notrace_hash case :
* We enable the update in the hash .
* As disabling notrace means enabling the tracing ,
* and enabling notrace means disabling , the inc variable
* gets inversed .
*/
if ( filter_hash ) {
2014-08-15 17:23:02 -04:00
hash = ops - > func_hash - > filter_hash ;
other_hash = ops - > func_hash - > notrace_hash ;
2011-12-19 19:07:36 -05:00
if ( ftrace_hash_empty ( hash ) )
2017-06-26 11:47:31 -04:00
all = true ;
2011-05-03 13:25:24 -04:00
} else {
inc = ! inc ;
2014-08-15 17:23:02 -04:00
hash = ops - > func_hash - > notrace_hash ;
other_hash = ops - > func_hash - > filter_hash ;
2011-05-03 13:25:24 -04:00
/*
* If the notrace hash has no items ,
* then there ' s nothing to do .
*/
2011-12-19 19:07:36 -05:00
if ( ftrace_hash_empty ( hash ) )
2016-03-16 15:34:32 +01:00
return false ;
2011-05-03 13:25:24 -04:00
}
do_for_each_ftrace_rec ( pg , rec ) {
int in_other_hash = 0 ;
int in_hash = 0 ;
int match = 0 ;
ftrace: Still disable enabled records marked as disabled
Weak functions started causing havoc as they showed up in the
"available_filter_functions" and this confused people as to why some
functions marked as "notrace" were listed, but when enabled they did
nothing. This was because weak functions can still have fentry calls, and
these addresses get added to the "available_filter_functions" file.
kallsyms is what converts those addresses to names, and since the weak
functions are not listed in kallsyms, it would just pick the function
before that.
To solve this, there was a trick to detect weak functions listed, and
these records would be marked as DISABLED so that they do not get enabled
and are mostly ignored. As the processing of the list of all functions to
figure out what is weak or not can take a long time, this process is put
off into a kernel thread and run in parallel with the rest of start up.
Now the issue happens whet function tracing is enabled via the kernel
command line. As it starts very early in boot up, it can be enabled before
the records that are weak are marked to be disabled. This causes an issue
in the accounting, as the weak records are enabled by the command line
function tracing, but after boot up, they are not disabled.
The ftrace records have several accounting flags and a ref count. The
DISABLED flag is just one. If the record is enabled before it is marked
DISABLED it will get an ENABLED flag and also have its ref counter
incremented. After it is marked for DISABLED, neither the ENABLED flag nor
the ref counter is cleared. There's sanity checks on the records that are
performed after an ftrace function is registered or unregistered, and this
detected that there were records marked as ENABLED with ref counter that
should not have been.
Note, the module loading code uses the DISABLED flag as well to keep its
functions from being modified while its being loaded and some of these
flags may get set in this process. So changing the verification code to
ignore DISABLED records is a no go, as it still needs to verify that the
module records are working too.
Also, the weak functions still are calling a trampoline. Even though they
should never be called, it is dangerous to leave these weak functions
calling a trampoline that is freed, so they should still be set back to
nops.
There's two places that need to not skip records that have the ENABLED
and the DISABLED flags set. That is where the ftrace_ops is processed and
sets the records ref counts, and then later when the function itself is to
be updated, and the ENABLED flag gets removed. Add a helper function
"skip_record()" that returns true if the record has the DISABLED flag set
but not the ENABLED flag.
Link: https://lkml.kernel.org/r/20221005003809.27d2b97b@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: b39181f7c6907 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-10-05 00:38:09 -04:00
if ( skip_record ( rec ) )
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
continue ;
2011-05-03 13:25:24 -04:00
if ( all ) {
/*
* Only the filter_hash affects all records .
* Update if the record is not in the notrace hash .
*/
2011-05-04 09:27:52 -04:00
if ( ! other_hash | | ! ftrace_lookup_ip ( other_hash , rec - > ip ) )
2011-05-03 13:25:24 -04:00
match = 1 ;
} else {
2011-12-19 19:07:36 -05:00
in_hash = ! ! ftrace_lookup_ip ( hash , rec - > ip ) ;
in_other_hash = ! ! ftrace_lookup_ip ( other_hash , rec - > ip ) ;
2011-05-03 13:25:24 -04:00
/*
2014-05-07 15:06:14 -04:00
* If filter_hash is set , we want to match all functions
* that are in the hash but not in the other hash .
2011-05-03 13:25:24 -04:00
*
2014-05-07 15:06:14 -04:00
* If filter_hash is not set , then we are decrementing .
* That means we match anything that is in the hash
* and also in the other_hash . That is , we need to turn
* off functions in the other hash because they are disabled
* by this hash .
2011-05-03 13:25:24 -04:00
*/
if ( filter_hash & & in_hash & & ! in_other_hash )
match = 1 ;
else if ( ! filter_hash & & in_hash & &
2011-12-19 19:07:36 -05:00
( in_other_hash | | ftrace_hash_empty ( other_hash ) ) )
2011-05-03 13:25:24 -04:00
match = 1 ;
}
if ( ! match )
continue ;
if ( inc ) {
rec - > flags + + ;
2014-05-07 13:46:45 -04:00
if ( FTRACE_WARN_ON ( ftrace_rec_count ( rec ) = = FTRACE_REF_MAX ) )
2016-03-16 15:34:32 +01:00
return false ;
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
2019-11-08 13:07:06 -05:00
if ( ops - > flags & FTRACE_OPS_FL_DIRECT )
rec - > flags | = FTRACE_FL_DIRECT ;
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
/*
* If there ' s only a single callback registered to a
* function , and the ops has a trampoline registered
* for it , then we can call it directly .
*/
2014-07-24 12:25:47 -04:00
if ( ftrace_rec_count ( rec ) = = 1 & & ops - > trampoline )
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
rec - > flags | = FTRACE_FL_TRAMP ;
2014-07-24 12:25:47 -04:00
else
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
/*
* If we are adding another function callback
* to this function , and the previous had a
2014-08-20 23:57:04 -04:00
* custom trampoline in use , then we need to go
* back to the default trampoline .
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
*/
2014-07-24 12:25:47 -04:00
rec - > flags & = ~ FTRACE_FL_TRAMP ;
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
2012-04-30 16:20:23 -04:00
/*
* If any ops wants regs saved for this function
* then all ops will get saved regs .
*/
if ( ops - > flags & FTRACE_OPS_FL_SAVE_REGS )
rec - > flags | = FTRACE_FL_REGS ;
2011-05-03 13:25:24 -04:00
} else {
2014-05-07 13:46:45 -04:00
if ( FTRACE_WARN_ON ( ftrace_rec_count ( rec ) = = 0 ) )
2016-03-16 15:34:32 +01:00
return false ;
2011-05-03 13:25:24 -04:00
rec - > flags - - ;
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
2019-11-08 13:07:06 -05:00
/*
* Only the internal direct_ops should have the
* DIRECT flag set . Thus , if it is removing a
* function , then that function should no longer
* be direct .
*/
if ( ops - > flags & FTRACE_OPS_FL_DIRECT )
rec - > flags & = ~ FTRACE_FL_DIRECT ;
ftrace: Allow no regs if no more callbacks require it
When registering a function callback for the function tracer, the ops
can specify if it wants to save full regs (like an interrupt would)
for each function that it traces, or if it does not care about regs
and just wants to have the fastest return possible.
Once a ops has registered a function, if other ops register that
function they all will receive the regs too. That's because it does
the work once, it does it for everyone.
Now if the ops wanting regs unregisters the function so that there's
only ops left that do not care about regs, those ops will still
continue getting regs and going through the work for it on that
function. This is because the disabling of the rec counter only
sees the ops registered, and does not see the ops that are still
attached, and does not know if the current ops that are still attached
want regs or not. To play it safe, it just keeps regs being processed
until no function is registered anymore.
Instead of doing that, check the ops that are still registered for that
function and if none want regs for it anymore, then disable the
processing of regs.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-30 22:35:48 -04:00
/*
* If the rec had REGS enabled and the ops that is
* being removed had REGS set , then see if there is
* still any ops for this record that wants regs .
* If not , we can stop recording them .
*/
2014-05-07 13:46:45 -04:00
if ( ftrace_rec_count ( rec ) > 0 & &
ftrace: Allow no regs if no more callbacks require it
When registering a function callback for the function tracer, the ops
can specify if it wants to save full regs (like an interrupt would)
for each function that it traces, or if it does not care about regs
and just wants to have the fastest return possible.
Once a ops has registered a function, if other ops register that
function they all will receive the regs too. That's because it does
the work once, it does it for everyone.
Now if the ops wanting regs unregisters the function so that there's
only ops left that do not care about regs, those ops will still
continue getting regs and going through the work for it on that
function. This is because the disabling of the rec counter only
sees the ops registered, and does not see the ops that are still
attached, and does not know if the current ops that are still attached
want regs or not. To play it safe, it just keeps regs being processed
until no function is registered anymore.
Instead of doing that, check the ops that are still registered for that
function and if none want regs for it anymore, then disable the
processing of regs.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-30 22:35:48 -04:00
rec - > flags & FTRACE_FL_REGS & &
ops - > flags & FTRACE_OPS_FL_SAVE_REGS ) {
if ( ! test_rec_ops_needs_regs ( rec ) )
rec - > flags & = ~ FTRACE_FL_REGS ;
}
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
2014-07-24 12:25:47 -04:00
/*
2019-05-04 19:39:39 +08:00
* The TRAMP needs to be set only if rec count
* is decremented to one , and the ops that is
* left has a trampoline . As TRAMP can only be
* enabled if there is only a single ops attached
* to it .
2014-07-24 12:25:47 -04:00
*/
2019-05-04 19:39:39 +08:00
if ( ftrace_rec_count ( rec ) = = 1 & &
2020-11-26 23:38:38 +05:30
ftrace_find_tramp_ops_any_other ( rec , ops ) )
2019-05-04 19:39:39 +08:00
rec - > flags | = FTRACE_FL_TRAMP ;
else
rec - > flags & = ~ FTRACE_FL_TRAMP ;
2014-07-24 12:25:47 -04:00
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
/*
* flags will be cleared in ftrace_check_record ( )
* if rec count is zero .
*/
2011-05-03 13:25:24 -04:00
}
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
/*
* If the rec has a single associated ops , and ops - > func can be
* called directly , allow the call site to call via the ops .
*/
if ( IS_ENABLED ( CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS ) & &
ftrace_rec_count ( rec ) = = 1 & &
ftrace_ops_get_func ( ops ) = = ops - > func )
rec - > flags | = FTRACE_FL_CALL_OPS ;
else
rec - > flags & = ~ FTRACE_FL_CALL_OPS ;
2011-05-03 13:25:24 -04:00
count + + ;
2016-03-16 15:34:32 +01:00
/* Must match FTRACE_UPDATE_CALLS in ftrace_modify_all_code() */
2019-05-20 09:26:24 -04:00
update | = ftrace_test_record ( rec , true ) ! = FTRACE_UPDATE_IGNORE ;
2016-03-16 15:34:32 +01:00
2011-05-03 13:25:24 -04:00
/* Shortcut, if we handled all records, we are done. */
if ( ! all & & count = = hash - > count )
2016-03-16 15:34:32 +01:00
return update ;
2011-05-03 13:25:24 -04:00
} while_for_each_ftrace_rec ( ) ;
2016-03-16 15:34:32 +01:00
return update ;
2011-05-03 13:25:24 -04:00
}
2016-03-16 15:34:32 +01:00
static bool ftrace_hash_rec_disable ( struct ftrace_ops * ops ,
2011-05-03 13:25:24 -04:00
int filter_hash )
{
2016-03-16 15:34:32 +01:00
return __ftrace_hash_rec_update ( ops , filter_hash , 0 ) ;
2011-05-03 13:25:24 -04:00
}
2016-03-16 15:34:32 +01:00
static bool ftrace_hash_rec_enable ( struct ftrace_ops * ops ,
2011-05-03 13:25:24 -04:00
int filter_hash )
{
2016-03-16 15:34:32 +01:00
return __ftrace_hash_rec_update ( ops , filter_hash , 1 ) ;
2011-05-03 13:25:24 -04:00
}
2014-08-18 13:21:08 -04:00
static void ftrace_hash_rec_update_modify ( struct ftrace_ops * ops ,
int filter_hash , int inc )
{
struct ftrace_ops * op ;
__ftrace_hash_rec_update ( ops , filter_hash , inc ) ;
if ( ops - > func_hash ! = & global_ops . local_hash )
return ;
/*
* If the ops shares the global_ops hash , then we need to update
* all ops that are enabled and use this hash .
*/
do_for_each_ftrace_op ( op , ftrace_ops_list ) {
/* Already done */
if ( op = = ops )
continue ;
if ( op - > func_hash = = & global_ops . local_hash )
__ftrace_hash_rec_update ( op , filter_hash , inc ) ;
} while_for_each_ftrace_op ( op ) ;
}
static void ftrace_hash_rec_disable_modify ( struct ftrace_ops * ops ,
int filter_hash )
{
ftrace_hash_rec_update_modify ( ops , filter_hash , 0 ) ;
}
static void ftrace_hash_rec_enable_modify ( struct ftrace_ops * ops ,
int filter_hash )
{
ftrace_hash_rec_update_modify ( ops , filter_hash , 1 ) ;
}
2014-11-21 05:25:16 -05:00
/*
* Try to update IPMODIFY flag on each ftrace_rec . Return 0 if it is OK
* or no - needed to update , - EBUSY if it detects a conflict of the flag
* on a ftrace_rec , and - EINVAL if the new_hash tries to trace all recs .
* Note that old_hash and new_hash has below meanings
* - If the hash is NULL , it hits all recs ( if IPMODIFY is set , this is rejected )
* - If the hash is EMPTY_HASH , it hits nothing
* - Anything else hits the recs which match the hash entries .
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
*
* DIRECT ops does not have IPMODIFY flag , but we still need to check it
* against functions with FTRACE_FL_IPMODIFY . If there is any overlap , call
* ops_func ( SHARE_IPMODIFY_SELF ) to make sure current ops can share with
* IPMODIFY . If ops_func ( SHARE_IPMODIFY_SELF ) returns non - zero , propagate
* the return value to the caller and eventually to the owner of the DIRECT
* ops .
2014-11-21 05:25:16 -05:00
*/
static int __ftrace_hash_update_ipmodify ( struct ftrace_ops * ops ,
struct ftrace_hash * old_hash ,
struct ftrace_hash * new_hash )
{
struct ftrace_page * pg ;
struct dyn_ftrace * rec , * end = NULL ;
int in_old , in_new ;
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
bool is_ipmodify , is_direct ;
2014-11-21 05:25:16 -05:00
/* Only update if the ops has been registered */
if ( ! ( ops - > flags & FTRACE_OPS_FL_ENABLED ) )
return 0 ;
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
is_ipmodify = ops - > flags & FTRACE_OPS_FL_IPMODIFY ;
is_direct = ops - > flags & FTRACE_OPS_FL_DIRECT ;
/* neither IPMODIFY nor DIRECT, skip */
if ( ! is_ipmodify & & ! is_direct )
return 0 ;
if ( WARN_ON_ONCE ( is_ipmodify & & is_direct ) )
2014-11-21 05:25:16 -05:00
return 0 ;
/*
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
* Since the IPMODIFY and DIRECT are very address sensitive
* actions , we do not allow ftrace_ops to set all functions to new
* hash .
2014-11-21 05:25:16 -05:00
*/
if ( ! new_hash | | ! old_hash )
return - EINVAL ;
/* Update rec->flags */
do_for_each_ftrace_rec ( pg , rec ) {
2016-11-14 16:31:49 -05:00
if ( rec - > flags & FTRACE_FL_DISABLED )
continue ;
2014-11-21 05:25:16 -05:00
/* We need to update only differences of filter_hash */
in_old = ! ! ftrace_lookup_ip ( old_hash , rec - > ip ) ;
in_new = ! ! ftrace_lookup_ip ( new_hash , rec - > ip ) ;
if ( in_old = = in_new )
continue ;
if ( in_new ) {
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
if ( rec - > flags & FTRACE_FL_IPMODIFY ) {
int ret ;
/* Cannot have two ipmodify on same rec */
if ( is_ipmodify )
goto rollback ;
FTRACE_WARN_ON ( rec - > flags & FTRACE_FL_DIRECT ) ;
/*
* Another ops with IPMODIFY is already
* attached . We are now attaching a direct
* ops . Run SHARE_IPMODIFY_SELF , to check
* whether sharing is supported .
*/
if ( ! ops - > ops_func )
return - EBUSY ;
ret = ops - > ops_func ( ops , FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF ) ;
if ( ret )
return ret ;
} else if ( is_ipmodify ) {
rec - > flags | = FTRACE_FL_IPMODIFY ;
}
} else if ( is_ipmodify ) {
2014-11-21 05:25:16 -05:00
rec - > flags & = ~ FTRACE_FL_IPMODIFY ;
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
}
2014-11-21 05:25:16 -05:00
} while_for_each_ftrace_rec ( ) ;
return 0 ;
rollback :
end = rec ;
/* Roll back what we did above */
do_for_each_ftrace_rec ( pg , rec ) {
2016-11-14 16:31:49 -05:00
if ( rec - > flags & FTRACE_FL_DISABLED )
continue ;
2014-11-21 05:25:16 -05:00
if ( rec = = end )
goto err_out ;
in_old = ! ! ftrace_lookup_ip ( old_hash , rec - > ip ) ;
in_new = ! ! ftrace_lookup_ip ( new_hash , rec - > ip ) ;
if ( in_old = = in_new )
continue ;
if ( in_new )
rec - > flags & = ~ FTRACE_FL_IPMODIFY ;
else
rec - > flags | = FTRACE_FL_IPMODIFY ;
} while_for_each_ftrace_rec ( ) ;
err_out :
return - EBUSY ;
}
static int ftrace_hash_ipmodify_enable ( struct ftrace_ops * ops )
{
struct ftrace_hash * hash = ops - > func_hash - > filter_hash ;
if ( ftrace_hash_empty ( hash ) )
hash = NULL ;
return __ftrace_hash_update_ipmodify ( ops , EMPTY_HASH , hash ) ;
}
/* Disabling always succeeds */
static void ftrace_hash_ipmodify_disable ( struct ftrace_ops * ops )
{
struct ftrace_hash * hash = ops - > func_hash - > filter_hash ;
if ( ftrace_hash_empty ( hash ) )
hash = NULL ;
__ftrace_hash_update_ipmodify ( ops , hash , EMPTY_HASH ) ;
}
static int ftrace_hash_ipmodify_update ( struct ftrace_ops * ops ,
struct ftrace_hash * new_hash )
{
struct ftrace_hash * old_hash = ops - > func_hash - > filter_hash ;
if ( ftrace_hash_empty ( old_hash ) )
old_hash = NULL ;
if ( ftrace_hash_empty ( new_hash ) )
new_hash = NULL ;
return __ftrace_hash_update_ipmodify ( ops , old_hash , new_hash ) ;
}
2015-11-25 14:13:11 -05:00
static void print_ip_ins ( const char * fmt , const unsigned char * p )
2008-11-14 16:21:19 -08:00
{
2021-06-07 21:39:08 -04:00
char ins [ MCOUNT_INSN_SIZE ] ;
2008-11-14 16:21:19 -08:00
2021-06-07 21:39:08 -04:00
if ( copy_from_kernel_nofault ( ins , p , MCOUNT_INSN_SIZE ) ) {
printk ( KERN_CONT " %s[FAULT] %px \n " , fmt , p ) ;
return ;
}
2008-11-14 16:21:19 -08:00
printk ( KERN_CONT " %s " , fmt ) ;
2022-10-11 12:03:52 +00:00
pr_cont ( " %*phC " , MCOUNT_INSN_SIZE , ins ) ;
2008-11-14 16:21:19 -08:00
}
2015-11-25 12:50:47 -05:00
enum ftrace_bug_type ftrace_bug_type ;
2015-11-25 14:13:11 -05:00
const void * ftrace_expected ;
2015-11-25 12:50:47 -05:00
static void print_bug_type ( void )
{
switch ( ftrace_bug_type ) {
case FTRACE_BUG_UNKNOWN :
break ;
case FTRACE_BUG_INIT :
pr_info ( " Initializing ftrace call sites \n " ) ;
break ;
case FTRACE_BUG_NOP :
pr_info ( " Setting ftrace call site to NOP \n " ) ;
break ;
case FTRACE_BUG_CALL :
pr_info ( " Setting ftrace call site to call ftrace function \n " ) ;
break ;
case FTRACE_BUG_UPDATE :
pr_info ( " Updating ftrace call site to call a different ftrace function \n " ) ;
break ;
}
}
2011-08-16 09:53:39 -04:00
/**
* ftrace_bug - report and shutdown function tracer
* @ failed : The failed type ( EFAULT , EINVAL , EPERM )
2014-10-24 17:56:04 -04:00
* @ rec : The record that failed
2011-08-16 09:53:39 -04:00
*
* The arch code that enables or disables the function tracing
* can call ftrace_bug ( ) when it has detected a problem in
* modifying the code . @ failed should be one of either :
* EFAULT - if the problem happens on reading the @ ip address
* EINVAL - if what is read at @ ip is not what was expected
2019-03-24 00:05:23 +05:30
* EPERM - if the problem happens on writing to the @ ip address
2011-08-16 09:53:39 -04:00
*/
2014-10-24 17:56:04 -04:00
void ftrace_bug ( int failed , struct dyn_ftrace * rec )
2008-11-14 16:21:19 -08:00
{
2014-10-24 17:56:04 -04:00
unsigned long ip = rec ? rec - > ip : 0 ;
2020-05-15 10:08:28 +00:00
pr_info ( " ------------[ ftrace bug ]------------ \n " ) ;
2008-11-14 16:21:19 -08:00
switch ( failed ) {
case - EFAULT :
pr_info ( " ftrace faulted on modifying " ) ;
2020-06-08 21:29:56 -07:00
print_ip_sym ( KERN_INFO , ip ) ;
2008-11-14 16:21:19 -08:00
break ;
case - EINVAL :
pr_info ( " ftrace failed to modify " ) ;
2020-06-08 21:29:56 -07:00
print_ip_sym ( KERN_INFO , ip ) ;
2015-11-25 14:13:11 -05:00
print_ip_ins ( " actual: " , ( unsigned char * ) ip ) ;
2014-10-24 17:56:04 -04:00
pr_cont ( " \n " ) ;
2015-11-25 14:13:11 -05:00
if ( ftrace_expected ) {
print_ip_ins ( " expected: " , ftrace_expected ) ;
pr_cont ( " \n " ) ;
}
2008-11-14 16:21:19 -08:00
break ;
case - EPERM :
pr_info ( " ftrace faulted on writing " ) ;
2020-06-08 21:29:56 -07:00
print_ip_sym ( KERN_INFO , ip ) ;
2008-11-14 16:21:19 -08:00
break ;
default :
pr_info ( " ftrace faulted on unknown error " ) ;
2020-06-08 21:29:56 -07:00
print_ip_sym ( KERN_INFO , ip ) ;
2008-11-14 16:21:19 -08:00
}
2015-11-25 12:50:47 -05:00
print_bug_type ( ) ;
2014-10-24 17:56:04 -04:00
if ( rec ) {
struct ftrace_ops * ops = NULL ;
pr_info ( " ftrace record flags: %lx \n " , rec - > flags ) ;
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
pr_cont ( " (%ld)%s%s " , ftrace_rec_count ( rec ) ,
rec - > flags & FTRACE_FL_REGS ? " R " : " " ,
rec - > flags & FTRACE_FL_CALL_OPS ? " O " : " " ) ;
2014-10-24 17:56:04 -04:00
if ( rec - > flags & FTRACE_FL_TRAMP_EN ) {
ops = ftrace_find_tramp_ops_any ( rec ) ;
2015-11-25 15:12:38 -05:00
if ( ops ) {
do {
pr_cont ( " \t tramp: %pS (%pS) " ,
( void * ) ops - > trampoline ,
( void * ) ops - > func ) ;
ops = ftrace_find_tramp_ops_next ( rec , ops ) ;
} while ( ops ) ;
} else
2014-10-24 17:56:04 -04:00
pr_cont ( " \t tramp: ERROR! " ) ;
}
ip = ftrace_get_addr_curr ( rec ) ;
2015-11-25 15:12:38 -05:00
pr_cont ( " \n expected tramp: %lx \n " , ip ) ;
2014-10-24 17:56:04 -04:00
}
2020-05-15 10:08:28 +00:00
FTRACE_WARN_ON_ONCE ( 1 ) ;
2008-11-14 16:21:19 -08:00
}
2019-05-20 09:26:24 -04:00
static int ftrace_check_record ( struct dyn_ftrace * rec , bool enable , bool update )
2008-05-12 21:20:43 +02:00
{
2009-07-15 12:32:15 +08:00
unsigned long flag = 0UL ;
2008-11-16 06:02:06 +01:00
2015-11-25 12:50:47 -05:00
ftrace_bug_type = FTRACE_BUG_UNKNOWN ;
ftrace: Still disable enabled records marked as disabled
Weak functions started causing havoc as they showed up in the
"available_filter_functions" and this confused people as to why some
functions marked as "notrace" were listed, but when enabled they did
nothing. This was because weak functions can still have fentry calls, and
these addresses get added to the "available_filter_functions" file.
kallsyms is what converts those addresses to names, and since the weak
functions are not listed in kallsyms, it would just pick the function
before that.
To solve this, there was a trick to detect weak functions listed, and
these records would be marked as DISABLED so that they do not get enabled
and are mostly ignored. As the processing of the list of all functions to
figure out what is weak or not can take a long time, this process is put
off into a kernel thread and run in parallel with the rest of start up.
Now the issue happens whet function tracing is enabled via the kernel
command line. As it starts very early in boot up, it can be enabled before
the records that are weak are marked to be disabled. This causes an issue
in the accounting, as the weak records are enabled by the command line
function tracing, but after boot up, they are not disabled.
The ftrace records have several accounting flags and a ref count. The
DISABLED flag is just one. If the record is enabled before it is marked
DISABLED it will get an ENABLED flag and also have its ref counter
incremented. After it is marked for DISABLED, neither the ENABLED flag nor
the ref counter is cleared. There's sanity checks on the records that are
performed after an ftrace function is registered or unregistered, and this
detected that there were records marked as ENABLED with ref counter that
should not have been.
Note, the module loading code uses the DISABLED flag as well to keep its
functions from being modified while its being loaded and some of these
flags may get set in this process. So changing the verification code to
ignore DISABLED records is a no go, as it still needs to verify that the
module records are working too.
Also, the weak functions still are calling a trampoline. Even though they
should never be called, it is dangerous to leave these weak functions
calling a trampoline that is freed, so they should still be set back to
nops.
There's two places that need to not skip records that have the ENABLED
and the DISABLED flags set. That is where the ftrace_ops is processed and
sets the records ref counts, and then later when the function itself is to
be updated, and the ENABLED flag gets removed. Add a helper function
"skip_record()" that returns true if the record has the DISABLED flag set
but not the ENABLED flag.
Link: https://lkml.kernel.org/r/20221005003809.27d2b97b@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: b39181f7c6907 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-10-05 00:38:09 -04:00
if ( skip_record ( rec ) )
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
return FTRACE_UPDATE_IGNORE ;
2008-11-15 16:31:41 -05:00
/*
2011-12-05 18:22:48 +01:00
* If we are updating calls :
2008-11-15 16:31:41 -05:00
*
2011-05-03 13:25:24 -04:00
* If the record has a ref count , then we need to enable it
* because someone is using it .
2008-11-15 16:31:41 -05:00
*
2011-05-03 13:25:24 -04:00
* Otherwise we make sure its disabled .
*
2011-12-05 18:22:48 +01:00
* If we are disabling calls , then disable all records that
2011-05-03 13:25:24 -04:00
* are enabled .
2008-11-15 16:31:41 -05:00
*/
2014-05-07 13:46:45 -04:00
if ( enable & & ftrace_rec_count ( rec ) )
2011-05-03 13:25:24 -04:00
flag = FTRACE_FL_ENABLED ;
2008-11-15 16:31:41 -05:00
2012-04-30 16:20:23 -04:00
/*
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
* If enabling and the REGS flag does not match the REGS_EN , or
* the TRAMP flag doesn ' t match the TRAMP_EN , then do not ignore
* this record . Set flags to fail the compare against ENABLED .
2019-11-08 13:07:06 -05:00
* Same for direct calls .
2012-04-30 16:20:23 -04:00
*/
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
if ( flag ) {
2019-11-08 13:07:06 -05:00
if ( ! ( rec - > flags & FTRACE_FL_REGS ) ! =
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
! ( rec - > flags & FTRACE_FL_REGS_EN ) )
flag | = FTRACE_FL_REGS ;
2019-11-08 13:07:06 -05:00
if ( ! ( rec - > flags & FTRACE_FL_TRAMP ) ! =
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
! ( rec - > flags & FTRACE_FL_TRAMP_EN ) )
flag | = FTRACE_FL_TRAMP ;
2019-11-08 13:07:06 -05:00
/*
* Direct calls are special , as count matters .
* We must test the record for direct , if the
* DIRECT and DIRECT_EN do not match , but only
* if the count is 1. That ' s because , if the
* count is something other than one , we do not
* want the direct enabled ( it will be done via the
* direct helper ) . But if DIRECT_EN is set , and
* the count is not one , we need to clear it .
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
*
2019-11-08 13:07:06 -05:00
*/
if ( ftrace_rec_count ( rec ) = = 1 ) {
if ( ! ( rec - > flags & FTRACE_FL_DIRECT ) ! =
! ( rec - > flags & FTRACE_FL_DIRECT_EN ) )
flag | = FTRACE_FL_DIRECT ;
} else if ( rec - > flags & FTRACE_FL_DIRECT_EN ) {
flag | = FTRACE_FL_DIRECT ;
}
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
/*
* Ops calls are special , as count matters .
* As with direct calls , they must only be enabled when count
* is one , otherwise they ' ll be handled via the list ops .
*/
if ( ftrace_rec_count ( rec ) = = 1 ) {
if ( ! ( rec - > flags & FTRACE_FL_CALL_OPS ) ! =
! ( rec - > flags & FTRACE_FL_CALL_OPS_EN ) )
flag | = FTRACE_FL_CALL_OPS ;
} else if ( rec - > flags & FTRACE_FL_CALL_OPS_EN ) {
flag | = FTRACE_FL_CALL_OPS ;
}
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
}
2012-04-30 16:20:23 -04:00
2009-07-15 12:32:15 +08:00
/* If the state of this record hasn't changed, then do nothing */
if ( ( rec - > flags & FTRACE_FL_ENABLED ) = = flag )
2011-08-16 09:53:39 -04:00
return FTRACE_UPDATE_IGNORE ;
2008-11-15 16:31:41 -05:00
2009-07-15 12:32:15 +08:00
if ( flag ) {
2012-04-30 16:20:23 -04:00
/* Save off if rec is being enabled (for return value) */
flag ^ = rec - > flags & FTRACE_FL_ENABLED ;
if ( update ) {
2023-01-24 09:56:53 -05:00
rec - > flags | = FTRACE_FL_ENABLED | FTRACE_FL_TOUCHED ;
2012-04-30 16:20:23 -04:00
if ( flag & FTRACE_FL_REGS ) {
if ( rec - > flags & FTRACE_FL_REGS )
rec - > flags | = FTRACE_FL_REGS_EN ;
else
rec - > flags & = ~ FTRACE_FL_REGS_EN ;
}
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
if ( flag & FTRACE_FL_TRAMP ) {
if ( rec - > flags & FTRACE_FL_TRAMP )
rec - > flags | = FTRACE_FL_TRAMP_EN ;
else
rec - > flags & = ~ FTRACE_FL_TRAMP_EN ;
}
2020-10-28 17:42:17 -04:00
2023-05-02 21:32:33 -04:00
/* Keep track of anything that modifies the function */
if ( rec - > flags & ( FTRACE_FL_DIRECT | FTRACE_FL_IPMODIFY ) )
rec - > flags | = FTRACE_FL_MODIFIED ;
2019-11-08 13:07:06 -05:00
if ( flag & FTRACE_FL_DIRECT ) {
/*
* If there ' s only one user ( direct_ops helper )
* then we can call the direct function
* directly ( no ftrace trampoline ) .
*/
if ( ftrace_rec_count ( rec ) = = 1 ) {
if ( rec - > flags & FTRACE_FL_DIRECT )
rec - > flags | = FTRACE_FL_DIRECT_EN ;
else
rec - > flags & = ~ FTRACE_FL_DIRECT_EN ;
} else {
/*
* Can only call directly if there ' s
* only one callback to the function .
*/
rec - > flags & = ~ FTRACE_FL_DIRECT_EN ;
}
}
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
if ( flag & FTRACE_FL_CALL_OPS ) {
if ( ftrace_rec_count ( rec ) = = 1 ) {
if ( rec - > flags & FTRACE_FL_CALL_OPS )
rec - > flags | = FTRACE_FL_CALL_OPS_EN ;
else
rec - > flags & = ~ FTRACE_FL_CALL_OPS_EN ;
} else {
/*
* Can only call directly if there ' s
* only one set of associated ops .
*/
rec - > flags & = ~ FTRACE_FL_CALL_OPS_EN ;
}
}
2012-04-30 16:20:23 -04:00
}
/*
* If this record is being updated from a nop , then
* return UPDATE_MAKE_CALL .
* Otherwise ,
* return UPDATE_MODIFY_CALL to tell the caller to convert
2014-05-07 16:09:49 -04:00
* from the save regs , to a non - save regs function or
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
* vice versa , or from a trampoline call .
2012-04-30 16:20:23 -04:00
*/
2015-11-25 12:50:47 -05:00
if ( flag & FTRACE_FL_ENABLED ) {
ftrace_bug_type = FTRACE_BUG_CALL ;
2012-04-30 16:20:23 -04:00
return FTRACE_UPDATE_MAKE_CALL ;
2015-11-25 12:50:47 -05:00
}
2014-05-07 16:09:49 -04:00
2015-11-25 12:50:47 -05:00
ftrace_bug_type = FTRACE_BUG_UPDATE ;
2014-05-07 16:09:49 -04:00
return FTRACE_UPDATE_MODIFY_CALL ;
2011-08-16 09:53:39 -04:00
}
2012-04-30 16:20:23 -04:00
if ( update ) {
/* If there's no more users, clear all flags */
2014-05-07 13:46:45 -04:00
if ( ! ftrace_rec_count ( rec ) )
2023-01-24 09:56:53 -05:00
rec - > flags & = FTRACE_NOCLEAR_FLAGS ;
2012-04-30 16:20:23 -04:00
else
ftrace: Clear REGS_EN and TRAMP_EN flags on disabling record via sysctl
When /proc/sys/kernel/ftrace_enabled is set to zero, all function
tracing is disabled. But the records that represent the functions
still hold information about the ftrace_ops that are hooked to them.
ftrace_ops may request "REGS" (have a full set of pt_regs passed to
the callback), or "TRAMP" (the ops has its own trampoline to use).
When the record is updated to represent the state of the ops hooked
to it, it sets "REGS_EN" and/or "TRAMP_EN" to state that the callback
points to the correct trampoline (REGS has its own trampoline).
When ftrace_enabled is set to zero, all ftrace locations are a nop,
so they do not point to any trampoline. But the _EN flags are still
set. This can cause the accounting to go wrong when ftrace_enabled
is cleared and an ops that has a trampoline is registered or unregistered.
For example, the following will cause ftrace to crash:
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo nop > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /proc/sys/kernel/ftrace_enabled
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
As function_graph uses a trampoline, when ftrace_enabled is set to zero
the updates to the record are not done. When enabling function_graph
again, the record will still have the TRAMP_EN flag set, and it will
look for an op that has a trampoline other than the function_graph
ops, and fail to find one.
Cc: stable@vger.kernel.org # 3.17+
Reported-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-03-04 23:10:28 -05:00
/*
* Just disable the record , but keep the ops TRAMP
* and REGS states . The _EN flags must be disabled though .
*/
rec - > flags & = ~ ( FTRACE_FL_ENABLED | FTRACE_FL_TRAMP_EN |
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
FTRACE_FL_REGS_EN | FTRACE_FL_DIRECT_EN |
FTRACE_FL_CALL_OPS_EN ) ;
2012-04-30 16:20:23 -04:00
}
2011-08-16 09:53:39 -04:00
2015-11-25 12:50:47 -05:00
ftrace_bug_type = FTRACE_BUG_NOP ;
2011-08-16 09:53:39 -04:00
return FTRACE_UPDATE_MAKE_NOP ;
}
/**
2021-10-29 09:52:23 -04:00
* ftrace_update_record - set a record that now is tracing or not
2011-08-16 09:53:39 -04:00
* @ rec : the record to update
2019-05-20 09:26:24 -04:00
* @ enable : set to true if the record is tracing , false to force disable
2011-08-16 09:53:39 -04:00
*
* The records that represent all functions that can be traced need
* to be updated when tracing has been enabled .
*/
2019-05-20 09:26:24 -04:00
int ftrace_update_record ( struct dyn_ftrace * rec , bool enable )
2011-08-16 09:53:39 -04:00
{
2019-05-20 09:26:24 -04:00
return ftrace_check_record ( rec , enable , true ) ;
2011-08-16 09:53:39 -04:00
}
/**
2021-10-29 09:52:23 -04:00
* ftrace_test_record - check if the record has been enabled or not
2011-08-16 09:53:39 -04:00
* @ rec : the record to test
2019-05-20 09:26:24 -04:00
* @ enable : set to true to check if enabled , false if it is disabled
2011-08-16 09:53:39 -04:00
*
* The arch code may need to test if a record is already set to
* tracing to determine how to modify the function code that it
* represents .
*/
2019-05-20 09:26:24 -04:00
int ftrace_test_record ( struct dyn_ftrace * rec , bool enable )
2011-08-16 09:53:39 -04:00
{
2019-05-20 09:26:24 -04:00
return ftrace_check_record ( rec , enable , false ) ;
2011-08-16 09:53:39 -04:00
}
2014-07-24 16:00:31 -04:00
static struct ftrace_ops *
ftrace_find_tramp_ops_any ( struct dyn_ftrace * rec )
{
struct ftrace_ops * op ;
2014-07-24 12:25:47 -04:00
unsigned long ip = rec - > ip ;
2014-07-24 16:00:31 -04:00
do_for_each_ftrace_op ( op , ftrace_ops_list ) {
if ( ! op - > trampoline )
continue ;
2014-07-24 12:25:47 -04:00
if ( hash_contains_ip ( ip , op - > func_hash ) )
2014-07-24 16:00:31 -04:00
return op ;
} while_for_each_ftrace_op ( op ) ;
return NULL ;
}
2020-11-26 23:38:38 +05:30
static struct ftrace_ops *
ftrace_find_tramp_ops_any_other ( struct dyn_ftrace * rec , struct ftrace_ops * op_exclude )
{
struct ftrace_ops * op ;
unsigned long ip = rec - > ip ;
do_for_each_ftrace_op ( op , ftrace_ops_list ) {
if ( op = = op_exclude | | ! op - > trampoline )
continue ;
if ( hash_contains_ip ( ip , op - > func_hash ) )
return op ;
} while_for_each_ftrace_op ( op ) ;
return NULL ;
}
2015-11-25 15:12:38 -05:00
static struct ftrace_ops *
ftrace_find_tramp_ops_next ( struct dyn_ftrace * rec ,
struct ftrace_ops * op )
{
unsigned long ip = rec - > ip ;
while_for_each_ftrace_op ( op ) {
if ( ! op - > trampoline )
continue ;
if ( hash_contains_ip ( ip , op - > func_hash ) )
return op ;
2020-05-29 22:12:14 +08:00
}
2015-11-25 15:12:38 -05:00
return NULL ;
}
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
static struct ftrace_ops *
ftrace_find_tramp_ops_curr ( struct dyn_ftrace * rec )
{
struct ftrace_ops * op ;
2014-07-24 12:25:47 -04:00
unsigned long ip = rec - > ip ;
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
2014-07-24 12:25:47 -04:00
/*
* Need to check removed ops first .
* If they are being removed , and this rec has a tramp ,
* and this rec is in the ops list , then it would be the
* one with the tramp .
*/
if ( removed_ops ) {
if ( hash_contains_ip ( ip , & removed_ops - > old_hash ) )
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
return removed_ops ;
}
2014-07-24 12:25:47 -04:00
/*
* Need to find the current trampoline for a rec .
* Now , a trampoline is only attached to a rec if there
* was a single ' ops ' attached to it . But this can be called
* when we are adding another op to the rec or removing the
* current one . Thus , if the op is being added , we can
* ignore it because it hasn ' t attached itself to the rec
2014-10-24 14:48:35 -04:00
* yet .
*
* If an ops is being modified ( hooking to different functions )
* then we don ' t care about the new functions that are being
* added , just the old ones ( that are probably being removed ) .
*
* If we are adding an ops to a function that already is using
* a trampoline , it needs to be removed ( trampolines are only
* for single ops connected ) , then an ops that is not being
* modified also needs to be checked .
2014-07-24 12:25:47 -04:00
*/
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
do_for_each_ftrace_op ( op , ftrace_ops_list ) {
2014-07-24 12:25:47 -04:00
if ( ! op - > trampoline )
continue ;
/*
* If the ops is being added , it hasn ' t gotten to
* the point to be removed from this tree yet .
*/
if ( op - > flags & FTRACE_OPS_FL_ADDING )
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
continue ;
2014-10-24 14:48:35 -04:00
2014-07-24 12:25:47 -04:00
/*
2014-10-24 14:48:35 -04:00
* If the ops is being modified and is in the old
* hash , then it is probably being removed from this
* function .
2014-07-24 12:25:47 -04:00
*/
if ( ( op - > flags & FTRACE_OPS_FL_MODIFYING ) & &
hash_contains_ip ( ip , & op - > old_hash ) )
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
return op ;
2014-10-24 14:48:35 -04:00
/*
* If the ops is not being added or modified , and it ' s
* in its normal filter hash , then this must be the one
* we want !
*/
if ( ! ( op - > flags & FTRACE_OPS_FL_MODIFYING ) & &
hash_contains_ip ( ip , op - > func_hash ) )
return op ;
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
} while_for_each_ftrace_op ( op ) ;
return NULL ;
}
static struct ftrace_ops *
ftrace_find_tramp_ops_new ( struct dyn_ftrace * rec )
{
struct ftrace_ops * op ;
2014-07-24 12:25:47 -04:00
unsigned long ip = rec - > ip ;
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
do_for_each_ftrace_op ( op , ftrace_ops_list ) {
/* pass rec in as regs to have non-NULL val */
2014-07-24 12:25:47 -04:00
if ( hash_contains_ip ( ip , op - > func_hash ) )
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
return op ;
} while_for_each_ftrace_op ( op ) ;
return NULL ;
}
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
struct ftrace_ops *
ftrace_find_unique_ops ( struct dyn_ftrace * rec )
{
struct ftrace_ops * op , * found = NULL ;
unsigned long ip = rec - > ip ;
do_for_each_ftrace_op ( op , ftrace_ops_list ) {
if ( hash_contains_ip ( ip , op - > func_hash ) ) {
if ( found )
return NULL ;
found = op ;
}
} while_for_each_ftrace_op ( op ) ;
return found ;
}
2019-11-08 13:07:06 -05:00
# ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
/* Protected by rcu_tasks for reading, and direct_mutex for writing */
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
static struct ftrace_hash __rcu * direct_functions = EMPTY_HASH ;
2019-11-08 13:07:06 -05:00
static DEFINE_MUTEX ( direct_mutex ) ;
2019-11-08 13:12:57 -05:00
int ftrace_direct_func_count ;
2019-11-08 13:07:06 -05:00
/*
* Search the direct_functions hash to see if the given instruction pointer
* has a direct caller attached to it .
*/
2019-12-08 16:01:12 -08:00
unsigned long ftrace_find_rec_direct ( unsigned long ip )
2019-11-08 13:07:06 -05:00
{
struct ftrace_func_entry * entry ;
entry = __ftrace_lookup_ip ( direct_functions , ip ) ;
if ( ! entry )
return 0 ;
return entry - > direct ;
}
static void call_direct_funcs ( unsigned long ip , unsigned long pip ,
2020-10-28 17:42:17 -04:00
struct ftrace_ops * ops , struct ftrace_regs * fregs )
2019-11-08 13:07:06 -05:00
{
2023-03-21 15:04:22 +01:00
unsigned long addr = READ_ONCE ( ops - > direct_call ) ;
2019-11-08 13:07:06 -05:00
if ( ! addr )
return ;
2022-11-03 17:05:17 +00:00
arch_ftrace_set_direct_caller ( fregs , addr ) ;
2019-11-08 13:07:06 -05:00
}
# endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
2014-05-06 21:34:14 -04:00
/**
* ftrace_get_addr_new - Get the call address to set to
* @ rec : The ftrace record descriptor
*
* If the record has the FTRACE_FL_REGS set , that means that it
* wants to convert to a callback that saves all regs . If FTRACE_FL_REGS
2020-08-06 20:32:59 -07:00
* is not set , then it wants to convert to the normal callback .
2014-05-06 21:34:14 -04:00
*
2024-02-22 21:48:33 -08:00
* Returns : the address of the trampoline to set to
2014-05-06 21:34:14 -04:00
*/
unsigned long ftrace_get_addr_new ( struct dyn_ftrace * rec )
{
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
struct ftrace_ops * ops ;
2019-11-08 13:07:06 -05:00
unsigned long addr ;
if ( ( rec - > flags & FTRACE_FL_DIRECT ) & &
( ftrace_rec_count ( rec ) = = 1 ) ) {
2019-12-08 16:01:12 -08:00
addr = ftrace_find_rec_direct ( rec - > ip ) ;
2019-11-08 13:07:06 -05:00
if ( addr )
return addr ;
WARN_ON_ONCE ( 1 ) ;
}
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
/* Trampolines take precedence over regs */
if ( rec - > flags & FTRACE_FL_TRAMP ) {
ops = ftrace_find_tramp_ops_new ( rec ) ;
if ( FTRACE_WARN_ON ( ! ops | | ! ops - > trampoline ) ) {
2014-08-20 23:57:04 -04:00
pr_warn ( " Bad trampoline accounting at: %p (%pS) (%lx) \n " ,
( void * ) rec - > ip , ( void * ) rec - > ip , rec - > flags ) ;
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
/* Ftrace is shutting down, return anything */
return ( unsigned long ) FTRACE_ADDR ;
}
return ops - > trampoline ;
}
2014-05-06 21:34:14 -04:00
if ( rec - > flags & FTRACE_FL_REGS )
return ( unsigned long ) FTRACE_REGS_ADDR ;
else
return ( unsigned long ) FTRACE_ADDR ;
}
/**
* ftrace_get_addr_curr - Get the call address that is already there
* @ rec : The ftrace record descriptor
*
* The FTRACE_FL_REGS_EN is set when the record already points to
* a function that saves all the regs . Basically the ' _EN ' version
* represents the current state of the function .
*
2024-02-22 21:48:33 -08:00
* Returns : the address of the trampoline that is currently being called
2014-05-06 21:34:14 -04:00
*/
unsigned long ftrace_get_addr_curr ( struct dyn_ftrace * rec )
{
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
struct ftrace_ops * ops ;
2019-11-08 13:07:06 -05:00
unsigned long addr ;
/* Direct calls take precedence over trampolines */
if ( rec - > flags & FTRACE_FL_DIRECT_EN ) {
2019-12-08 16:01:12 -08:00
addr = ftrace_find_rec_direct ( rec - > ip ) ;
2019-11-08 13:07:06 -05:00
if ( addr )
return addr ;
WARN_ON_ONCE ( 1 ) ;
}
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
/* Trampolines take precedence over regs */
if ( rec - > flags & FTRACE_FL_TRAMP_EN ) {
ops = ftrace_find_tramp_ops_curr ( rec ) ;
if ( FTRACE_WARN_ON ( ! ops ) ) {
2016-03-22 14:28:09 -07:00
pr_warn ( " Bad trampoline accounting at: %p (%pS) \n " ,
( void * ) rec - > ip , ( void * ) rec - > ip ) ;
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
/* Ftrace is shutting down, return anything */
return ( unsigned long ) FTRACE_ADDR ;
}
return ops - > trampoline ;
}
2014-05-06 21:34:14 -04:00
if ( rec - > flags & FTRACE_FL_REGS_EN )
return ( unsigned long ) FTRACE_REGS_ADDR ;
else
return ( unsigned long ) FTRACE_ADDR ;
}
2011-08-16 09:53:39 -04:00
static int
2019-05-20 09:26:24 -04:00
__ftrace_replace_code ( struct dyn_ftrace * rec , bool enable )
2011-08-16 09:53:39 -04:00
{
2012-04-30 16:20:23 -04:00
unsigned long ftrace_old_addr ;
2011-08-16 09:53:39 -04:00
unsigned long ftrace_addr ;
int ret ;
2014-05-08 07:01:21 -04:00
ftrace_addr = ftrace_get_addr_new ( rec ) ;
2011-08-16 09:53:39 -04:00
2014-05-08 07:01:21 -04:00
/* This needs to be done before we call ftrace_update_record */
ftrace_old_addr = ftrace_get_addr_curr ( rec ) ;
ret = ftrace_update_record ( rec , enable ) ;
2012-04-30 16:20:23 -04:00
2015-11-25 12:50:47 -05:00
ftrace_bug_type = FTRACE_BUG_UNKNOWN ;
2011-08-16 09:53:39 -04:00
switch ( ret ) {
case FTRACE_UPDATE_IGNORE :
return 0 ;
case FTRACE_UPDATE_MAKE_CALL :
2015-11-25 12:50:47 -05:00
ftrace_bug_type = FTRACE_BUG_CALL ;
2009-07-15 12:32:15 +08:00
return ftrace_make_call ( rec , ftrace_addr ) ;
2011-08-16 09:53:39 -04:00
case FTRACE_UPDATE_MAKE_NOP :
2015-11-25 12:50:47 -05:00
ftrace_bug_type = FTRACE_BUG_NOP ;
2014-08-17 20:59:10 -04:00
return ftrace_make_nop ( NULL , rec , ftrace_old_addr ) ;
2012-04-30 16:20:23 -04:00
case FTRACE_UPDATE_MODIFY_CALL :
2015-11-25 12:50:47 -05:00
ftrace_bug_type = FTRACE_BUG_UPDATE ;
2012-04-30 16:20:23 -04:00
return ftrace_modify_call ( rec , ftrace_old_addr , ftrace_addr ) ;
2008-05-12 21:20:43 +02:00
}
2019-03-24 00:05:23 +05:30
return - 1 ; /* unknown ftrace bug */
2008-05-12 21:20:43 +02:00
}
2018-12-05 12:48:53 -05:00
void __weak ftrace_replace_code ( int mod_flags )
2008-05-12 21:20:43 +02:00
{
struct dyn_ftrace * rec ;
struct ftrace_page * pg ;
2019-05-20 09:26:24 -04:00
bool enable = mod_flags & FTRACE_MODIFY_ENABLE_FL ;
2018-12-05 12:48:53 -05:00
int schedulable = mod_flags & FTRACE_MODIFY_MAY_SLEEP_FL ;
2009-02-17 11:20:26 -05:00
int failed ;
2008-05-12 21:20:43 +02:00
2011-04-21 23:16:46 -04:00
if ( unlikely ( ftrace_disabled ) )
return ;
2009-02-13 12:43:56 -05:00
do_for_each_ftrace_rec ( pg , rec ) {
2016-11-14 16:31:49 -05:00
ftrace: Still disable enabled records marked as disabled
Weak functions started causing havoc as they showed up in the
"available_filter_functions" and this confused people as to why some
functions marked as "notrace" were listed, but when enabled they did
nothing. This was because weak functions can still have fentry calls, and
these addresses get added to the "available_filter_functions" file.
kallsyms is what converts those addresses to names, and since the weak
functions are not listed in kallsyms, it would just pick the function
before that.
To solve this, there was a trick to detect weak functions listed, and
these records would be marked as DISABLED so that they do not get enabled
and are mostly ignored. As the processing of the list of all functions to
figure out what is weak or not can take a long time, this process is put
off into a kernel thread and run in parallel with the rest of start up.
Now the issue happens whet function tracing is enabled via the kernel
command line. As it starts very early in boot up, it can be enabled before
the records that are weak are marked to be disabled. This causes an issue
in the accounting, as the weak records are enabled by the command line
function tracing, but after boot up, they are not disabled.
The ftrace records have several accounting flags and a ref count. The
DISABLED flag is just one. If the record is enabled before it is marked
DISABLED it will get an ENABLED flag and also have its ref counter
incremented. After it is marked for DISABLED, neither the ENABLED flag nor
the ref counter is cleared. There's sanity checks on the records that are
performed after an ftrace function is registered or unregistered, and this
detected that there were records marked as ENABLED with ref counter that
should not have been.
Note, the module loading code uses the DISABLED flag as well to keep its
functions from being modified while its being loaded and some of these
flags may get set in this process. So changing the verification code to
ignore DISABLED records is a no go, as it still needs to verify that the
module records are working too.
Also, the weak functions still are calling a trampoline. Even though they
should never be called, it is dangerous to leave these weak functions
calling a trampoline that is freed, so they should still be set back to
nops.
There's two places that need to not skip records that have the ENABLED
and the DISABLED flags set. That is where the ftrace_ops is processed and
sets the records ref counts, and then later when the function itself is to
be updated, and the ENABLED flag gets removed. Add a helper function
"skip_record()" that returns true if the record has the DISABLED flag set
but not the ENABLED flag.
Link: https://lkml.kernel.org/r/20221005003809.27d2b97b@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: b39181f7c6907 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-10-05 00:38:09 -04:00
if ( skip_record ( rec ) )
2016-11-14 16:31:49 -05:00
continue ;
2012-04-27 09:13:18 -04:00
failed = __ftrace_replace_code ( rec , enable ) ;
2009-03-13 17:16:34 +08:00
if ( failed ) {
2014-10-24 17:56:04 -04:00
ftrace_bug ( failed , rec ) ;
2009-10-07 16:57:56 -04:00
/* Stop processing */
return ;
2008-05-12 21:20:43 +02:00
}
2018-12-05 12:48:53 -05:00
if ( schedulable )
cond_resched ( ) ;
2009-02-13 12:43:56 -05:00
} while_for_each_ftrace_rec ( ) ;
2008-05-12 21:20:43 +02:00
}
2011-08-16 09:53:39 -04:00
struct ftrace_rec_iter {
struct ftrace_page * pg ;
int index ;
} ;
/**
2021-10-29 09:52:23 -04:00
* ftrace_rec_iter_start - start up iterating over traced functions
2011-08-16 09:53:39 -04:00
*
2024-02-22 21:48:33 -08:00
* Returns : an iterator handle that is used to iterate over all
2011-08-16 09:53:39 -04:00
* the records that represent address locations where functions
* are traced .
*
* May return NULL if no records are available .
*/
struct ftrace_rec_iter * ftrace_rec_iter_start ( void )
{
/*
* We only use a single iterator .
* Protected by the ftrace_lock mutex .
*/
static struct ftrace_rec_iter ftrace_rec_iter ;
struct ftrace_rec_iter * iter = & ftrace_rec_iter ;
iter - > pg = ftrace_pages_start ;
iter - > index = 0 ;
/* Could have empty pages */
while ( iter - > pg & & ! iter - > pg - > index )
iter - > pg = iter - > pg - > next ;
if ( ! iter - > pg )
return NULL ;
return iter ;
}
/**
2021-10-29 09:52:23 -04:00
* ftrace_rec_iter_next - get the next record to process .
2011-08-16 09:53:39 -04:00
* @ iter : The handle to the iterator .
*
2024-02-22 21:48:33 -08:00
* Returns : the next iterator after the given iterator @ iter .
2011-08-16 09:53:39 -04:00
*/
struct ftrace_rec_iter * ftrace_rec_iter_next ( struct ftrace_rec_iter * iter )
{
iter - > index + + ;
if ( iter - > index > = iter - > pg - > index ) {
iter - > pg = iter - > pg - > next ;
iter - > index = 0 ;
/* Could have empty pages */
while ( iter - > pg & & ! iter - > pg - > index )
iter - > pg = iter - > pg - > next ;
}
if ( ! iter - > pg )
return NULL ;
return iter ;
}
/**
2021-10-29 09:52:23 -04:00
* ftrace_rec_iter_record - get the record at the iterator location
2011-08-16 09:53:39 -04:00
* @ iter : The current iterator location
*
2024-02-22 21:48:33 -08:00
* Returns : the record that the current @ iter is at .
2011-08-16 09:53:39 -04:00
*/
struct dyn_ftrace * ftrace_rec_iter_record ( struct ftrace_rec_iter * iter )
{
return & iter - > pg - > records [ iter - > index ] ;
}
2008-05-25 00:10:04 +05:30
static int
2019-10-16 17:51:10 +01:00
ftrace_nop_initialize ( struct module * mod , struct dyn_ftrace * rec )
2008-05-12 21:20:43 +02:00
{
2008-10-23 09:32:59 -04:00
int ret ;
2008-05-12 21:20:43 +02:00
2011-04-21 23:16:46 -04:00
if ( unlikely ( ftrace_disabled ) )
return 0 ;
2019-10-16 17:51:10 +01:00
ret = ftrace_init_nop ( mod , rec ) ;
2008-10-23 09:32:59 -04:00
if ( ret ) {
2015-11-25 12:50:47 -05:00
ftrace_bug_type = FTRACE_BUG_INIT ;
2014-10-24 17:56:04 -04:00
ftrace_bug ( ret , rec ) ;
2008-05-25 00:10:04 +05:30
return 0 ;
2008-05-12 21:20:48 +02:00
}
2008-05-25 00:10:04 +05:30
return 1 ;
2008-05-12 21:20:43 +02:00
}
2009-02-17 13:35:06 -05:00
/*
* archs can override this function if they must do something
* before the modifying code is performed .
*/
2022-05-18 10:36:40 +08:00
void __weak ftrace_arch_code_modify_prepare ( void )
2009-02-17 13:35:06 -05:00
{
}
/*
* archs can override this function if they must do something
* after the modifying code is performed .
*/
2022-05-18 10:36:40 +08:00
void __weak ftrace_arch_code_modify_post_process ( void )
2009-02-17 13:35:06 -05:00
{
}
2022-11-22 18:09:05 -05:00
static int update_ftrace_func ( ftrace_func_t func )
{
static ftrace_func_t save_func ;
/* Avoid updating if it hasn't changed */
if ( func = = save_func )
return 0 ;
save_func = func ;
return ftrace_update_ftrace_func ( func ) ;
}
2012-04-26 14:59:43 -04:00
void ftrace_modify_all_code ( int command )
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
{
2013-08-31 01:04:07 -04:00
int update = command & FTRACE_UPDATE_TRACE_FUNC ;
2018-12-05 12:48:53 -05:00
int mod_flags = 0 ;
2014-02-24 17:12:21 +01:00
int err = 0 ;
2013-08-31 01:04:07 -04:00
2018-12-05 12:48:53 -05:00
if ( command & FTRACE_MAY_SLEEP )
mod_flags = FTRACE_MODIFY_MAY_SLEEP_FL ;
2013-08-31 01:04:07 -04:00
/*
* If the ftrace_caller calls a ftrace_ops func directly ,
* we need to make sure that it only traces functions it
* expects to trace . When doing the switch of functions ,
* we need to update to the ftrace_ops_list_func first
* before the transition between old and new calls are set ,
* as the ftrace_ops_list_func will check the ops hashes
* to make sure the ops are having the right functions
* traced .
*/
2014-02-24 17:12:21 +01:00
if ( update ) {
2022-11-22 18:09:05 -05:00
err = update_ftrace_func ( ftrace_ops_list_func ) ;
2014-02-24 17:12:21 +01:00
if ( FTRACE_WARN_ON ( err ) )
return ;
}
2013-08-31 01:04:07 -04:00
2012-04-26 14:59:43 -04:00
if ( command & FTRACE_UPDATE_CALLS )
2018-12-05 12:48:53 -05:00
ftrace_replace_code ( mod_flags | FTRACE_MODIFY_ENABLE_FL ) ;
2012-04-26 14:59:43 -04:00
else if ( command & FTRACE_DISABLE_CALLS )
2018-12-05 12:48:53 -05:00
ftrace_replace_code ( mod_flags ) ;
2008-05-12 21:20:43 +02:00
2013-11-08 14:17:30 -05:00
if ( update & & ftrace_trace_function ! = ftrace_ops_list_func ) {
function_trace_op = set_function_trace_op ;
smp_wmb ( ) ;
/* If irqs are disabled, we are in stop machine */
if ( ! irqs_disabled ( ) )
smp_call_function ( ftrace_sync_ipi , NULL , 1 ) ;
2022-11-22 18:09:05 -05:00
err = update_ftrace_func ( ftrace_trace_function ) ;
2014-02-24 17:12:21 +01:00
if ( FTRACE_WARN_ON ( err ) )
return ;
2013-11-08 14:17:30 -05:00
}
2008-05-12 21:20:43 +02:00
2012-04-26 14:59:43 -04:00
if ( command & FTRACE_START_FUNC_RET )
2014-02-24 17:12:21 +01:00
err = ftrace_enable_ftrace_graph_caller ( ) ;
2012-04-26 14:59:43 -04:00
else if ( command & FTRACE_STOP_FUNC_RET )
2014-02-24 17:12:21 +01:00
err = ftrace_disable_ftrace_graph_caller ( ) ;
FTRACE_WARN_ON ( err ) ;
2012-04-26 14:59:43 -04:00
}
static int __ftrace_modify_code ( void * data )
{
int * command = data ;
ftrace_modify_all_code ( * command ) ;
2008-11-26 00:16:24 -05:00
2008-05-12 21:20:43 +02:00
return 0 ;
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
}
2011-08-16 09:53:39 -04:00
/**
2021-10-29 09:52:23 -04:00
* ftrace_run_stop_machine - go back to the stop machine method
2011-08-16 09:53:39 -04:00
* @ command : The command to tell ftrace what to do
*
* If an arch needs to fall back to the stop machine method , the
* it can call this function .
*/
void ftrace_run_stop_machine ( int command )
{
stop_machine ( __ftrace_modify_code , & command , NULL ) ;
}
/**
2021-10-29 09:52:23 -04:00
* arch_ftrace_update_code - modify the code to trace or not trace
2011-08-16 09:53:39 -04:00
* @ command : The command that needs to be done
*
* Archs can override this function if it does not need to
* run stop_machine ( ) to modify code .
*/
void __weak arch_ftrace_update_code ( int command )
{
ftrace_run_stop_machine ( command ) ;
}
2008-05-12 21:20:51 +02:00
static void ftrace_run_update_code ( int command )
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
{
2022-05-18 10:36:40 +08:00
ftrace_arch_code_modify_prepare ( ) ;
2009-02-17 13:35:06 -05:00
2011-08-16 09:53:39 -04:00
/*
* By default we use stop_machine ( ) to modify the code .
* But archs can do what ever they want as long as it
* is safe . The stop_machine ( ) is the safest , but also
* produces the most overhead .
*/
arch_ftrace_update_code ( command ) ;
2022-05-18 10:36:40 +08:00
ftrace_arch_code_modify_post_process ( ) ;
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
}
2014-10-24 14:56:01 -04:00
static void ftrace_run_modify_code ( struct ftrace_ops * ops , int command ,
2015-01-13 14:03:38 -05:00
struct ftrace_ops_hash * old_hash )
2014-08-05 17:19:38 -04:00
{
ops - > flags | = FTRACE_OPS_FL_MODIFYING ;
2015-01-13 14:03:38 -05:00
ops - > old_hash . filter_hash = old_hash - > filter_hash ;
ops - > old_hash . notrace_hash = old_hash - > notrace_hash ;
2014-08-05 17:19:38 -04:00
ftrace_run_update_code ( command ) ;
2014-10-24 14:56:01 -04:00
ops - > old_hash . filter_hash = NULL ;
2015-01-13 14:03:38 -05:00
ops - > old_hash . notrace_hash = NULL ;
2014-08-05 17:19:38 -04:00
ops - > flags & = ~ FTRACE_OPS_FL_MODIFYING ;
}
2008-05-12 21:20:43 +02:00
static ftrace_func_t saved_ftrace_func ;
2008-11-05 16:05:44 -05:00
static int ftrace_start_up ;
2008-11-26 00:16:23 -05:00
2014-07-03 15:48:16 -04:00
void __weak arch_ftrace_trampoline_free ( struct ftrace_ops * ops )
{
}
2020-05-12 15:19:13 +03:00
/* List of trace_ops that have allocated trampolines */
static LIST_HEAD ( ftrace_ops_trampoline_list ) ;
static void ftrace_add_trampoline_to_kallsyms ( struct ftrace_ops * ops )
{
lockdep_assert_held ( & ftrace_lock ) ;
list_add_rcu ( & ops - > list , & ftrace_ops_trampoline_list ) ;
}
static void ftrace_remove_trampoline_from_kallsyms ( struct ftrace_ops * ops )
{
lockdep_assert_held ( & ftrace_lock ) ;
list_del_rcu ( & ops - > list ) ;
2020-09-01 12:16:17 +03:00
synchronize_rcu ( ) ;
2020-05-12 15:19:13 +03:00
}
/*
* " __builtin__ftrace " is used as a module name in / proc / kallsyms for symbols
* for pages allocated for ftrace purposes , even though " __builtin__ftrace " is
* not a module .
*/
# define FTRACE_TRAMPOLINE_MOD "__builtin__ftrace"
# define FTRACE_TRAMPOLINE_SYM "ftrace_trampoline"
static void ftrace_trampoline_free ( struct ftrace_ops * ops )
{
if ( ops & & ( ops - > flags & FTRACE_OPS_FL_ALLOC_TRAMP ) & &
2020-05-12 15:19:14 +03:00
ops - > trampoline ) {
2020-05-12 15:19:15 +03:00
/*
* Record the text poke event before the ksymbol unregister
* event .
*/
perf_event_text_poke ( ( void * ) ops - > trampoline ,
( void * ) ops - > trampoline ,
ops - > trampoline_size , NULL , 0 ) ;
2020-05-12 15:19:14 +03:00
perf_event_ksymbol ( PERF_RECORD_KSYMBOL_TYPE_OOL ,
ops - > trampoline , ops - > trampoline_size ,
true , FTRACE_TRAMPOLINE_SYM ) ;
/* Remove from kallsyms after the perf events */
2020-05-12 15:19:13 +03:00
ftrace_remove_trampoline_from_kallsyms ( ops ) ;
2020-05-12 15:19:14 +03:00
}
2020-05-12 15:19:13 +03:00
arch_ftrace_trampoline_free ( ops ) ;
}
2008-11-26 00:16:23 -05:00
static void ftrace_startup_enable ( int command )
{
if ( saved_ftrace_func ! = ftrace_trace_function ) {
saved_ftrace_func = ftrace_trace_function ;
command | = FTRACE_UPDATE_TRACE_FUNC ;
}
if ( ! command | | ! ftrace_enabled )
return ;
ftrace_run_update_code ( command ) ;
}
2008-05-12 21:20:43 +02:00
2014-08-05 17:19:38 -04:00
static void ftrace_startup_all ( int command )
{
update_all_ops = true ;
ftrace_startup_enable ( command ) ;
update_all_ops = false ;
}
2018-11-15 12:32:38 -05:00
int ftrace_startup ( struct ftrace_ops * ops , int command )
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
{
2013-11-25 20:59:46 -05:00
int ret ;
2011-05-04 09:27:52 -04:00
2008-05-12 21:20:48 +02:00
if ( unlikely ( ftrace_disabled ) )
2011-05-23 15:24:25 -04:00
return - ENODEV ;
2008-05-12 21:20:48 +02:00
2013-11-25 20:59:46 -05:00
ret = __register_ftrace_function ( ops ) ;
if ( ret )
return ret ;
2008-11-05 16:05:44 -05:00
ftrace_start_up + + ;
2008-05-12 21:20:43 +02:00
2014-08-05 17:19:38 -04:00
/*
* Note that ftrace probes uses this to start up
* and modify functions it will probe . But we still
* set the ADDING flag for modification , as probes
* do not have trampolines . If they add them in the
* future , then the probes will need to distinguish
* between adding and updating probes .
*/
ops - > flags | = FTRACE_OPS_FL_ENABLED | FTRACE_OPS_FL_ADDING ;
2014-05-06 21:57:49 -04:00
2014-11-21 05:25:16 -05:00
ret = ftrace_hash_ipmodify_enable ( ops ) ;
if ( ret < 0 ) {
/* Rollback registration process */
__unregister_ftrace_function ( ops ) ;
ftrace_start_up - - ;
ops - > flags & = ~ FTRACE_OPS_FL_ENABLED ;
2020-08-31 14:26:31 +02:00
if ( ops - > flags & FTRACE_OPS_FL_DYNAMIC )
ftrace_trampoline_free ( ops ) ;
2014-11-21 05:25:16 -05:00
return ret ;
}
2016-03-16 15:34:33 +01:00
if ( ftrace_hash_rec_enable ( ops , 1 ) )
command | = FTRACE_UPDATE_CALLS ;
2011-05-03 13:25:24 -04:00
2008-11-26 00:16:23 -05:00
ftrace_startup_enable ( command ) ;
2011-05-23 15:24:25 -04:00
ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead
ftrace_startup does not remove ops from ftrace_ops_list when
ftrace_startup_enable fails:
register_ftrace_function
ftrace_startup
__register_ftrace_function
...
add_ftrace_ops(&ftrace_ops_list, ops)
...
...
ftrace_startup_enable // if ftrace failed to modify, ftrace_disabled is set to 1
...
return 0 // ops is in the ftrace_ops_list.
When ftrace_disabled = 1, unregister_ftrace_function simply returns without doing anything:
unregister_ftrace_function
ftrace_shutdown
if (unlikely(ftrace_disabled))
return -ENODEV; // return here, __unregister_ftrace_function is not executed,
// as a result, ops is still in the ftrace_ops_list
__unregister_ftrace_function
...
If ops is dynamically allocated, it will be free later, in this case,
is_ftrace_trampoline accesses NULL pointer:
is_ftrace_trampoline
ftrace_ops_trampoline
do_for_each_ftrace_op(op, ftrace_ops_list) // OOPS! op may be NULL!
Syzkaller reports as follows:
[ 1203.506103] BUG: kernel NULL pointer dereference, address: 000000000000010b
[ 1203.508039] #PF: supervisor read access in kernel mode
[ 1203.508798] #PF: error_code(0x0000) - not-present page
[ 1203.509558] PGD 800000011660b067 P4D 800000011660b067 PUD 130fb8067 PMD 0
[ 1203.510560] Oops: 0000 [#1] SMP KASAN PTI
[ 1203.511189] CPU: 6 PID: 29532 Comm: syz-executor.2 Tainted: G B W 5.10.0 #8
[ 1203.512324] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 1203.513895] RIP: 0010:is_ftrace_trampoline+0x26/0xb0
[ 1203.514644] Code: ff eb d3 90 41 55 41 54 49 89 fc 55 53 e8 f2 00 fd ff 48 8b 1d 3b 35 5d 03 e8 e6 00 fd ff 48 8d bb 90 00 00 00 e8 2a 81 26 00 <48> 8b ab 90 00 00 00 48 85 ed 74 1d e8 c9 00 fd ff 48 8d bb 98 00
[ 1203.518838] RSP: 0018:ffffc900012cf960 EFLAGS: 00010246
[ 1203.520092] RAX: 0000000000000000 RBX: 000000000000007b RCX: ffffffff8a331866
[ 1203.521469] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000000010b
[ 1203.522583] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8df18b07
[ 1203.523550] R10: fffffbfff1be3160 R11: 0000000000000001 R12: 0000000000478399
[ 1203.524596] R13: 0000000000000000 R14: ffff888145088000 R15: 0000000000000008
[ 1203.525634] FS: 00007f429f5f4700(0000) GS:ffff8881daf00000(0000) knlGS:0000000000000000
[ 1203.526801] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1203.527626] CR2: 000000000000010b CR3: 0000000170e1e001 CR4: 00000000003706e0
[ 1203.528611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1203.529605] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Therefore, when ftrace_startup_enable fails, we need to rollback registration
process and remove ops from ftrace_ops_list.
Link: https://lkml.kernel.org/r/20220818032659.56209-1-yangjihong1@huawei.com
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-18 11:26:59 +08:00
/*
* If ftrace is in an undefined state , we just remove ops from list
* to prevent the NULL pointer , instead of totally rolling it back and
* free trampoline , because those actions could cause further damage .
*/
if ( unlikely ( ftrace_disabled ) ) {
__unregister_ftrace_function ( ops ) ;
return - ENODEV ;
}
2014-08-05 17:19:38 -04:00
ops - > flags & = ~ FTRACE_OPS_FL_ADDING ;
2011-05-23 15:24:25 -04:00
return 0 ;
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
}
2018-11-15 12:32:38 -05:00
int ftrace_shutdown ( struct ftrace_ops * ops , int command )
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
{
2013-11-25 20:59:46 -05:00
int ret ;
2011-05-04 09:27:52 -04:00
2008-05-12 21:20:48 +02:00
if ( unlikely ( ftrace_disabled ) )
2013-11-25 20:59:46 -05:00
return - ENODEV ;
ret = __unregister_ftrace_function ( ops ) ;
if ( ret )
return ret ;
2008-05-12 21:20:48 +02:00
2008-11-05 16:05:44 -05:00
ftrace_start_up - - ;
2009-06-20 06:52:21 +02:00
/*
* Just warn in case of unbalance , no need to kill ftrace , it ' s not
* critical but the ftrace_call callers may be never nopped again after
* further ftrace uses .
*/
WARN_ON_ONCE ( ftrace_start_up < 0 ) ;
2014-11-21 05:25:16 -05:00
/* Disabling ipmodify never fails */
ftrace_hash_ipmodify_disable ( ops ) ;
2011-05-03 13:25:24 -04:00
2016-03-16 15:34:33 +01:00
if ( ftrace_hash_rec_disable ( ops , 1 ) )
command | = FTRACE_UPDATE_CALLS ;
2011-05-04 09:27:52 -04:00
2016-03-16 15:34:33 +01:00
ops - > flags & = ~ FTRACE_OPS_FL_ENABLED ;
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
2008-05-12 21:20:43 +02:00
if ( saved_ftrace_func ! = ftrace_trace_function ) {
saved_ftrace_func = ftrace_trace_function ;
command | = FTRACE_UPDATE_TRACE_FUNC ;
}
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
2022-11-03 11:10:10 +08:00
if ( ! command | | ! ftrace_enabled )
goto out ;
2008-05-12 21:20:43 +02:00
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
/*
* If the ops uses a trampoline , then it needs to be
* tested first on update .
*/
2014-08-05 17:19:38 -04:00
ops - > flags | = FTRACE_OPS_FL_REMOVING ;
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
removed_ops = ops ;
2014-07-24 12:25:47 -04:00
/* The trampoline logic checks the old hashes */
ops - > old_hash . filter_hash = ops - > func_hash - > filter_hash ;
ops - > old_hash . notrace_hash = ops - > func_hash - > notrace_hash ;
2008-05-12 21:20:43 +02:00
ftrace_run_update_code ( command ) ;
2014-01-13 12:56:21 -05:00
2014-09-12 14:21:13 -04:00
/*
* If there ' s no more ops registered with ftrace , run a
* sanity check to make sure all rec flags are cleared .
*/
2017-06-07 16:12:51 +08:00
if ( rcu_dereference_protected ( ftrace_ops_list ,
lockdep_is_held ( & ftrace_lock ) ) = = & ftrace_list_end ) {
2014-09-12 14:21:13 -04:00
struct ftrace_page * pg ;
struct dyn_ftrace * rec ;
do_for_each_ftrace_rec ( pg , rec ) {
2023-01-24 09:56:53 -05:00
if ( FTRACE_WARN_ON_ONCE ( rec - > flags & ~ FTRACE_NOCLEAR_FLAGS ) )
2014-09-12 14:21:13 -04:00
pr_warn ( " %pS flags:%lx \n " ,
( void * ) rec - > ip , rec - > flags ) ;
} while_for_each_ftrace_rec ( ) ;
}
2014-07-24 12:25:47 -04:00
ops - > old_hash . filter_hash = NULL ;
ops - > old_hash . notrace_hash = NULL ;
removed_ops = NULL ;
2014-08-05 17:19:38 -04:00
ops - > flags & = ~ FTRACE_OPS_FL_REMOVING ;
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
2022-11-03 11:10:10 +08:00
out :
2014-01-13 12:56:21 -05:00
/*
* Dynamic ops may be freed , we must make sure that all
* callers are done before leaving this function .
*/
2017-10-11 09:45:32 +02:00
if ( ops - > flags & FTRACE_OPS_FL_DYNAMIC ) {
2017-04-06 10:28:12 -04:00
/*
* We need to do a hard force of sched synchronization .
* This is because we use preempt_disable ( ) to do RCU , but
* the function tracers can be called where RCU is not watching
* ( like before user_exit ( ) ) . We can not rely on the RCU
* infrastructure to do the synchronization , thus we must do it
* ourselves .
*/
2020-04-03 12:10:28 -07:00
synchronize_rcu_tasks_rude ( ) ;
2014-01-13 12:56:21 -05:00
2017-04-06 10:28:12 -04:00
/*
2020-10-02 22:31:26 +08:00
* When the kernel is preemptive , tasks can be preempted
2017-04-06 10:28:12 -04:00
* while on a ftrace trampoline . Just scheduling a task on
* a CPU is not good enough to flush them . Calling
2021-03-23 18:49:35 +01:00
* synchronize_rcu_tasks ( ) will wait for those tasks to
2017-04-06 10:28:12 -04:00
* execute and either schedule voluntarily or enter user space .
*/
2019-07-26 23:19:40 +02:00
if ( IS_ENABLED ( CONFIG_PREEMPTION ) )
2017-04-06 10:28:12 -04:00
synchronize_rcu_tasks ( ) ;
2020-05-12 15:19:13 +03:00
ftrace_trampoline_free ( ops ) ;
2014-01-13 12:56:21 -05:00
}
2013-11-25 20:59:46 -05:00
return 0 ;
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
}
2016-12-21 20:32:01 +01:00
static u64 ftrace_update_time ;
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
unsigned long ftrace_update_tot_cnt ;
2019-10-01 14:38:07 -04:00
unsigned long ftrace_number_of_pages ;
unsigned long ftrace_number_of_groups ;
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
ftrace: Check module functions being traced on reload
There's been a nasty bug that would show up and not give much info.
The bug displayed the following warning:
WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230()
Pid: 20903, comm: bash Tainted: G O 3.6.11+ #38405.trunk
Call Trace:
[<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20
[<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230
[<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0
[<ffffffff811401cc>] ? kfree+0x2c/0x110
[<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150
[<ffffffff81149f1e>] __fput+0xae/0x220
[<ffffffff8114a09e>] ____fput+0xe/0x10
[<ffffffff8105fa22>] task_work_run+0x72/0x90
[<ffffffff810028ec>] do_notify_resume+0x6c/0xc0
[<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff815c0f88>] int_signal+0x12/0x17
---[ end trace 793179526ee09b2c ]---
It was finally narrowed down to unloading a module that was being traced.
It was actually more than that. When functions are being traced, there's
a table of all functions that have a ref count of the number of active
tracers attached to that function. When a function trace callback is
registered to a function, the function's record ref count is incremented.
When it is unregistered, the function's record ref count is decremented.
If an inconsistency is detected (ref count goes below zero) the above
warning is shown and the function tracing is permanently disabled until
reboot.
The ftrace callback ops holds a hash of functions that it filters on
(and/or filters off). If the hash is empty, the default means to filter
all functions (for the filter_hash) or to disable no functions (for the
notrace_hash).
When a module is unloaded, it frees the function records that represent
the module functions. These records exist on their own pages, that is
function records for one module will not exist on the same page as
function records for other modules or even the core kernel.
Now when a module unloads, the records that represents its functions are
freed. When the module is loaded again, the records are recreated with
a default ref count of zero (unless there's a callback that traces all
functions, then they will also be traced, and the ref count will be
incremented).
The problem is that if an ftrace callback hash includes functions of the
module being unloaded, those hash entries will not be removed. If the
module is reloaded in the same location, the hash entries still point
to the functions of the module but the module's ref counts do not reflect
that.
With the help of Steve and Joern, we found a reproducer:
Using uinput module and uinput_release function.
cd /sys/kernel/debug/tracing
modprobe uinput
echo uinput_release > set_ftrace_filter
echo function > current_tracer
rmmod uinput
modprobe uinput
# check /proc/modules to see if loaded in same addr, otherwise try again
echo nop > current_tracer
[BOOM]
The above loads the uinput module, which creates a table of functions that
can be traced within the module.
We add uinput_release to the filter_hash to trace just that function.
Enable function tracincg, which increments the ref count of the record
associated to uinput_release.
Remove uinput, which frees the records including the one that represents
uinput_release.
Load the uinput module again (and make sure it's at the same address).
This recreates the function records all with a ref count of zero,
including uinput_release.
Disable function tracing, which will decrement the ref count for uinput_release
which is now zero because of the module removal and reload, and we have
a mismatch (below zero ref count).
The solution is to check all currently tracing ftrace callbacks to see if any
are tracing any of the module's functions when a module is loaded (it already does
that with callbacks that trace all functions). If a callback happens to have
a module function being traced, it increments that records ref count and starts
tracing that function.
There may be a strange side effect with this, where tracing module functions
on unload and then reloading a new module may have that new module's functions
being traced. This may be something that confuses the user, but it's not
a big deal. Another approach is to disable all callback hashes on module unload,
but this leaves some ftrace callbacks that may not be registered, but can
still have hashes tracing the module's function where ftrace doesn't know about
it. That situation can cause the same bug. This solution solves that case too.
Another benefit of this solution, is it is possible to trace a module's
function on unload and load.
Link: http://lkml.kernel.org/r/20130705142629.GA325@redhat.com
Reported-by: Jörn Engel <joern@logfs.org>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Steve Hodgson <steve@purestorage.com>
Tested-by: Steve Hodgson <steve@purestorage.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-30 00:04:32 -04:00
static inline int ops_traces_mod ( struct ftrace_ops * ops )
ftrace: Fix regression where ftrace breaks when modules are loaded
Enabling function tracer to trace all functions, then load a module and
then disable function tracing will cause ftrace to fail.
This can also happen by enabling function tracing on the command line:
ftrace=function
and during boot up, modules are loaded, then you disable function tracing
with 'echo nop > current_tracer' you will trigger a bug in ftrace that
will shut itself down.
The reason is, the new ftrace code keeps ref counts of all ftrace_ops that
are registered for tracing. When one or more ftrace_ops are registered,
all the records that represent the functions that the ftrace_ops will
trace have a ref count incremented. If this ref count is not zero,
when the code modification runs, that function will be enabled for tracing.
If the ref count is zero, that function will be disabled from tracing.
To make sure the accounting was working, FTRACE_WARN_ON()s were added
to updating of the ref counts.
If the ref count hits its max (> 2^30 ftrace_ops added), or if
the ref count goes below zero, a FTRACE_WARN_ON() is triggered which
disables all modification of code.
Since it is common for ftrace_ops to trace all functions in the kernel,
instead of creating > 20,000 hash items for the ftrace_ops, the hash
count is just set to zero, and it represents that the ftrace_ops is
to trace all functions. This is where the issues arrise.
If you enable function tracing to trace all functions, and then add
a module, the modules function records do not get the ref count updated.
When the function tracer is disabled, all function records ref counts
are subtracted. Since the modules never had their ref counts incremented,
they go below zero and the FTRACE_WARN_ON() is triggered.
The solution to this is rather simple. When modules are loaded, and
their functions are added to the the ftrace pool, look to see if any
ftrace_ops are registered that trace all functions. And for those,
update the ref count for the module function records.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-14 23:02:27 -04:00
{
ftrace: Check module functions being traced on reload
There's been a nasty bug that would show up and not give much info.
The bug displayed the following warning:
WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230()
Pid: 20903, comm: bash Tainted: G O 3.6.11+ #38405.trunk
Call Trace:
[<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20
[<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230
[<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0
[<ffffffff811401cc>] ? kfree+0x2c/0x110
[<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150
[<ffffffff81149f1e>] __fput+0xae/0x220
[<ffffffff8114a09e>] ____fput+0xe/0x10
[<ffffffff8105fa22>] task_work_run+0x72/0x90
[<ffffffff810028ec>] do_notify_resume+0x6c/0xc0
[<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff815c0f88>] int_signal+0x12/0x17
---[ end trace 793179526ee09b2c ]---
It was finally narrowed down to unloading a module that was being traced.
It was actually more than that. When functions are being traced, there's
a table of all functions that have a ref count of the number of active
tracers attached to that function. When a function trace callback is
registered to a function, the function's record ref count is incremented.
When it is unregistered, the function's record ref count is decremented.
If an inconsistency is detected (ref count goes below zero) the above
warning is shown and the function tracing is permanently disabled until
reboot.
The ftrace callback ops holds a hash of functions that it filters on
(and/or filters off). If the hash is empty, the default means to filter
all functions (for the filter_hash) or to disable no functions (for the
notrace_hash).
When a module is unloaded, it frees the function records that represent
the module functions. These records exist on their own pages, that is
function records for one module will not exist on the same page as
function records for other modules or even the core kernel.
Now when a module unloads, the records that represents its functions are
freed. When the module is loaded again, the records are recreated with
a default ref count of zero (unless there's a callback that traces all
functions, then they will also be traced, and the ref count will be
incremented).
The problem is that if an ftrace callback hash includes functions of the
module being unloaded, those hash entries will not be removed. If the
module is reloaded in the same location, the hash entries still point
to the functions of the module but the module's ref counts do not reflect
that.
With the help of Steve and Joern, we found a reproducer:
Using uinput module and uinput_release function.
cd /sys/kernel/debug/tracing
modprobe uinput
echo uinput_release > set_ftrace_filter
echo function > current_tracer
rmmod uinput
modprobe uinput
# check /proc/modules to see if loaded in same addr, otherwise try again
echo nop > current_tracer
[BOOM]
The above loads the uinput module, which creates a table of functions that
can be traced within the module.
We add uinput_release to the filter_hash to trace just that function.
Enable function tracincg, which increments the ref count of the record
associated to uinput_release.
Remove uinput, which frees the records including the one that represents
uinput_release.
Load the uinput module again (and make sure it's at the same address).
This recreates the function records all with a ref count of zero,
including uinput_release.
Disable function tracing, which will decrement the ref count for uinput_release
which is now zero because of the module removal and reload, and we have
a mismatch (below zero ref count).
The solution is to check all currently tracing ftrace callbacks to see if any
are tracing any of the module's functions when a module is loaded (it already does
that with callbacks that trace all functions). If a callback happens to have
a module function being traced, it increments that records ref count and starts
tracing that function.
There may be a strange side effect with this, where tracing module functions
on unload and then reloading a new module may have that new module's functions
being traced. This may be something that confuses the user, but it's not
a big deal. Another approach is to disable all callback hashes on module unload,
but this leaves some ftrace callbacks that may not be registered, but can
still have hashes tracing the module's function where ftrace doesn't know about
it. That situation can cause the same bug. This solution solves that case too.
Another benefit of this solution, is it is possible to trace a module's
function on unload and load.
Link: http://lkml.kernel.org/r/20130705142629.GA325@redhat.com
Reported-by: Jörn Engel <joern@logfs.org>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Steve Hodgson <steve@purestorage.com>
Tested-by: Steve Hodgson <steve@purestorage.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-30 00:04:32 -04:00
/*
* Filter_hash being empty will default to trace module .
* But notrace hash requires a test of individual module functions .
*/
2014-08-15 17:23:02 -04:00
return ftrace_hash_empty ( ops - > func_hash - > filter_hash ) & &
ftrace_hash_empty ( ops - > func_hash - > notrace_hash ) ;
ftrace: Check module functions being traced on reload
There's been a nasty bug that would show up and not give much info.
The bug displayed the following warning:
WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230()
Pid: 20903, comm: bash Tainted: G O 3.6.11+ #38405.trunk
Call Trace:
[<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20
[<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230
[<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0
[<ffffffff811401cc>] ? kfree+0x2c/0x110
[<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150
[<ffffffff81149f1e>] __fput+0xae/0x220
[<ffffffff8114a09e>] ____fput+0xe/0x10
[<ffffffff8105fa22>] task_work_run+0x72/0x90
[<ffffffff810028ec>] do_notify_resume+0x6c/0xc0
[<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff815c0f88>] int_signal+0x12/0x17
---[ end trace 793179526ee09b2c ]---
It was finally narrowed down to unloading a module that was being traced.
It was actually more than that. When functions are being traced, there's
a table of all functions that have a ref count of the number of active
tracers attached to that function. When a function trace callback is
registered to a function, the function's record ref count is incremented.
When it is unregistered, the function's record ref count is decremented.
If an inconsistency is detected (ref count goes below zero) the above
warning is shown and the function tracing is permanently disabled until
reboot.
The ftrace callback ops holds a hash of functions that it filters on
(and/or filters off). If the hash is empty, the default means to filter
all functions (for the filter_hash) or to disable no functions (for the
notrace_hash).
When a module is unloaded, it frees the function records that represent
the module functions. These records exist on their own pages, that is
function records for one module will not exist on the same page as
function records for other modules or even the core kernel.
Now when a module unloads, the records that represents its functions are
freed. When the module is loaded again, the records are recreated with
a default ref count of zero (unless there's a callback that traces all
functions, then they will also be traced, and the ref count will be
incremented).
The problem is that if an ftrace callback hash includes functions of the
module being unloaded, those hash entries will not be removed. If the
module is reloaded in the same location, the hash entries still point
to the functions of the module but the module's ref counts do not reflect
that.
With the help of Steve and Joern, we found a reproducer:
Using uinput module and uinput_release function.
cd /sys/kernel/debug/tracing
modprobe uinput
echo uinput_release > set_ftrace_filter
echo function > current_tracer
rmmod uinput
modprobe uinput
# check /proc/modules to see if loaded in same addr, otherwise try again
echo nop > current_tracer
[BOOM]
The above loads the uinput module, which creates a table of functions that
can be traced within the module.
We add uinput_release to the filter_hash to trace just that function.
Enable function tracincg, which increments the ref count of the record
associated to uinput_release.
Remove uinput, which frees the records including the one that represents
uinput_release.
Load the uinput module again (and make sure it's at the same address).
This recreates the function records all with a ref count of zero,
including uinput_release.
Disable function tracing, which will decrement the ref count for uinput_release
which is now zero because of the module removal and reload, and we have
a mismatch (below zero ref count).
The solution is to check all currently tracing ftrace callbacks to see if any
are tracing any of the module's functions when a module is loaded (it already does
that with callbacks that trace all functions). If a callback happens to have
a module function being traced, it increments that records ref count and starts
tracing that function.
There may be a strange side effect with this, where tracing module functions
on unload and then reloading a new module may have that new module's functions
being traced. This may be something that confuses the user, but it's not
a big deal. Another approach is to disable all callback hashes on module unload,
but this leaves some ftrace callbacks that may not be registered, but can
still have hashes tracing the module's function where ftrace doesn't know about
it. That situation can cause the same bug. This solution solves that case too.
Another benefit of this solution, is it is possible to trace a module's
function on unload and load.
Link: http://lkml.kernel.org/r/20130705142629.GA325@redhat.com
Reported-by: Jörn Engel <joern@logfs.org>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Steve Hodgson <steve@purestorage.com>
Tested-by: Steve Hodgson <steve@purestorage.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-30 00:04:32 -04:00
}
2014-02-24 19:59:56 +01:00
static int ftrace_update_code ( struct module * mod , struct ftrace_page * new_pgs )
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
{
2021-07-28 23:25:45 +02:00
bool init_nop = ftrace_need_init_nop ( ) ;
2011-12-16 16:30:31 -05:00
struct ftrace_page * pg ;
2009-03-13 17:51:27 +08:00
struct dyn_ftrace * p ;
2016-12-21 20:32:01 +01:00
u64 start , stop ;
2014-02-24 19:59:56 +01:00
unsigned long update_cnt = 0 ;
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
unsigned long rec_flags = 0 ;
2011-12-16 16:30:31 -05:00
int i ;
ftrace: Fix regression where ftrace breaks when modules are loaded
Enabling function tracer to trace all functions, then load a module and
then disable function tracing will cause ftrace to fail.
This can also happen by enabling function tracing on the command line:
ftrace=function
and during boot up, modules are loaded, then you disable function tracing
with 'echo nop > current_tracer' you will trigger a bug in ftrace that
will shut itself down.
The reason is, the new ftrace code keeps ref counts of all ftrace_ops that
are registered for tracing. When one or more ftrace_ops are registered,
all the records that represent the functions that the ftrace_ops will
trace have a ref count incremented. If this ref count is not zero,
when the code modification runs, that function will be enabled for tracing.
If the ref count is zero, that function will be disabled from tracing.
To make sure the accounting was working, FTRACE_WARN_ON()s were added
to updating of the ref counts.
If the ref count hits its max (> 2^30 ftrace_ops added), or if
the ref count goes below zero, a FTRACE_WARN_ON() is triggered which
disables all modification of code.
Since it is common for ftrace_ops to trace all functions in the kernel,
instead of creating > 20,000 hash items for the ftrace_ops, the hash
count is just set to zero, and it represents that the ftrace_ops is
to trace all functions. This is where the issues arrise.
If you enable function tracing to trace all functions, and then add
a module, the modules function records do not get the ref count updated.
When the function tracer is disabled, all function records ref counts
are subtracted. Since the modules never had their ref counts incremented,
they go below zero and the FTRACE_WARN_ON() is triggered.
The solution to this is rather simple. When modules are loaded, and
their functions are added to the the ftrace pool, look to see if any
ftrace_ops are registered that trace all functions. And for those,
update the ref count for the module function records.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-14 23:02:27 -04:00
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
start = ftrace_now ( raw_smp_processor_id ( ) ) ;
ftrace: Fix regression where ftrace breaks when modules are loaded
Enabling function tracer to trace all functions, then load a module and
then disable function tracing will cause ftrace to fail.
This can also happen by enabling function tracing on the command line:
ftrace=function
and during boot up, modules are loaded, then you disable function tracing
with 'echo nop > current_tracer' you will trigger a bug in ftrace that
will shut itself down.
The reason is, the new ftrace code keeps ref counts of all ftrace_ops that
are registered for tracing. When one or more ftrace_ops are registered,
all the records that represent the functions that the ftrace_ops will
trace have a ref count incremented. If this ref count is not zero,
when the code modification runs, that function will be enabled for tracing.
If the ref count is zero, that function will be disabled from tracing.
To make sure the accounting was working, FTRACE_WARN_ON()s were added
to updating of the ref counts.
If the ref count hits its max (> 2^30 ftrace_ops added), or if
the ref count goes below zero, a FTRACE_WARN_ON() is triggered which
disables all modification of code.
Since it is common for ftrace_ops to trace all functions in the kernel,
instead of creating > 20,000 hash items for the ftrace_ops, the hash
count is just set to zero, and it represents that the ftrace_ops is
to trace all functions. This is where the issues arrise.
If you enable function tracing to trace all functions, and then add
a module, the modules function records do not get the ref count updated.
When the function tracer is disabled, all function records ref counts
are subtracted. Since the modules never had their ref counts incremented,
they go below zero and the FTRACE_WARN_ON() is triggered.
The solution to this is rather simple. When modules are loaded, and
their functions are added to the the ftrace pool, look to see if any
ftrace_ops are registered that trace all functions. And for those,
update the ref count for the module function records.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-14 23:02:27 -04:00
/*
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
* When a module is loaded , this function is called to convert
* the calls to mcount in its text to nops , and also to create
* an entry in the ftrace data . Now , if ftrace is activated
* after this call , but before the module sets its text to
* read - only , the modification of enabling ftrace can fail if
* the read - only is done while ftrace is converting the calls .
* To prevent this , the module ' s records are set as disabled
* and will be enabled after the call to set the module ' s text
* to read - only .
ftrace: Fix regression where ftrace breaks when modules are loaded
Enabling function tracer to trace all functions, then load a module and
then disable function tracing will cause ftrace to fail.
This can also happen by enabling function tracing on the command line:
ftrace=function
and during boot up, modules are loaded, then you disable function tracing
with 'echo nop > current_tracer' you will trigger a bug in ftrace that
will shut itself down.
The reason is, the new ftrace code keeps ref counts of all ftrace_ops that
are registered for tracing. When one or more ftrace_ops are registered,
all the records that represent the functions that the ftrace_ops will
trace have a ref count incremented. If this ref count is not zero,
when the code modification runs, that function will be enabled for tracing.
If the ref count is zero, that function will be disabled from tracing.
To make sure the accounting was working, FTRACE_WARN_ON()s were added
to updating of the ref counts.
If the ref count hits its max (> 2^30 ftrace_ops added), or if
the ref count goes below zero, a FTRACE_WARN_ON() is triggered which
disables all modification of code.
Since it is common for ftrace_ops to trace all functions in the kernel,
instead of creating > 20,000 hash items for the ftrace_ops, the hash
count is just set to zero, and it represents that the ftrace_ops is
to trace all functions. This is where the issues arrise.
If you enable function tracing to trace all functions, and then add
a module, the modules function records do not get the ref count updated.
When the function tracer is disabled, all function records ref counts
are subtracted. Since the modules never had their ref counts incremented,
they go below zero and the FTRACE_WARN_ON() is triggered.
The solution to this is rather simple. When modules are loaded, and
their functions are added to the the ftrace pool, look to see if any
ftrace_ops are registered that trace all functions. And for those,
update the ref count for the module function records.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-14 23:02:27 -04:00
*/
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
if ( mod )
rec_flags | = FTRACE_FL_DISABLED ;
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
2014-02-24 19:59:56 +01:00
for ( pg = new_pgs ; pg ; pg = pg - > next ) {
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
2011-12-16 16:30:31 -05:00
for ( i = 0 ; i < pg - > index ; i + + ) {
ftrace: Check module functions being traced on reload
There's been a nasty bug that would show up and not give much info.
The bug displayed the following warning:
WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230()
Pid: 20903, comm: bash Tainted: G O 3.6.11+ #38405.trunk
Call Trace:
[<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20
[<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230
[<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0
[<ffffffff811401cc>] ? kfree+0x2c/0x110
[<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150
[<ffffffff81149f1e>] __fput+0xae/0x220
[<ffffffff8114a09e>] ____fput+0xe/0x10
[<ffffffff8105fa22>] task_work_run+0x72/0x90
[<ffffffff810028ec>] do_notify_resume+0x6c/0xc0
[<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff815c0f88>] int_signal+0x12/0x17
---[ end trace 793179526ee09b2c ]---
It was finally narrowed down to unloading a module that was being traced.
It was actually more than that. When functions are being traced, there's
a table of all functions that have a ref count of the number of active
tracers attached to that function. When a function trace callback is
registered to a function, the function's record ref count is incremented.
When it is unregistered, the function's record ref count is decremented.
If an inconsistency is detected (ref count goes below zero) the above
warning is shown and the function tracing is permanently disabled until
reboot.
The ftrace callback ops holds a hash of functions that it filters on
(and/or filters off). If the hash is empty, the default means to filter
all functions (for the filter_hash) or to disable no functions (for the
notrace_hash).
When a module is unloaded, it frees the function records that represent
the module functions. These records exist on their own pages, that is
function records for one module will not exist on the same page as
function records for other modules or even the core kernel.
Now when a module unloads, the records that represents its functions are
freed. When the module is loaded again, the records are recreated with
a default ref count of zero (unless there's a callback that traces all
functions, then they will also be traced, and the ref count will be
incremented).
The problem is that if an ftrace callback hash includes functions of the
module being unloaded, those hash entries will not be removed. If the
module is reloaded in the same location, the hash entries still point
to the functions of the module but the module's ref counts do not reflect
that.
With the help of Steve and Joern, we found a reproducer:
Using uinput module and uinput_release function.
cd /sys/kernel/debug/tracing
modprobe uinput
echo uinput_release > set_ftrace_filter
echo function > current_tracer
rmmod uinput
modprobe uinput
# check /proc/modules to see if loaded in same addr, otherwise try again
echo nop > current_tracer
[BOOM]
The above loads the uinput module, which creates a table of functions that
can be traced within the module.
We add uinput_release to the filter_hash to trace just that function.
Enable function tracincg, which increments the ref count of the record
associated to uinput_release.
Remove uinput, which frees the records including the one that represents
uinput_release.
Load the uinput module again (and make sure it's at the same address).
This recreates the function records all with a ref count of zero,
including uinput_release.
Disable function tracing, which will decrement the ref count for uinput_release
which is now zero because of the module removal and reload, and we have
a mismatch (below zero ref count).
The solution is to check all currently tracing ftrace callbacks to see if any
are tracing any of the module's functions when a module is loaded (it already does
that with callbacks that trace all functions). If a callback happens to have
a module function being traced, it increments that records ref count and starts
tracing that function.
There may be a strange side effect with this, where tracing module functions
on unload and then reloading a new module may have that new module's functions
being traced. This may be something that confuses the user, but it's not
a big deal. Another approach is to disable all callback hashes on module unload,
but this leaves some ftrace callbacks that may not be registered, but can
still have hashes tracing the module's function where ftrace doesn't know about
it. That situation can cause the same bug. This solution solves that case too.
Another benefit of this solution, is it is possible to trace a module's
function on unload and load.
Link: http://lkml.kernel.org/r/20130705142629.GA325@redhat.com
Reported-by: Jörn Engel <joern@logfs.org>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Steve Hodgson <steve@purestorage.com>
Tested-by: Steve Hodgson <steve@purestorage.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-30 00:04:32 -04:00
2011-12-16 16:30:31 -05:00
/* If something went wrong, bail without enabling anything */
if ( unlikely ( ftrace_disabled ) )
return - 1 ;
2008-06-21 23:50:29 +05:30
2011-12-16 16:30:31 -05:00
p = & pg - > records [ i ] ;
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
p - > flags = rec_flags ;
2008-06-21 23:50:29 +05:30
2011-12-16 16:30:31 -05:00
/*
* Do the initial record conversion from mcount jump
* to the NOP instructions .
*/
2021-07-28 23:25:45 +02:00
if ( init_nop & & ! ftrace_nop_initialize ( mod , p ) )
2011-12-16 16:30:31 -05:00
break ;
2009-10-13 16:33:53 -04:00
2014-02-24 19:59:56 +01:00
update_cnt + + ;
2009-10-13 16:33:53 -04:00
}
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
}
2008-05-12 21:20:46 +02:00
stop = ftrace_now ( raw_smp_processor_id ( ) ) ;
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
ftrace_update_time = stop - start ;
2014-02-24 19:59:56 +01:00
ftrace_update_tot_cnt + = update_cnt ;
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
2008-05-12 21:20:42 +02:00
return 0 ;
}
2011-12-16 16:23:44 -05:00
static int ftrace_allocate_records ( struct ftrace_page * pg , int count )
2008-05-12 21:20:43 +02:00
{
2011-12-16 16:23:44 -05:00
int order ;
2020-10-05 20:37:41 -04:00
int pages ;
2008-05-12 21:20:43 +02:00
int cnt ;
2011-12-16 16:23:44 -05:00
if ( WARN_ON ( ! count ) )
return - EINVAL ;
2021-04-01 16:40:32 -04:00
/* We want to fill as much as possible, with no empty pages */
2020-08-31 11:11:02 +08:00
pages = DIV_ROUND_UP ( count , ENTRIES_PER_PAGE ) ;
2021-04-01 16:40:32 -04:00
order = fls ( pages ) - 1 ;
2008-05-12 21:20:43 +02:00
2011-12-16 16:23:44 -05:00
again :
pg - > records = ( void * ) __get_free_pages ( GFP_KERNEL | __GFP_ZERO , order ) ;
2008-05-12 21:20:43 +02:00
2011-12-16 16:23:44 -05:00
if ( ! pg - > records ) {
/* if we can't allocate this size, try something smaller */
if ( ! order )
return - ENOMEM ;
2022-11-09 09:44:33 +00:00
order - - ;
2011-12-16 16:23:44 -05:00
goto again ;
}
2008-05-12 21:20:43 +02:00
2019-10-01 14:38:07 -04:00
ftrace_number_of_pages + = 1 < < order ;
ftrace_number_of_groups + + ;
2011-12-16 16:23:44 -05:00
cnt = ( PAGE_SIZE < < order ) / ENTRY_SIZE ;
2021-04-01 16:14:17 -04:00
pg - > order = order ;
2008-05-12 21:20:43 +02:00
2011-12-16 16:23:44 -05:00
if ( cnt > count )
cnt = count ;
return cnt ;
}
2023-07-12 14:04:52 +08:00
static void ftrace_free_pages ( struct ftrace_page * pages )
{
struct ftrace_page * pg = pages ;
while ( pg ) {
if ( pg - > records ) {
free_pages ( ( unsigned long ) pg - > records , pg - > order ) ;
ftrace_number_of_pages - = 1 < < pg - > order ;
}
pages = pg - > next ;
kfree ( pg ) ;
pg = pages ;
ftrace_number_of_groups - - ;
}
}
2011-12-16 16:23:44 -05:00
static struct ftrace_page *
ftrace_allocate_pages ( unsigned long num_to_init )
{
struct ftrace_page * start_pg ;
struct ftrace_page * pg ;
int cnt ;
if ( ! num_to_init )
2019-03-24 00:05:23 +05:30
return NULL ;
2011-12-16 16:23:44 -05:00
start_pg = pg = kzalloc ( sizeof ( * pg ) , GFP_KERNEL ) ;
if ( ! pg )
return NULL ;
/*
* Try to allocate as much as possible in one continues
* location that fills in all of the space . We want to
* waste as little space as possible .
*/
for ( ; ; ) {
cnt = ftrace_allocate_records ( pg , num_to_init ) ;
if ( cnt < 0 )
goto free_pages ;
num_to_init - = cnt ;
if ( ! num_to_init )
2008-05-12 21:20:43 +02:00
break ;
2011-12-16 16:23:44 -05:00
pg - > next = kzalloc ( sizeof ( * pg ) , GFP_KERNEL ) ;
if ( ! pg - > next )
goto free_pages ;
2008-05-12 21:20:43 +02:00
pg = pg - > next ;
}
2011-12-16 16:23:44 -05:00
return start_pg ;
free_pages :
2023-07-12 14:04:52 +08:00
ftrace_free_pages ( start_pg ) ;
2011-12-16 16:23:44 -05:00
pr_info ( " ftrace: FAILED to allocate memory for functions \n " ) ;
return NULL ;
}
2008-05-12 21:20:43 +02:00
# define FTRACE_BUFF_MAX (KSYM_SYMBOL_LEN+4) /* room for wildcards */
struct ftrace_iterator {
2010-09-10 11:47:43 -04:00
loff_t pos ;
2010-09-09 10:00:28 -04:00
loff_t func_pos ;
2017-06-23 16:05:11 -04:00
loff_t mod_pos ;
2010-09-09 10:00:28 -04:00
struct ftrace_page * pg ;
struct dyn_ftrace * func ;
struct ftrace_func_probe * probe ;
2017-04-04 21:31:28 -04:00
struct ftrace_func_entry * probe_entry ;
2010-09-09 10:00:28 -04:00
struct trace_parser parser ;
2011-04-29 20:59:51 -04:00
struct ftrace_hash * hash ;
2011-05-02 17:34:47 -04:00
struct ftrace_ops * ops ;
2017-06-23 16:05:11 -04:00
struct trace_array * tr ;
struct list_head * mod_list ;
2017-04-04 21:31:28 -04:00
int pidx ;
2010-09-09 10:00:28 -04:00
int idx ;
unsigned flags ;
2008-05-12 21:20:43 +02:00
} ;
2009-02-16 15:28:00 -05:00
static void *
2017-04-04 21:31:28 -04:00
t_probe_next ( struct seq_file * m , loff_t * pos )
2009-02-16 15:28:00 -05:00
{
struct ftrace_iterator * iter = m - > private ;
2017-04-20 11:31:35 -04:00
struct trace_array * tr = iter - > ops - > private ;
2017-04-05 13:12:55 -04:00
struct list_head * func_probes ;
2017-04-04 21:31:28 -04:00
struct ftrace_hash * hash ;
struct list_head * next ;
2010-09-09 10:00:28 -04:00
struct hlist_node * hnd = NULL ;
2009-02-16 15:28:00 -05:00
struct hlist_head * hhd ;
2017-04-04 21:31:28 -04:00
int size ;
2009-02-16 15:28:00 -05:00
( * pos ) + + ;
2010-09-10 11:47:43 -04:00
iter - > pos = * pos ;
2009-02-16 15:28:00 -05:00
2017-04-05 13:12:55 -04:00
if ( ! tr )
2009-02-16 15:28:00 -05:00
return NULL ;
2017-04-05 13:12:55 -04:00
func_probes = & tr - > func_probes ;
if ( list_empty ( func_probes ) )
2009-02-16 15:28:00 -05:00
return NULL ;
2017-04-04 21:31:28 -04:00
if ( ! iter - > probe ) {
2017-04-05 13:12:55 -04:00
next = func_probes - > next ;
2017-04-18 14:50:39 -04:00
iter - > probe = list_entry ( next , struct ftrace_func_probe , list ) ;
2017-04-04 21:31:28 -04:00
}
if ( iter - > probe_entry )
hnd = & iter - > probe_entry - > hlist ;
hash = iter - > probe - > ops . func_hash - > filter_hash ;
ftrace: Fix NULL pointer dereference in t_probe_next()
LTP testsuite on powerpc results in the below crash:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc00000000029d800
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
...
CPU: 68 PID: 96584 Comm: cat Kdump: loaded Tainted: G W
NIP: c00000000029d800 LR: c00000000029dac4 CTR: c0000000001e6ad0
REGS: c0002017fae8ba10 TRAP: 0300 Tainted: G W
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28022422 XER: 20040000
CFAR: c00000000029d90c DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
...
NIP [c00000000029d800] t_probe_next+0x60/0x180
LR [c00000000029dac4] t_mod_start+0x1a4/0x1f0
Call Trace:
[c0002017fae8bc90] [c000000000cdbc40] _cond_resched+0x10/0xb0 (unreliable)
[c0002017fae8bce0] [c0000000002a15b0] t_start+0xf0/0x1c0
[c0002017fae8bd30] [c0000000004ec2b4] seq_read+0x184/0x640
[c0002017fae8bdd0] [c0000000004a57bc] sys_read+0x10c/0x300
[c0002017fae8be30] [c00000000000b388] system_call+0x5c/0x70
The test (ftrace_set_ftrace_filter.sh) is part of ftrace stress tests
and the crash happens when the test does 'cat
$TRACING_PATH/set_ftrace_filter'.
The address points to the second line below, in t_probe_next(), where
filter_hash is dereferenced:
hash = iter->probe->ops.func_hash->filter_hash;
size = 1 << hash->size_bits;
This happens due to a race with register_ftrace_function_probe(). A new
ftrace_func_probe is created and added into the func_probes list in
trace_array under ftrace_lock. However, before initializing the filter,
we drop ftrace_lock, and re-acquire it after acquiring regex_lock. If
another process is trying to read set_ftrace_filter, it will be able to
acquire ftrace_lock during this window and it will end up seeing a NULL
filter_hash.
Fix this by just checking for a NULL filter_hash in t_probe_next(). If
the filter_hash is NULL, then this probe is just being added and we can
simply return from here.
Link: http://lkml.kernel.org/r/05e021f757625cbbb006fad41380323dbe4e3b43.1562249521.git.naveen.n.rao@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Fixes: 7b60f3d876156 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-04 20:04:41 +05:30
2019-08-30 16:30:01 -04:00
/*
* A probe being registered may temporarily have an empty hash
* and it ' s at the end of the func_probes list .
*/
if ( ! hash | | hash = = EMPTY_HASH )
ftrace: Fix NULL pointer dereference in t_probe_next()
LTP testsuite on powerpc results in the below crash:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc00000000029d800
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
...
CPU: 68 PID: 96584 Comm: cat Kdump: loaded Tainted: G W
NIP: c00000000029d800 LR: c00000000029dac4 CTR: c0000000001e6ad0
REGS: c0002017fae8ba10 TRAP: 0300 Tainted: G W
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28022422 XER: 20040000
CFAR: c00000000029d90c DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
...
NIP [c00000000029d800] t_probe_next+0x60/0x180
LR [c00000000029dac4] t_mod_start+0x1a4/0x1f0
Call Trace:
[c0002017fae8bc90] [c000000000cdbc40] _cond_resched+0x10/0xb0 (unreliable)
[c0002017fae8bce0] [c0000000002a15b0] t_start+0xf0/0x1c0
[c0002017fae8bd30] [c0000000004ec2b4] seq_read+0x184/0x640
[c0002017fae8bdd0] [c0000000004a57bc] sys_read+0x10c/0x300
[c0002017fae8be30] [c00000000000b388] system_call+0x5c/0x70
The test (ftrace_set_ftrace_filter.sh) is part of ftrace stress tests
and the crash happens when the test does 'cat
$TRACING_PATH/set_ftrace_filter'.
The address points to the second line below, in t_probe_next(), where
filter_hash is dereferenced:
hash = iter->probe->ops.func_hash->filter_hash;
size = 1 << hash->size_bits;
This happens due to a race with register_ftrace_function_probe(). A new
ftrace_func_probe is created and added into the func_probes list in
trace_array under ftrace_lock. However, before initializing the filter,
we drop ftrace_lock, and re-acquire it after acquiring regex_lock. If
another process is trying to read set_ftrace_filter, it will be able to
acquire ftrace_lock during this window and it will end up seeing a NULL
filter_hash.
Fix this by just checking for a NULL filter_hash in t_probe_next(). If
the filter_hash is NULL, then this probe is just being added and we can
simply return from here.
Link: http://lkml.kernel.org/r/05e021f757625cbbb006fad41380323dbe4e3b43.1562249521.git.naveen.n.rao@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Fixes: 7b60f3d876156 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-04 20:04:41 +05:30
return NULL ;
2017-04-04 21:31:28 -04:00
size = 1 < < hash - > size_bits ;
retry :
if ( iter - > pidx > = size ) {
2017-04-05 13:12:55 -04:00
if ( iter - > probe - > list . next = = func_probes )
2017-04-04 21:31:28 -04:00
return NULL ;
next = iter - > probe - > list . next ;
2017-04-18 14:50:39 -04:00
iter - > probe = list_entry ( next , struct ftrace_func_probe , list ) ;
2017-04-04 21:31:28 -04:00
hash = iter - > probe - > ops . func_hash - > filter_hash ;
size = 1 < < hash - > size_bits ;
iter - > pidx = 0 ;
}
hhd = & hash - > buckets [ iter - > pidx ] ;
2009-02-16 15:28:00 -05:00
if ( hlist_empty ( hhd ) ) {
2017-04-04 21:31:28 -04:00
iter - > pidx + + ;
2009-02-16 15:28:00 -05:00
hnd = NULL ;
goto retry ;
}
if ( ! hnd )
hnd = hhd - > first ;
else {
hnd = hnd - > next ;
if ( ! hnd ) {
2017-04-04 21:31:28 -04:00
iter - > pidx + + ;
2009-02-16 15:28:00 -05:00
goto retry ;
}
}
2010-09-09 10:00:28 -04:00
if ( WARN_ON_ONCE ( ! hnd ) )
return NULL ;
2017-04-04 21:31:28 -04:00
iter - > probe_entry = hlist_entry ( hnd , struct ftrace_func_entry , hlist ) ;
2010-09-09 10:00:28 -04:00
return iter ;
2009-02-16 15:28:00 -05:00
}
2017-04-04 21:31:28 -04:00
static void * t_probe_start ( struct seq_file * m , loff_t * pos )
2009-02-16 15:28:00 -05:00
{
struct ftrace_iterator * iter = m - > private ;
void * p = NULL ;
2009-06-24 09:54:54 +08:00
loff_t l ;
2017-04-04 21:31:28 -04:00
if ( ! ( iter - > flags & FTRACE_ITER_DO_PROBES ) )
2011-12-19 15:21:16 -05:00
return NULL ;
2017-06-23 16:05:11 -04:00
if ( iter - > mod_pos > * pos )
2010-09-09 08:43:22 -04:00
return NULL ;
2009-02-16 15:28:00 -05:00
2017-04-04 21:31:28 -04:00
iter - > probe = NULL ;
iter - > probe_entry = NULL ;
iter - > pidx = 0 ;
2017-06-23 16:05:11 -04:00
for ( l = 0 ; l < = ( * pos - iter - > mod_pos ) ; ) {
2017-04-04 21:31:28 -04:00
p = t_probe_next ( m , & l ) ;
2009-06-24 09:54:54 +08:00
if ( ! p )
break ;
}
2010-09-09 10:00:28 -04:00
if ( ! p )
return NULL ;
2010-09-10 11:47:43 -04:00
/* Only set this if we have an item */
2017-04-04 21:31:28 -04:00
iter - > flags | = FTRACE_ITER_PROBE ;
2010-09-10 11:47:43 -04:00
2010-09-09 10:00:28 -04:00
return iter ;
2009-02-16 15:28:00 -05:00
}
2010-09-09 10:00:28 -04:00
static int
2017-04-04 21:31:28 -04:00
t_probe_show ( struct seq_file * m , struct ftrace_iterator * iter )
2009-02-16 15:28:00 -05:00
{
2017-04-04 21:31:28 -04:00
struct ftrace_func_entry * probe_entry ;
2017-04-18 14:50:39 -04:00
struct ftrace_probe_ops * probe_ops ;
struct ftrace_func_probe * probe ;
2009-02-16 15:28:00 -05:00
2017-04-04 21:31:28 -04:00
probe = iter - > probe ;
probe_entry = iter - > probe_entry ;
2009-02-16 15:28:00 -05:00
2017-04-04 21:31:28 -04:00
if ( WARN_ON_ONCE ( ! probe | | ! probe_entry ) )
2010-09-09 10:00:28 -04:00
return - EIO ;
2009-02-16 15:28:00 -05:00
2017-04-18 14:50:39 -04:00
probe_ops = probe - > probe_ops ;
2009-02-16 23:06:01 -05:00
2017-04-18 14:50:39 -04:00
if ( probe_ops - > print )
tracing/ftrace: Add a better way to pass data via the probe functions
With the redesign of the registration and execution of the function probes
(triggers), data can now be passed from the setup of the probe to the probe
callers that are specific to the trace_array it is on. Although, all probes
still only affect the toplevel trace array, this change will allow for
instances to have their own probes separated from other instances and the
top array.
That is, something like the stacktrace probe can be set to trace only in an
instance and not the toplevel trace array. This isn't implement yet, but
this change sets the ground work for the change.
When a probe callback is triggered (someone writes the probe format into
set_ftrace_filter), it calls register_ftrace_function_probe() passing in
init_data that will be used to initialize the probe. Then for every matching
function, register_ftrace_function_probe() will call the probe_ops->init()
function with the init data that was passed to it, as well as an address to
a place holder that is associated with the probe and the instance. The first
occurrence will have a NULL in the pointer. The init() function will then
initialize it. If other probes are added, or more functions are part of the
probe, the place holder will be passed to the init() function with the place
holder data that it was initialized to the last time.
Then this place_holder is passed to each of the other probe_ops functions,
where it can be used in the function callback. When the probe_ops free()
function is called, it can be called either with the rip of the function
that is being removed from the probe, or zero, indicating that there are no
more functions attached to the probe, and the place holder is about to be
freed. This gives the probe_ops a way to free the data it assigned to the
place holder if it was allocade during the first init call.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-19 22:39:44 -04:00
return probe_ops - > print ( m , probe_entry - > ip , probe_ops , probe - > data ) ;
2009-02-16 15:28:00 -05:00
2017-04-18 14:50:39 -04:00
seq_printf ( m , " %ps:%ps \n " , ( void * ) probe_entry - > ip ,
( void * ) probe_ops - > func ) ;
2009-02-16 15:28:00 -05:00
return 0 ;
}
2017-06-23 16:05:11 -04:00
static void *
t_mod_next ( struct seq_file * m , loff_t * pos )
{
struct ftrace_iterator * iter = m - > private ;
struct trace_array * tr = iter - > tr ;
( * pos ) + + ;
iter - > pos = * pos ;
iter - > mod_list = iter - > mod_list - > next ;
if ( iter - > mod_list = = & tr - > mod_trace | |
iter - > mod_list = = & tr - > mod_notrace ) {
iter - > flags & = ~ FTRACE_ITER_MOD ;
return NULL ;
}
iter - > mod_pos = * pos ;
return iter ;
}
static void * t_mod_start ( struct seq_file * m , loff_t * pos )
{
struct ftrace_iterator * iter = m - > private ;
void * p = NULL ;
loff_t l ;
if ( iter - > func_pos > * pos )
return NULL ;
iter - > mod_pos = iter - > func_pos ;
/* probes are only available if tr is set */
if ( ! iter - > tr )
return NULL ;
for ( l = 0 ; l < = ( * pos - iter - > func_pos ) ; ) {
p = t_mod_next ( m , & l ) ;
if ( ! p )
break ;
}
if ( ! p ) {
iter - > flags & = ~ FTRACE_ITER_MOD ;
return t_probe_start ( m , pos ) ;
}
/* Only set this if we have an item */
iter - > flags | = FTRACE_ITER_MOD ;
return iter ;
}
static int
t_mod_show ( struct seq_file * m , struct ftrace_iterator * iter )
{
struct ftrace_mod_load * ftrace_mod ;
struct trace_array * tr = iter - > tr ;
if ( WARN_ON_ONCE ( ! iter - > mod_list ) | |
iter - > mod_list = = & tr - > mod_trace | |
iter - > mod_list = = & tr - > mod_notrace )
return - EIO ;
ftrace_mod = list_entry ( iter - > mod_list , struct ftrace_mod_load , list ) ;
if ( ftrace_mod - > func )
seq_printf ( m , " %s " , ftrace_mod - > func ) ;
else
seq_putc ( m , ' * ' ) ;
seq_printf ( m , " :mod:%s \n " , ftrace_mod - > module ) ;
return 0 ;
}
2008-05-12 21:20:51 +02:00
static void *
2017-03-29 22:45:18 -04:00
t_func_next ( struct seq_file * m , loff_t * pos )
2008-05-12 21:20:43 +02:00
{
struct ftrace_iterator * iter = m - > private ;
struct dyn_ftrace * rec = NULL ;
( * pos ) + + ;
2009-02-16 11:21:52 -05:00
2008-05-12 21:20:43 +02:00
retry :
if ( iter - > idx > = iter - > pg - > index ) {
if ( iter - > pg - > next ) {
iter - > pg = iter - > pg - > next ;
iter - > idx = 0 ;
goto retry ;
}
} else {
rec = & iter - > pg - > records [ iter - > idx + + ] ;
2017-03-29 14:55:49 -04:00
if ( ( ( iter - > flags & ( FTRACE_ITER_FILTER | FTRACE_ITER_NOTRACE ) ) & &
! ftrace_lookup_ip ( iter - > hash , rec - > ip ) ) | |
2011-05-03 14:39:21 -04:00
( ( iter - > flags & FTRACE_ITER_ENABLED ) & &
2023-01-24 09:56:53 -05:00
! ( rec - > flags & FTRACE_FL_ENABLED ) ) | |
( ( iter - > flags & FTRACE_ITER_TOUCHED ) & &
! ( rec - > flags & FTRACE_FL_TOUCHED ) ) ) {
2011-05-03 14:39:21 -04:00
2008-05-12 21:20:43 +02:00
rec = NULL ;
goto retry ;
}
}
2010-09-09 10:00:28 -04:00
if ( ! rec )
2017-03-29 22:45:18 -04:00
return NULL ;
2010-09-09 10:00:28 -04:00
2017-03-29 22:45:18 -04:00
iter - > pos = iter - > func_pos = * pos ;
2010-09-09 10:00:28 -04:00
iter - > func = rec ;
return iter ;
2008-05-12 21:20:43 +02:00
}
2017-03-29 22:45:18 -04:00
static void *
t_next ( struct seq_file * m , void * v , loff_t * pos )
{
struct ftrace_iterator * iter = m - > private ;
2017-06-23 16:05:11 -04:00
loff_t l = * pos ; /* t_probe_start() must use original pos */
2017-03-29 22:45:18 -04:00
void * ret ;
if ( unlikely ( ftrace_disabled ) )
return NULL ;
2017-04-04 21:31:28 -04:00
if ( iter - > flags & FTRACE_ITER_PROBE )
return t_probe_next ( m , pos ) ;
2017-03-29 22:45:18 -04:00
2017-06-23 16:05:11 -04:00
if ( iter - > flags & FTRACE_ITER_MOD )
return t_mod_next ( m , pos ) ;
2017-03-29 22:45:18 -04:00
if ( iter - > flags & FTRACE_ITER_PRINTALL ) {
2017-04-04 21:31:28 -04:00
/* next must increment pos, and t_probe_start does not */
2017-03-29 22:45:18 -04:00
( * pos ) + + ;
2017-06-23 16:05:11 -04:00
return t_mod_start ( m , & l ) ;
2017-03-29 22:45:18 -04:00
}
ret = t_func_next ( m , pos ) ;
if ( ! ret )
2017-06-23 16:05:11 -04:00
return t_mod_start ( m , & l ) ;
2017-03-29 22:45:18 -04:00
return ret ;
}
2010-09-10 11:47:43 -04:00
static void reset_iter_read ( struct ftrace_iterator * iter )
{
iter - > pos = 0 ;
iter - > func_pos = 0 ;
2017-06-23 16:05:11 -04:00
iter - > flags & = ~ ( FTRACE_ITER_PRINTALL | FTRACE_ITER_PROBE | FTRACE_ITER_MOD ) ;
2008-05-12 21:20:43 +02:00
}
static void * t_start ( struct seq_file * m , loff_t * pos )
{
struct ftrace_iterator * iter = m - > private ;
void * p = NULL ;
2009-06-24 09:54:19 +08:00
loff_t l ;
2008-05-12 21:20:43 +02:00
2009-02-16 15:28:00 -05:00
mutex_lock ( & ftrace_lock ) ;
2011-04-21 23:16:46 -04:00
if ( unlikely ( ftrace_disabled ) )
return NULL ;
2010-09-10 11:47:43 -04:00
/*
* If an lseek was done , then reset and start from beginning .
*/
if ( * pos < iter - > pos )
reset_iter_read ( iter ) ;
2009-02-16 11:21:52 -05:00
/*
* For set_ftrace_filter reading , if we have the filter
* off , we can short cut and just print out that all
* functions are enabled .
*/
2017-03-29 14:55:49 -04:00
if ( ( iter - > flags & ( FTRACE_ITER_FILTER | FTRACE_ITER_NOTRACE ) ) & &
ftrace_hash_empty ( iter - > hash ) ) {
2017-03-30 16:51:43 -04:00
iter - > func_pos = 1 ; /* Account for the message */
2009-02-16 11:21:52 -05:00
if ( * pos > 0 )
2017-06-23 16:05:11 -04:00
return t_mod_start ( m , pos ) ;
2009-02-16 11:21:52 -05:00
iter - > flags | = FTRACE_ITER_PRINTALL ;
2010-09-09 16:34:59 -07:00
/* reset in case of seek/pread */
2017-04-04 21:31:28 -04:00
iter - > flags & = ~ FTRACE_ITER_PROBE ;
2009-02-16 11:21:52 -05:00
return iter ;
}
2017-06-23 16:05:11 -04:00
if ( iter - > flags & FTRACE_ITER_MOD )
return t_mod_start ( m , pos ) ;
2009-02-16 15:28:00 -05:00
2010-09-10 11:47:43 -04:00
/*
* Unfortunately , we need to restart at ftrace_pages_start
* every time we let go of the ftrace_mutex . This is because
* those pointers can change without the lock .
*/
2009-06-24 09:54:19 +08:00
iter - > pg = ftrace_pages_start ;
iter - > idx = 0 ;
for ( l = 0 ; l < = * pos ; ) {
2017-03-29 22:45:18 -04:00
p = t_func_next ( m , & l ) ;
2009-06-24 09:54:19 +08:00
if ( ! p )
break ;
2008-11-28 12:13:21 +08:00
}
function tracing: fix wrong pos computing when read buffer has been fulfilled
Impact: make output of available_filter_functions complete
phenomenon:
The first value of dyn_ftrace_total_info is not equal with
`cat available_filter_functions | wc -l`, but they should be equal.
root cause:
When printing functions with seq_printf in t_show, if the read buffer
is just overflowed by current function record, then this function
won't be printed to user space through read buffer, it will
just be dropped. So we can't see this function printing.
So, every time the last function to fill the read buffer, if overflowed,
will be dropped.
This also applies to set_ftrace_filter if set_ftrace_filter has
more bytes than read buffer.
fix:
Through checking return value of seq_printf, if less than 0, we know
this function doesn't be printed. Then we decrease position to force
this function to be printed next time, in next read buffer.
Another little fix is to show correct allocating pages count.
Signed-off-by: walimis <walimisdev@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-15 15:19:06 +08:00
2011-12-19 15:21:16 -05:00
if ( ! p )
2017-06-23 16:05:11 -04:00
return t_mod_start ( m , pos ) ;
2010-09-09 10:00:28 -04:00
return iter ;
2008-05-12 21:20:43 +02:00
}
static void t_stop ( struct seq_file * m , void * p )
{
2009-02-16 15:28:00 -05:00
mutex_unlock ( & ftrace_lock ) ;
2008-05-12 21:20:43 +02:00
}
2014-07-03 14:51:36 -04:00
void * __weak
arch_ftrace_trampoline_func ( struct ftrace_ops * ops , struct dyn_ftrace * rec )
{
return NULL ;
}
static void add_trampoline_func ( struct seq_file * m , struct ftrace_ops * ops ,
struct dyn_ftrace * rec )
{
void * ptr ;
ptr = arch_ftrace_trampoline_func ( ops , rec ) ;
if ( ptr )
seq_printf ( m , " ->%pS " , ptr ) ;
}
ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function
If an unused weak function was traced, it's call to fentry will still
exist, which gets added into the __mcount_loc table. Ftrace will use
kallsyms to retrieve the name for each location in __mcount_loc to display
it in the available_filter_functions and used to enable functions via the
name matching in set_ftrace_filter/notrace. Enabling these functions do
nothing but enable an unused call to ftrace_caller. If a traced weak
function is overridden, the symbol of the function would be used for it,
which will either created duplicate names, or if the previous function was
not traced, it would be incorrectly be listed in available_filter_functions
as a function that can be traced.
This became an issue with BPF[1] as there are tooling that enables the
direct callers via ftrace but then checks to see if the functions were
actually enabled. The case of one function that was marked notrace, but
was followed by an unused weak function that was traced. The unused
function's call to fentry was added to the __mcount_loc section, and
kallsyms retrieved the untraced function's symbol as the weak function was
overridden. Since the untraced function would not get traced, the BPF
check would detect this and fail.
The real fix would be to fix kallsyms to not show addresses of weak
functions as the function before it. But that would require adding code in
the build to add function size to kallsyms so that it can know when the
function ends instead of just using the start of the next known symbol.
In the mean time, this is a work around. Add a FTRACE_MCOUNT_MAX_OFFSET
macro that if defined, ftrace will ignore any function that has its call
to fentry/mcount that has an offset from the symbol that is greater than
FTRACE_MCOUNT_MAX_OFFSET.
If CONFIG_HAVE_FENTRY is defined for x86, define FTRACE_MCOUNT_MAX_OFFSET
to zero (unless IBT is enabled), which will have ftrace ignore all locations
that are not at the start of the function (or one after the ENDBR
instruction).
A worker thread is added at boot up to scan all the ftrace record entries,
and will mark any that fail the FTRACE_MCOUNT_MAX_OFFSET test as disabled.
They will still appear in the available_filter_functions file as:
__ftrace_invalid_address___<invalid-offset>
(showing the offset that caused it to be invalid).
This is required for tools that use libtracefs (like trace-cmd does) that
scan the available_filter_functions and enable set_ftrace_filter and
set_ftrace_notrace using indexes of the function listed in the file (this
is a speedup, as enabling thousands of files via names is an O(n^2)
operation and can take minutes to complete, where the indexing takes less
than a second).
The invalid functions cannot be removed from available_filter_functions as
the names there correspond to the ftrace records in the array that manages
them (and the indexing depends on this).
[1] https://lore.kernel.org/all/20220412094923.0abe90955e5db486b7bca279@kernel.org/
Link: https://lkml.kernel.org/r/20220526141912.794c2786@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 14:19:12 -04:00
# ifdef FTRACE_MCOUNT_MAX_OFFSET
/*
* Weak functions can still have an mcount / fentry that is saved in
* the __mcount_loc section . These can be detected by having a
* symbol offset of greater than FTRACE_MCOUNT_MAX_OFFSET , as the
* symbol found by kallsyms is not the function that the mcount / fentry
* is part of . The offset is much greater in these cases .
*
* Test the record to make sure that the ip points to a valid kallsyms
* and if not , mark it disabled .
*/
static int test_for_valid_rec ( struct dyn_ftrace * rec )
{
char str [ KSYM_SYMBOL_LEN ] ;
unsigned long offset ;
const char * ret ;
ret = kallsyms_lookup ( rec - > ip , NULL , & offset , NULL , str ) ;
/* Weak functions can cause invalid addresses */
if ( ! ret | | offset > FTRACE_MCOUNT_MAX_OFFSET ) {
rec - > flags | = FTRACE_FL_DISABLED ;
return 0 ;
}
return 1 ;
}
static struct workqueue_struct * ftrace_check_wq __initdata ;
static struct work_struct ftrace_check_work __initdata ;
/*
* Scan all the mcount / fentry entries to make sure they are valid .
*/
static __init void ftrace_check_work_func ( struct work_struct * work )
{
struct ftrace_page * pg ;
struct dyn_ftrace * rec ;
mutex_lock ( & ftrace_lock ) ;
do_for_each_ftrace_rec ( pg , rec ) {
test_for_valid_rec ( rec ) ;
} while_for_each_ftrace_rec ( ) ;
mutex_unlock ( & ftrace_lock ) ;
}
static int __init ftrace_check_for_weak_functions ( void )
{
INIT_WORK ( & ftrace_check_work , ftrace_check_work_func ) ;
ftrace_check_wq = alloc_workqueue ( " ftrace_check_wq " , WQ_UNBOUND , 0 ) ;
queue_work ( ftrace_check_wq , & ftrace_check_work ) ;
return 0 ;
}
static int __init ftrace_check_sync ( void )
{
/* Make sure the ftrace_check updates are finished */
if ( ftrace_check_wq )
destroy_workqueue ( ftrace_check_wq ) ;
return 0 ;
}
late_initcall_sync ( ftrace_check_sync ) ;
subsys_initcall ( ftrace_check_for_weak_functions ) ;
static int print_rec ( struct seq_file * m , unsigned long ip )
{
unsigned long offset ;
char str [ KSYM_SYMBOL_LEN ] ;
char * modname ;
const char * ret ;
ret = kallsyms_lookup ( ip , NULL , & offset , & modname , str ) ;
/* Weak functions can cause invalid addresses */
if ( ! ret | | offset > FTRACE_MCOUNT_MAX_OFFSET ) {
snprintf ( str , KSYM_SYMBOL_LEN , " %s_%ld " ,
FTRACE_INVALID_FUNCTION , offset ) ;
ret = NULL ;
}
seq_puts ( m , str ) ;
if ( modname )
seq_printf ( m , " [%s] " , modname ) ;
return ret = = NULL ? - 1 : 0 ;
}
# else
static inline int test_for_valid_rec ( struct dyn_ftrace * rec )
{
return 1 ;
}
static inline int print_rec ( struct seq_file * m , unsigned long ip )
{
seq_printf ( m , " %ps " , ( void * ) ip ) ;
return 0 ;
}
# endif
2008-05-12 21:20:43 +02:00
static int t_show ( struct seq_file * m , void * v )
{
2009-02-16 11:21:52 -05:00
struct ftrace_iterator * iter = m - > private ;
2010-09-09 10:00:28 -04:00
struct dyn_ftrace * rec ;
2008-05-12 21:20:43 +02:00
2017-04-04 21:31:28 -04:00
if ( iter - > flags & FTRACE_ITER_PROBE )
return t_probe_show ( m , iter ) ;
2009-02-16 15:28:00 -05:00
2017-06-23 16:05:11 -04:00
if ( iter - > flags & FTRACE_ITER_MOD )
return t_mod_show ( m , iter ) ;
2009-02-16 11:21:52 -05:00
if ( iter - > flags & FTRACE_ITER_PRINTALL ) {
2014-06-13 16:24:06 +09:00
if ( iter - > flags & FTRACE_ITER_NOTRACE )
2014-11-08 21:42:10 +01:00
seq_puts ( m , " #### no functions disabled #### \n " ) ;
2014-06-13 16:24:06 +09:00
else
2014-11-08 21:42:10 +01:00
seq_puts ( m , " #### all functions enabled #### \n " ) ;
2009-02-16 11:21:52 -05:00
return 0 ;
}
2010-09-09 10:00:28 -04:00
rec = iter - > func ;
2008-05-12 21:20:43 +02:00
if ( ! rec )
return 0 ;
2023-06-11 15:00:29 +02:00
if ( iter - > flags & FTRACE_ITER_ADDRS )
seq_printf ( m , " %lx " , rec - > ip ) ;
ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function
If an unused weak function was traced, it's call to fentry will still
exist, which gets added into the __mcount_loc table. Ftrace will use
kallsyms to retrieve the name for each location in __mcount_loc to display
it in the available_filter_functions and used to enable functions via the
name matching in set_ftrace_filter/notrace. Enabling these functions do
nothing but enable an unused call to ftrace_caller. If a traced weak
function is overridden, the symbol of the function would be used for it,
which will either created duplicate names, or if the previous function was
not traced, it would be incorrectly be listed in available_filter_functions
as a function that can be traced.
This became an issue with BPF[1] as there are tooling that enables the
direct callers via ftrace but then checks to see if the functions were
actually enabled. The case of one function that was marked notrace, but
was followed by an unused weak function that was traced. The unused
function's call to fentry was added to the __mcount_loc section, and
kallsyms retrieved the untraced function's symbol as the weak function was
overridden. Since the untraced function would not get traced, the BPF
check would detect this and fail.
The real fix would be to fix kallsyms to not show addresses of weak
functions as the function before it. But that would require adding code in
the build to add function size to kallsyms so that it can know when the
function ends instead of just using the start of the next known symbol.
In the mean time, this is a work around. Add a FTRACE_MCOUNT_MAX_OFFSET
macro that if defined, ftrace will ignore any function that has its call
to fentry/mcount that has an offset from the symbol that is greater than
FTRACE_MCOUNT_MAX_OFFSET.
If CONFIG_HAVE_FENTRY is defined for x86, define FTRACE_MCOUNT_MAX_OFFSET
to zero (unless IBT is enabled), which will have ftrace ignore all locations
that are not at the start of the function (or one after the ENDBR
instruction).
A worker thread is added at boot up to scan all the ftrace record entries,
and will mark any that fail the FTRACE_MCOUNT_MAX_OFFSET test as disabled.
They will still appear in the available_filter_functions file as:
__ftrace_invalid_address___<invalid-offset>
(showing the offset that caused it to be invalid).
This is required for tools that use libtracefs (like trace-cmd does) that
scan the available_filter_functions and enable set_ftrace_filter and
set_ftrace_notrace using indexes of the function listed in the file (this
is a speedup, as enabling thousands of files via names is an O(n^2)
operation and can take minutes to complete, where the indexing takes less
than a second).
The invalid functions cannot be removed from available_filter_functions as
the names there correspond to the ftrace records in the array that manages
them (and the indexing depends on this).
[1] https://lore.kernel.org/all/20220412094923.0abe90955e5db486b7bca279@kernel.org/
Link: https://lkml.kernel.org/r/20220526141912.794c2786@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 14:19:12 -04:00
if ( print_rec ( m , rec - > ip ) ) {
/* This should only happen when a rec is disabled */
WARN_ON_ONCE ( ! ( rec - > flags & FTRACE_FL_DISABLED ) ) ;
seq_putc ( m , ' \n ' ) ;
return 0 ;
}
2023-01-24 09:56:53 -05:00
if ( iter - > flags & ( FTRACE_ITER_ENABLED | FTRACE_ITER_TOUCHED ) ) {
2015-12-01 12:24:45 -05:00
struct ftrace_ops * ops ;
2014-07-03 14:51:36 -04:00
2023-05-02 21:32:33 -04:00
seq_printf ( m , " (%ld)%s%s%s%s%s " ,
2014-05-07 13:46:45 -04:00
ftrace_rec_count ( rec ) ,
2014-11-21 05:25:16 -05:00
rec - > flags & FTRACE_FL_REGS ? " R " : " " ,
2019-11-08 13:07:06 -05:00
rec - > flags & FTRACE_FL_IPMODIFY ? " I " : " " ,
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
rec - > flags & FTRACE_FL_DIRECT ? " D " : " " ,
2023-05-02 21:32:33 -04:00
rec - > flags & FTRACE_FL_CALL_OPS ? " O " : " " ,
rec - > flags & FTRACE_FL_MODIFIED ? " M " : " " ) ;
2014-05-09 16:54:59 -04:00
if ( rec - > flags & FTRACE_FL_TRAMP_EN ) {
2014-07-24 16:00:31 -04:00
ops = ftrace_find_tramp_ops_any ( rec ) ;
2015-11-25 15:12:38 -05:00
if ( ops ) {
do {
seq_printf ( m , " \t tramp: %pS (%pS) " ,
( void * ) ops - > trampoline ,
( void * ) ops - > func ) ;
2015-12-01 12:24:45 -05:00
add_trampoline_func ( m , ops , rec ) ;
2015-11-25 15:12:38 -05:00
ops = ftrace_find_tramp_ops_next ( rec , ops ) ;
} while ( ops ) ;
} else
2014-11-08 21:42:10 +01:00
seq_puts ( m , " \t tramp: ERROR! " ) ;
2015-12-01 12:24:45 -05:00
} else {
add_trampoline_func ( m , NULL , rec ) ;
2014-05-09 16:54:59 -04:00
}
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
if ( rec - > flags & FTRACE_FL_CALL_OPS_EN ) {
ops = ftrace_find_unique_ops ( rec ) ;
if ( ops ) {
seq_printf ( m , " \t ops: %pS (%pS) " ,
ops , ops - > func ) ;
} else {
seq_puts ( m , " \t ops: ERROR! " ) ;
}
}
2019-11-08 13:07:06 -05:00
if ( rec - > flags & FTRACE_FL_DIRECT ) {
unsigned long direct ;
2019-12-08 16:01:12 -08:00
direct = ftrace_find_rec_direct ( rec - > ip ) ;
2019-11-08 13:07:06 -05:00
if ( direct )
seq_printf ( m , " \n \t direct-->%pS " , ( void * ) direct ) ;
}
2020-05-29 22:12:14 +08:00
}
2014-05-09 16:54:59 -04:00
2014-11-08 21:42:10 +01:00
seq_putc ( m , ' \n ' ) ;
2008-05-12 21:20:43 +02:00
return 0 ;
}
2009-09-22 16:43:43 -07:00
static const struct seq_operations show_ftrace_seq_ops = {
2008-05-12 21:20:43 +02:00
. start = t_start ,
. next = t_next ,
. stop = t_stop ,
. show = t_show ,
} ;
2008-05-12 21:20:51 +02:00
static int
2008-05-12 21:20:43 +02:00
ftrace_avail_open ( struct inode * inode , struct file * file )
{
struct ftrace_iterator * iter ;
2019-10-11 17:22:50 -04:00
int ret ;
ret = security_locked_down ( LOCKDOWN_TRACEFS ) ;
if ( ret )
return ret ;
2008-05-12 21:20:43 +02:00
2008-05-12 21:20:48 +02:00
if ( unlikely ( ftrace_disabled ) )
return - ENODEV ;
2012-04-25 10:23:39 +02:00
iter = __seq_open_private ( file , & show_ftrace_seq_ops , sizeof ( * iter ) ) ;
2017-03-29 11:38:13 -04:00
if ( ! iter )
return - ENOMEM ;
2008-05-12 21:20:43 +02:00
2017-03-29 11:38:13 -04:00
iter - > pg = ftrace_pages_start ;
iter - > ops = & global_ops ;
return 0 ;
2008-05-12 21:20:43 +02:00
}
2011-05-03 14:39:21 -04:00
static int
ftrace_enabled_open ( struct inode * inode , struct file * file )
{
struct ftrace_iterator * iter ;
2019-10-11 17:22:50 -04:00
/*
* This shows us what functions are currently being
* traced and by what . Not sure if we want lockdown
* to hide such critical information for an admin .
* Although , perhaps it can show information we don ' t
* want people to see , but if something is tracing
* something , we probably want to know about it .
*/
2012-04-25 10:23:39 +02:00
iter = __seq_open_private ( file , & show_ftrace_seq_ops , sizeof ( * iter ) ) ;
2017-03-29 11:38:13 -04:00
if ( ! iter )
return - ENOMEM ;
2011-05-03 14:39:21 -04:00
2017-03-29 11:38:13 -04:00
iter - > pg = ftrace_pages_start ;
iter - > flags = FTRACE_ITER_ENABLED ;
iter - > ops = & global_ops ;
return 0 ;
2011-05-03 14:39:21 -04:00
}
2023-01-24 09:56:53 -05:00
static int
ftrace_touched_open ( struct inode * inode , struct file * file )
{
struct ftrace_iterator * iter ;
/*
* This shows us what functions have ever been enabled
* ( traced , direct , patched , etc ) . Not sure if we want lockdown
* to hide such critical information for an admin .
* Although , perhaps it can show information we don ' t
* want people to see , but if something had traced
* something , we probably want to know about it .
*/
iter = __seq_open_private ( file , & show_ftrace_seq_ops , sizeof ( * iter ) ) ;
if ( ! iter )
return - ENOMEM ;
iter - > pg = ftrace_pages_start ;
iter - > flags = FTRACE_ITER_TOUCHED ;
iter - > ops = & global_ops ;
return 0 ;
}
2023-06-11 15:00:29 +02:00
static int
ftrace_avail_addrs_open ( struct inode * inode , struct file * file )
{
struct ftrace_iterator * iter ;
int ret ;
ret = security_locked_down ( LOCKDOWN_TRACEFS ) ;
if ( ret )
return ret ;
if ( unlikely ( ftrace_disabled ) )
return - ENODEV ;
iter = __seq_open_private ( file , & show_ftrace_seq_ops , sizeof ( * iter ) ) ;
if ( ! iter )
return - ENOMEM ;
iter - > pg = ftrace_pages_start ;
iter - > flags = FTRACE_ITER_ADDRS ;
iter - > ops = & global_ops ;
return 0 ;
}
2011-12-19 14:41:25 -05:00
/**
* ftrace_regex_open - initialize function tracer filter files
* @ ops : The ftrace_ops that hold the hash filters
* @ flag : The type of filter to process
* @ inode : The inode , usually passed in to your open routine
* @ file : The file , usually passed in to your open routine
*
* ftrace_regex_open ( ) initializes the filter files for the
* @ ops . Depending on @ flag it may process the filter hash or
* the notrace hash of @ ops . With this called from the open
* routine , you can use ftrace_filter_write ( ) for the write
* routine if @ flag has FTRACE_ITER_FILTER set , or
* ftrace_notrace_write ( ) if @ flag has FTRACE_ITER_NOTRACE set .
2013-12-21 17:39:40 -05:00
* tracing_lseek ( ) should be used as the lseek routine , and
2011-12-19 14:41:25 -05:00
* release must call ftrace_regex_release ( ) .
2024-02-22 21:48:33 -08:00
*
* Returns : 0 on success or a negative errno value on failure
2011-12-19 14:41:25 -05:00
*/
int
2011-05-02 12:29:25 -04:00
ftrace_regex_open ( struct ftrace_ops * ops , int flag ,
2011-04-29 20:59:51 -04:00
struct inode * inode , struct file * file )
2008-05-12 21:20:43 +02:00
{
struct ftrace_iterator * iter ;
2011-05-02 12:29:25 -04:00
struct ftrace_hash * hash ;
2017-06-23 15:26:26 -04:00
struct list_head * mod_head ;
struct trace_array * tr = ops - > private ;
2019-10-11 17:56:57 -04:00
int ret = - ENOMEM ;
2008-05-12 21:20:43 +02:00
2013-05-09 14:44:17 +09:00
ftrace_ops_init ( ops ) ;
2008-05-12 21:20:48 +02:00
if ( unlikely ( ftrace_disabled ) )
return - ENODEV ;
tracing: Add tracing_check_open_get_tr()
Currently, most files in the tracefs directory test if tracing_disabled is
set. If so, it should return -ENODEV. The tracing_disabled is called when
tracing is found to be broken. Originally it was done in case the ring
buffer was found to be corrupted, and we wanted to prevent reading it from
crashing the kernel. But it's also called if a tracing selftest fails on
boot. It's a one way switch. That is, once it is triggered, tracing is
disabled until reboot.
As most tracefs files can also be used by instances in the tracefs
directory, they need to be carefully done. Each instance has a trace_array
associated to it, and when the instance is removed, the trace_array is
freed. But if an instance is opened with a reference to the trace_array,
then it requires looking up the trace_array to get its ref counter (as there
could be a race with it being deleted and the open itself). Once it is
found, a reference is added to prevent the instance from being removed (and
the trace_array associated with it freed).
Combine the two checks (tracing_disabled and trace_array_get()) into a
single helper function. This will also make it easier to add lockdown to
tracefs later.
Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-11 17:39:57 -04:00
if ( tracing_check_open_get_tr ( tr ) )
2019-10-11 17:56:57 -04:00
return - ENODEV ;
2008-05-12 21:20:43 +02:00
iter = kzalloc ( sizeof ( * iter ) , GFP_KERNEL ) ;
if ( ! iter )
2019-10-11 17:56:57 -04:00
goto out ;
2008-05-12 21:20:43 +02:00
2019-10-11 17:56:57 -04:00
if ( trace_parser_get_init ( & iter - > parser , FTRACE_BUFF_MAX ) )
goto out ;
2009-09-11 17:29:29 +02:00
2013-05-09 14:44:21 +09:00
iter - > ops = ops ;
iter - > flags = flag ;
2017-06-23 16:05:11 -04:00
iter - > tr = tr ;
2013-05-09 14:44:21 +09:00
2014-08-15 17:23:02 -04:00
mutex_lock ( & ops - > func_hash - > regex_lock ) ;
2013-05-09 14:44:21 +09:00
2017-06-23 15:26:26 -04:00
if ( flag & FTRACE_ITER_NOTRACE ) {
2014-08-15 17:23:02 -04:00
hash = ops - > func_hash - > notrace_hash ;
2017-06-23 16:05:11 -04:00
mod_head = tr ? & tr - > mod_notrace : NULL ;
2017-06-23 15:26:26 -04:00
} else {
2014-08-15 17:23:02 -04:00
hash = ops - > func_hash - > filter_hash ;
2017-06-23 16:05:11 -04:00
mod_head = tr ? & tr - > mod_trace : NULL ;
2017-06-23 15:26:26 -04:00
}
2011-05-02 12:29:25 -04:00
2017-06-23 16:05:11 -04:00
iter - > mod_list = mod_head ;
2011-05-02 17:34:47 -04:00
if ( file - > f_mode & FMODE_WRITE ) {
2014-06-11 17:06:54 +09:00
const int size_bits = FTRACE_HASH_DEFAULT_BITS ;
2017-06-23 15:26:26 -04:00
if ( file - > f_flags & O_TRUNC ) {
2014-06-11 17:06:54 +09:00
iter - > hash = alloc_ftrace_hash ( size_bits ) ;
2017-06-23 15:26:26 -04:00
clear_ftrace_mod_list ( mod_head ) ;
} else {
2014-06-11 17:06:54 +09:00
iter - > hash = alloc_and_copy_ftrace_hash ( size_bits , hash ) ;
2017-06-23 15:26:26 -04:00
}
2014-06-11 17:06:54 +09:00
2011-05-02 17:34:47 -04:00
if ( ! iter - > hash ) {
trace_parser_put ( & iter - > parser ) ;
2013-05-09 14:44:21 +09:00
goto out_unlock ;
2011-05-02 17:34:47 -04:00
}
2017-03-29 14:55:49 -04:00
} else
iter - > hash = hash ;
2011-04-29 20:59:51 -04:00
2019-10-11 17:56:57 -04:00
ret = 0 ;
2008-05-12 21:20:43 +02:00
if ( file - > f_mode & FMODE_READ ) {
iter - > pg = ftrace_pages_start ;
ret = seq_open ( file , & show_ftrace_seq_ops ) ;
if ( ! ret ) {
struct seq_file * m = file - > private_data ;
m - > private = iter ;
2009-09-22 13:54:28 +08:00
} else {
2011-05-02 17:34:47 -04:00
/* Failed */
free_ftrace_hash ( iter - > hash ) ;
2009-09-22 13:54:28 +08:00
trace_parser_put ( & iter - > parser ) ;
}
2008-05-12 21:20:43 +02:00
} else
file - > private_data = iter ;
2013-05-09 14:44:21 +09:00
out_unlock :
2014-08-15 17:23:02 -04:00
mutex_unlock ( & ops - > func_hash - > regex_lock ) ;
2008-05-12 21:20:43 +02:00
2019-10-11 17:56:57 -04:00
out :
if ( ret ) {
kfree ( iter ) ;
if ( tr )
trace_array_put ( tr ) ;
}
2008-05-12 21:20:43 +02:00
return ret ;
}
2008-05-22 11:46:33 -04:00
static int
ftrace_filter_open ( struct inode * inode , struct file * file )
{
2013-11-11 23:07:14 -05:00
struct ftrace_ops * ops = inode - > i_private ;
2019-10-11 17:22:50 -04:00
/* Checks for tracefs lockdown */
2013-11-11 23:07:14 -05:00
return ftrace_regex_open ( ops ,
2017-04-04 21:31:28 -04:00
FTRACE_ITER_FILTER | FTRACE_ITER_DO_PROBES ,
2011-12-19 15:21:16 -05:00
inode , file ) ;
2008-05-22 11:46:33 -04:00
}
static int
ftrace_notrace_open ( struct inode * inode , struct file * file )
{
2013-11-11 23:07:14 -05:00
struct ftrace_ops * ops = inode - > i_private ;
2019-10-11 17:22:50 -04:00
/* Checks for tracefs lockdown */
2013-11-11 23:07:14 -05:00
return ftrace_regex_open ( ops , FTRACE_ITER_NOTRACE ,
2011-04-29 20:59:51 -04:00
inode , file ) ;
2008-05-22 11:46:33 -04:00
}
2015-09-29 19:46:14 +03:00
/* Type for quick search ftrace basic regexes (globs) from filter_parse_regex */
struct ftrace_glob {
char * search ;
unsigned len ;
int type ;
} ;
2016-04-25 18:56:14 -03:00
/*
* If symbols in an architecture don ' t correspond exactly to the user - visible
* name of what they represent , it is possible to define this function to
* perform the necessary adjustments .
*/
char * __weak arch_ftrace_match_adjust ( char * str , const char * search )
{
return str ;
}
2015-09-29 19:46:14 +03:00
static int ftrace_match ( char * str , struct ftrace_glob * g )
2009-02-13 15:56:43 -05:00
{
int matched = 0 ;
2010-01-14 10:53:02 +08:00
int slen ;
2009-02-13 15:56:43 -05:00
2016-04-25 18:56:14 -03:00
str = arch_ftrace_match_adjust ( str , g - > search ) ;
2015-09-29 19:46:14 +03:00
switch ( g - > type ) {
2009-02-13 15:56:43 -05:00
case MATCH_FULL :
2015-09-29 19:46:14 +03:00
if ( strcmp ( str , g - > search ) = = 0 )
2009-02-13 15:56:43 -05:00
matched = 1 ;
break ;
case MATCH_FRONT_ONLY :
2015-09-29 19:46:14 +03:00
if ( strncmp ( str , g - > search , g - > len ) = = 0 )
2009-02-13 15:56:43 -05:00
matched = 1 ;
break ;
case MATCH_MIDDLE_ONLY :
2015-09-29 19:46:14 +03:00
if ( strstr ( str , g - > search ) )
2009-02-13 15:56:43 -05:00
matched = 1 ;
break ;
case MATCH_END_ONLY :
2010-01-14 10:53:02 +08:00
slen = strlen ( str ) ;
2015-09-29 19:46:14 +03:00
if ( slen > = g - > len & &
memcmp ( str + slen - g - > len , g - > search , g - > len ) = = 0 )
2009-02-13 15:56:43 -05:00
matched = 1 ;
break ;
2016-10-05 20:58:15 +09:00
case MATCH_GLOB :
if ( glob_match ( g - > search , str ) )
matched = 1 ;
break ;
2009-02-13 15:56:43 -05:00
}
return matched ;
}
2011-04-29 15:12:32 -04:00
static int
2015-09-29 19:46:13 +03:00
enter_record ( struct ftrace_hash * hash , struct dyn_ftrace * rec , int clear_filter )
2011-04-26 16:11:03 -04:00
{
2011-04-29 15:12:32 -04:00
struct ftrace_func_entry * entry ;
int ret = 0 ;
2011-04-29 20:59:51 -04:00
entry = ftrace_lookup_ip ( hash , rec - > ip ) ;
2015-09-29 19:46:13 +03:00
if ( clear_filter ) {
2011-04-29 20:59:51 -04:00
/* Do nothing if it doesn't exist */
if ( ! entry )
return 0 ;
2011-04-29 15:12:32 -04:00
2011-05-02 17:34:47 -04:00
free_hash_entry ( hash , entry ) ;
2011-04-29 20:59:51 -04:00
} else {
/* Do nothing if it exists */
if ( entry )
return 0 ;
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
if ( add_hash_entry ( hash , rec - > ip ) = = NULL )
ret = - ENOMEM ;
2011-04-29 15:12:32 -04:00
}
return ret ;
2011-04-26 16:11:03 -04:00
}
ftrace: Allow enabling of filters via index of available_filter_functions
Enabling of large number of functions by echoing in a large subset of the
functions in available_filter_functions can take a very long time. The
process requires testing all functions registered by the function tracer
(which is in the 10s of thousands), and doing a kallsyms lookup to convert
the ip address into a name, then comparing that name with the string passed
in.
When a function causes the function tracer to crash the system, a binary
bisect of the available_filter_functions can be done to find the culprit.
But this requires passing in half of the functions in
available_filter_functions over and over again, which makes it basically a
O(n^2) operation. With 40,000 functions, that ends up bing 1,600,000,000
opertions! And enabling this can take over 20 minutes.
As a quick speed up, if a number is passed into one of the filter files,
instead of doing a search, it just enables the function at the corresponding
line of the available_filter_functions file. That is:
# echo 50 > set_ftrace_filter
# cat set_ftrace_filter
x86_pmu_commit_txn
# head -50 available_filter_functions | tail -1
x86_pmu_commit_txn
This allows setting of half the available_filter_functions to take place in
less than a second!
# time seq 20000 > set_ftrace_filter
real 0m0.042s
user 0m0.005s
sys 0m0.015s
# wc -l set_ftrace_filter
20000 set_ftrace_filter
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-11 15:00:48 -05:00
static int
add_rec_by_index ( struct ftrace_hash * hash , struct ftrace_glob * func_g ,
int clear_filter )
{
long index = simple_strtoul ( func_g - > search , NULL , 0 ) ;
struct ftrace_page * pg ;
struct dyn_ftrace * rec ;
/* The index starts at 1 */
if ( - - index < 0 )
return 0 ;
do_for_each_ftrace_rec ( pg , rec ) {
if ( pg - > index < = index ) {
index - = pg - > index ;
/* this is a double loop, break goes to the next page */
break ;
}
rec = & pg - > records [ index ] ;
enter_record ( hash , rec , clear_filter ) ;
return 1 ;
} while_for_each_ftrace_rec ( ) ;
return 0 ;
}
ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function
If an unused weak function was traced, it's call to fentry will still
exist, which gets added into the __mcount_loc table. Ftrace will use
kallsyms to retrieve the name for each location in __mcount_loc to display
it in the available_filter_functions and used to enable functions via the
name matching in set_ftrace_filter/notrace. Enabling these functions do
nothing but enable an unused call to ftrace_caller. If a traced weak
function is overridden, the symbol of the function would be used for it,
which will either created duplicate names, or if the previous function was
not traced, it would be incorrectly be listed in available_filter_functions
as a function that can be traced.
This became an issue with BPF[1] as there are tooling that enables the
direct callers via ftrace but then checks to see if the functions were
actually enabled. The case of one function that was marked notrace, but
was followed by an unused weak function that was traced. The unused
function's call to fentry was added to the __mcount_loc section, and
kallsyms retrieved the untraced function's symbol as the weak function was
overridden. Since the untraced function would not get traced, the BPF
check would detect this and fail.
The real fix would be to fix kallsyms to not show addresses of weak
functions as the function before it. But that would require adding code in
the build to add function size to kallsyms so that it can know when the
function ends instead of just using the start of the next known symbol.
In the mean time, this is a work around. Add a FTRACE_MCOUNT_MAX_OFFSET
macro that if defined, ftrace will ignore any function that has its call
to fentry/mcount that has an offset from the symbol that is greater than
FTRACE_MCOUNT_MAX_OFFSET.
If CONFIG_HAVE_FENTRY is defined for x86, define FTRACE_MCOUNT_MAX_OFFSET
to zero (unless IBT is enabled), which will have ftrace ignore all locations
that are not at the start of the function (or one after the ENDBR
instruction).
A worker thread is added at boot up to scan all the ftrace record entries,
and will mark any that fail the FTRACE_MCOUNT_MAX_OFFSET test as disabled.
They will still appear in the available_filter_functions file as:
__ftrace_invalid_address___<invalid-offset>
(showing the offset that caused it to be invalid).
This is required for tools that use libtracefs (like trace-cmd does) that
scan the available_filter_functions and enable set_ftrace_filter and
set_ftrace_notrace using indexes of the function listed in the file (this
is a speedup, as enabling thousands of files via names is an O(n^2)
operation and can take minutes to complete, where the indexing takes less
than a second).
The invalid functions cannot be removed from available_filter_functions as
the names there correspond to the ftrace records in the array that manages
them (and the indexing depends on this).
[1] https://lore.kernel.org/all/20220412094923.0abe90955e5db486b7bca279@kernel.org/
Link: https://lkml.kernel.org/r/20220526141912.794c2786@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 14:19:12 -04:00
# ifdef FTRACE_MCOUNT_MAX_OFFSET
static int lookup_ip ( unsigned long ip , char * * modname , char * str )
{
unsigned long offset ;
kallsyms_lookup ( ip , NULL , & offset , modname , str ) ;
if ( offset > FTRACE_MCOUNT_MAX_OFFSET )
return - 1 ;
return 0 ;
}
# else
static int lookup_ip ( unsigned long ip , char * * modname , char * str )
{
kallsyms_lookup ( ip , NULL , NULL , modname , str ) ;
return 0 ;
}
# endif
2009-02-13 17:08:48 -05:00
static int
2015-09-29 19:46:15 +03:00
ftrace_match_record ( struct dyn_ftrace * rec , struct ftrace_glob * func_g ,
struct ftrace_glob * mod_g , int exclude_mod )
2009-02-13 17:08:48 -05:00
{
char str [ KSYM_SYMBOL_LEN ] ;
2011-04-28 20:32:08 -04:00
char * modname ;
ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function
If an unused weak function was traced, it's call to fentry will still
exist, which gets added into the __mcount_loc table. Ftrace will use
kallsyms to retrieve the name for each location in __mcount_loc to display
it in the available_filter_functions and used to enable functions via the
name matching in set_ftrace_filter/notrace. Enabling these functions do
nothing but enable an unused call to ftrace_caller. If a traced weak
function is overridden, the symbol of the function would be used for it,
which will either created duplicate names, or if the previous function was
not traced, it would be incorrectly be listed in available_filter_functions
as a function that can be traced.
This became an issue with BPF[1] as there are tooling that enables the
direct callers via ftrace but then checks to see if the functions were
actually enabled. The case of one function that was marked notrace, but
was followed by an unused weak function that was traced. The unused
function's call to fentry was added to the __mcount_loc section, and
kallsyms retrieved the untraced function's symbol as the weak function was
overridden. Since the untraced function would not get traced, the BPF
check would detect this and fail.
The real fix would be to fix kallsyms to not show addresses of weak
functions as the function before it. But that would require adding code in
the build to add function size to kallsyms so that it can know when the
function ends instead of just using the start of the next known symbol.
In the mean time, this is a work around. Add a FTRACE_MCOUNT_MAX_OFFSET
macro that if defined, ftrace will ignore any function that has its call
to fentry/mcount that has an offset from the symbol that is greater than
FTRACE_MCOUNT_MAX_OFFSET.
If CONFIG_HAVE_FENTRY is defined for x86, define FTRACE_MCOUNT_MAX_OFFSET
to zero (unless IBT is enabled), which will have ftrace ignore all locations
that are not at the start of the function (or one after the ENDBR
instruction).
A worker thread is added at boot up to scan all the ftrace record entries,
and will mark any that fail the FTRACE_MCOUNT_MAX_OFFSET test as disabled.
They will still appear in the available_filter_functions file as:
__ftrace_invalid_address___<invalid-offset>
(showing the offset that caused it to be invalid).
This is required for tools that use libtracefs (like trace-cmd does) that
scan the available_filter_functions and enable set_ftrace_filter and
set_ftrace_notrace using indexes of the function listed in the file (this
is a speedup, as enabling thousands of files via names is an O(n^2)
operation and can take minutes to complete, where the indexing takes less
than a second).
The invalid functions cannot be removed from available_filter_functions as
the names there correspond to the ftrace records in the array that manages
them (and the indexing depends on this).
[1] https://lore.kernel.org/all/20220412094923.0abe90955e5db486b7bca279@kernel.org/
Link: https://lkml.kernel.org/r/20220526141912.794c2786@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 14:19:12 -04:00
if ( lookup_ip ( rec - > ip , & modname , str ) ) {
/* This should only happen when a rec is disabled */
WARN_ON_ONCE ( system_state = = SYSTEM_RUNNING & &
! ( rec - > flags & FTRACE_FL_DISABLED ) ) ;
return 0 ;
}
2011-04-28 20:32:08 -04:00
2015-09-29 19:46:15 +03:00
if ( mod_g ) {
int mod_matches = ( modname ) ? ftrace_match ( modname , mod_g ) : 0 ;
/* blank module name to match all modules */
if ( ! mod_g - > len ) {
/* blank module globbing: modname xor exclude_mod */
2017-05-03 11:41:44 -04:00
if ( ! exclude_mod ! = ! modname )
2015-09-29 19:46:15 +03:00
goto func_match ;
return 0 ;
}
2017-05-03 11:41:44 -04:00
/*
* exclude_mod is set to trace everything but the given
* module . If it is set and the module matches , then
* return 0. If it is not set , and the module doesn ' t match
* also return 0. Otherwise , check the function to see if
* that matches .
*/
if ( ! mod_matches = = ! exclude_mod )
2011-04-28 20:32:08 -04:00
return 0 ;
2015-09-29 19:46:15 +03:00
func_match :
2011-04-28 20:32:08 -04:00
/* blank search means to match all funcs in the mod */
2015-09-29 19:46:14 +03:00
if ( ! func_g - > len )
2011-04-28 20:32:08 -04:00
return 1 ;
}
2009-02-13 17:08:48 -05:00
2015-09-29 19:46:14 +03:00
return ftrace_match ( str , func_g ) ;
2009-02-13 17:08:48 -05:00
}
2011-04-29 20:59:51 -04:00
static int
2015-09-29 19:46:14 +03:00
match_records ( struct ftrace_hash * hash , char * func , int len , char * mod )
2009-02-13 15:56:43 -05:00
{
struct ftrace_page * pg ;
struct dyn_ftrace * rec ;
2015-09-29 19:46:14 +03:00
struct ftrace_glob func_g = { . type = MATCH_FULL } ;
2015-09-29 19:46:15 +03:00
struct ftrace_glob mod_g = { . type = MATCH_FULL } ;
struct ftrace_glob * mod_match = ( mod ) ? & mod_g : NULL ;
int exclude_mod = 0 ;
2009-12-08 11:15:11 +08:00
int found = 0 ;
2011-04-29 15:12:32 -04:00
int ret ;
2017-07-12 10:35:57 +03:00
int clear_filter = 0 ;
2009-02-13 15:56:43 -05:00
2015-09-29 19:46:15 +03:00
if ( func ) {
2015-09-29 19:46:14 +03:00
func_g . type = filter_parse_regex ( func , len , & func_g . search ,
& clear_filter ) ;
func_g . len = strlen ( func_g . search ) ;
2011-04-28 20:32:08 -04:00
}
2009-02-13 15:56:43 -05:00
2015-09-29 19:46:15 +03:00
if ( mod ) {
mod_g . type = filter_parse_regex ( mod , strlen ( mod ) ,
& mod_g . search , & exclude_mod ) ;
mod_g . len = strlen ( mod_g . search ) ;
2011-04-28 20:32:08 -04:00
}
2009-02-13 15:56:43 -05:00
2009-02-14 01:15:39 -05:00
mutex_lock ( & ftrace_lock ) ;
2009-02-13 12:43:56 -05:00
2011-04-28 20:32:08 -04:00
if ( unlikely ( ftrace_disabled ) )
goto out_unlock ;
2009-02-13 15:56:43 -05:00
ftrace: Allow enabling of filters via index of available_filter_functions
Enabling of large number of functions by echoing in a large subset of the
functions in available_filter_functions can take a very long time. The
process requires testing all functions registered by the function tracer
(which is in the 10s of thousands), and doing a kallsyms lookup to convert
the ip address into a name, then comparing that name with the string passed
in.
When a function causes the function tracer to crash the system, a binary
bisect of the available_filter_functions can be done to find the culprit.
But this requires passing in half of the functions in
available_filter_functions over and over again, which makes it basically a
O(n^2) operation. With 40,000 functions, that ends up bing 1,600,000,000
opertions! And enabling this can take over 20 minutes.
As a quick speed up, if a number is passed into one of the filter files,
instead of doing a search, it just enables the function at the corresponding
line of the available_filter_functions file. That is:
# echo 50 > set_ftrace_filter
# cat set_ftrace_filter
x86_pmu_commit_txn
# head -50 available_filter_functions | tail -1
x86_pmu_commit_txn
This allows setting of half the available_filter_functions to take place in
less than a second!
# time seq 20000 > set_ftrace_filter
real 0m0.042s
user 0m0.005s
sys 0m0.015s
# wc -l set_ftrace_filter
20000 set_ftrace_filter
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-11 15:00:48 -05:00
if ( func_g . type = = MATCH_INDEX ) {
found = add_rec_by_index ( hash , & func_g , clear_filter ) ;
goto out_unlock ;
}
2009-02-13 12:43:56 -05:00
do_for_each_ftrace_rec ( pg , rec ) {
2016-11-14 16:31:49 -05:00
if ( rec - > flags & FTRACE_FL_DISABLED )
continue ;
2015-09-29 19:46:15 +03:00
if ( ftrace_match_record ( rec , & func_g , mod_match , exclude_mod ) ) {
2015-09-29 19:46:13 +03:00
ret = enter_record ( hash , rec , clear_filter ) ;
2011-04-29 15:12:32 -04:00
if ( ret < 0 ) {
found = ret ;
goto out_unlock ;
}
2009-12-08 11:15:11 +08:00
found = 1 ;
2009-02-13 12:43:56 -05:00
}
2022-11-15 17:48:47 -03:00
cond_resched ( ) ;
2009-02-13 12:43:56 -05:00
} while_for_each_ftrace_rec ( ) ;
2011-04-28 20:32:08 -04:00
out_unlock :
2009-02-14 01:15:39 -05:00
mutex_unlock ( & ftrace_lock ) ;
2009-12-08 11:15:11 +08:00
return found ;
2008-05-12 21:20:43 +02:00
}
2009-02-13 17:08:48 -05:00
static int
2011-04-29 20:59:51 -04:00
ftrace_match_records ( struct ftrace_hash * hash , char * buff , int len )
2009-02-13 17:08:48 -05:00
{
2015-09-29 19:46:13 +03:00
return match_records ( hash , buff , len , NULL ) ;
2009-02-13 17:08:48 -05:00
}
2017-04-04 14:46:56 -04:00
static void ftrace_ops_update_code ( struct ftrace_ops * ops ,
struct ftrace_ops_hash * old_hash )
{
struct ftrace_ops * op ;
if ( ! ftrace_enabled )
return ;
if ( ops - > flags & FTRACE_OPS_FL_ENABLED ) {
ftrace_run_modify_code ( ops , FTRACE_UPDATE_CALLS , old_hash ) ;
return ;
}
/*
* If this is the shared global_ops filter , then we need to
* check if there is another ops that shares it , is enabled .
* If so , we still need to run the modify code .
*/
if ( ops - > func_hash ! = & global_ops . local_hash )
return ;
do_for_each_ftrace_op ( op , ftrace_ops_list ) {
if ( op - > func_hash = = & global_ops . local_hash & &
op - > flags & FTRACE_OPS_FL_ENABLED ) {
ftrace_run_modify_code ( op , FTRACE_UPDATE_CALLS , old_hash ) ;
/* Only need to do this once */
return ;
}
} while_for_each_ftrace_op ( op ) ;
}
static int ftrace_hash_move_and_update_ops ( struct ftrace_ops * ops ,
struct ftrace_hash * * orig_hash ,
struct ftrace_hash * hash ,
int enable )
{
struct ftrace_ops_hash old_hash_ops ;
struct ftrace_hash * old_hash ;
int ret ;
old_hash = * orig_hash ;
old_hash_ops . filter_hash = ops - > func_hash - > filter_hash ;
old_hash_ops . notrace_hash = ops - > func_hash - > notrace_hash ;
ret = ftrace_hash_move ( ops , enable , orig_hash , hash ) ;
if ( ! ret ) {
ftrace_ops_update_code ( ops , & old_hash_ops ) ;
free_ftrace_hash_rcu ( old_hash ) ;
}
return ret ;
}
2009-02-13 17:08:48 -05:00
2017-06-23 15:26:26 -04:00
static bool module_exists ( const char * module )
{
/* All modules have the symbol __this_module */
tracing: Eliminate const char[] auto variables
Automatic const char[] variables cause unnecessary code
generation. For example, the this_mod variable leads to
3f04: 48 b8 5f 5f 74 68 69 73 5f 6d movabs $0x6d5f736968745f5f,%rax # __this_m
3f0e: 4c 8d 44 24 02 lea 0x2(%rsp),%r8
3f13: 48 8d 7c 24 10 lea 0x10(%rsp),%rdi
3f18: 48 89 44 24 02 mov %rax,0x2(%rsp)
3f1d: 4c 89 e9 mov %r13,%rcx
3f20: b8 65 00 00 00 mov $0x65,%eax # e
3f25: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
3f28: R_X86_64_32S .rodata.str1.1+0x18d
3f2c: be 48 00 00 00 mov $0x48,%esi
3f31: c7 44 24 0a 6f 64 75 6c movl $0x6c75646f,0xa(%rsp) # odul
3f39: 66 89 44 24 0e mov %ax,0xe(%rsp)
i.e., the string gets built on the stack at runtime. Similar code can be
found for the other instances I'm replacing here. Putting the string
in .rodata reduces the combined .text+.rodata size and saves time and
stack space at runtime.
The simplest fix, and what I've done for the this_mod case, is to just
make the variable static.
However, for the "<faulted>" case where the same string is used twice,
that prevents the linker from merging those two literals, so instead use
a macro - that also keeps the two instances automatically in
sync (instead of only the compile-time strlen expression).
Finally, for the two runs of spaces, it turns out that the "build
these strings on the stack" is not the worst part of what gcc does -
it turns print_func_help_header_irq() into "if (tgid) { /*
print_event_info + five seq_printf calls */ } else { /* print
event_info + another five seq_printf */}". Taking inspiration from a
suggestion from Al Viro, use %.*s to make snprintf either stop after
the first two spaces or print the whole string. As a bonus, the
seq_printfs now fit on single lines (at least, they are not longer
than the existing ones in the function just above), making it easier
to see that the ascii art lines up.
x86-64 defconfig + CONFIG_FUNCTION_TRACER:
$ scripts/stackdelta /tmp/stackusage.{0,1}
./kernel/trace/ftrace.c ftrace_mod_callback 152 136 -16
./kernel/trace/trace.c trace_default_header 56 32 -24
./kernel/trace/trace.c tracing_mark_raw_write 96 72 -24
./kernel/trace/trace.c tracing_mark_write 104 80 -24
bloat-o-meter
add/remove: 1/0 grow/shrink: 0/4 up/down: 14/-375 (-361)
Function old new delta
this_mod - 14 +14
ftrace_mod_callback 577 542 -35
tracing_mark_raw_write 444 374 -70
tracing_mark_write 616 540 -76
trace_default_header 600 406 -194
Link: http://lkml.kernel.org/r/20190320081757.6037-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-20 09:17:57 +01:00
static const char this_mod [ ] = " __this_module " ;
2018-03-30 10:53:08 +02:00
char modname [ MAX_PARAM_PREFIX_LEN + sizeof ( this_mod ) + 2 ] ;
2017-06-23 15:26:26 -04:00
unsigned long val ;
int n ;
2018-03-30 10:53:08 +02:00
n = snprintf ( modname , sizeof ( modname ) , " %s:%s " , module , this_mod ) ;
2017-06-23 15:26:26 -04:00
2018-03-30 10:53:08 +02:00
if ( n > sizeof ( modname ) - 1 )
2017-06-23 15:26:26 -04:00
return false ;
val = module_kallsyms_lookup_name ( modname ) ;
return val ! = 0 ;
}
static int cache_mod ( struct trace_array * tr ,
const char * func , char * module , int enable )
{
struct ftrace_mod_load * ftrace_mod , * n ;
struct list_head * head = enable ? & tr - > mod_trace : & tr - > mod_notrace ;
int ret ;
mutex_lock ( & ftrace_lock ) ;
/* We do not cache inverse filters */
if ( func [ 0 ] = = ' ! ' ) {
func + + ;
ret = - EINVAL ;
/* Look to remove this hash */
list_for_each_entry_safe ( ftrace_mod , n , head , list ) {
if ( strcmp ( ftrace_mod - > module , module ) ! = 0 )
continue ;
/* no func matches all */
2017-07-12 10:33:40 +03:00
if ( strcmp ( func , " * " ) = = 0 | |
2017-06-23 15:26:26 -04:00
( ftrace_mod - > func & &
strcmp ( ftrace_mod - > func , func ) = = 0 ) ) {
ret = 0 ;
free_ftrace_mod ( ftrace_mod ) ;
continue ;
}
}
goto out ;
}
ret = - EINVAL ;
/* We only care about modules that have not been loaded yet */
if ( module_exists ( module ) )
goto out ;
/* Save this string off, and execute it when the module is loaded */
ret = ftrace_add_mod ( tr , func , module , enable ) ;
out :
mutex_unlock ( & ftrace_lock ) ;
return ret ;
}
2017-06-26 10:57:21 -04:00
static int
ftrace_set_regex ( struct ftrace_ops * ops , unsigned char * buf , int len ,
int reset , int enable ) ;
2017-07-10 10:44:03 +02:00
# ifdef CONFIG_MODULES
2017-06-26 10:57:21 -04:00
static void process_mod_list ( struct list_head * head , struct ftrace_ops * ops ,
char * mod , bool enable )
{
struct ftrace_mod_load * ftrace_mod , * n ;
struct ftrace_hash * * orig_hash , * new_hash ;
LIST_HEAD ( process_mods ) ;
char * func ;
mutex_lock ( & ops - > func_hash - > regex_lock ) ;
if ( enable )
orig_hash = & ops - > func_hash - > filter_hash ;
else
orig_hash = & ops - > func_hash - > notrace_hash ;
new_hash = alloc_and_copy_ftrace_hash ( FTRACE_HASH_DEFAULT_BITS ,
* orig_hash ) ;
if ( ! new_hash )
2017-06-28 09:09:38 -04:00
goto out ; /* warn? */
2017-06-26 10:57:21 -04:00
mutex_lock ( & ftrace_lock ) ;
list_for_each_entry_safe ( ftrace_mod , n , head , list ) {
if ( strcmp ( ftrace_mod - > module , mod ) ! = 0 )
continue ;
if ( ftrace_mod - > func )
func = kstrdup ( ftrace_mod - > func , GFP_KERNEL ) ;
else
func = kstrdup ( " * " , GFP_KERNEL ) ;
if ( ! func ) /* warn? */
continue ;
2021-06-08 11:11:08 +08:00
list_move ( & ftrace_mod - > list , & process_mods ) ;
2017-06-26 10:57:21 -04:00
/* Use the newly allocated func, as it may be "*" */
kfree ( ftrace_mod - > func ) ;
ftrace_mod - > func = func ;
}
mutex_unlock ( & ftrace_lock ) ;
list_for_each_entry_safe ( ftrace_mod , n , & process_mods , list ) {
func = ftrace_mod - > func ;
/* Grabs ftrace_lock, which is why we have this extra step */
match_records ( new_hash , func , strlen ( func ) , mod ) ;
free_ftrace_mod ( ftrace_mod ) ;
}
2017-06-26 11:47:31 -04:00
if ( enable & & list_empty ( head ) )
new_hash - > flags & = ~ FTRACE_HASH_FL_MOD ;
2017-06-26 10:57:21 -04:00
mutex_lock ( & ftrace_lock ) ;
2020-11-06 22:54:46 +08:00
ftrace_hash_move_and_update_ops ( ops , orig_hash ,
2017-06-26 10:57:21 -04:00
new_hash , enable ) ;
mutex_unlock ( & ftrace_lock ) ;
2017-06-28 09:09:38 -04:00
out :
2017-06-26 10:57:21 -04:00
mutex_unlock ( & ops - > func_hash - > regex_lock ) ;
free_ftrace_hash ( new_hash ) ;
}
static void process_cached_mods ( const char * mod_name )
{
struct trace_array * tr ;
char * mod ;
mod = kstrdup ( mod_name , GFP_KERNEL ) ;
if ( ! mod )
return ;
mutex_lock ( & trace_types_lock ) ;
list_for_each_entry ( tr , & ftrace_trace_arrays , list ) {
if ( ! list_empty ( & tr - > mod_trace ) )
process_mod_list ( & tr - > mod_trace , tr - > ops , mod , true ) ;
if ( ! list_empty ( & tr - > mod_notrace ) )
process_mod_list ( & tr - > mod_notrace , tr - > ops , mod , false ) ;
}
mutex_unlock ( & trace_types_lock ) ;
kfree ( mod ) ;
}
2017-07-10 10:44:03 +02:00
# endif
2017-06-26 10:57:21 -04:00
2009-02-14 00:40:25 -05:00
/*
* We register the module command as a template to show others how
* to register the a command as well .
*/
static int
2017-04-05 13:12:55 -04:00
ftrace_mod_callback ( struct trace_array * tr , struct ftrace_hash * hash ,
2017-06-23 15:26:26 -04:00
char * func_orig , char * cmd , char * module , int enable )
2009-02-14 00:40:25 -05:00
{
2017-06-23 15:26:26 -04:00
char * func ;
2015-09-29 19:46:12 +03:00
int ret ;
2009-02-14 00:40:25 -05:00
2017-06-23 15:26:26 -04:00
/* match_records() modifies func, and we need the original */
func = kstrdup ( func_orig , GFP_KERNEL ) ;
if ( ! func )
return - ENOMEM ;
2009-02-14 00:40:25 -05:00
/*
* cmd = = ' mod ' because we only registered this func
* for the ' mod ' ftrace_func_command .
* But if you register one func with multiple commands ,
* you can tell which command was used by the cmd
* parameter .
*/
2015-09-29 19:46:13 +03:00
ret = match_records ( hash , func , strlen ( func ) , module ) ;
2017-06-23 15:26:26 -04:00
kfree ( func ) ;
2011-04-29 15:12:32 -04:00
if ( ! ret )
2017-06-23 15:26:26 -04:00
return cache_mod ( tr , func_orig , module , enable ) ;
2011-04-29 15:12:32 -04:00
if ( ret < 0 )
return ret ;
return 0 ;
2009-02-14 00:40:25 -05:00
}
static struct ftrace_func_command ftrace_mod_cmd = {
. name = " mod " ,
. func = ftrace_mod_callback ,
} ;
static int __init ftrace_mod_cmd_init ( void )
{
return register_ftrace_command ( & ftrace_mod_cmd ) ;
}
2012-10-05 12:13:07 -04:00
core_initcall ( ftrace_mod_cmd_init ) ;
2009-02-14 00:40:25 -05:00
2011-08-08 16:57:47 -04:00
static void function_trace_probe_call ( unsigned long ip , unsigned long parent_ip ,
2020-10-28 17:42:17 -04:00
struct ftrace_ops * op , struct ftrace_regs * fregs )
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
{
2017-04-04 21:31:28 -04:00
struct ftrace_probe_ops * probe_ops ;
2017-04-18 14:50:39 -04:00
struct ftrace_func_probe * probe ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-18 14:50:39 -04:00
probe = container_of ( op , struct ftrace_func_probe , ops ) ;
probe_ops = probe - > probe_ops ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
/*
* Disable preemption for these calls to prevent a RCU grace
* period . This syncs the hash iteration and freeing of items
* on the hash . rcu_read_lock is too dangerous here .
*/
2010-06-03 09:36:50 -04:00
preempt_disable_notrace ( ) ;
tracing/ftrace: Add a better way to pass data via the probe functions
With the redesign of the registration and execution of the function probes
(triggers), data can now be passed from the setup of the probe to the probe
callers that are specific to the trace_array it is on. Although, all probes
still only affect the toplevel trace array, this change will allow for
instances to have their own probes separated from other instances and the
top array.
That is, something like the stacktrace probe can be set to trace only in an
instance and not the toplevel trace array. This isn't implement yet, but
this change sets the ground work for the change.
When a probe callback is triggered (someone writes the probe format into
set_ftrace_filter), it calls register_ftrace_function_probe() passing in
init_data that will be used to initialize the probe. Then for every matching
function, register_ftrace_function_probe() will call the probe_ops->init()
function with the init data that was passed to it, as well as an address to
a place holder that is associated with the probe and the instance. The first
occurrence will have a NULL in the pointer. The init() function will then
initialize it. If other probes are added, or more functions are part of the
probe, the place holder will be passed to the init() function with the place
holder data that it was initialized to the last time.
Then this place_holder is passed to each of the other probe_ops functions,
where it can be used in the function callback. When the probe_ops free()
function is called, it can be called either with the rip of the function
that is being removed from the probe, or zero, indicating that there are no
more functions attached to the probe, and the place holder is about to be
freed. This gives the probe_ops a way to free the data it assigned to the
place holder if it was allocade during the first init call.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-19 22:39:44 -04:00
probe_ops - > func ( ip , parent_ip , probe - > tr , probe_ops , probe - > data ) ;
2010-06-03 09:36:50 -04:00
preempt_enable_notrace ( ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
}
2017-04-03 20:58:35 -04:00
struct ftrace_func_map {
struct ftrace_func_entry entry ;
void * data ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
} ;
2017-04-03 20:58:35 -04:00
struct ftrace_func_mapper {
struct ftrace_hash hash ;
} ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-03 20:58:35 -04:00
/**
* allocate_ftrace_func_mapper - allocate a new ftrace_func_mapper
*
2024-02-22 21:48:33 -08:00
* Returns : a ftrace_func_mapper descriptor that can be used to map ips to data .
2017-04-03 20:58:35 -04:00
*/
struct ftrace_func_mapper * allocate_ftrace_func_mapper ( void )
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
{
2017-04-03 20:58:35 -04:00
struct ftrace_hash * hash ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-03 20:58:35 -04:00
/*
* The mapper is simply a ftrace_hash , but since the entries
* in the hash are not ftrace_func_entry type , we define it
* as a separate structure .
*/
hash = alloc_ftrace_hash ( FTRACE_HASH_DEFAULT_BITS ) ;
return ( struct ftrace_func_mapper * ) hash ;
}
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-03 20:58:35 -04:00
/**
* ftrace_func_mapper_find_ip - Find some data mapped to an ip
* @ mapper : The mapper that has the ip maps
* @ ip : the instruction pointer to find the data for
*
2024-02-22 21:48:33 -08:00
* Returns : the data mapped to @ ip if found otherwise NULL . The return
2017-04-03 20:58:35 -04:00
* is actually the address of the mapper data pointer . The address is
* returned for use cases where the data is no bigger than a long , and
* the user can use the data pointer as its data instead of having to
* allocate more memory for the reference .
*/
void * * ftrace_func_mapper_find_ip ( struct ftrace_func_mapper * mapper ,
unsigned long ip )
{
struct ftrace_func_entry * entry ;
struct ftrace_func_map * map ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-03 20:58:35 -04:00
entry = ftrace_lookup_ip ( & mapper - > hash , ip ) ;
if ( ! entry )
return NULL ;
2011-05-04 09:27:52 -04:00
2017-04-03 20:58:35 -04:00
map = ( struct ftrace_func_map * ) entry ;
return & map - > data ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
}
2017-04-03 20:58:35 -04:00
/**
* ftrace_func_mapper_add_ip - Map some data to an ip
* @ mapper : The mapper that has the ip maps
* @ ip : The instruction pointer address to map @ data to
* @ data : The data to map to @ ip
*
2024-02-22 21:48:33 -08:00
* Returns : 0 on success otherwise an error .
2017-04-03 20:58:35 -04:00
*/
int ftrace_func_mapper_add_ip ( struct ftrace_func_mapper * mapper ,
unsigned long ip , void * data )
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
{
2017-04-03 20:58:35 -04:00
struct ftrace_func_entry * entry ;
struct ftrace_func_map * map ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-03 20:58:35 -04:00
entry = ftrace_lookup_ip ( & mapper - > hash , ip ) ;
if ( entry )
return - EBUSY ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-03 20:58:35 -04:00
map = kmalloc ( sizeof ( * map ) , GFP_KERNEL ) ;
if ( ! map )
return - ENOMEM ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-03 20:58:35 -04:00
map - > entry . ip = ip ;
map - > data = data ;
2011-05-04 09:27:52 -04:00
2017-04-03 20:58:35 -04:00
__add_hash_entry ( & mapper - > hash , & map - > entry ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-03 20:58:35 -04:00
return 0 ;
}
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-03 20:58:35 -04:00
/**
* ftrace_func_mapper_remove_ip - Remove an ip from the mapping
* @ mapper : The mapper that has the ip maps
* @ ip : The instruction pointer address to remove the data from
*
2024-02-22 21:48:33 -08:00
* Returns : the data if it is found , otherwise NULL .
2022-05-24 10:08:39 -07:00
* Note , if the data pointer is used as the data itself , ( see
2017-04-03 20:58:35 -04:00
* ftrace_func_mapper_find_ip ( ) , then the return value may be meaningless ,
* if the data pointer was set to zero .
*/
void * ftrace_func_mapper_remove_ip ( struct ftrace_func_mapper * mapper ,
unsigned long ip )
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
{
2017-04-03 20:58:35 -04:00
struct ftrace_func_entry * entry ;
struct ftrace_func_map * map ;
void * data ;
entry = ftrace_lookup_ip ( & mapper - > hash , ip ) ;
if ( ! entry )
return NULL ;
map = ( struct ftrace_func_map * ) entry ;
data = map - > data ;
remove_hash_entry ( & mapper - > hash , entry ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
kfree ( entry ) ;
2017-04-03 20:58:35 -04:00
return data ;
}
/**
* free_ftrace_func_mapper - free a mapping of ips and data
* @ mapper : The mapper that has the ip maps
* @ free_func : A function to be called on each data item .
*
* This is used to free the function mapper . The @ free_func is optional
* and can be used if the data needs to be freed as well .
*/
void free_ftrace_func_mapper ( struct ftrace_func_mapper * mapper ,
ftrace_mapper_func free_func )
{
struct ftrace_func_entry * entry ;
struct ftrace_func_map * map ;
struct hlist_head * hhd ;
ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper()
The mapper may be NULL when called from register_ftrace_function_probe()
with probe->data == NULL.
This issue can be reproduced as follow (it may be covered by compiler
optimization sometime):
/ # cat /sys/kernel/debug/tracing/set_ftrace_filter
#### all functions enabled ####
/ # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter
[ 206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 206.952402] Mem abort info:
[ 206.952819] ESR = 0x96000006
[ 206.955326] Exception class = DABT (current EL), IL = 32 bits
[ 206.955844] SET = 0, FnV = 0
[ 206.956272] EA = 0, S1PTW = 0
[ 206.956652] Data abort info:
[ 206.957320] ISV = 0, ISS = 0x00000006
[ 206.959271] CM = 0, WnR = 0
[ 206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000
[ 206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000
[ 206.964953] Internal error: Oops: 96000006 [#1] SMP
[ 206.971122] Dumping ftrace buffer:
[ 206.973677] (ftrace buffer empty)
[ 206.975258] Modules linked in:
[ 206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____))
[ 206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ #17
[ 206.978955] Hardware name: linux,dummy-virt (DT)
[ 206.979883] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 206.980499] pc : free_ftrace_func_mapper+0x2c/0x118
[ 206.980874] lr : ftrace_count_free+0x68/0x80
[ 206.982539] sp : ffff0000182f3ab0
[ 206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700
[ 206.983632] x27: ffff000013054b40 x26: 0000000000000001
[ 206.984000] x25: ffff00001385f000 x24: 0000000000000000
[ 206.984394] x23: ffff000013453000 x22: ffff000013054000
[ 206.984775] x21: 0000000000000000 x20: ffff00001385fe28
[ 206.986575] x19: ffff000013872c30 x18: 0000000000000000
[ 206.987111] x17: 0000000000000000 x16: 0000000000000000
[ 206.987491] x15: ffffffffffffffb0 x14: 0000000000000000
[ 206.987850] x13: 000000000017430e x12: 0000000000000580
[ 206.988251] x11: 0000000000000000 x10: cccccccccccccccc
[ 206.988740] x9 : 0000000000000000 x8 : ffff000013917550
[ 206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000
[ 206.991008] x5 : ffff0000103da588 x4 : 0000000000000001
[ 206.991395] x3 : 0000000000000001 x2 : ffff000013872a28
[ 206.991771] x1 : 0000000000000000 x0 : 0000000000000000
[ 206.992557] Call trace:
[ 206.993101] free_ftrace_func_mapper+0x2c/0x118
[ 206.994827] ftrace_count_free+0x68/0x80
[ 206.995238] release_probe+0xfc/0x1d0
[ 206.995555] register_ftrace_function_probe+0x4a8/0x868
[ 206.995923] ftrace_trace_probe_callback.isra.4+0xb8/0x180
[ 206.996330] ftrace_dump_callback+0x50/0x70
[ 206.996663] ftrace_regex_write.isra.29+0x290/0x3a8
[ 206.997157] ftrace_filter_write+0x44/0x60
[ 206.998971] __vfs_write+0x64/0xf0
[ 206.999285] vfs_write+0x14c/0x2f0
[ 206.999591] ksys_write+0xbc/0x1b0
[ 206.999888] __arm64_sys_write+0x3c/0x58
[ 207.000246] el0_svc_common.constprop.0+0x408/0x5f0
[ 207.000607] el0_svc_handler+0x144/0x1c8
[ 207.000916] el0_svc+0x8/0xc
[ 207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303)
[ 207.008388] ---[ end trace 7b6d11b5f542bdf1 ]---
[ 207.010126] Kernel panic - not syncing: Fatal exception
[ 207.011322] SMP: stopping secondary CPUs
[ 207.013956] Dumping ftrace buffer:
[ 207.014595] (ftrace buffer empty)
[ 207.015632] Kernel Offset: disabled
[ 207.017187] CPU features: 0x002,20006008
[ 207.017985] Memory Limit: none
[ 207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]---
Link: http://lkml.kernel.org/r/20190606031754.10798-1-liwei391@huawei.com
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-06 11:17:54 +08:00
int size , i ;
if ( ! mapper )
return ;
2017-04-03 20:58:35 -04:00
if ( free_func & & mapper - > hash . count ) {
ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper()
The mapper may be NULL when called from register_ftrace_function_probe()
with probe->data == NULL.
This issue can be reproduced as follow (it may be covered by compiler
optimization sometime):
/ # cat /sys/kernel/debug/tracing/set_ftrace_filter
#### all functions enabled ####
/ # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter
[ 206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 206.952402] Mem abort info:
[ 206.952819] ESR = 0x96000006
[ 206.955326] Exception class = DABT (current EL), IL = 32 bits
[ 206.955844] SET = 0, FnV = 0
[ 206.956272] EA = 0, S1PTW = 0
[ 206.956652] Data abort info:
[ 206.957320] ISV = 0, ISS = 0x00000006
[ 206.959271] CM = 0, WnR = 0
[ 206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000
[ 206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000
[ 206.964953] Internal error: Oops: 96000006 [#1] SMP
[ 206.971122] Dumping ftrace buffer:
[ 206.973677] (ftrace buffer empty)
[ 206.975258] Modules linked in:
[ 206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____))
[ 206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ #17
[ 206.978955] Hardware name: linux,dummy-virt (DT)
[ 206.979883] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 206.980499] pc : free_ftrace_func_mapper+0x2c/0x118
[ 206.980874] lr : ftrace_count_free+0x68/0x80
[ 206.982539] sp : ffff0000182f3ab0
[ 206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700
[ 206.983632] x27: ffff000013054b40 x26: 0000000000000001
[ 206.984000] x25: ffff00001385f000 x24: 0000000000000000
[ 206.984394] x23: ffff000013453000 x22: ffff000013054000
[ 206.984775] x21: 0000000000000000 x20: ffff00001385fe28
[ 206.986575] x19: ffff000013872c30 x18: 0000000000000000
[ 206.987111] x17: 0000000000000000 x16: 0000000000000000
[ 206.987491] x15: ffffffffffffffb0 x14: 0000000000000000
[ 206.987850] x13: 000000000017430e x12: 0000000000000580
[ 206.988251] x11: 0000000000000000 x10: cccccccccccccccc
[ 206.988740] x9 : 0000000000000000 x8 : ffff000013917550
[ 206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000
[ 206.991008] x5 : ffff0000103da588 x4 : 0000000000000001
[ 206.991395] x3 : 0000000000000001 x2 : ffff000013872a28
[ 206.991771] x1 : 0000000000000000 x0 : 0000000000000000
[ 206.992557] Call trace:
[ 206.993101] free_ftrace_func_mapper+0x2c/0x118
[ 206.994827] ftrace_count_free+0x68/0x80
[ 206.995238] release_probe+0xfc/0x1d0
[ 206.995555] register_ftrace_function_probe+0x4a8/0x868
[ 206.995923] ftrace_trace_probe_callback.isra.4+0xb8/0x180
[ 206.996330] ftrace_dump_callback+0x50/0x70
[ 206.996663] ftrace_regex_write.isra.29+0x290/0x3a8
[ 206.997157] ftrace_filter_write+0x44/0x60
[ 206.998971] __vfs_write+0x64/0xf0
[ 206.999285] vfs_write+0x14c/0x2f0
[ 206.999591] ksys_write+0xbc/0x1b0
[ 206.999888] __arm64_sys_write+0x3c/0x58
[ 207.000246] el0_svc_common.constprop.0+0x408/0x5f0
[ 207.000607] el0_svc_handler+0x144/0x1c8
[ 207.000916] el0_svc+0x8/0xc
[ 207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303)
[ 207.008388] ---[ end trace 7b6d11b5f542bdf1 ]---
[ 207.010126] Kernel panic - not syncing: Fatal exception
[ 207.011322] SMP: stopping secondary CPUs
[ 207.013956] Dumping ftrace buffer:
[ 207.014595] (ftrace buffer empty)
[ 207.015632] Kernel Offset: disabled
[ 207.017187] CPU features: 0x002,20006008
[ 207.017985] Memory Limit: none
[ 207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]---
Link: http://lkml.kernel.org/r/20190606031754.10798-1-liwei391@huawei.com
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-06 11:17:54 +08:00
size = 1 < < mapper - > hash . size_bits ;
2017-04-03 20:58:35 -04:00
for ( i = 0 ; i < size ; i + + ) {
hhd = & mapper - > hash . buckets [ i ] ;
hlist_for_each_entry ( entry , hhd , hlist ) {
map = ( struct ftrace_func_map * ) entry ;
free_func ( map ) ;
}
}
}
free_ftrace_hash ( & mapper - > hash ) ;
}
2017-04-18 14:50:39 -04:00
static void release_probe ( struct ftrace_func_probe * probe )
{
struct ftrace_probe_ops * probe_ops ;
mutex_lock ( & ftrace_lock ) ;
WARN_ON ( probe - > ref < = 0 ) ;
/* Subtract the ref that was used to protect this instance */
probe - > ref - - ;
if ( ! probe - > ref ) {
probe_ops = probe - > probe_ops ;
tracing/ftrace: Add a better way to pass data via the probe functions
With the redesign of the registration and execution of the function probes
(triggers), data can now be passed from the setup of the probe to the probe
callers that are specific to the trace_array it is on. Although, all probes
still only affect the toplevel trace array, this change will allow for
instances to have their own probes separated from other instances and the
top array.
That is, something like the stacktrace probe can be set to trace only in an
instance and not the toplevel trace array. This isn't implement yet, but
this change sets the ground work for the change.
When a probe callback is triggered (someone writes the probe format into
set_ftrace_filter), it calls register_ftrace_function_probe() passing in
init_data that will be used to initialize the probe. Then for every matching
function, register_ftrace_function_probe() will call the probe_ops->init()
function with the init data that was passed to it, as well as an address to
a place holder that is associated with the probe and the instance. The first
occurrence will have a NULL in the pointer. The init() function will then
initialize it. If other probes are added, or more functions are part of the
probe, the place holder will be passed to the init() function with the place
holder data that it was initialized to the last time.
Then this place_holder is passed to each of the other probe_ops functions,
where it can be used in the function callback. When the probe_ops free()
function is called, it can be called either with the rip of the function
that is being removed from the probe, or zero, indicating that there are no
more functions attached to the probe, and the place holder is about to be
freed. This gives the probe_ops a way to free the data it assigned to the
place holder if it was allocade during the first init call.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-19 22:39:44 -04:00
/*
* Sending zero as ip tells probe_ops to free
* the probe - > data itself
*/
if ( probe_ops - > free )
probe_ops - > free ( probe_ops , probe - > tr , 0 , probe - > data ) ;
2017-04-18 14:50:39 -04:00
list_del ( & probe - > list ) ;
kfree ( probe ) ;
}
mutex_unlock ( & ftrace_lock ) ;
}
static void acquire_probe_locked ( struct ftrace_func_probe * probe )
{
/*
* Add one ref to keep it from being freed when releasing the
* ftrace_lock mutex .
*/
probe - > ref + + ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
}
int
2017-04-05 13:12:55 -04:00
register_ftrace_function_probe ( char * glob , struct trace_array * tr ,
2017-04-18 14:50:39 -04:00
struct ftrace_probe_ops * probe_ops ,
void * data )
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
{
2022-04-27 19:07:34 +02:00
struct ftrace_func_probe * probe = NULL , * iter ;
2017-04-04 18:16:29 -04:00
struct ftrace_func_entry * entry ;
struct ftrace_hash * * orig_hash ;
struct ftrace_hash * old_hash ;
ftrace: Fix function probe to only enable needed functions
Currently the function probe enables all functions and runs a "hash"
against every function call to see if it should call a probe. This
is extremely wasteful.
Note, a probe is something like:
echo schedule:traceoff > /debug/tracing/set_ftrace_filter
When schedule is called, the probe will disable tracing. But currently,
it has a call back for *all* functions, and checks to see if the
called function is the probe that is needed.
The probe function has been created before ftrace was rewritten to
allow for more than one "op" to be registered by the function tracer.
When probes were created, it couldn't limit the functions without also
limiting normal function calls. But now we can, it's about time
to update the probe code.
Todo, have separate ops for different entries. That is, assign
a ftrace_ops per probe, instead of one op for all probes. But
as there's not many probes assigned, this may not be that urgent.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-12 10:09:42 -04:00
struct ftrace_hash * hash ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
int count = 0 ;
2017-04-04 18:16:29 -04:00
int size ;
ftrace: Fix function probe to only enable needed functions
Currently the function probe enables all functions and runs a "hash"
against every function call to see if it should call a probe. This
is extremely wasteful.
Note, a probe is something like:
echo schedule:traceoff > /debug/tracing/set_ftrace_filter
When schedule is called, the probe will disable tracing. But currently,
it has a call back for *all* functions, and checks to see if the
called function is the probe that is needed.
The probe function has been created before ftrace was rewritten to
allow for more than one "op" to be registered by the function tracer.
When probes were created, it couldn't limit the functions without also
limiting normal function calls. But now we can, it's about time
to update the probe code.
Todo, have separate ops for different entries. That is, assign
a ftrace_ops per probe, instead of one op for all probes. But
as there's not many probes assigned, this may not be that urgent.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-12 10:09:42 -04:00
int ret ;
2017-04-04 18:16:29 -04:00
int i ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-05 13:12:55 -04:00
if ( WARN_ON ( ! tr ) )
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
return - EINVAL ;
2017-04-04 18:16:29 -04:00
/* We do not support '!' for function probes */
if ( WARN_ON ( glob [ 0 ] = = ' ! ' ) )
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
return - EINVAL ;
2015-01-13 14:03:38 -05:00
2017-04-18 14:50:39 -04:00
mutex_lock ( & ftrace_lock ) ;
/* Check if the probe_ops is already registered */
2022-04-27 19:07:34 +02:00
list_for_each_entry ( iter , & tr - > func_probes , list ) {
if ( iter - > probe_ops = = probe_ops ) {
probe = iter ;
2017-04-18 14:50:39 -04:00
break ;
2022-04-27 19:07:34 +02:00
}
ftrace: Fix function probe to only enable needed functions
Currently the function probe enables all functions and runs a "hash"
against every function call to see if it should call a probe. This
is extremely wasteful.
Note, a probe is something like:
echo schedule:traceoff > /debug/tracing/set_ftrace_filter
When schedule is called, the probe will disable tracing. But currently,
it has a call back for *all* functions, and checks to see if the
called function is the probe that is needed.
The probe function has been created before ftrace was rewritten to
allow for more than one "op" to be registered by the function tracer.
When probes were created, it couldn't limit the functions without also
limiting normal function calls. But now we can, it's about time
to update the probe code.
Todo, have separate ops for different entries. That is, assign
a ftrace_ops per probe, instead of one op for all probes. But
as there's not many probes assigned, this may not be that urgent.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-12 10:09:42 -04:00
}
2022-04-27 19:07:34 +02:00
if ( ! probe ) {
2017-04-18 14:50:39 -04:00
probe = kzalloc ( sizeof ( * probe ) , GFP_KERNEL ) ;
if ( ! probe ) {
mutex_unlock ( & ftrace_lock ) ;
return - ENOMEM ;
}
probe - > probe_ops = probe_ops ;
probe - > ops . func = function_trace_probe_call ;
probe - > tr = tr ;
ftrace_ops_init ( & probe - > ops ) ;
list_add ( & probe - > list , & tr - > func_probes ) ;
ftrace: Fix function probe to only enable needed functions
Currently the function probe enables all functions and runs a "hash"
against every function call to see if it should call a probe. This
is extremely wasteful.
Note, a probe is something like:
echo schedule:traceoff > /debug/tracing/set_ftrace_filter
When schedule is called, the probe will disable tracing. But currently,
it has a call back for *all* functions, and checks to see if the
called function is the probe that is needed.
The probe function has been created before ftrace was rewritten to
allow for more than one "op" to be registered by the function tracer.
When probes were created, it couldn't limit the functions without also
limiting normal function calls. But now we can, it's about time
to update the probe code.
Todo, have separate ops for different entries. That is, assign
a ftrace_ops per probe, instead of one op for all probes. But
as there's not many probes assigned, this may not be that urgent.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-12 10:09:42 -04:00
}
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-18 14:50:39 -04:00
acquire_probe_locked ( probe ) ;
2013-05-09 18:20:37 -04:00
2017-04-18 14:50:39 -04:00
mutex_unlock ( & ftrace_lock ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2019-08-30 16:30:01 -04:00
/*
* Note , there ' s a small window here that the func_hash - > filter_hash
2020-10-02 22:31:26 +08:00
* may be NULL or empty . Need to be careful when reading the loop .
2019-08-30 16:30:01 -04:00
*/
2017-04-18 14:50:39 -04:00
mutex_lock ( & probe - > ops . func_hash - > regex_lock ) ;
2016-11-14 16:31:49 -05:00
2017-04-18 14:50:39 -04:00
orig_hash = & probe - > ops . func_hash - > filter_hash ;
2017-04-04 18:16:29 -04:00
old_hash = * orig_hash ;
hash = alloc_and_copy_ftrace_hash ( FTRACE_HASH_DEFAULT_BITS , old_hash ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2019-07-04 20:04:42 +05:30
if ( ! hash ) {
ret = - ENOMEM ;
goto out ;
}
2017-04-04 18:16:29 -04:00
ret = ftrace_match_records ( hash , glob , strlen ( glob ) ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-04 18:16:29 -04:00
/* Nothing found? */
if ( ! ret )
ret = - EINVAL ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-04 18:16:29 -04:00
if ( ret < 0 )
goto out ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-04 18:16:29 -04:00
size = 1 < < hash - > size_bits ;
for ( i = 0 ; i < size ; i + + ) {
hlist_for_each_entry ( entry , & hash - > buckets [ i ] , hlist ) {
if ( ftrace_lookup_ip ( old_hash , entry - > ip ) )
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
continue ;
2017-04-04 18:16:29 -04:00
/*
* The caller might want to do something special
* for each function we find . We call the callback
* to give the caller an opportunity to do so .
*/
2017-04-18 14:50:39 -04:00
if ( probe_ops - > init ) {
ret = probe_ops - > init ( probe_ops , tr ,
tracing/ftrace: Add a better way to pass data via the probe functions
With the redesign of the registration and execution of the function probes
(triggers), data can now be passed from the setup of the probe to the probe
callers that are specific to the trace_array it is on. Although, all probes
still only affect the toplevel trace array, this change will allow for
instances to have their own probes separated from other instances and the
top array.
That is, something like the stacktrace probe can be set to trace only in an
instance and not the toplevel trace array. This isn't implement yet, but
this change sets the ground work for the change.
When a probe callback is triggered (someone writes the probe format into
set_ftrace_filter), it calls register_ftrace_function_probe() passing in
init_data that will be used to initialize the probe. Then for every matching
function, register_ftrace_function_probe() will call the probe_ops->init()
function with the init data that was passed to it, as well as an address to
a place holder that is associated with the probe and the instance. The first
occurrence will have a NULL in the pointer. The init() function will then
initialize it. If other probes are added, or more functions are part of the
probe, the place holder will be passed to the init() function with the place
holder data that it was initialized to the last time.
Then this place_holder is passed to each of the other probe_ops functions,
where it can be used in the function callback. When the probe_ops free()
function is called, it can be called either with the rip of the function
that is being removed from the probe, or zero, indicating that there are no
more functions attached to the probe, and the place holder is about to be
freed. This gives the probe_ops a way to free the data it assigned to the
place holder if it was allocade during the first init call.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-19 22:39:44 -04:00
entry - > ip , data ,
& probe - > data ) ;
if ( ret < 0 ) {
if ( probe_ops - > free & & count )
probe_ops - > free ( probe_ops , tr ,
0 , probe - > data ) ;
probe - > data = NULL ;
2017-04-04 21:31:28 -04:00
goto out ;
tracing/ftrace: Add a better way to pass data via the probe functions
With the redesign of the registration and execution of the function probes
(triggers), data can now be passed from the setup of the probe to the probe
callers that are specific to the trace_array it is on. Although, all probes
still only affect the toplevel trace array, this change will allow for
instances to have their own probes separated from other instances and the
top array.
That is, something like the stacktrace probe can be set to trace only in an
instance and not the toplevel trace array. This isn't implement yet, but
this change sets the ground work for the change.
When a probe callback is triggered (someone writes the probe format into
set_ftrace_filter), it calls register_ftrace_function_probe() passing in
init_data that will be used to initialize the probe. Then for every matching
function, register_ftrace_function_probe() will call the probe_ops->init()
function with the init data that was passed to it, as well as an address to
a place holder that is associated with the probe and the instance. The first
occurrence will have a NULL in the pointer. The init() function will then
initialize it. If other probes are added, or more functions are part of the
probe, the place holder will be passed to the init() function with the place
holder data that it was initialized to the last time.
Then this place_holder is passed to each of the other probe_ops functions,
where it can be used in the function callback. When the probe_ops free()
function is called, it can be called either with the rip of the function
that is being removed from the probe, or zero, indicating that there are no
more functions attached to the probe, and the place holder is about to be
freed. This gives the probe_ops a way to free the data it assigned to the
place holder if it was allocade during the first init call.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-19 22:39:44 -04:00
}
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
}
2017-04-04 18:16:29 -04:00
count + + ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
}
2017-04-04 18:16:29 -04:00
}
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-04 18:16:29 -04:00
mutex_lock ( & ftrace_lock ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-18 14:50:39 -04:00
if ( ! count ) {
/* Nothing was added? */
ret = - EINVAL ;
goto out_unlock ;
}
ftrace: Fix function probe to only enable needed functions
Currently the function probe enables all functions and runs a "hash"
against every function call to see if it should call a probe. This
is extremely wasteful.
Note, a probe is something like:
echo schedule:traceoff > /debug/tracing/set_ftrace_filter
When schedule is called, the probe will disable tracing. But currently,
it has a call back for *all* functions, and checks to see if the
called function is the probe that is needed.
The probe function has been created before ftrace was rewritten to
allow for more than one "op" to be registered by the function tracer.
When probes were created, it couldn't limit the functions without also
limiting normal function calls. But now we can, it's about time
to update the probe code.
Todo, have separate ops for different entries. That is, assign
a ftrace_ops per probe, instead of one op for all probes. But
as there's not many probes assigned, this may not be that urgent.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-12 10:09:42 -04:00
2017-04-18 14:50:39 -04:00
ret = ftrace_hash_move_and_update_ops ( & probe - > ops , orig_hash ,
hash , 1 ) ;
2017-04-04 18:16:29 -04:00
if ( ret < 0 )
2017-04-05 13:36:18 -04:00
goto err_unlock ;
2014-10-24 14:56:01 -04:00
2017-04-18 14:50:39 -04:00
/* One ref for each new function traced */
probe - > ref + = count ;
2014-10-24 14:56:01 -04:00
2017-04-18 14:50:39 -04:00
if ( ! ( probe - > ops . flags & FTRACE_OPS_FL_ENABLED ) )
ret = ftrace_startup ( & probe - > ops , 0 ) ;
ftrace: Fix function probe to only enable needed functions
Currently the function probe enables all functions and runs a "hash"
against every function call to see if it should call a probe. This
is extremely wasteful.
Note, a probe is something like:
echo schedule:traceoff > /debug/tracing/set_ftrace_filter
When schedule is called, the probe will disable tracing. But currently,
it has a call back for *all* functions, and checks to see if the
called function is the probe that is needed.
The probe function has been created before ftrace was rewritten to
allow for more than one "op" to be registered by the function tracer.
When probes were created, it couldn't limit the functions without also
limiting normal function calls. But now we can, it's about time
to update the probe code.
Todo, have separate ops for different entries. That is, assign
a ftrace_ops per probe, instead of one op for all probes. But
as there's not many probes assigned, this may not be that urgent.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-12 10:09:42 -04:00
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
out_unlock :
2013-05-09 18:20:37 -04:00
mutex_unlock ( & ftrace_lock ) ;
2014-10-24 14:56:01 -04:00
2014-07-24 15:33:41 -04:00
if ( ! ret )
2017-04-04 18:16:29 -04:00
ret = count ;
2013-05-09 18:20:37 -04:00
out :
2017-04-18 14:50:39 -04:00
mutex_unlock ( & probe - > ops . func_hash - > regex_lock ) ;
ftrace: Fix function probe to only enable needed functions
Currently the function probe enables all functions and runs a "hash"
against every function call to see if it should call a probe. This
is extremely wasteful.
Note, a probe is something like:
echo schedule:traceoff > /debug/tracing/set_ftrace_filter
When schedule is called, the probe will disable tracing. But currently,
it has a call back for *all* functions, and checks to see if the
called function is the probe that is needed.
The probe function has been created before ftrace was rewritten to
allow for more than one "op" to be registered by the function tracer.
When probes were created, it couldn't limit the functions without also
limiting normal function calls. But now we can, it's about time
to update the probe code.
Todo, have separate ops for different entries. That is, assign
a ftrace_ops per probe, instead of one op for all probes. But
as there's not many probes assigned, this may not be that urgent.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-12 10:09:42 -04:00
free_ftrace_hash ( hash ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-18 14:50:39 -04:00
release_probe ( probe ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-04 18:16:29 -04:00
return ret ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-05 13:36:18 -04:00
err_unlock :
2017-04-18 14:50:39 -04:00
if ( ! probe_ops - > free | | ! count )
2017-04-05 13:36:18 -04:00
goto out_unlock ;
/* Failed to do the move, need to call the free functions */
for ( i = 0 ; i < size ; i + + ) {
hlist_for_each_entry ( entry , & hash - > buckets [ i ] , hlist ) {
if ( ftrace_lookup_ip ( old_hash , entry - > ip ) )
continue ;
tracing/ftrace: Add a better way to pass data via the probe functions
With the redesign of the registration and execution of the function probes
(triggers), data can now be passed from the setup of the probe to the probe
callers that are specific to the trace_array it is on. Although, all probes
still only affect the toplevel trace array, this change will allow for
instances to have their own probes separated from other instances and the
top array.
That is, something like the stacktrace probe can be set to trace only in an
instance and not the toplevel trace array. This isn't implement yet, but
this change sets the ground work for the change.
When a probe callback is triggered (someone writes the probe format into
set_ftrace_filter), it calls register_ftrace_function_probe() passing in
init_data that will be used to initialize the probe. Then for every matching
function, register_ftrace_function_probe() will call the probe_ops->init()
function with the init data that was passed to it, as well as an address to
a place holder that is associated with the probe and the instance. The first
occurrence will have a NULL in the pointer. The init() function will then
initialize it. If other probes are added, or more functions are part of the
probe, the place holder will be passed to the init() function with the place
holder data that it was initialized to the last time.
Then this place_holder is passed to each of the other probe_ops functions,
where it can be used in the function callback. When the probe_ops free()
function is called, it can be called either with the rip of the function
that is being removed from the probe, or zero, indicating that there are no
more functions attached to the probe, and the place holder is about to be
freed. This gives the probe_ops a way to free the data it assigned to the
place holder if it was allocade during the first init call.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-19 22:39:44 -04:00
probe_ops - > free ( probe_ops , tr , entry - > ip , probe - > data ) ;
2017-04-05 13:36:18 -04:00
}
}
goto out_unlock ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
}
2017-04-04 16:44:43 -04:00
int
2017-04-18 14:50:39 -04:00
unregister_ftrace_function_probe_func ( char * glob , struct trace_array * tr ,
struct ftrace_probe_ops * probe_ops )
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
{
2022-04-27 19:07:34 +02:00
struct ftrace_func_probe * probe = NULL , * iter ;
2017-04-14 17:45:45 -04:00
struct ftrace_ops_hash old_hash_ops ;
2017-04-04 21:31:28 -04:00
struct ftrace_func_entry * entry ;
2015-09-29 19:46:14 +03:00
struct ftrace_glob func_g ;
2017-04-04 18:16:29 -04:00
struct ftrace_hash * * orig_hash ;
struct ftrace_hash * old_hash ;
struct ftrace_hash * hash = NULL ;
hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 17:06:00 -08:00
struct hlist_node * tmp ;
2017-04-04 21:31:28 -04:00
struct hlist_head hhd ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
char str [ KSYM_SYMBOL_LEN ] ;
2017-04-18 14:50:39 -04:00
int count = 0 ;
int i , ret = - ENODEV ;
2017-04-04 21:31:28 -04:00
int size ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-05-16 23:21:25 +05:30
if ( ! glob | | ! strlen ( glob ) | | ! strcmp ( glob , " * " ) )
2015-09-29 19:46:14 +03:00
func_g . search = NULL ;
2017-05-16 23:21:25 +05:30
else {
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
int not ;
2015-09-29 19:46:14 +03:00
func_g . type = filter_parse_regex ( glob , strlen ( glob ) ,
& func_g . search , & not ) ;
func_g . len = strlen ( func_g . search ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2009-02-17 12:32:04 -05:00
/* we do not support '!' for function probes */
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
if ( WARN_ON ( not ) )
2017-04-04 16:44:43 -04:00
return - EINVAL ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
}
2017-04-18 14:50:39 -04:00
mutex_lock ( & ftrace_lock ) ;
/* Check if the probe_ops is already registered */
2022-04-27 19:07:34 +02:00
list_for_each_entry ( iter , & tr - > func_probes , list ) {
if ( iter - > probe_ops = = probe_ops ) {
probe = iter ;
2017-04-18 14:50:39 -04:00
break ;
2022-04-27 19:07:34 +02:00
}
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
}
2022-04-27 19:07:34 +02:00
if ( ! probe )
2017-04-18 14:50:39 -04:00
goto err_unlock_ftrace ;
ret = - EINVAL ;
if ( ! ( probe - > ops . flags & FTRACE_OPS_FL_INITIALIZED ) )
goto err_unlock_ftrace ;
acquire_probe_locked ( probe ) ;
mutex_unlock ( & ftrace_lock ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-18 14:50:39 -04:00
mutex_lock ( & probe - > ops . func_hash - > regex_lock ) ;
2017-04-04 18:16:29 -04:00
2017-04-18 14:50:39 -04:00
orig_hash = & probe - > ops . func_hash - > filter_hash ;
2017-04-04 18:16:29 -04:00
old_hash = * orig_hash ;
if ( ftrace_hash_empty ( old_hash ) )
goto out_unlock ;
ftrace: Fix function probe to only enable needed functions
Currently the function probe enables all functions and runs a "hash"
against every function call to see if it should call a probe. This
is extremely wasteful.
Note, a probe is something like:
echo schedule:traceoff > /debug/tracing/set_ftrace_filter
When schedule is called, the probe will disable tracing. But currently,
it has a call back for *all* functions, and checks to see if the
called function is the probe that is needed.
The probe function has been created before ftrace was rewritten to
allow for more than one "op" to be registered by the function tracer.
When probes were created, it couldn't limit the functions without also
limiting normal function calls. But now we can, it's about time
to update the probe code.
Todo, have separate ops for different entries. That is, assign
a ftrace_ops per probe, instead of one op for all probes. But
as there's not many probes assigned, this may not be that urgent.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-12 10:09:42 -04:00
2017-04-14 17:45:45 -04:00
old_hash_ops . filter_hash = old_hash ;
/* Probes only have filters */
old_hash_ops . notrace_hash = NULL ;
2017-04-04 16:44:43 -04:00
ret = - ENOMEM ;
2017-04-04 18:16:29 -04:00
hash = alloc_and_copy_ftrace_hash ( FTRACE_HASH_DEFAULT_BITS , old_hash ) ;
ftrace: Fix function probe to only enable needed functions
Currently the function probe enables all functions and runs a "hash"
against every function call to see if it should call a probe. This
is extremely wasteful.
Note, a probe is something like:
echo schedule:traceoff > /debug/tracing/set_ftrace_filter
When schedule is called, the probe will disable tracing. But currently,
it has a call back for *all* functions, and checks to see if the
called function is the probe that is needed.
The probe function has been created before ftrace was rewritten to
allow for more than one "op" to be registered by the function tracer.
When probes were created, it couldn't limit the functions without also
limiting normal function calls. But now we can, it's about time
to update the probe code.
Todo, have separate ops for different entries. That is, assign
a ftrace_ops per probe, instead of one op for all probes. But
as there's not many probes assigned, this may not be that urgent.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-12 10:09:42 -04:00
if ( ! hash )
goto out_unlock ;
2017-04-04 21:31:28 -04:00
INIT_HLIST_HEAD ( & hhd ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-04 21:31:28 -04:00
size = 1 < < hash - > size_bits ;
for ( i = 0 ; i < size ; i + + ) {
hlist_for_each_entry_safe ( entry , tmp , & hash - > buckets [ i ] , hlist ) {
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2015-09-29 19:46:14 +03:00
if ( func_g . search ) {
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
kallsyms_lookup ( entry - > ip , NULL , NULL ,
NULL , str ) ;
2015-09-29 19:46:14 +03:00
if ( ! ftrace_match ( str , & func_g ) )
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
continue ;
}
2017-04-18 14:50:39 -04:00
count + + ;
2017-04-04 21:31:28 -04:00
remove_hash_entry ( hash , entry ) ;
hlist_add_head ( & entry - > hlist , & hhd ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
}
}
2017-04-04 16:44:43 -04:00
/* Nothing found? */
2017-04-18 14:50:39 -04:00
if ( ! count ) {
2017-04-04 16:44:43 -04:00
ret = - EINVAL ;
goto out_unlock ;
}
2013-05-09 14:44:21 +09:00
mutex_lock ( & ftrace_lock ) ;
2017-04-04 18:16:29 -04:00
2017-04-18 14:50:39 -04:00
WARN_ON ( probe - > ref < count ) ;
2017-04-04 21:31:28 -04:00
2017-04-18 14:50:39 -04:00
probe - > ref - = count ;
2017-04-04 18:16:29 -04:00
2017-04-18 14:50:39 -04:00
if ( ftrace_hash_empty ( hash ) )
ftrace_shutdown ( & probe - > ops , 0 ) ;
ret = ftrace_hash_move_and_update_ops ( & probe - > ops , orig_hash ,
2017-04-04 18:16:29 -04:00
hash , 1 ) ;
2017-04-14 17:45:45 -04:00
/* still need to update the function call sites */
2017-04-04 18:16:29 -04:00
if ( ftrace_enabled & & ! ftrace_hash_empty ( hash ) )
2017-04-18 14:50:39 -04:00
ftrace_run_modify_code ( & probe - > ops , FTRACE_UPDATE_CALLS ,
2017-04-14 17:45:45 -04:00
& old_hash_ops ) ;
2018-11-06 18:44:52 -08:00
synchronize_rcu ( ) ;
2014-07-24 15:33:41 -04:00
2017-04-04 21:31:28 -04:00
hlist_for_each_entry_safe ( entry , tmp , & hhd , hlist ) {
hlist_del ( & entry - > hlist ) ;
2017-04-18 14:50:39 -04:00
if ( probe_ops - > free )
tracing/ftrace: Add a better way to pass data via the probe functions
With the redesign of the registration and execution of the function probes
(triggers), data can now be passed from the setup of the probe to the probe
callers that are specific to the trace_array it is on. Although, all probes
still only affect the toplevel trace array, this change will allow for
instances to have their own probes separated from other instances and the
top array.
That is, something like the stacktrace probe can be set to trace only in an
instance and not the toplevel trace array. This isn't implement yet, but
this change sets the ground work for the change.
When a probe callback is triggered (someone writes the probe format into
set_ftrace_filter), it calls register_ftrace_function_probe() passing in
init_data that will be used to initialize the probe. Then for every matching
function, register_ftrace_function_probe() will call the probe_ops->init()
function with the init data that was passed to it, as well as an address to
a place holder that is associated with the probe and the instance. The first
occurrence will have a NULL in the pointer. The init() function will then
initialize it. If other probes are added, or more functions are part of the
probe, the place holder will be passed to the init() function with the place
holder data that it was initialized to the last time.
Then this place_holder is passed to each of the other probe_ops functions,
where it can be used in the function callback. When the probe_ops free()
function is called, it can be called either with the rip of the function
that is being removed from the probe, or zero, indicating that there are no
more functions attached to the probe, and the place holder is about to be
freed. This gives the probe_ops a way to free the data it assigned to the
place holder if it was allocade during the first init call.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-19 22:39:44 -04:00
probe_ops - > free ( probe_ops , tr , entry - > ip , probe - > data ) ;
2017-04-04 21:31:28 -04:00
kfree ( entry ) ;
2013-03-13 12:42:58 -04:00
}
2013-05-09 14:44:21 +09:00
mutex_unlock ( & ftrace_lock ) ;
2015-09-29 19:46:14 +03:00
ftrace: Fix function probe to only enable needed functions
Currently the function probe enables all functions and runs a "hash"
against every function call to see if it should call a probe. This
is extremely wasteful.
Note, a probe is something like:
echo schedule:traceoff > /debug/tracing/set_ftrace_filter
When schedule is called, the probe will disable tracing. But currently,
it has a call back for *all* functions, and checks to see if the
called function is the probe that is needed.
The probe function has been created before ftrace was rewritten to
allow for more than one "op" to be registered by the function tracer.
When probes were created, it couldn't limit the functions without also
limiting normal function calls. But now we can, it's about time
to update the probe code.
Todo, have separate ops for different entries. That is, assign
a ftrace_ops per probe, instead of one op for all probes. But
as there's not many probes assigned, this may not be that urgent.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-12 10:09:42 -04:00
out_unlock :
2017-04-18 14:50:39 -04:00
mutex_unlock ( & probe - > ops . func_hash - > regex_lock ) ;
ftrace: Fix function probe to only enable needed functions
Currently the function probe enables all functions and runs a "hash"
against every function call to see if it should call a probe. This
is extremely wasteful.
Note, a probe is something like:
echo schedule:traceoff > /debug/tracing/set_ftrace_filter
When schedule is called, the probe will disable tracing. But currently,
it has a call back for *all* functions, and checks to see if the
called function is the probe that is needed.
The probe function has been created before ftrace was rewritten to
allow for more than one "op" to be registered by the function tracer.
When probes were created, it couldn't limit the functions without also
limiting normal function calls. But now we can, it's about time
to update the probe code.
Todo, have separate ops for different entries. That is, assign
a ftrace_ops per probe, instead of one op for all probes. But
as there's not many probes assigned, this may not be that urgent.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-12 10:09:42 -04:00
free_ftrace_hash ( hash ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-18 14:50:39 -04:00
release_probe ( probe ) ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-18 14:50:39 -04:00
return ret ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
2017-04-18 14:50:39 -04:00
err_unlock_ftrace :
mutex_unlock ( & ftrace_lock ) ;
2017-04-04 16:44:43 -04:00
return ret ;
ftrace: trace different functions with a different tracer
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-14 15:29:06 -05:00
}
2017-05-16 23:21:26 +05:30
void clear_ftrace_function_probes ( struct trace_array * tr )
{
struct ftrace_func_probe * probe , * n ;
list_for_each_entry_safe ( probe , n , & tr - > func_probes , list )
unregister_ftrace_function_probe_func ( NULL , tr , probe - > probe_ops ) ;
}
2009-02-14 00:40:25 -05:00
static LIST_HEAD ( ftrace_commands ) ;
static DEFINE_MUTEX ( ftrace_cmd_mutex ) ;
2013-10-24 08:34:18 -05:00
/*
* Currently we only register ftrace commands from __init , so mark this
* __init too .
*/
__init int register_ftrace_command ( struct ftrace_func_command * cmd )
2009-02-14 00:40:25 -05:00
{
struct ftrace_func_command * p ;
int ret = 0 ;
mutex_lock ( & ftrace_cmd_mutex ) ;
list_for_each_entry ( p , & ftrace_commands , list ) {
if ( strcmp ( cmd - > name , p - > name ) = = 0 ) {
ret = - EBUSY ;
goto out_unlock ;
}
}
list_add ( & cmd - > list , & ftrace_commands ) ;
out_unlock :
mutex_unlock ( & ftrace_cmd_mutex ) ;
return ret ;
}
2013-10-24 08:34:18 -05:00
/*
* Currently we only unregister ftrace commands from __init , so mark
* this __init too .
*/
__init int unregister_ftrace_command ( struct ftrace_func_command * cmd )
2009-02-14 00:40:25 -05:00
{
struct ftrace_func_command * p , * n ;
int ret = - ENODEV ;
mutex_lock ( & ftrace_cmd_mutex ) ;
list_for_each_entry_safe ( p , n , & ftrace_commands , list ) {
if ( strcmp ( cmd - > name , p - > name ) = = 0 ) {
ret = 0 ;
list_del_init ( & p - > list ) ;
goto out_unlock ;
}
}
out_unlock :
mutex_unlock ( & ftrace_cmd_mutex ) ;
return ret ;
}
2017-04-05 13:12:55 -04:00
static int ftrace_process_regex ( struct ftrace_iterator * iter ,
2011-05-02 17:34:47 -04:00
char * buff , int len , int enable )
2009-02-13 17:08:48 -05:00
{
2017-04-05 13:12:55 -04:00
struct ftrace_hash * hash = iter - > hash ;
2017-04-20 11:31:35 -04:00
struct trace_array * tr = iter - > ops - > private ;
2009-02-14 00:40:25 -05:00
char * func , * command , * next = buff ;
2009-02-17 11:20:26 -05:00
struct ftrace_func_command * p ;
2011-06-01 19:18:47 +08:00
int ret = - EINVAL ;
2009-02-13 17:08:48 -05:00
func = strsep ( & next , " : " ) ;
if ( ! next ) {
2011-04-29 20:59:51 -04:00
ret = ftrace_match_records ( hash , func , len ) ;
2011-04-29 15:12:32 -04:00
if ( ! ret )
ret = - EINVAL ;
if ( ret < 0 )
return ret ;
return 0 ;
2009-02-13 17:08:48 -05:00
}
2009-02-14 00:40:25 -05:00
/* command found */
2009-02-13 17:08:48 -05:00
command = strsep ( & next , " : " ) ;
2009-02-14 00:40:25 -05:00
mutex_lock ( & ftrace_cmd_mutex ) ;
list_for_each_entry ( p , & ftrace_commands , list ) {
if ( strcmp ( p - > name , command ) = = 0 ) {
2017-04-05 13:12:55 -04:00
ret = p - > func ( tr , hash , func , command , next , enable ) ;
2009-02-14 00:40:25 -05:00
goto out_unlock ;
}
2009-02-13 17:08:48 -05:00
}
2009-02-14 00:40:25 -05:00
out_unlock :
mutex_unlock ( & ftrace_cmd_mutex ) ;
2009-02-13 17:08:48 -05:00
2009-02-14 00:40:25 -05:00
return ret ;
2009-02-13 17:08:48 -05:00
}
2008-05-12 21:20:51 +02:00
static ssize_t
2008-05-22 11:46:33 -04:00
ftrace_regex_write ( struct file * file , const char __user * ubuf ,
size_t cnt , loff_t * ppos , int enable )
2008-05-12 21:20:43 +02:00
{
struct ftrace_iterator * iter ;
2009-09-11 17:29:29 +02:00
struct trace_parser * parser ;
ssize_t ret , read ;
2008-05-12 21:20:43 +02:00
2009-09-22 13:52:20 +08:00
if ( ! cnt )
2008-05-12 21:20:43 +02:00
return 0 ;
if ( file - > f_mode & FMODE_READ ) {
struct seq_file * m = file - > private_data ;
iter = m - > private ;
} else
iter = file - > private_data ;
2013-05-09 14:44:17 +09:00
if ( unlikely ( ftrace_disabled ) )
2013-05-09 14:44:21 +09:00
return - ENODEV ;
/* iter->hash is a local copy, so we don't need regex_lock */
2013-05-09 14:44:17 +09:00
2009-09-11 17:29:29 +02:00
parser = & iter - > parser ;
read = trace_get_user ( parser , ubuf , cnt , ppos ) ;
2008-05-12 21:20:43 +02:00
2009-09-22 13:52:20 +08:00
if ( read > = 0 & & trace_parser_loaded ( parser ) & &
2009-09-11 17:29:29 +02:00
! trace_parser_cont ( parser ) ) {
2017-04-05 13:12:55 -04:00
ret = ftrace_process_regex ( iter , parser - > buffer ,
2009-09-11 17:29:29 +02:00
parser - > idx , enable ) ;
2009-12-08 11:15:30 +08:00
trace_parser_clear ( parser ) ;
2013-05-09 11:35:12 -04:00
if ( ret < 0 )
2013-05-09 14:44:21 +09:00
goto out ;
2009-08-11 17:29:04 +02:00
}
2008-05-12 21:20:43 +02:00
ret = read ;
2013-05-09 14:44:21 +09:00
out :
2008-05-12 21:20:43 +02:00
return ret ;
}
2011-12-19 14:41:25 -05:00
ssize_t
2008-05-22 11:46:33 -04:00
ftrace_filter_write ( struct file * file , const char __user * ubuf ,
size_t cnt , loff_t * ppos )
{
return ftrace_regex_write ( file , ubuf , cnt , ppos , 1 ) ;
}
2011-12-19 14:41:25 -05:00
ssize_t
2008-05-22 11:46:33 -04:00
ftrace_notrace_write ( struct file * file , const char __user * ubuf ,
size_t cnt , loff_t * ppos )
{
return ftrace_regex_write ( file , ubuf , cnt , ppos , 0 ) ;
}
2011-05-02 17:34:47 -04:00
static int
2022-03-15 23:00:26 +09:00
__ftrace_match_addr ( struct ftrace_hash * hash , unsigned long ip , int remove )
2012-06-05 19:28:08 +09:00
{
struct ftrace_func_entry * entry ;
2022-03-08 16:30:29 +01:00
ip = ftrace_location ( ip ) ;
if ( ! ip )
2012-06-05 19:28:08 +09:00
return - EINVAL ;
if ( remove ) {
entry = ftrace_lookup_ip ( hash , ip ) ;
if ( ! entry )
return - ENOENT ;
free_hash_entry ( hash , entry ) ;
return 0 ;
}
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
entry = add_hash_entry ( hash , ip ) ;
return entry ? 0 : - ENOMEM ;
2012-06-05 19:28:08 +09:00
}
2022-03-15 23:00:26 +09:00
static int
ftrace_match_addr ( struct ftrace_hash * hash , unsigned long * ips ,
unsigned int cnt , int remove )
{
unsigned int i ;
int err ;
for ( i = 0 ; i < cnt ; i + + ) {
err = __ftrace_match_addr ( hash , ips [ i ] , remove ) ;
if ( err ) {
/*
* This expects the @ hash is a temporary hash and if this
* fails the caller must free the @ hash .
*/
return err ;
}
}
return 0 ;
}
2012-06-05 19:28:08 +09:00
static int
ftrace_set_hash ( struct ftrace_ops * ops , unsigned char * buf , int len ,
2022-03-15 23:00:26 +09:00
unsigned long * ips , unsigned int cnt ,
int remove , int reset , int enable )
2008-05-22 11:46:33 -04:00
{
2011-05-02 17:34:47 -04:00
struct ftrace_hash * * orig_hash ;
2011-05-02 12:29:25 -04:00
struct ftrace_hash * hash ;
2011-05-02 17:34:47 -04:00
int ret ;
2011-05-02 12:29:25 -04:00
2008-05-22 11:46:33 -04:00
if ( unlikely ( ftrace_disabled ) )
2011-05-02 17:34:47 -04:00
return - ENODEV ;
2008-05-22 11:46:33 -04:00
2014-08-15 17:23:02 -04:00
mutex_lock ( & ops - > func_hash - > regex_lock ) ;
2013-05-09 14:44:21 +09:00
2011-05-02 12:29:25 -04:00
if ( enable )
2014-08-15 17:23:02 -04:00
orig_hash = & ops - > func_hash - > filter_hash ;
2011-05-02 12:29:25 -04:00
else
2014-08-15 17:23:02 -04:00
orig_hash = & ops - > func_hash - > notrace_hash ;
2011-05-02 17:34:47 -04:00
2014-07-15 08:40:20 +08:00
if ( reset )
hash = alloc_ftrace_hash ( FTRACE_HASH_DEFAULT_BITS ) ;
else
hash = alloc_and_copy_ftrace_hash ( FTRACE_HASH_DEFAULT_BITS , * orig_hash ) ;
2013-05-09 14:44:21 +09:00
if ( ! hash ) {
ret = - ENOMEM ;
goto out_regex_unlock ;
}
2011-05-02 12:29:25 -04:00
2012-01-02 10:04:14 +01:00
if ( buf & & ! ftrace_match_records ( hash , buf , len ) ) {
ret = - EINVAL ;
goto out_regex_unlock ;
}
2022-03-15 23:00:26 +09:00
if ( ips ) {
ret = ftrace_match_addr ( hash , ips , cnt , remove ) ;
2012-06-05 19:28:08 +09:00
if ( ret < 0 )
goto out_regex_unlock ;
}
2011-05-02 17:34:47 -04:00
mutex_lock ( & ftrace_lock ) ;
2017-04-04 14:46:56 -04:00
ret = ftrace_hash_move_and_update_ops ( ops , orig_hash , hash , enable ) ;
2011-05-02 17:34:47 -04:00
mutex_unlock ( & ftrace_lock ) ;
2012-01-02 10:04:14 +01:00
out_regex_unlock :
2014-08-15 17:23:02 -04:00
mutex_unlock ( & ops - > func_hash - > regex_lock ) ;
2011-05-02 17:34:47 -04:00
free_ftrace_hash ( hash ) ;
return ret ;
2008-05-22 11:46:33 -04:00
}
2012-06-05 19:28:08 +09:00
static int
2022-03-15 23:00:26 +09:00
ftrace_set_addr ( struct ftrace_ops * ops , unsigned long * ips , unsigned int cnt ,
int remove , int reset , int enable )
2012-06-05 19:28:08 +09:00
{
2022-03-15 23:00:26 +09:00
return ftrace_set_hash ( ops , NULL , 0 , ips , cnt , remove , reset , enable ) ;
2012-06-05 19:28:08 +09:00
}
2019-11-08 13:07:06 -05:00
# ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
2019-11-08 13:11:27 -05:00
struct ftrace_direct_func {
struct list_head next ;
unsigned long addr ;
int count ;
} ;
static LIST_HEAD ( ftrace_direct_funcs ) ;
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
static int register_ftrace_function_nolock ( struct ftrace_ops * ops ) ;
2024-01-10 09:13:06 +09:00
/*
* If there are multiple ftrace_ops , use SAVE_REGS by default , so that direct
* call will be jumped from ftrace_regs_caller . Only if the architecture does
* not support ftrace_regs_caller but direct_call , use SAVE_ARGS so that it
* jumps from ftrace_caller for multiple ftrace_ops .
*/
2024-02-13 14:24:34 +01:00
# ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS
2023-03-21 15:04:23 +01:00
# define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_ARGS)
2024-01-10 09:13:06 +09:00
# else
# define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS)
# endif
2021-10-08 11:13:34 +02:00
static int check_direct_multi ( struct ftrace_ops * ops )
{
if ( ! ( ops - > flags & FTRACE_OPS_FL_INITIALIZED ) )
return - EINVAL ;
if ( ( ops - > flags & MULTI_FLAGS ) ! = MULTI_FLAGS )
return - EINVAL ;
return 0 ;
}
static void remove_direct_functions_hash ( struct ftrace_hash * hash , unsigned long addr )
{
struct ftrace_func_entry * entry , * del ;
int size , i ;
size = 1 < < hash - > size_bits ;
for ( i = 0 ; i < size ; i + + ) {
hlist_for_each_entry ( entry , & hash - > buckets [ i ] , hlist ) {
del = __ftrace_lookup_ip ( direct_functions , entry - > ip ) ;
if ( del & & del - > direct = = addr ) {
remove_hash_entry ( direct_functions , del ) ;
kfree ( del ) ;
}
}
}
}
/**
2023-03-21 15:04:21 +01:00
* register_ftrace_direct - Call a custom trampoline directly
2021-10-08 11:13:34 +02:00
* for multiple functions registered in @ ops
* @ ops : The address of the struct ftrace_ops object
* @ addr : The address of the trampoline to call at @ ops functions
*
* This is used to connect a direct calls to @ addr from the nop locations
* of the functions registered in @ ops ( with by ftrace_set_filter_ip
* function ) .
*
* The location that it calls ( @ addr ) must be able to handle a direct call ,
* and save the parameters of the function being traced , and restore them
* ( or inject new ones if needed ) , before returning .
*
* Returns :
* 0 on success
* - EINVAL - The @ ops object was already registered with this call or
* when there are no functions in @ ops object .
* - EBUSY - Another direct function is already attached ( there can be only one )
* - ENODEV - @ ip does not point to a ftrace nop location ( or not supported )
* - ENOMEM - There was an allocation failure .
*/
2023-03-21 15:04:21 +01:00
int register_ftrace_direct ( struct ftrace_ops * ops , unsigned long addr )
2021-10-08 11:13:34 +02:00
{
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
struct ftrace_hash * hash , * new_hash = NULL , * free_hash = NULL ;
2021-10-08 11:13:34 +02:00
struct ftrace_func_entry * entry , * new ;
int err = - EBUSY , size , i ;
if ( ops - > func | | ops - > trampoline )
return - EINVAL ;
if ( ! ( ops - > flags & FTRACE_OPS_FL_INITIALIZED ) )
return - EINVAL ;
if ( ops - > flags & FTRACE_OPS_FL_ENABLED )
return - EINVAL ;
hash = ops - > func_hash - > filter_hash ;
if ( ftrace_hash_empty ( hash ) )
return - EINVAL ;
mutex_lock ( & direct_mutex ) ;
/* Make sure requested entries are not already registered.. */
size = 1 < < hash - > size_bits ;
for ( i = 0 ; i < size ; i + + ) {
hlist_for_each_entry ( entry , & hash - > buckets [ i ] , hlist ) {
if ( ftrace_find_rec_direct ( entry - > ip ) )
goto out_unlock ;
}
}
err = - ENOMEM ;
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
/* Make a copy hash to place the new and the old entries in */
size = hash - > count + direct_functions - > count ;
if ( size > 32 )
size = 32 ;
new_hash = alloc_ftrace_hash ( fls ( size ) ) ;
if ( ! new_hash )
goto out_unlock ;
/* Now copy over the existing direct entries */
size = 1 < < direct_functions - > size_bits ;
for ( i = 0 ; i < size ; i + + ) {
hlist_for_each_entry ( entry , & direct_functions - > buckets [ i ] , hlist ) {
new = add_hash_entry ( new_hash , entry - > ip ) ;
if ( ! new )
goto out_unlock ;
new - > direct = entry - > direct ;
}
}
/* ... and add the new entries */
size = 1 < < hash - > size_bits ;
2021-10-08 11:13:34 +02:00
for ( i = 0 ; i < size ; i + + ) {
hlist_for_each_entry ( entry , & hash - > buckets [ i ] , hlist ) {
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
new = add_hash_entry ( new_hash , entry - > ip ) ;
2021-10-08 11:13:34 +02:00
if ( ! new )
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
goto out_unlock ;
/* Update both the copy and the hash entry */
new - > direct = addr ;
2021-10-08 11:13:34 +02:00
entry - > direct = addr ;
}
}
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
free_hash = direct_functions ;
rcu_assign_pointer ( direct_functions , new_hash ) ;
new_hash = NULL ;
2021-10-08 11:13:34 +02:00
ops - > func = call_direct_funcs ;
ops - > flags = MULTI_FLAGS ;
ops - > trampoline = FTRACE_REGS_ADDR ;
2023-03-21 15:04:22 +01:00
ops - > direct_call = addr ;
2021-10-08 11:13:34 +02:00
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
err = register_ftrace_function_nolock ( ops ) ;
2021-10-08 11:13:34 +02:00
out_unlock :
mutex_unlock ( & direct_mutex ) ;
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
if ( free_hash & & free_hash ! = EMPTY_HASH ) {
2021-10-08 11:13:34 +02:00
synchronize_rcu_tasks ( ) ;
free_ftrace_hash ( free_hash ) ;
}
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
if ( new_hash )
free_ftrace_hash ( new_hash ) ;
2021-10-08 11:13:34 +02:00
return err ;
}
2023-03-21 15:04:21 +01:00
EXPORT_SYMBOL_GPL ( register_ftrace_direct ) ;
2021-10-08 11:13:34 +02:00
/**
2023-03-21 15:04:21 +01:00
* unregister_ftrace_direct - Remove calls to custom trampoline
* previously registered by register_ftrace_direct for @ ops object .
2021-10-08 11:13:34 +02:00
* @ ops : The address of the struct ftrace_ops object
*
* This is used to remove a direct calls to @ addr from the nop locations
* of the functions registered in @ ops ( with by ftrace_set_filter_ip
* function ) .
*
* Returns :
* 0 on success
* - EINVAL - The @ ops object was not properly registered .
*/
2023-03-21 15:04:21 +01:00
int unregister_ftrace_direct ( struct ftrace_ops * ops , unsigned long addr ,
bool free_filters )
2021-10-08 11:13:34 +02:00
{
struct ftrace_hash * hash = ops - > func_hash - > filter_hash ;
int err ;
if ( check_direct_multi ( ops ) )
return - EINVAL ;
if ( ! ( ops - > flags & FTRACE_OPS_FL_ENABLED ) )
return - EINVAL ;
mutex_lock ( & direct_mutex ) ;
err = unregister_ftrace_function ( ops ) ;
remove_direct_functions_hash ( hash , addr ) ;
mutex_unlock ( & direct_mutex ) ;
2021-12-06 19:20:31 +01:00
/* cleanup for possible another register call */
ops - > func = NULL ;
ops - > trampoline = 0 ;
2023-03-21 15:04:18 +01:00
if ( free_filters )
ftrace_free_filter ( ops ) ;
2021-10-08 11:13:34 +02:00
return err ;
}
2023-03-21 15:04:21 +01:00
EXPORT_SYMBOL_GPL ( unregister_ftrace_direct ) ;
2021-10-08 11:13:35 +02:00
2022-07-19 17:21:23 -07:00
static int
2023-03-21 15:04:21 +01:00
__modify_ftrace_direct ( struct ftrace_ops * ops , unsigned long addr )
2021-10-08 11:13:35 +02:00
{
2021-10-14 16:11:14 -04:00
struct ftrace_hash * hash ;
2021-10-08 11:13:35 +02:00
struct ftrace_func_entry * entry , * iter ;
2021-10-14 16:11:14 -04:00
static struct ftrace_ops tmp_ops = {
. func = ftrace_stub ,
. flags = FTRACE_OPS_FL_STUB ,
} ;
2021-10-08 11:13:35 +02:00
int i , size ;
int err ;
2022-07-19 17:21:23 -07:00
lockdep_assert_held_once ( & direct_mutex ) ;
2021-10-14 16:11:14 -04:00
/* Enable the tmp_ops to have the same functions as the direct ops */
ftrace_ops_init ( & tmp_ops ) ;
tmp_ops . func_hash = ops - > func_hash ;
2023-03-21 15:04:22 +01:00
tmp_ops . direct_call = addr ;
2021-10-14 16:11:14 -04:00
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
err = register_ftrace_function_nolock ( & tmp_ops ) ;
2021-10-14 16:11:14 -04:00
if ( err )
2022-07-19 17:21:23 -07:00
return err ;
2021-10-08 11:13:35 +02:00
/*
2021-10-14 16:11:14 -04:00
* Now the ftrace_ops_list_func ( ) is called to do the direct callers .
* We can safely change the direct functions attached to each entry .
2021-10-08 11:13:35 +02:00
*/
2021-10-14 16:11:14 -04:00
mutex_lock ( & ftrace_lock ) ;
2021-10-08 11:13:35 +02:00
2021-10-14 16:11:14 -04:00
hash = ops - > func_hash - > filter_hash ;
2021-10-08 11:13:35 +02:00
size = 1 < < hash - > size_bits ;
for ( i = 0 ; i < size ; i + + ) {
hlist_for_each_entry ( iter , & hash - > buckets [ i ] , hlist ) {
entry = __ftrace_lookup_ip ( direct_functions , iter - > ip ) ;
if ( ! entry )
continue ;
entry - > direct = addr ;
}
}
2023-03-21 15:04:22 +01:00
/* Prevent store tearing if a trampoline concurrently accesses the value */
WRITE_ONCE ( ops - > direct_call , addr ) ;
2021-10-08 11:13:35 +02:00
2021-11-09 12:42:17 +01:00
mutex_unlock ( & ftrace_lock ) ;
2021-10-14 16:11:14 -04:00
/* Removing the tmp_ops will add the updated direct callers to the functions */
unregister_ftrace_function ( & tmp_ops ) ;
2021-10-08 11:13:35 +02:00
2022-07-19 17:21:23 -07:00
return err ;
}
/**
2023-03-21 15:04:21 +01:00
* modify_ftrace_direct_nolock - Modify an existing direct ' multi ' call
2022-07-19 17:21:23 -07:00
* to call something else
* @ ops : The address of the struct ftrace_ops object
* @ addr : The address of the new trampoline to call at @ ops functions
*
* This is used to unregister currently registered direct caller and
* register new one @ addr on functions registered in @ ops object .
*
* Note there ' s window between ftrace_shutdown and ftrace_startup calls
* where there will be no callbacks called .
*
* Caller should already have direct_mutex locked , so we don ' t lock
* direct_mutex here .
*
* Returns : zero on success . Non zero on error , which includes :
* - EINVAL - The @ ops object was not properly registered .
*/
2023-03-21 15:04:21 +01:00
int modify_ftrace_direct_nolock ( struct ftrace_ops * ops , unsigned long addr )
2022-07-19 17:21:23 -07:00
{
if ( check_direct_multi ( ops ) )
return - EINVAL ;
if ( ! ( ops - > flags & FTRACE_OPS_FL_ENABLED ) )
return - EINVAL ;
2023-03-21 15:04:21 +01:00
return __modify_ftrace_direct ( ops , addr ) ;
2022-07-19 17:21:23 -07:00
}
2023-03-21 15:04:21 +01:00
EXPORT_SYMBOL_GPL ( modify_ftrace_direct_nolock ) ;
2022-07-19 17:21:23 -07:00
/**
2023-03-21 15:04:21 +01:00
* modify_ftrace_direct - Modify an existing direct ' multi ' call
2022-07-19 17:21:23 -07:00
* to call something else
* @ ops : The address of the struct ftrace_ops object
* @ addr : The address of the new trampoline to call at @ ops functions
*
* This is used to unregister currently registered direct caller and
* register new one @ addr on functions registered in @ ops object .
*
* Note there ' s window between ftrace_shutdown and ftrace_startup calls
* where there will be no callbacks called .
*
* Returns : zero on success . Non zero on error , which includes :
* - EINVAL - The @ ops object was not properly registered .
*/
2023-03-21 15:04:21 +01:00
int modify_ftrace_direct ( struct ftrace_ops * ops , unsigned long addr )
2022-07-19 17:21:23 -07:00
{
int err ;
if ( check_direct_multi ( ops ) )
return - EINVAL ;
if ( ! ( ops - > flags & FTRACE_OPS_FL_ENABLED ) )
return - EINVAL ;
mutex_lock ( & direct_mutex ) ;
2023-03-21 15:04:21 +01:00
err = __modify_ftrace_direct ( ops , addr ) ;
2021-10-08 11:13:35 +02:00
mutex_unlock ( & direct_mutex ) ;
return err ;
}
2023-03-21 15:04:21 +01:00
EXPORT_SYMBOL_GPL ( modify_ftrace_direct ) ;
2019-11-08 13:07:06 -05:00
# endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
2012-06-05 19:28:08 +09:00
/**
* ftrace_set_filter_ip - set a function to filter on in ftrace by address
2024-02-22 21:48:33 -08:00
* @ ops : the ops to set the filter with
* @ ip : the address to add to or remove from the filter .
* @ remove : non zero to remove the ip from the filter
* @ reset : non zero to reset all filters before applying this filter .
2012-06-05 19:28:08 +09:00
*
* Filters denote which functions should be enabled when tracing is enabled
2021-03-23 18:49:35 +01:00
* If @ ip is NULL , it fails to update filter .
2023-01-03 12:49:11 +00:00
*
* This can allocate memory which must be freed before @ ops can be freed ,
* either by removing each filtered addr or by using
* ftrace_free_filter ( @ ops ) .
2012-06-05 19:28:08 +09:00
*/
int ftrace_set_filter_ip ( struct ftrace_ops * ops , unsigned long ip ,
int remove , int reset )
{
2013-05-09 14:44:17 +09:00
ftrace_ops_init ( ops ) ;
2022-03-15 23:00:26 +09:00
return ftrace_set_addr ( ops , & ip , 1 , remove , reset , 1 ) ;
2012-06-05 19:28:08 +09:00
}
EXPORT_SYMBOL_GPL ( ftrace_set_filter_ip ) ;
2022-03-15 23:00:26 +09:00
/**
* ftrace_set_filter_ips - set functions to filter on in ftrace by addresses
2024-02-22 21:48:33 -08:00
* @ ops : the ops to set the filter with
* @ ips : the array of addresses to add to or remove from the filter .
* @ cnt : the number of addresses in @ ips
* @ remove : non zero to remove ips from the filter
* @ reset : non zero to reset all filters before applying this filter .
2022-03-15 23:00:26 +09:00
*
* Filters denote which functions should be enabled when tracing is enabled
* If @ ips array or any ip specified within is NULL , it fails to update filter .
2023-01-03 12:49:11 +00:00
*
* This can allocate memory which must be freed before @ ops can be freed ,
* either by removing each filtered addr or by using
* ftrace_free_filter ( @ ops ) .
*/
2022-03-15 23:00:26 +09:00
int ftrace_set_filter_ips ( struct ftrace_ops * ops , unsigned long * ips ,
unsigned int cnt , int remove , int reset )
{
ftrace_ops_init ( ops ) ;
return ftrace_set_addr ( ops , ips , cnt , remove , reset , 1 ) ;
}
EXPORT_SYMBOL_GPL ( ftrace_set_filter_ips ) ;
2016-11-15 12:31:20 -08:00
/**
* ftrace_ops_set_global_filter - setup ops to use global filters
2024-02-22 21:48:33 -08:00
* @ ops : the ops which will use the global filters
2016-11-15 12:31:20 -08:00
*
* ftrace users who need global function trace filtering should call this .
* It can set the global filter only if ops were not initialized before .
*/
void ftrace_ops_set_global_filter ( struct ftrace_ops * ops )
{
if ( ops - > flags & FTRACE_OPS_FL_INITIALIZED )
return ;
ftrace_ops_init ( ops ) ;
ops - > func_hash = & global_ops . local_hash ;
}
EXPORT_SYMBOL_GPL ( ftrace_ops_set_global_filter ) ;
2012-06-05 19:28:08 +09:00
static int
ftrace_set_regex ( struct ftrace_ops * ops , unsigned char * buf , int len ,
int reset , int enable )
{
2022-03-15 23:00:26 +09:00
return ftrace_set_hash ( ops , buf , len , NULL , 0 , 0 , reset , enable ) ;
2012-06-05 19:28:08 +09:00
}
2008-05-12 21:20:45 +02:00
/**
* ftrace_set_filter - set a function to filter on in ftrace
2024-02-22 21:48:33 -08:00
* @ ops : the ops to set the filter with
* @ buf : the string that holds the function filter text .
* @ len : the length of the string .
* @ reset : non - zero to reset all filters before applying this filter .
2011-05-05 22:54:01 -04:00
*
* Filters denote which functions should be enabled when tracing is enabled .
* If @ buf is NULL and reset is set , all functions will be enabled for tracing .
2023-01-03 12:49:11 +00:00
*
* This can allocate memory which must be freed before @ ops can be freed ,
* either by removing each filtered addr or by using
* ftrace_free_filter ( @ ops ) .
2011-05-05 22:54:01 -04:00
*/
2012-01-02 10:04:14 +01:00
int ftrace_set_filter ( struct ftrace_ops * ops , unsigned char * buf ,
2011-05-05 22:54:01 -04:00
int len , int reset )
{
2013-05-09 14:44:17 +09:00
ftrace_ops_init ( ops ) ;
2012-01-02 10:04:14 +01:00
return ftrace_set_regex ( ops , buf , len , reset , 1 ) ;
2011-05-05 22:54:01 -04:00
}
EXPORT_SYMBOL_GPL ( ftrace_set_filter ) ;
/**
* ftrace_set_notrace - set a function to not trace in ftrace
2024-02-22 21:48:33 -08:00
* @ ops : the ops to set the notrace filter with
* @ buf : the string that holds the function notrace text .
* @ len : the length of the string .
* @ reset : non - zero to reset all filters before applying this filter .
2011-05-05 22:54:01 -04:00
*
* Notrace Filters denote which functions should not be enabled when tracing
* is enabled . If @ buf is NULL and reset is set , all functions will be enabled
* for tracing .
2023-01-03 12:49:11 +00:00
*
* This can allocate memory which must be freed before @ ops can be freed ,
* either by removing each filtered addr or by using
* ftrace_free_filter ( @ ops ) .
2011-05-05 22:54:01 -04:00
*/
2012-01-02 10:04:14 +01:00
int ftrace_set_notrace ( struct ftrace_ops * ops , unsigned char * buf ,
2011-05-05 22:54:01 -04:00
int len , int reset )
{
2013-05-09 14:44:17 +09:00
ftrace_ops_init ( ops ) ;
2012-01-02 10:04:14 +01:00
return ftrace_set_regex ( ops , buf , len , reset , 0 ) ;
2011-05-05 22:54:01 -04:00
}
EXPORT_SYMBOL_GPL ( ftrace_set_notrace ) ;
/**
2014-04-20 23:10:44 +08:00
* ftrace_set_global_filter - set a function to filter on with global tracers
2024-02-22 21:48:33 -08:00
* @ buf : the string that holds the function filter text .
* @ len : the length of the string .
* @ reset : non - zero to reset all filters before applying this filter .
2008-05-12 21:20:45 +02:00
*
* Filters denote which functions should be enabled when tracing is enabled .
* If @ buf is NULL and reset is set , all functions will be enabled for tracing .
*/
2011-05-05 22:54:01 -04:00
void ftrace_set_global_filter ( unsigned char * buf , int len , int reset )
2008-05-12 21:20:45 +02:00
{
2011-05-02 12:29:25 -04:00
ftrace_set_regex ( & global_ops , buf , len , reset , 1 ) ;
2008-05-22 11:46:33 -04:00
}
2011-05-05 22:54:01 -04:00
EXPORT_SYMBOL_GPL ( ftrace_set_global_filter ) ;
2008-05-12 21:20:48 +02:00
2008-05-22 11:46:33 -04:00
/**
2014-04-20 23:10:44 +08:00
* ftrace_set_global_notrace - set a function to not trace with global tracers
2024-02-22 21:48:33 -08:00
* @ buf : the string that holds the function notrace text .
* @ len : the length of the string .
* @ reset : non - zero to reset all filters before applying this filter .
2008-05-22 11:46:33 -04:00
*
* Notrace Filters denote which functions should not be enabled when tracing
* is enabled . If @ buf is NULL and reset is set , all functions will be enabled
* for tracing .
*/
2011-05-05 22:54:01 -04:00
void ftrace_set_global_notrace ( unsigned char * buf , int len , int reset )
2008-05-22 11:46:33 -04:00
{
2011-05-02 12:29:25 -04:00
ftrace_set_regex ( & global_ops , buf , len , reset , 0 ) ;
2008-05-12 21:20:45 +02:00
}
2011-05-05 22:54:01 -04:00
EXPORT_SYMBOL_GPL ( ftrace_set_global_notrace ) ;
2008-05-12 21:20:45 +02:00
2009-05-28 13:37:24 -04:00
/*
* command line interface to allow users to set filters on boot up .
*/
# define FTRACE_FILTER_SIZE COMMAND_LINE_SIZE
static char ftrace_notrace_buf [ FTRACE_FILTER_SIZE ] __initdata ;
static char ftrace_filter_buf [ FTRACE_FILTER_SIZE ] __initdata ;
2013-06-27 22:18:06 -04:00
/* Used by function selftest to not test if filter is set */
bool ftrace_filter_param __initdata ;
2009-05-28 13:37:24 -04:00
static int __init set_ftrace_notrace ( char * str )
{
2013-06-27 22:18:06 -04:00
ftrace_filter_param = true ;
2023-05-17 14:53:23 +00:00
strscpy ( ftrace_notrace_buf , str , FTRACE_FILTER_SIZE ) ;
2009-05-28 13:37:24 -04:00
return 1 ;
}
__setup ( " ftrace_notrace= " , set_ftrace_notrace ) ;
static int __init set_ftrace_filter ( char * str )
{
2013-06-27 22:18:06 -04:00
ftrace_filter_param = true ;
2023-05-17 14:53:23 +00:00
strscpy ( ftrace_filter_buf , str , FTRACE_FILTER_SIZE ) ;
2009-05-28 13:37:24 -04:00
return 1 ;
}
__setup ( " ftrace_filter= " , set_ftrace_filter ) ;
2009-10-12 22:17:21 +02:00
# ifdef CONFIG_FUNCTION_GRAPH_TRACER
2009-11-05 11:16:17 +08:00
static char ftrace_graph_buf [ FTRACE_FILTER_SIZE ] __initdata ;
2014-06-13 01:23:50 +09:00
static char ftrace_graph_notrace_buf [ FTRACE_FILTER_SIZE ] __initdata ;
2017-01-20 11:44:47 +09:00
static int ftrace_graph_set_hash ( struct ftrace_hash * hash , char * buffer ) ;
2010-03-05 20:02:19 -05:00
2009-10-12 22:17:21 +02:00
static int __init set_graph_function ( char * str )
{
2023-05-17 14:53:23 +00:00
strscpy ( ftrace_graph_buf , str , FTRACE_FILTER_SIZE ) ;
2009-10-12 22:17:21 +02:00
return 1 ;
}
__setup ( " ftrace_graph_filter= " , set_graph_function ) ;
2014-06-13 01:23:50 +09:00
static int __init set_graph_notrace_function ( char * str )
{
2023-05-17 14:53:23 +00:00
strscpy ( ftrace_graph_notrace_buf , str , FTRACE_FILTER_SIZE ) ;
2014-06-13 01:23:50 +09:00
return 1 ;
}
__setup ( " ftrace_graph_notrace= " , set_graph_notrace_function ) ;
2017-03-02 16:12:15 -08:00
static int __init set_graph_max_depth_function ( char * str )
{
if ( ! str )
return 0 ;
fgraph_max_depth = simple_strtoul ( str , NULL , 0 ) ;
return 1 ;
}
__setup ( " ftrace_graph_max_depth= " , set_graph_max_depth_function ) ;
2014-06-13 01:23:50 +09:00
static void __init set_ftrace_early_graph ( char * buf , int enable )
2009-10-12 22:17:21 +02:00
{
int ret ;
char * func ;
2017-01-20 11:44:47 +09:00
struct ftrace_hash * hash ;
2014-06-13 01:23:50 +09:00
2017-03-02 12:53:26 -05:00
hash = alloc_ftrace_hash ( FTRACE_HASH_DEFAULT_BITS ) ;
2020-01-25 10:52:30 -05:00
if ( MEM_FAIL ( ! hash , " Failed to allocate hash \n " ) )
2017-03-02 12:53:26 -05:00
return ;
2009-10-12 22:17:21 +02:00
while ( buf ) {
func = strsep ( & buf , " , " ) ;
/* we allow only one expression at a time */
2017-01-20 11:44:47 +09:00
ret = ftrace_graph_set_hash ( hash , func ) ;
2009-10-12 22:17:21 +02:00
if ( ret )
printk ( KERN_DEBUG " ftrace: function %s not "
" traceable \n " , func ) ;
}
2017-03-02 12:53:26 -05:00
if ( enable )
ftrace_graph_hash = hash ;
else
ftrace_graph_notrace_hash = hash ;
2009-10-12 22:17:21 +02:00
}
# endif /* CONFIG_FUNCTION_GRAPH_TRACER */
2011-12-19 21:57:44 -05:00
void __init
ftrace_set_early_filter ( struct ftrace_ops * ops , char * buf , int enable )
2009-05-28 13:37:24 -04:00
{
char * func ;
2013-05-09 14:44:17 +09:00
ftrace_ops_init ( ops ) ;
2009-05-28 13:37:24 -04:00
while ( buf ) {
func = strsep ( & buf , " , " ) ;
2011-05-02 12:29:25 -04:00
ftrace_set_regex ( ops , func , strlen ( func ) , 0 , enable ) ;
2009-05-28 13:37:24 -04:00
}
}
static void __init set_ftrace_early_filters ( void )
{
if ( ftrace_filter_buf [ 0 ] )
2011-12-19 21:57:44 -05:00
ftrace_set_early_filter ( & global_ops , ftrace_filter_buf , 1 ) ;
2009-05-28 13:37:24 -04:00
if ( ftrace_notrace_buf [ 0 ] )
2011-12-19 21:57:44 -05:00
ftrace_set_early_filter ( & global_ops , ftrace_notrace_buf , 0 ) ;
2009-10-12 22:17:21 +02:00
# ifdef CONFIG_FUNCTION_GRAPH_TRACER
if ( ftrace_graph_buf [ 0 ] )
2014-06-13 01:23:50 +09:00
set_ftrace_early_graph ( ftrace_graph_buf , 1 ) ;
if ( ftrace_graph_notrace_buf [ 0 ] )
set_ftrace_early_graph ( ftrace_graph_notrace_buf , 0 ) ;
2009-10-12 22:17:21 +02:00
# endif /* CONFIG_FUNCTION_GRAPH_TRACER */
2009-05-28 13:37:24 -04:00
}
2011-12-19 14:41:25 -05:00
int ftrace_regex_release ( struct inode * inode , struct file * file )
2008-05-12 21:20:43 +02:00
{
struct seq_file * m = ( struct seq_file * ) file - > private_data ;
struct ftrace_iterator * iter ;
2011-05-02 17:34:47 -04:00
struct ftrace_hash * * orig_hash ;
2009-09-11 17:29:29 +02:00
struct trace_parser * parser ;
2011-05-03 13:25:24 -04:00
int filter_hash ;
2008-05-12 21:20:43 +02:00
if ( file - > f_mode & FMODE_READ ) {
iter = m - > private ;
seq_release ( inode , file ) ;
} else
iter = file - > private_data ;
2009-09-11 17:29:29 +02:00
parser = & iter - > parser ;
if ( trace_parser_loaded ( parser ) ) {
2021-05-05 10:38:24 -04:00
int enable = ! ( iter - > flags & FTRACE_ITER_NOTRACE ) ;
ftrace_process_regex ( iter , parser - > buffer ,
parser - > idx , enable ) ;
2008-05-12 21:20:43 +02:00
}
2009-09-11 17:29:29 +02:00
trace_parser_put ( parser ) ;
2014-08-15 17:23:02 -04:00
mutex_lock ( & iter - > ops - > func_hash - > regex_lock ) ;
2013-05-09 14:44:21 +09:00
2011-04-29 22:35:33 -04:00
if ( file - > f_mode & FMODE_WRITE ) {
2011-05-03 13:25:24 -04:00
filter_hash = ! ! ( iter - > flags & FTRACE_ITER_FILTER ) ;
2017-06-26 11:47:31 -04:00
if ( filter_hash ) {
2014-08-15 17:23:02 -04:00
orig_hash = & iter - > ops - > func_hash - > filter_hash ;
2022-09-26 15:20:08 +00:00
if ( iter - > tr ) {
if ( list_empty ( & iter - > tr - > mod_trace ) )
iter - > hash - > flags & = ~ FTRACE_HASH_FL_MOD ;
else
iter - > hash - > flags | = FTRACE_HASH_FL_MOD ;
}
2017-06-26 11:47:31 -04:00
} else
2014-08-15 17:23:02 -04:00
orig_hash = & iter - > ops - > func_hash - > notrace_hash ;
2011-05-02 17:34:47 -04:00
2011-04-29 22:35:33 -04:00
mutex_lock ( & ftrace_lock ) ;
2020-11-06 22:54:46 +08:00
ftrace_hash_move_and_update_ops ( iter - > ops , orig_hash ,
2017-04-04 14:46:56 -04:00
iter - > hash , filter_hash ) ;
2011-04-29 22:35:33 -04:00
mutex_unlock ( & ftrace_lock ) ;
2017-03-29 14:55:49 -04:00
} else {
/* For read only, the hash is the ops hash */
iter - > hash = NULL ;
2011-04-29 22:35:33 -04:00
}
2013-05-09 14:44:21 +09:00
2014-08-15 17:23:02 -04:00
mutex_unlock ( & iter - > ops - > func_hash - > regex_lock ) ;
2011-05-02 17:34:47 -04:00
free_ftrace_hash ( iter - > hash ) ;
2019-10-11 17:56:57 -04:00
if ( iter - > tr )
trace_array_put ( iter - > tr ) ;
2011-05-02 17:34:47 -04:00
kfree ( iter ) ;
2011-04-29 22:35:33 -04:00
2008-05-12 21:20:43 +02:00
return 0 ;
}
2009-03-05 21:44:55 -05:00
static const struct file_operations ftrace_avail_fops = {
2008-05-12 21:20:43 +02:00
. open = ftrace_avail_open ,
. read = seq_read ,
. llseek = seq_lseek ,
2009-08-17 16:54:03 +08:00
. release = seq_release_private ,
2008-05-12 21:20:43 +02:00
} ;
2011-05-03 14:39:21 -04:00
static const struct file_operations ftrace_enabled_fops = {
. open = ftrace_enabled_open ,
. read = seq_read ,
. llseek = seq_lseek ,
. release = seq_release_private ,
} ;
2023-01-24 09:56:53 -05:00
static const struct file_operations ftrace_touched_fops = {
. open = ftrace_touched_open ,
. read = seq_read ,
. llseek = seq_lseek ,
. release = seq_release_private ,
} ;
2023-06-11 15:00:29 +02:00
static const struct file_operations ftrace_avail_addrs_fops = {
. open = ftrace_avail_addrs_open ,
. read = seq_read ,
. llseek = seq_lseek ,
. release = seq_release_private ,
} ;
2009-03-05 21:44:55 -05:00
static const struct file_operations ftrace_filter_fops = {
2008-05-12 21:20:43 +02:00
. open = ftrace_filter_open ,
2009-03-13 17:47:23 +08:00
. read = seq_read ,
2008-05-12 21:20:43 +02:00
. write = ftrace_filter_write ,
2013-12-21 17:39:40 -05:00
. llseek = tracing_lseek ,
2011-04-29 20:59:51 -04:00
. release = ftrace_regex_release ,
2008-05-12 21:20:43 +02:00
} ;
2009-03-05 21:44:55 -05:00
static const struct file_operations ftrace_notrace_fops = {
2008-05-22 11:46:33 -04:00
. open = ftrace_notrace_open ,
2009-03-13 17:47:23 +08:00
. read = seq_read ,
2008-05-22 11:46:33 -04:00
. write = ftrace_notrace_write ,
2013-12-21 17:39:40 -05:00
. llseek = tracing_lseek ,
2011-04-29 20:59:51 -04:00
. release = ftrace_regex_release ,
2008-05-22 11:46:33 -04:00
} ;
2008-12-03 15:36:57 -05:00
# ifdef CONFIG_FUNCTION_GRAPH_TRACER
static DEFINE_MUTEX ( graph_lock ) ;
2020-02-01 12:57:04 +05:30
struct ftrace_hash __rcu * ftrace_graph_hash = EMPTY_HASH ;
2020-02-05 11:27:02 +05:30
struct ftrace_hash __rcu * ftrace_graph_notrace_hash = EMPTY_HASH ;
2017-01-20 11:44:47 +09:00
enum graph_filter_type {
GRAPH_FILTER_NOTRACE = 0 ,
GRAPH_FILTER_FUNCTION ,
} ;
2008-12-03 15:36:57 -05:00
2017-02-02 10:15:22 -05:00
# define FTRACE_GRAPH_EMPTY ((void *)1)
2013-10-14 17:24:24 +09:00
struct ftrace_graph_data {
2017-02-02 20:34:37 -05:00
struct ftrace_hash * hash ;
struct ftrace_func_entry * entry ;
int idx ; /* for hash table iteration */
enum graph_filter_type type ;
struct ftrace_hash * new_hash ;
const struct seq_operations * seq_ops ;
struct trace_parser parser ;
2013-10-14 17:24:24 +09:00
} ;
2008-12-03 15:36:57 -05:00
static void *
2009-06-24 09:54:00 +08:00
__g_next ( struct seq_file * m , loff_t * pos )
2008-12-03 15:36:57 -05:00
{
2013-10-14 17:24:24 +09:00
struct ftrace_graph_data * fgd = m - > private ;
2017-01-20 11:44:47 +09:00
struct ftrace_func_entry * entry = fgd - > entry ;
struct hlist_head * head ;
int i , idx = fgd - > idx ;
2013-10-14 17:24:24 +09:00
2017-01-20 11:44:47 +09:00
if ( * pos > = fgd - > hash - > count )
2008-12-03 15:36:57 -05:00
return NULL ;
2017-01-20 11:44:47 +09:00
if ( entry ) {
hlist_for_each_entry_continue ( entry , hlist ) {
fgd - > entry = entry ;
return entry ;
}
idx + + ;
}
for ( i = idx ; i < 1 < < fgd - > hash - > size_bits ; i + + ) {
head = & fgd - > hash - > buckets [ i ] ;
hlist_for_each_entry ( entry , head , hlist ) {
fgd - > entry = entry ;
fgd - > idx = i ;
return entry ;
}
}
return NULL ;
2009-06-24 09:54:00 +08:00
}
2008-12-03 15:36:57 -05:00
2009-06-24 09:54:00 +08:00
static void *
g_next ( struct seq_file * m , void * v , loff_t * pos )
{
( * pos ) + + ;
return __g_next ( m , pos ) ;
2008-12-03 15:36:57 -05:00
}
static void * g_start ( struct seq_file * m , loff_t * pos )
{
2013-10-14 17:24:24 +09:00
struct ftrace_graph_data * fgd = m - > private ;
2008-12-03 15:36:57 -05:00
mutex_lock ( & graph_lock ) ;
2017-02-02 20:16:29 -05:00
if ( fgd - > type = = GRAPH_FILTER_FUNCTION )
fgd - > hash = rcu_dereference_protected ( ftrace_graph_hash ,
lockdep_is_held ( & graph_lock ) ) ;
else
fgd - > hash = rcu_dereference_protected ( ftrace_graph_notrace_hash ,
lockdep_is_held ( & graph_lock ) ) ;
2009-02-19 21:13:12 +01:00
/* Nothing, tell g_show to print all functions are enabled */
2017-01-20 11:44:47 +09:00
if ( ftrace_hash_empty ( fgd - > hash ) & & ! * pos )
2017-02-02 10:15:22 -05:00
return FTRACE_GRAPH_EMPTY ;
2009-02-19 21:13:12 +01:00
2017-01-20 11:44:47 +09:00
fgd - > idx = 0 ;
fgd - > entry = NULL ;
2009-06-24 09:54:00 +08:00
return __g_next ( m , pos ) ;
2008-12-03 15:36:57 -05:00
}
static void g_stop ( struct seq_file * m , void * p )
{
mutex_unlock ( & graph_lock ) ;
}
static int g_show ( struct seq_file * m , void * v )
{
2017-01-20 11:44:47 +09:00
struct ftrace_func_entry * entry = v ;
2008-12-03 15:36:57 -05:00
2017-01-20 11:44:47 +09:00
if ( ! entry )
2008-12-03 15:36:57 -05:00
return 0 ;
2017-02-02 10:15:22 -05:00
if ( entry = = FTRACE_GRAPH_EMPTY ) {
2014-06-13 01:23:51 +09:00
struct ftrace_graph_data * fgd = m - > private ;
2017-01-20 11:44:47 +09:00
if ( fgd - > type = = GRAPH_FILTER_FUNCTION )
2014-11-08 21:42:10 +01:00
seq_puts ( m , " #### all functions enabled #### \n " ) ;
2014-06-13 01:23:51 +09:00
else
2014-11-08 21:42:10 +01:00
seq_puts ( m , " #### no functions disabled #### \n " ) ;
2009-02-19 21:13:12 +01:00
return 0 ;
}
2017-01-20 11:44:47 +09:00
seq_printf ( m , " %ps \n " , ( void * ) entry - > ip ) ;
2008-12-03 15:36:57 -05:00
return 0 ;
}
2009-09-22 16:43:43 -07:00
static const struct seq_operations ftrace_graph_seq_ops = {
2008-12-03 15:36:57 -05:00
. start = g_start ,
. next = g_next ,
. stop = g_stop ,
. show = g_show ,
} ;
static int
2013-10-14 17:24:24 +09:00
__ftrace_graph_open ( struct inode * inode , struct file * file ,
struct ftrace_graph_data * fgd )
2008-12-03 15:36:57 -05:00
{
2019-10-11 17:22:50 -04:00
int ret ;
2017-01-20 11:44:47 +09:00
struct ftrace_hash * new_hash = NULL ;
2008-12-03 15:36:57 -05:00
2019-10-11 17:22:50 -04:00
ret = security_locked_down ( LOCKDOWN_TRACEFS ) ;
if ( ret )
return ret ;
2017-01-20 11:44:47 +09:00
if ( file - > f_mode & FMODE_WRITE ) {
const int size_bits = FTRACE_HASH_DEFAULT_BITS ;
2017-02-02 20:34:37 -05:00
if ( trace_parser_get_init ( & fgd - > parser , FTRACE_BUFF_MAX ) )
return - ENOMEM ;
2017-01-20 11:44:47 +09:00
if ( file - > f_flags & O_TRUNC )
new_hash = alloc_ftrace_hash ( size_bits ) ;
else
new_hash = alloc_and_copy_ftrace_hash ( size_bits ,
fgd - > hash ) ;
if ( ! new_hash ) {
ret = - ENOMEM ;
goto out ;
}
2008-12-03 15:36:57 -05:00
}
2013-10-14 17:24:24 +09:00
if ( file - > f_mode & FMODE_READ ) {
2017-01-20 11:44:47 +09:00
ret = seq_open ( file , & ftrace_graph_seq_ops ) ;
2013-10-14 17:24:24 +09:00
if ( ! ret ) {
struct seq_file * m = file - > private_data ;
m - > private = fgd ;
2017-01-20 11:44:47 +09:00
} else {
/* Failed */
free_ftrace_hash ( new_hash ) ;
new_hash = NULL ;
2013-10-14 17:24:24 +09:00
}
} else
file - > private_data = fgd ;
2008-12-03 15:36:57 -05:00
2017-01-20 11:44:47 +09:00
out :
2017-02-02 20:34:37 -05:00
if ( ret < 0 & & file - > f_mode & FMODE_WRITE )
trace_parser_put ( & fgd - > parser ) ;
2017-01-20 11:44:47 +09:00
fgd - > new_hash = new_hash ;
2017-02-02 20:16:29 -05:00
/*
* All uses of fgd - > hash must be taken with the graph_lock
* held . The graph_lock is going to be released , so force
* fgd - > hash to be reinitialized when it is taken again .
*/
fgd - > hash = NULL ;
2008-12-03 15:36:57 -05:00
return ret ;
}
2013-10-14 17:24:24 +09:00
static int
ftrace_graph_open ( struct inode * inode , struct file * file )
{
struct ftrace_graph_data * fgd ;
2017-01-20 11:44:47 +09:00
int ret ;
2013-10-14 17:24:24 +09:00
if ( unlikely ( ftrace_disabled ) )
return - ENODEV ;
fgd = kmalloc ( sizeof ( * fgd ) , GFP_KERNEL ) ;
if ( fgd = = NULL )
return - ENOMEM ;
2017-01-20 11:44:47 +09:00
mutex_lock ( & graph_lock ) ;
2017-02-02 20:16:29 -05:00
fgd - > hash = rcu_dereference_protected ( ftrace_graph_hash ,
lockdep_is_held ( & graph_lock ) ) ;
2017-01-20 11:44:47 +09:00
fgd - > type = GRAPH_FILTER_FUNCTION ;
2013-10-14 17:24:24 +09:00
fgd - > seq_ops = & ftrace_graph_seq_ops ;
2017-01-20 11:44:47 +09:00
ret = __ftrace_graph_open ( inode , file , fgd ) ;
if ( ret < 0 )
kfree ( fgd ) ;
mutex_unlock ( & graph_lock ) ;
return ret ;
2013-10-14 17:24:24 +09:00
}
2013-10-14 17:24:26 +09:00
static int
ftrace_graph_notrace_open ( struct inode * inode , struct file * file )
{
struct ftrace_graph_data * fgd ;
2017-01-20 11:44:47 +09:00
int ret ;
2013-10-14 17:24:26 +09:00
if ( unlikely ( ftrace_disabled ) )
return - ENODEV ;
fgd = kmalloc ( sizeof ( * fgd ) , GFP_KERNEL ) ;
if ( fgd = = NULL )
return - ENOMEM ;
2017-01-20 11:44:47 +09:00
mutex_lock ( & graph_lock ) ;
2017-02-02 20:16:29 -05:00
fgd - > hash = rcu_dereference_protected ( ftrace_graph_notrace_hash ,
lockdep_is_held ( & graph_lock ) ) ;
2017-01-20 11:44:47 +09:00
fgd - > type = GRAPH_FILTER_NOTRACE ;
2013-10-14 17:24:26 +09:00
fgd - > seq_ops = & ftrace_graph_seq_ops ;
2017-01-20 11:44:47 +09:00
ret = __ftrace_graph_open ( inode , file , fgd ) ;
if ( ret < 0 )
kfree ( fgd ) ;
mutex_unlock ( & graph_lock ) ;
return ret ;
2013-10-14 17:24:26 +09:00
}
2009-07-23 11:29:11 +08:00
static int
ftrace_graph_release ( struct inode * inode , struct file * file )
{
2017-01-20 11:44:47 +09:00
struct ftrace_graph_data * fgd ;
2017-02-02 20:34:37 -05:00
struct ftrace_hash * old_hash , * new_hash ;
struct trace_parser * parser ;
int ret = 0 ;
2017-01-20 11:44:47 +09:00
2013-10-14 17:24:24 +09:00
if ( file - > f_mode & FMODE_READ ) {
struct seq_file * m = file - > private_data ;
2017-01-20 11:44:47 +09:00
fgd = m - > private ;
2009-07-23 11:29:11 +08:00
seq_release ( inode , file ) ;
2013-10-14 17:24:24 +09:00
} else {
2017-01-20 11:44:47 +09:00
fgd = file - > private_data ;
2013-10-14 17:24:24 +09:00
}
2017-02-02 20:34:37 -05:00
if ( file - > f_mode & FMODE_WRITE ) {
parser = & fgd - > parser ;
if ( trace_parser_loaded ( ( parser ) ) ) {
ret = ftrace_graph_set_hash ( fgd - > new_hash ,
parser - > buffer ) ;
}
trace_parser_put ( parser ) ;
new_hash = __ftrace_hash_move ( fgd - > new_hash ) ;
if ( ! new_hash ) {
ret = - ENOMEM ;
goto out ;
}
mutex_lock ( & graph_lock ) ;
if ( fgd - > type = = GRAPH_FILTER_FUNCTION ) {
old_hash = rcu_dereference_protected ( ftrace_graph_hash ,
lockdep_is_held ( & graph_lock ) ) ;
rcu_assign_pointer ( ftrace_graph_hash , new_hash ) ;
} else {
old_hash = rcu_dereference_protected ( ftrace_graph_notrace_hash ,
lockdep_is_held ( & graph_lock ) ) ;
rcu_assign_pointer ( ftrace_graph_notrace_hash , new_hash ) ;
}
mutex_unlock ( & graph_lock ) ;
2020-02-05 09:20:32 -05:00
/*
* We need to do a hard force of sched synchronization .
* This is because we use preempt_disable ( ) to do RCU , but
* the function tracers can be called where RCU is not watching
* ( like before user_exit ( ) ) . We can not rely on the RCU
* infrastructure to do the synchronization , thus we must do it
* ourselves .
*/
2021-07-21 13:47:26 +02:00
if ( old_hash ! = EMPTY_HASH )
synchronize_rcu_tasks_rude ( ) ;
2017-02-02 20:34:37 -05:00
free_ftrace_hash ( old_hash ) ;
}
out :
2017-05-25 16:20:38 +01:00
free_ftrace_hash ( fgd - > new_hash ) ;
2017-01-20 11:44:47 +09:00
kfree ( fgd ) ;
2017-02-02 20:34:37 -05:00
return ret ;
2009-07-23 11:29:11 +08:00
}
2008-12-03 15:36:57 -05:00
static int
2017-01-20 11:44:47 +09:00
ftrace_graph_set_hash ( struct ftrace_hash * hash , char * buffer )
2008-12-03 15:36:57 -05:00
{
2015-09-29 19:46:14 +03:00
struct ftrace_glob func_g ;
2008-12-03 15:36:57 -05:00
struct dyn_ftrace * rec ;
struct ftrace_page * pg ;
2017-01-20 11:44:47 +09:00
struct ftrace_func_entry * entry ;
2010-02-10 15:43:04 +08:00
int fail = 1 ;
2015-09-29 19:46:14 +03:00
int not ;
2008-12-03 15:36:57 -05:00
2009-02-19 21:13:12 +01:00
/* decode regex */
2015-09-29 19:46:14 +03:00
func_g . type = filter_parse_regex ( buffer , strlen ( buffer ) ,
& func_g . search , & not ) ;
2009-02-19 21:13:12 +01:00
2015-09-29 19:46:14 +03:00
func_g . len = strlen ( func_g . search ) ;
2009-02-19 21:13:12 +01:00
2009-02-14 01:15:39 -05:00
mutex_lock ( & ftrace_lock ) ;
2011-04-21 23:16:46 -04:00
if ( unlikely ( ftrace_disabled ) ) {
mutex_unlock ( & ftrace_lock ) ;
return - ENODEV ;
}
2009-02-13 12:43:56 -05:00
do_for_each_ftrace_rec ( pg , rec ) {
2016-11-14 16:31:49 -05:00
if ( rec - > flags & FTRACE_FL_DISABLED )
continue ;
2015-09-29 19:46:15 +03:00
if ( ftrace_match_record ( rec , & func_g , NULL , 0 ) ) {
2017-01-20 11:44:47 +09:00
entry = ftrace_lookup_ip ( hash , rec - > ip ) ;
2010-02-10 15:43:04 +08:00
if ( ! not ) {
fail = 0 ;
2017-01-20 11:44:47 +09:00
if ( entry )
continue ;
ftrace: Fix modification of direct_function hash while in use
Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
if (!new)
goto out_remove;
entry->direct = addr;
}
}
Where ftrace_add_rec_direct() has:
if (ftrace_hash_empty(direct_functions) ||
direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
struct ftrace_hash *new_hash;
int size = ftrace_hash_empty(direct_functions) ? 0 :
direct_functions->count + 1;
if (size < 32)
size = 32;
new_hash = dup_hash(direct_functions, size);
if (!new_hash)
return NULL;
*free_hash = direct_functions;
direct_functions = new_hash;
}
The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.
But this also exposed a more serious bug.
The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:
new_hash = dup_hash(direct_functions, size);
and
direct_functions = new_hash;
can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.
That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.
Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.
The following is done:
1. Change add_hash_entry() to return the entry it created and inserted
into the hash, and not just return success or not.
2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
the former.
3. Allocate a "new_hash" at the start that is made for holding both the
new hash entries as well as the existing entries in direct_functions.
4. Copy (not move) the direct_function entries over to the new_hash.
5. Copy the entries of the added hash to the new_hash.
6. If everything succeeds, then use rcu_pointer_assign() to update the
direct_functions with the new_hash.
This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.
Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e74bb7d ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29 11:51:34 -05:00
if ( add_hash_entry ( hash , rec - > ip ) = = NULL )
2017-01-20 11:44:47 +09:00
goto out ;
2010-02-10 15:43:04 +08:00
} else {
2017-01-20 11:44:47 +09:00
if ( entry ) {
free_hash_entry ( hash , entry ) ;
2010-02-10 15:43:04 +08:00
fail = 0 ;
}
}
2008-12-03 15:36:57 -05:00
}
2009-02-13 12:43:56 -05:00
} while_for_each_ftrace_rec ( ) ;
2010-02-10 15:43:04 +08:00
out :
2009-02-14 01:15:39 -05:00
mutex_unlock ( & ftrace_lock ) ;
2008-12-03 15:36:57 -05:00
2010-02-10 15:43:04 +08:00
if ( fail )
return - EINVAL ;
return 0 ;
2008-12-03 15:36:57 -05:00
}
static ssize_t
ftrace_graph_write ( struct file * file , const char __user * ubuf ,
size_t cnt , loff_t * ppos )
{
2013-10-14 17:24:25 +09:00
ssize_t read , ret = 0 ;
2013-10-14 17:24:24 +09:00
struct ftrace_graph_data * fgd = file - > private_data ;
2017-02-02 20:34:37 -05:00
struct trace_parser * parser ;
2008-12-03 15:36:57 -05:00
2010-02-10 15:43:04 +08:00
if ( ! cnt )
2008-12-03 15:36:57 -05:00
return 0 ;
2017-02-02 16:59:06 -05:00
/* Read mode uses seq functions */
if ( file - > f_mode & FMODE_READ ) {
struct seq_file * m = file - > private_data ;
fgd = m - > private ;
}
2017-02-02 20:34:37 -05:00
parser = & fgd - > parser ;
2008-12-03 15:36:57 -05:00
2017-02-02 20:34:37 -05:00
read = trace_get_user ( parser , ubuf , cnt , ppos ) ;
2009-09-11 17:29:29 +02:00
2017-02-02 20:34:37 -05:00
if ( read > = 0 & & trace_parser_loaded ( parser ) & &
! trace_parser_cont ( parser ) ) {
2013-10-14 17:24:25 +09:00
2017-01-20 11:44:47 +09:00
ret = ftrace_graph_set_hash ( fgd - > new_hash ,
2017-02-02 20:34:37 -05:00
parser - > buffer ) ;
trace_parser_clear ( parser ) ;
2008-12-03 15:36:57 -05:00
}
2013-10-14 17:24:25 +09:00
if ( ! ret )
ret = read ;
2009-09-22 13:52:57 +08:00
2008-12-03 15:36:57 -05:00
return ret ;
}
static const struct file_operations ftrace_graph_fops = {
2009-07-23 11:29:11 +08:00
. open = ftrace_graph_open ,
. read = seq_read ,
. write = ftrace_graph_write ,
2013-12-21 17:39:40 -05:00
. llseek = tracing_lseek ,
2009-07-23 11:29:11 +08:00
. release = ftrace_graph_release ,
2008-12-03 15:36:57 -05:00
} ;
2013-10-14 17:24:26 +09:00
static const struct file_operations ftrace_graph_notrace_fops = {
. open = ftrace_graph_notrace_open ,
. read = seq_read ,
. write = ftrace_graph_write ,
2013-12-21 17:39:40 -05:00
. llseek = tracing_lseek ,
2013-10-14 17:24:26 +09:00
. release = ftrace_graph_release ,
} ;
2008-12-03 15:36:57 -05:00
# endif /* CONFIG_FUNCTION_GRAPH_TRACER */
2014-01-10 16:17:45 -05:00
void ftrace_create_filter_files ( struct ftrace_ops * ops ,
struct dentry * parent )
{
2021-08-18 11:24:51 -04:00
trace_create_file ( " set_ftrace_filter " , TRACE_MODE_WRITE , parent ,
2014-01-10 16:17:45 -05:00
ops , & ftrace_filter_fops ) ;
2021-08-18 11:24:51 -04:00
trace_create_file ( " set_ftrace_notrace " , TRACE_MODE_WRITE , parent ,
2014-01-10 16:17:45 -05:00
ops , & ftrace_notrace_fops ) ;
}
/*
* The name " destroy_filter_files " is really a misnomer . Although
2019-03-24 00:05:23 +05:30
* in the future , it may actually delete the files , but this is
2014-01-10 16:17:45 -05:00
* really intended to make sure the ops passed in are disabled
* and that when this function returns , the caller is free to
* free the ops .
*
* The " destroy " name is only to match the " create " name that this
* should be paired with .
*/
void ftrace_destroy_filter_files ( struct ftrace_ops * ops )
{
mutex_lock ( & ftrace_lock ) ;
if ( ops - > flags & FTRACE_OPS_FL_ENABLED )
ftrace_shutdown ( ops , 0 ) ;
ops - > flags | = FTRACE_OPS_FL_DELETED ;
2018-12-10 23:58:01 -05:00
ftrace_free_filter ( ops ) ;
2014-01-10 16:17:45 -05:00
mutex_unlock ( & ftrace_lock ) ;
}
2015-01-20 12:13:40 -05:00
static __init int ftrace_init_dyn_tracefs ( struct dentry * d_tracer )
2008-05-12 21:20:43 +02:00
{
2021-08-18 11:24:51 -04:00
trace_create_file ( " available_filter_functions " , TRACE_MODE_READ ,
2009-03-27 00:25:38 +01:00
d_tracer , NULL , & ftrace_avail_fops ) ;
2008-05-12 21:20:43 +02:00
2023-06-11 15:00:29 +02:00
trace_create_file ( " available_filter_functions_addrs " , TRACE_MODE_READ ,
d_tracer , NULL , & ftrace_avail_addrs_fops ) ;
2021-08-18 11:24:51 -04:00
trace_create_file ( " enabled_functions " , TRACE_MODE_READ ,
2011-05-03 14:39:21 -04:00
d_tracer , NULL , & ftrace_enabled_fops ) ;
2023-01-24 09:56:53 -05:00
trace_create_file ( " touched_functions " , TRACE_MODE_READ ,
d_tracer , NULL , & ftrace_touched_fops ) ;
2014-01-10 16:17:45 -05:00
ftrace_create_filter_files ( & global_ops , d_tracer ) ;
ftrace: user update and disable dynamic ftrace daemon
In dynamic ftrace, the mcount function starts off pointing to a stub
function that just returns.
On start up, the call to the stub is modified to point to a "record_ip"
function. The job of the record_ip function is to add the function to
a pre-allocated hash list. If the function is already there, it simply is
ignored, otherwise it is added to the list.
Later, a ftraced daemon wakes up and calls kstop_machine if any functions
have been recorded, and changes the calls to the recorded functions to
a simple nop. If no functions were recorded, the daemon goes back to sleep.
The daemon wakes up once a second to see if it needs to update any newly
recorded functions into nops. Usually it does not, but if a lot of code
has been executed for the first time in the kernel, the ftraced daemon
will call kstop_machine to update those into nops.
The problem currently is that there's no way to stop the daemon from doing
this, and it can cause unneeded latencies (800us which for some is bothersome).
This patch adds a new file /debugfs/tracing/ftraced_enabled. If the daemon
is active, reading this will return "enabled\n" and "disabled\n" when the
daemon is not running. To disable the daemon, the user can echo "0" or
"disable" into this file, and "1" or "enable" to re-enable the daemon.
Since the daemon is used to convert the functions into nops to increase
the performance of the system, I also added that anytime something is
written into the ftraced_enabled file, kstop_machine will run if there
are new functions that have been detected that need to be converted.
This way the user can disable the daemon but still be able to control the
conversion of the mcount calls to nops by simply,
"echo 0 > /debugfs/tracing/ftraced_enabled"
when they need to do more conversions.
To see the number of converted functions:
"cat /debugfs/tracing/dyn_ftrace_total_info"
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-27 20:48:37 -04:00
2008-12-03 15:36:57 -05:00
# ifdef CONFIG_FUNCTION_GRAPH_TRACER
2021-08-18 11:24:51 -04:00
trace_create_file ( " set_graph_function " , TRACE_MODE_WRITE , d_tracer ,
2008-12-03 15:36:57 -05:00
NULL ,
& ftrace_graph_fops ) ;
2021-08-18 11:24:51 -04:00
trace_create_file ( " set_graph_notrace " , TRACE_MODE_WRITE , d_tracer ,
2013-10-14 17:24:26 +09:00
NULL ,
& ftrace_graph_notrace_fops ) ;
2008-12-03 15:36:57 -05:00
# endif /* CONFIG_FUNCTION_GRAPH_TRACER */
2008-05-12 21:20:43 +02:00
return 0 ;
}
2012-04-24 22:32:06 -04:00
static int ftrace_cmp_ips ( const void * a , const void * b )
2011-12-16 17:06:45 -05:00
{
2012-04-24 22:32:06 -04:00
const unsigned long * ipa = a ;
const unsigned long * ipb = b ;
2011-12-16 17:06:45 -05:00
2012-04-24 22:32:06 -04:00
if ( * ipa > * ipb )
return 1 ;
if ( * ipa < * ipb )
return - 1 ;
return 0 ;
}
2021-12-06 15:18:58 -05:00
# ifdef CONFIG_FTRACE_SORT_STARTUP_TEST
static void test_is_sorted ( unsigned long * start , unsigned long count )
{
int i ;
for ( i = 1 ; i < count ; i + + ) {
if ( WARN ( start [ i - 1 ] > start [ i ] ,
" [%d] %pS at %lx is not sorted with %pS at %lx \n " , i ,
( void * ) start [ i - 1 ] , start [ i - 1 ] ,
( void * ) start [ i ] , start [ i ] ) )
break ;
}
if ( i = = count )
pr_info ( " ftrace section at %px sorted properly \n " , start ) ;
}
# else
static void test_is_sorted ( unsigned long * start , unsigned long count )
{
}
# endif
2009-10-13 16:33:53 -04:00
static int ftrace_process_locs ( struct module * mod ,
2008-11-14 16:21:19 -08:00
unsigned long * start ,
2008-08-14 15:45:08 -04:00
unsigned long * end )
{
2023-07-12 14:04:52 +08:00
struct ftrace_page * pg_unuse = NULL ;
2012-04-24 23:45:26 -04:00
struct ftrace_page * start_pg ;
2011-12-16 16:23:44 -05:00
struct ftrace_page * pg ;
2012-04-24 23:45:26 -04:00
struct dyn_ftrace * rec ;
2023-07-12 14:04:52 +08:00
unsigned long skipped = 0 ;
2011-12-16 16:23:44 -05:00
unsigned long count ;
2008-08-14 15:45:08 -04:00
unsigned long * p ;
unsigned long addr ;
2011-06-24 23:28:13 -04:00
unsigned long flags = 0 ; /* Shut up gcc */
2011-12-16 16:23:44 -05:00
int ret = - ENOMEM ;
count = end - start ;
if ( ! count )
return 0 ;
2021-12-12 19:33:58 +08:00
/*
* Sorting mcount in vmlinux at build time depend on
2022-01-22 09:17:10 -05:00
* CONFIG_BUILDTIME_MCOUNT_SORT , while mcount loc in
2021-12-12 19:33:58 +08:00
* modules can not be sorted at build time .
*/
2022-01-22 09:17:10 -05:00
if ( ! IS_ENABLED ( CONFIG_BUILDTIME_MCOUNT_SORT ) | | mod ) {
2021-12-12 19:33:58 +08:00
sort ( start , count , sizeof ( * start ) ,
ftrace_cmp_ips , NULL ) ;
2021-12-06 15:18:58 -05:00
} else {
test_is_sorted ( start , count ) ;
2021-12-12 19:33:58 +08:00
}
2012-04-24 22:32:06 -04:00
2012-04-24 23:45:26 -04:00
start_pg = ftrace_allocate_pages ( count ) ;
if ( ! start_pg )
2011-12-16 16:23:44 -05:00
return - ENOMEM ;
2008-08-14 15:45:08 -04:00
2009-02-14 01:42:44 -05:00
mutex_lock ( & ftrace_lock ) ;
2011-12-16 16:23:44 -05:00
2011-12-16 14:42:37 -05:00
/*
* Core and each module needs their own pages , as
* modules will free them when they are removed .
* Force a new page to be allocated for modules .
*/
2011-12-16 16:23:44 -05:00
if ( ! mod ) {
WARN_ON ( ftrace_pages | | ftrace_pages_start ) ;
/* First initialization */
2012-04-24 23:45:26 -04:00
ftrace_pages = ftrace_pages_start = start_pg ;
2011-12-16 16:23:44 -05:00
} else {
2011-12-16 14:42:37 -05:00
if ( ! ftrace_pages )
2011-12-16 16:23:44 -05:00
goto out ;
2011-12-16 14:42:37 -05:00
2011-12-16 16:23:44 -05:00
if ( WARN_ON ( ftrace_pages - > next ) ) {
/* Hmm, we have free pages? */
while ( ftrace_pages - > next )
ftrace_pages = ftrace_pages - > next ;
2011-12-16 14:42:37 -05:00
}
2011-12-16 16:23:44 -05:00
2012-04-24 23:45:26 -04:00
ftrace_pages - > next = start_pg ;
2011-12-16 14:42:37 -05:00
}
2008-08-14 15:45:08 -04:00
p = start ;
2012-04-24 23:45:26 -04:00
pg = start_pg ;
2008-08-14 15:45:08 -04:00
while ( p < end ) {
2021-04-01 16:14:17 -04:00
unsigned long end_offset ;
2008-08-14 15:45:08 -04:00
addr = ftrace_call_adjust ( * p + + ) ;
2008-11-14 16:21:19 -08:00
/*
* Some architecture linkers will pad between
* the different mcount_loc sections of different
* object files to satisfy alignments .
* Skip any NULL pointers .
*/
2023-07-12 14:04:52 +08:00
if ( ! addr ) {
skipped + + ;
2008-11-14 16:21:19 -08:00
continue ;
2023-07-12 14:04:52 +08:00
}
2012-04-24 23:45:26 -04:00
2021-04-01 16:14:17 -04:00
end_offset = ( pg - > index + 1 ) * sizeof ( pg - > records [ 0 ] ) ;
if ( end_offset > PAGE_SIZE < < pg - > order ) {
2012-04-24 23:45:26 -04:00
/* We should have allocated enough */
if ( WARN_ON ( ! pg - > next ) )
break ;
pg = pg - > next ;
}
rec = & pg - > records [ pg - > index + + ] ;
rec - > ip = addr ;
2008-08-14 15:45:08 -04:00
}
2023-07-12 14:04:52 +08:00
if ( pg - > next ) {
pg_unuse = pg - > next ;
pg - > next = NULL ;
}
2012-04-24 23:45:26 -04:00
/* Assign the last page to ftrace_pages */
ftrace_pages = pg ;
2011-06-07 09:26:46 -04:00
/*
2011-06-24 23:28:13 -04:00
* We only need to disable interrupts on start up
* because we are modifying code that an interrupt
* may execute , and the modification is not atomic .
* But for modules , nothing runs the code we modify
* until we are finished with it , and there ' s no
* reason to cause large interrupt latencies while we do it .
2011-06-07 09:26:46 -04:00
*/
2011-06-24 23:28:13 -04:00
if ( ! mod )
local_irq_save ( flags ) ;
2014-02-24 19:59:56 +01:00
ftrace_update_code ( mod , start_pg ) ;
2011-06-24 23:28:13 -04:00
if ( ! mod )
local_irq_restore ( flags ) ;
2011-12-16 16:23:44 -05:00
ret = 0 ;
out :
2009-02-14 01:42:44 -05:00
mutex_unlock ( & ftrace_lock ) ;
2008-08-14 15:45:08 -04:00
2023-07-12 14:04:52 +08:00
/* We should have used all pages unless we skipped some */
if ( pg_unuse ) {
WARN_ON ( ! skipped ) ;
ftrace_free_pages ( pg_unuse ) ;
}
2011-12-16 16:23:44 -05:00
return ret ;
2008-08-14 15:45:08 -04:00
}
2017-09-01 08:35:38 -04:00
struct ftrace_mod_func {
struct list_head list ;
char * name ;
unsigned long ip ;
unsigned int size ;
} ;
struct ftrace_mod_map {
2017-09-05 19:20:16 -04:00
struct rcu_head rcu ;
2017-09-01 08:35:38 -04:00
struct list_head list ;
struct module * mod ;
unsigned long start_addr ;
unsigned long end_addr ;
struct list_head funcs ;
2017-09-06 08:40:41 -04:00
unsigned int num_funcs ;
2017-09-01 08:35:38 -04:00
} ;
2020-05-12 15:19:13 +03:00
static int ftrace_get_trampoline_kallsym ( unsigned int symnum ,
unsigned long * value , char * type ,
char * name , char * module_name ,
int * exported )
{
struct ftrace_ops * op ;
list_for_each_entry_rcu ( op , & ftrace_ops_trampoline_list , list ) {
if ( ! op - > trampoline | | symnum - - )
continue ;
* value = op - > trampoline ;
* type = ' t ' ;
2023-05-17 14:53:23 +00:00
strscpy ( name , FTRACE_TRAMPOLINE_SYM , KSYM_NAME_LEN ) ;
strscpy ( module_name , FTRACE_TRAMPOLINE_MOD , MODULE_NAME_LEN ) ;
2020-05-12 15:19:13 +03:00
* exported = 0 ;
return 0 ;
}
return - ERANGE ;
}
2022-08-01 16:47:45 +08:00
# if defined(CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS) || defined(CONFIG_MODULES)
/*
* Check if the current ops references the given ip .
*
* If the ops traces all functions , then it was already accounted for .
* If the ops does not trace the current record function , skip it .
* If the ops ignores the function via notrace filter , skip it .
*/
static bool
ops_references_ip ( struct ftrace_ops * ops , unsigned long ip )
{
/* If ops isn't enabled, ignore it */
if ( ! ( ops - > flags & FTRACE_OPS_FL_ENABLED ) )
return false ;
/* If ops traces all then it includes this function */
if ( ops_traces_mod ( ops ) )
return true ;
/* The function must be in the filter */
if ( ! ftrace_hash_empty ( ops - > func_hash - > filter_hash ) & &
! __ftrace_lookup_ip ( ops - > func_hash - > filter_hash , ip ) )
return false ;
/* If in notrace hash, we ignore it too */
if ( ftrace_lookup_ip ( ops - > func_hash - > notrace_hash , ip ) )
return false ;
return true ;
}
# endif
2009-04-15 13:24:06 -04:00
# ifdef CONFIG_MODULES
2011-12-16 14:42:37 -05:00
# define next_to_ftrace_page(p) container_of(p, struct ftrace_page, next)
2017-09-05 19:20:16 -04:00
static LIST_HEAD ( ftrace_mod_maps ) ;
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
static int referenced_filters ( struct dyn_ftrace * rec )
{
struct ftrace_ops * ops ;
int cnt = 0 ;
for ( ops = ftrace_ops_list ; ops ! = & ftrace_list_end ; ops = ops - > next ) {
2022-08-01 16:47:45 +08:00
if ( ops_references_ip ( ops , rec - > ip ) ) {
2020-07-29 02:05:54 +08:00
if ( WARN_ON_ONCE ( ops - > flags & FTRACE_OPS_FL_DIRECT ) )
continue ;
if ( WARN_ON_ONCE ( ops - > flags & FTRACE_OPS_FL_IPMODIFY ) )
continue ;
2020-07-29 02:05:53 +08:00
cnt + + ;
if ( ops - > flags & FTRACE_OPS_FL_SAVE_REGS )
rec - > flags | = FTRACE_FL_REGS ;
2020-07-29 02:05:54 +08:00
if ( cnt = = 1 & & ops - > trampoline )
rec - > flags | = FTRACE_FL_TRAMP ;
else
rec - > flags & = ~ FTRACE_FL_TRAMP ;
2020-07-29 02:05:53 +08:00
}
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
}
return cnt ;
}
2017-08-31 17:36:51 -04:00
static void
clear_mod_from_hash ( struct ftrace_page * pg , struct ftrace_hash * hash )
{
struct ftrace_func_entry * entry ;
struct dyn_ftrace * rec ;
int i ;
if ( ftrace_hash_empty ( hash ) )
return ;
for ( i = 0 ; i < pg - > index ; i + + ) {
rec = & pg - > records [ i ] ;
entry = __ftrace_lookup_ip ( hash , rec - > ip ) ;
/*
* Do not allow this rec to match again .
* Yeah , it may waste some memory , but will be removed
* if / when the hash is modified again .
*/
if ( entry )
entry - > ip = 0 ;
}
}
2021-03-23 18:49:35 +01:00
/* Clear any records from hashes */
2017-08-31 17:36:51 -04:00
static void clear_mod_from_hashes ( struct ftrace_page * pg )
{
struct trace_array * tr ;
mutex_lock ( & trace_types_lock ) ;
list_for_each_entry ( tr , & ftrace_trace_arrays , list ) {
if ( ! tr - > ops | | ! tr - > ops - > func_hash )
continue ;
mutex_lock ( & tr - > ops - > func_hash - > regex_lock ) ;
clear_mod_from_hash ( pg , tr - > ops - > func_hash - > filter_hash ) ;
clear_mod_from_hash ( pg , tr - > ops - > func_hash - > notrace_hash ) ;
mutex_unlock ( & tr - > ops - > func_hash - > regex_lock ) ;
}
mutex_unlock ( & trace_types_lock ) ;
}
2017-09-05 19:20:16 -04:00
static void ftrace_free_mod_map ( struct rcu_head * rcu )
{
struct ftrace_mod_map * mod_map = container_of ( rcu , struct ftrace_mod_map , rcu ) ;
struct ftrace_mod_func * mod_func ;
struct ftrace_mod_func * n ;
/* All the contents of mod_map are now not visible to readers */
list_for_each_entry_safe ( mod_func , n , & mod_map - > funcs , list ) {
kfree ( mod_func - > name ) ;
list_del ( & mod_func - > list ) ;
kfree ( mod_func ) ;
}
kfree ( mod_map ) ;
}
2009-10-07 19:00:35 +02:00
void ftrace_release_mod ( struct module * mod )
2009-04-15 13:24:06 -04:00
{
2017-09-05 19:20:16 -04:00
struct ftrace_mod_map * mod_map ;
struct ftrace_mod_map * n ;
2009-04-15 13:24:06 -04:00
struct dyn_ftrace * rec ;
2011-12-16 14:42:37 -05:00
struct ftrace_page * * last_pg ;
2017-08-31 17:36:51 -04:00
struct ftrace_page * tmp_page = NULL ;
2009-04-15 13:24:06 -04:00
struct ftrace_page * pg ;
2011-04-21 23:16:46 -04:00
mutex_lock ( & ftrace_lock ) ;
2009-10-07 19:00:35 +02:00
if ( ftrace_disabled )
2011-04-21 23:16:46 -04:00
goto out_unlock ;
2009-04-15 13:24:06 -04:00
2017-09-05 19:20:16 -04:00
list_for_each_entry_safe ( mod_map , n , & ftrace_mod_maps , list ) {
if ( mod_map - > mod = = mod ) {
list_del_rcu ( & mod_map - > list ) ;
2018-11-06 18:44:52 -08:00
call_rcu ( & mod_map - > rcu , ftrace_free_mod_map ) ;
2017-09-05 19:20:16 -04:00
break ;
}
}
2011-12-16 14:42:37 -05:00
/*
* Each module has its own ftrace_pages , remove
* them from the list .
*/
last_pg = & ftrace_pages_start ;
for ( pg = ftrace_pages_start ; pg ; pg = * last_pg ) {
rec = & pg - > records [ 0 ] ;
2023-08-03 21:52:36 +01:00
if ( within_module ( rec - > ip , mod ) ) {
2009-04-15 13:24:06 -04:00
/*
2011-12-16 14:42:37 -05:00
* As core pages are first , the first
* page should never be a module page .
2009-04-15 13:24:06 -04:00
*/
2011-12-16 14:42:37 -05:00
if ( WARN_ON ( pg = = ftrace_pages_start ) )
goto out_unlock ;
/* Check if we are deleting the last page */
if ( pg = = ftrace_pages )
ftrace_pages = next_to_ftrace_page ( last_pg ) ;
2017-06-27 11:04:40 -04:00
ftrace_update_tot_cnt - = pg - > index ;
2011-12-16 14:42:37 -05:00
* last_pg = pg - > next ;
2017-08-31 17:36:51 -04:00
pg - > next = tmp_page ;
tmp_page = pg ;
2011-12-16 14:42:37 -05:00
} else
last_pg = & pg - > next ;
}
2011-04-21 23:16:46 -04:00
out_unlock :
2009-04-15 13:24:06 -04:00
mutex_unlock ( & ftrace_lock ) ;
2017-08-31 17:36:51 -04:00
for ( pg = tmp_page ; pg ; pg = tmp_page ) {
/* Needs to be called outside of ftrace_lock */
clear_mod_from_hashes ( pg ) ;
2021-04-01 16:14:17 -04:00
if ( pg - > records ) {
free_pages ( ( unsigned long ) pg - > records , pg - > order ) ;
ftrace_number_of_pages - = 1 < < pg - > order ;
}
2017-08-31 17:36:51 -04:00
tmp_page = pg - > next ;
kfree ( pg ) ;
2019-10-01 14:38:07 -04:00
ftrace_number_of_groups - - ;
2017-08-31 17:36:51 -04:00
}
2009-04-15 13:24:06 -04:00
}
2016-02-16 17:32:33 -05:00
void ftrace_module_enable ( struct module * mod )
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
{
struct dyn_ftrace * rec ;
struct ftrace_page * pg ;
mutex_lock ( & ftrace_lock ) ;
if ( ftrace_disabled )
goto out_unlock ;
/*
* If the tracing is enabled , go ahead and enable the record .
*
2019-03-24 00:05:23 +05:30
* The reason not to enable the record immediately is the
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
* inherent check of ftrace_make_nop / ftrace_make_call for
* correct previous instructions . Making first the NOP
* conversion puts the module to the correct state , thus
* passing the ftrace_make_call check .
*
* We also delay this to after the module code already set the
* text to read - only , as we now need to set it back to read - write
* so that we can modify the text .
*/
if ( ftrace_start_up )
ftrace_arch_code_modify_prepare ( ) ;
do_for_each_ftrace_rec ( pg , rec ) {
int cnt ;
/*
* do_for_each_ftrace_rec ( ) is a double loop .
* module text shares the pg . If a record is
* not part of this module , then skip this pg ,
* which the " break " will do .
*/
2023-08-03 21:52:36 +01:00
if ( ! within_module ( rec - > ip , mod ) )
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
break ;
ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function
If an unused weak function was traced, it's call to fentry will still
exist, which gets added into the __mcount_loc table. Ftrace will use
kallsyms to retrieve the name for each location in __mcount_loc to display
it in the available_filter_functions and used to enable functions via the
name matching in set_ftrace_filter/notrace. Enabling these functions do
nothing but enable an unused call to ftrace_caller. If a traced weak
function is overridden, the symbol of the function would be used for it,
which will either created duplicate names, or if the previous function was
not traced, it would be incorrectly be listed in available_filter_functions
as a function that can be traced.
This became an issue with BPF[1] as there are tooling that enables the
direct callers via ftrace but then checks to see if the functions were
actually enabled. The case of one function that was marked notrace, but
was followed by an unused weak function that was traced. The unused
function's call to fentry was added to the __mcount_loc section, and
kallsyms retrieved the untraced function's symbol as the weak function was
overridden. Since the untraced function would not get traced, the BPF
check would detect this and fail.
The real fix would be to fix kallsyms to not show addresses of weak
functions as the function before it. But that would require adding code in
the build to add function size to kallsyms so that it can know when the
function ends instead of just using the start of the next known symbol.
In the mean time, this is a work around. Add a FTRACE_MCOUNT_MAX_OFFSET
macro that if defined, ftrace will ignore any function that has its call
to fentry/mcount that has an offset from the symbol that is greater than
FTRACE_MCOUNT_MAX_OFFSET.
If CONFIG_HAVE_FENTRY is defined for x86, define FTRACE_MCOUNT_MAX_OFFSET
to zero (unless IBT is enabled), which will have ftrace ignore all locations
that are not at the start of the function (or one after the ENDBR
instruction).
A worker thread is added at boot up to scan all the ftrace record entries,
and will mark any that fail the FTRACE_MCOUNT_MAX_OFFSET test as disabled.
They will still appear in the available_filter_functions file as:
__ftrace_invalid_address___<invalid-offset>
(showing the offset that caused it to be invalid).
This is required for tools that use libtracefs (like trace-cmd does) that
scan the available_filter_functions and enable set_ftrace_filter and
set_ftrace_notrace using indexes of the function listed in the file (this
is a speedup, as enabling thousands of files via names is an O(n^2)
operation and can take minutes to complete, where the indexing takes less
than a second).
The invalid functions cannot be removed from available_filter_functions as
the names there correspond to the ftrace records in the array that manages
them (and the indexing depends on this).
[1] https://lore.kernel.org/all/20220412094923.0abe90955e5db486b7bca279@kernel.org/
Link: https://lkml.kernel.org/r/20220526141912.794c2786@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 14:19:12 -04:00
/* Weak functions should still be ignored */
if ( ! test_for_valid_rec ( rec ) ) {
/* Clear all other flags. Should not be enabled anyway */
rec - > flags = FTRACE_FL_DISABLED ;
continue ;
}
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
cnt = 0 ;
/*
* When adding a module , we need to check if tracers are
* currently enabled and if they are , and can trace this record ,
* we need to enable the module functions as well as update the
* reference counts for those function records .
*/
if ( ftrace_start_up )
cnt + = referenced_filters ( rec ) ;
2020-07-29 02:05:53 +08:00
rec - > flags & = ~ FTRACE_FL_DISABLED ;
rec - > flags + = cnt ;
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
if ( ftrace_start_up & & cnt ) {
int failed = __ftrace_replace_code ( rec , 1 ) ;
if ( failed ) {
ftrace_bug ( failed , rec ) ;
goto out_loop ;
}
}
} while_for_each_ftrace_rec ( ) ;
out_loop :
if ( ftrace_start_up )
ftrace_arch_code_modify_post_process ( ) ;
out_unlock :
mutex_unlock ( & ftrace_lock ) ;
2017-06-26 10:57:21 -04:00
process_cached_mods ( mod - > name ) ;
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
}
2015-12-02 15:39:57 +01:00
void ftrace_module_init ( struct module * mod )
2008-08-14 15:45:09 -04:00
{
2022-01-20 06:59:49 +00:00
int ret ;
2015-12-23 12:12:22 -05:00
if ( ftrace_disabled | | ! mod - > num_ftrace_callsites )
2008-08-14 22:47:19 -04:00
return ;
2008-08-14 15:45:09 -04:00
2022-01-20 06:59:49 +00:00
ret = ftrace_process_locs ( mod , mod - > ftrace_callsites ,
mod - > ftrace_callsites + mod - > num_ftrace_callsites ) ;
if ( ret )
pr_warn ( " ftrace: failed to allocate entries for module '%s' functions \n " ,
mod - > name ) ;
ftrace: Call ftrace cleanup module notifier after all other notifiers
Commit: c1bf08ac "ftrace: Be first to run code modification on modules"
changed ftrace module notifier's priority to INT_MAX in order to
process the ftrace nops before anything else could touch them
(namely kprobes). This was the correct thing to do.
Unfortunately, the ftrace module notifier also contains the ftrace
clean up code. As opposed to the set up code, this code should be
run *after* all the module notifiers have run in case a module is doing
correct clean-up and unregisters its ftrace hooks. Basically, ftrace
needs to do clean up on module removal, as it needs to know about code
being removed so that it doesn't try to modify that code. But after it
removes the module from its records, if a ftrace user tries to remove
a probe, that removal will fail due as the record of that code segment
no longer exists.
Nothing really bad happens if the probe removal is called after ftrace
did the clean up, but the ftrace removal function will return an error.
Correct code (such as kprobes) will produce a WARN_ON() if it fails
to remove the probe. As people get annoyed by frivolous warnings, it's
best to do the ftrace clean up after everything else.
By splitting the ftrace_module_notifier into two notifiers, one that
does the module load setup that is run at high priority, and the other
that is called for module clean up that is run at low priority, the
problem is solved.
Cc: stable@vger.kernel.org
Reported-by: Frank Ch. Eigler <fche@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-02-13 15:18:38 -05:00
}
2017-09-01 08:35:38 -04:00
static void save_ftrace_mod_rec ( struct ftrace_mod_map * mod_map ,
struct dyn_ftrace * rec )
{
struct ftrace_mod_func * mod_func ;
unsigned long symsize ;
unsigned long offset ;
char str [ KSYM_SYMBOL_LEN ] ;
char * modname ;
const char * ret ;
ret = kallsyms_lookup ( rec - > ip , & symsize , & offset , & modname , str ) ;
if ( ! ret )
return ;
mod_func = kmalloc ( sizeof ( * mod_func ) , GFP_KERNEL ) ;
if ( ! mod_func )
return ;
mod_func - > name = kstrdup ( str , GFP_KERNEL ) ;
if ( ! mod_func - > name ) {
kfree ( mod_func ) ;
return ;
}
mod_func - > ip = rec - > ip - offset ;
mod_func - > size = symsize ;
2017-09-06 08:40:41 -04:00
mod_map - > num_funcs + + ;
2017-09-01 08:35:38 -04:00
list_add_rcu ( & mod_func - > list , & mod_map - > funcs ) ;
}
static struct ftrace_mod_map *
allocate_ftrace_mod_map ( struct module * mod ,
unsigned long start , unsigned long end )
{
struct ftrace_mod_map * mod_map ;
mod_map = kmalloc ( sizeof ( * mod_map ) , GFP_KERNEL ) ;
if ( ! mod_map )
return NULL ;
mod_map - > mod = mod ;
mod_map - > start_addr = start ;
mod_map - > end_addr = end ;
2017-09-06 08:40:41 -04:00
mod_map - > num_funcs = 0 ;
2017-09-01 08:35:38 -04:00
INIT_LIST_HEAD_RCU ( & mod_map - > funcs ) ;
list_add_rcu ( & mod_map - > list , & ftrace_mod_maps ) ;
return mod_map ;
}
static const char *
ftrace_func_address_lookup ( struct ftrace_mod_map * mod_map ,
unsigned long addr , unsigned long * size ,
unsigned long * off , char * sym )
{
struct ftrace_mod_func * found_func = NULL ;
struct ftrace_mod_func * mod_func ;
list_for_each_entry_rcu ( mod_func , & mod_map - > funcs , list ) {
if ( addr > = mod_func - > ip & &
addr < mod_func - > ip + mod_func - > size ) {
found_func = mod_func ;
break ;
}
}
if ( found_func ) {
if ( size )
* size = found_func - > size ;
if ( off )
* off = addr - found_func - > ip ;
if ( sym )
2023-05-17 14:53:23 +00:00
strscpy ( sym , found_func - > name , KSYM_NAME_LEN ) ;
2017-09-01 08:35:38 -04:00
return found_func - > name ;
}
return NULL ;
}
const char *
ftrace_mod_address_lookup ( unsigned long addr , unsigned long * size ,
unsigned long * off , char * * modname , char * sym )
{
struct ftrace_mod_map * mod_map ;
const char * ret = NULL ;
2018-11-06 18:44:52 -08:00
/* mod_map is freed via call_rcu() */
2017-09-01 08:35:38 -04:00
preempt_disable ( ) ;
list_for_each_entry_rcu ( mod_map , & ftrace_mod_maps , list ) {
ret = ftrace_func_address_lookup ( mod_map , addr , size , off , sym ) ;
if ( ret ) {
if ( modname )
* modname = mod_map - > mod - > name ;
break ;
}
}
preempt_enable ( ) ;
return ret ;
}
2017-09-06 08:40:41 -04:00
int ftrace_mod_get_kallsym ( unsigned int symnum , unsigned long * value ,
char * type , char * name ,
char * module_name , int * exported )
{
struct ftrace_mod_map * mod_map ;
struct ftrace_mod_func * mod_func ;
2020-05-12 15:19:13 +03:00
int ret ;
2017-09-06 08:40:41 -04:00
preempt_disable ( ) ;
list_for_each_entry_rcu ( mod_map , & ftrace_mod_maps , list ) {
if ( symnum > = mod_map - > num_funcs ) {
symnum - = mod_map - > num_funcs ;
continue ;
}
list_for_each_entry_rcu ( mod_func , & mod_map - > funcs , list ) {
if ( symnum > 1 ) {
symnum - - ;
continue ;
}
* value = mod_func - > ip ;
* type = ' T ' ;
2023-05-17 14:53:23 +00:00
strscpy ( name , mod_func - > name , KSYM_NAME_LEN ) ;
strscpy ( module_name , mod_map - > mod - > name , MODULE_NAME_LEN ) ;
2017-09-06 08:40:41 -04:00
* exported = 1 ;
preempt_enable ( ) ;
return 0 ;
}
WARN_ON ( 1 ) ;
break ;
}
2020-05-12 15:19:13 +03:00
ret = ftrace_get_trampoline_kallsym ( symnum , value , type , name ,
module_name , exported ) ;
2017-09-06 08:40:41 -04:00
preempt_enable ( ) ;
2020-05-12 15:19:13 +03:00
return ret ;
2017-09-06 08:40:41 -04:00
}
2017-09-01 08:35:38 -04:00
# else
static void save_ftrace_mod_rec ( struct ftrace_mod_map * mod_map ,
struct dyn_ftrace * rec ) { }
static inline struct ftrace_mod_map *
allocate_ftrace_mod_map ( struct module * mod ,
unsigned long start , unsigned long end )
{
return NULL ;
}
2020-05-12 15:19:13 +03:00
int ftrace_mod_get_kallsym ( unsigned int symnum , unsigned long * value ,
char * type , char * name , char * module_name ,
int * exported )
{
int ret ;
preempt_disable ( ) ;
ret = ftrace_get_trampoline_kallsym ( symnum , value , type , name ,
module_name , exported ) ;
preempt_enable ( ) ;
return ret ;
}
2009-04-15 13:24:06 -04:00
# endif /* CONFIG_MODULES */
2017-10-09 12:29:31 -07:00
struct ftrace_init_func {
struct list_head list ;
unsigned long ip ;
} ;
/* Clear any init ips from hashes */
static void
clear_func_from_hash ( struct ftrace_init_func * func , struct ftrace_hash * hash )
2017-03-03 16:15:39 -05:00
{
2017-10-09 12:29:31 -07:00
struct ftrace_func_entry * entry ;
2019-09-10 22:33:36 +08:00
entry = ftrace_lookup_ip ( hash , func - > ip ) ;
2017-10-09 12:29:31 -07:00
/*
* Do not allow this rec to match again .
* Yeah , it may waste some memory , but will be removed
* if / when the hash is modified again .
*/
if ( entry )
entry - > ip = 0 ;
}
static void
clear_func_from_hashes ( struct ftrace_init_func * func )
{
struct trace_array * tr ;
mutex_lock ( & trace_types_lock ) ;
list_for_each_entry ( tr , & ftrace_trace_arrays , list ) {
if ( ! tr - > ops | | ! tr - > ops - > func_hash )
continue ;
mutex_lock ( & tr - > ops - > func_hash - > regex_lock ) ;
clear_func_from_hash ( func , tr - > ops - > func_hash - > filter_hash ) ;
clear_func_from_hash ( func , tr - > ops - > func_hash - > notrace_hash ) ;
mutex_unlock ( & tr - > ops - > func_hash - > regex_lock ) ;
}
mutex_unlock ( & trace_types_lock ) ;
}
static void add_to_clear_hash_list ( struct list_head * clear_list ,
struct dyn_ftrace * rec )
{
struct ftrace_init_func * func ;
func = kmalloc ( sizeof ( * func ) , GFP_KERNEL ) ;
if ( ! func ) {
2020-01-25 10:52:30 -05:00
MEM_FAIL ( 1 , " alloc failure, ftrace filter could be stale \n " ) ;
2017-10-09 12:29:31 -07:00
return ;
}
func - > ip = rec - > ip ;
list_add ( & func - > list , clear_list ) ;
}
2017-09-01 08:35:38 -04:00
void ftrace_free_mem ( struct module * mod , void * start_ptr , void * end_ptr )
2017-03-03 16:15:39 -05:00
{
2017-06-20 10:44:58 -04:00
unsigned long start = ( unsigned long ) ( start_ptr ) ;
unsigned long end = ( unsigned long ) ( end_ptr ) ;
2017-03-03 16:15:39 -05:00
struct ftrace_page * * last_pg = & ftrace_pages_start ;
struct ftrace_page * pg ;
struct dyn_ftrace * rec ;
struct dyn_ftrace key ;
2017-09-01 08:35:38 -04:00
struct ftrace_mod_map * mod_map = NULL ;
2017-10-09 12:29:31 -07:00
struct ftrace_init_func * func , * func_next ;
2023-08-09 15:15:51 +08:00
LIST_HEAD ( clear_hash ) ;
2017-10-09 12:29:31 -07:00
2017-03-03 16:15:39 -05:00
key . ip = start ;
key . flags = end ; /* overload flags, as it is unsigned long */
mutex_lock ( & ftrace_lock ) ;
2017-09-01 08:35:38 -04:00
/*
* If we are freeing module init memory , then check if
* any tracer is active . If so , we need to save a mapping of
* the module functions being freed with the address .
*/
if ( mod & & ftrace_ops_list ! = & ftrace_list_end )
mod_map = allocate_ftrace_mod_map ( mod , start , end ) ;
2017-03-03 16:15:39 -05:00
for ( pg = ftrace_pages_start ; pg ; last_pg = & pg - > next , pg = * last_pg ) {
if ( end < pg - > records [ 0 ] . ip | |
start > = ( pg - > records [ pg - > index - 1 ] . ip + MCOUNT_INSN_SIZE ) )
continue ;
again :
rec = bsearch ( & key , pg - > records , pg - > index ,
sizeof ( struct dyn_ftrace ) ,
ftrace_cmp_recs ) ;
if ( ! rec )
continue ;
2017-09-01 08:35:38 -04:00
2017-10-09 12:29:31 -07:00
/* rec will be cleared from hashes after ftrace_lock unlock */
add_to_clear_hash_list ( & clear_hash , rec ) ;
2017-09-01 08:35:38 -04:00
if ( mod_map )
save_ftrace_mod_rec ( mod_map , rec ) ;
2017-03-03 16:15:39 -05:00
pg - > index - - ;
2017-06-28 11:57:03 -04:00
ftrace_update_tot_cnt - - ;
2017-03-03 16:15:39 -05:00
if ( ! pg - > index ) {
* last_pg = pg - > next ;
2021-04-01 16:14:17 -04:00
if ( pg - > records ) {
free_pages ( ( unsigned long ) pg - > records , pg - > order ) ;
ftrace_number_of_pages - = 1 < < pg - > order ;
}
2019-10-01 14:38:07 -04:00
ftrace_number_of_groups - - ;
2017-03-03 16:15:39 -05:00
kfree ( pg ) ;
pg = container_of ( last_pg , struct ftrace_page , next ) ;
if ( ! ( * last_pg ) )
ftrace_pages = pg ;
continue ;
}
memmove ( rec , rec + 1 ,
( pg - > index - ( rec - pg - > records ) ) * sizeof ( * rec ) ) ;
/* More than one function may be in this block */
goto again ;
}
mutex_unlock ( & ftrace_lock ) ;
2017-10-09 12:29:31 -07:00
list_for_each_entry_safe ( func , func_next , & clear_hash , list ) {
clear_func_from_hashes ( func ) ;
kfree ( func ) ;
}
2017-03-03 16:15:39 -05:00
}
2017-06-20 10:44:58 -04:00
void __init ftrace_free_init_mem ( void )
{
void * start = ( void * ) ( & __init_begin ) ;
void * end = ( void * ) ( & __init_end ) ;
2022-03-10 21:37:09 -05:00
ftrace_boot_snapshot ( ) ;
2017-09-01 08:35:38 -04:00
ftrace_free_mem ( NULL , start , end ) ;
2017-03-03 16:15:39 -05:00
}
2021-09-09 17:02:16 +08:00
int __init __weak ftrace_dyn_arch_init ( void )
{
return 0 ;
}
2008-08-14 15:45:08 -04:00
void __init ftrace_init ( void )
{
2014-02-24 19:59:56 +01:00
extern unsigned long __start_mcount_loc [ ] ;
extern unsigned long __stop_mcount_loc [ ] ;
2014-02-24 19:59:59 +01:00
unsigned long count , flags ;
2008-08-14 15:45:08 -04:00
int ret ;
local_irq_save ( flags ) ;
2014-02-24 19:59:59 +01:00
ret = ftrace_dyn_arch_init ( ) ;
2008-08-14 15:45:08 -04:00
local_irq_restore ( flags ) ;
2014-02-24 19:59:58 +01:00
if ( ret )
2008-08-14 15:45:08 -04:00
goto failed ;
count = __stop_mcount_loc - __start_mcount_loc ;
2014-02-24 19:59:57 +01:00
if ( ! count ) {
pr_info ( " ftrace: No functions to be traced? \n " ) ;
2008-08-14 15:45:08 -04:00
goto failed ;
2014-02-24 19:59:57 +01:00
}
pr_info ( " ftrace: allocating %ld entries in %ld pages \n " ,
2022-11-09 09:44:32 +00:00
count , DIV_ROUND_UP ( count , ENTRIES_PER_PAGE ) ) ;
2008-08-14 15:45:08 -04:00
2009-10-13 16:33:53 -04:00
ret = ftrace_process_locs ( NULL ,
2008-11-14 16:21:19 -08:00
__start_mcount_loc ,
2008-08-14 15:45:08 -04:00
__stop_mcount_loc ) ;
2022-01-20 06:59:49 +00:00
if ( ret ) {
pr_warn ( " ftrace: failed to allocate entries for functions \n " ) ;
goto failed ;
}
2008-08-14 15:45:08 -04:00
2019-10-01 14:38:07 -04:00
pr_info ( " ftrace: allocated %ld pages with %ld groups \n " ,
ftrace_number_of_pages , ftrace_number_of_groups ) ;
2022-01-20 06:59:49 +00:00
last_ftrace_enabled = ftrace_enabled = 1 ;
2009-05-28 13:37:24 -04:00
set_ftrace_early_filters ( ) ;
2008-08-14 15:45:08 -04:00
return ;
failed :
ftrace_disabled = 1 ;
}
ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.
For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!
Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.
For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-02 23:23:31 -04:00
/* Do nothing if arch does not support this */
void __weak arch_ftrace_update_trampoline ( struct ftrace_ops * ops )
{
}
static void ftrace_update_trampoline ( struct ftrace_ops * ops )
{
2020-05-12 15:19:13 +03:00
unsigned long trampoline = ops - > trampoline ;
ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.
For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!
Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.
For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-02 23:23:31 -04:00
arch_ftrace_update_trampoline ( ops ) ;
2020-05-12 15:19:13 +03:00
if ( ops - > trampoline & & ops - > trampoline ! = trampoline & &
2020-05-12 15:19:14 +03:00
( ops - > flags & FTRACE_OPS_FL_ALLOC_TRAMP ) ) {
/* Add to kallsyms before the perf events */
2020-05-12 15:19:13 +03:00
ftrace_add_trampoline_to_kallsyms ( ops ) ;
2020-05-12 15:19:14 +03:00
perf_event_ksymbol ( PERF_RECORD_KSYMBOL_TYPE_OOL ,
ops - > trampoline , ops - > trampoline_size , false ,
FTRACE_TRAMPOLINE_SYM ) ;
2020-05-12 15:19:15 +03:00
/*
* Record the perf text poke event after the ksymbol register
* event .
*/
perf_event_text_poke ( ( void * ) ops - > trampoline , NULL , 0 ,
( void * ) ops - > trampoline ,
ops - > trampoline_size ) ;
2020-05-12 15:19:14 +03:00
}
ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.
For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!
Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.
For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-02 23:23:31 -04:00
}
2017-04-05 13:12:55 -04:00
void ftrace_init_trace_array ( struct trace_array * tr )
{
INIT_LIST_HEAD ( & tr - > func_probes ) ;
2017-06-23 15:26:26 -04:00
INIT_LIST_HEAD ( & tr - > mod_trace ) ;
INIT_LIST_HEAD ( & tr - > mod_notrace ) ;
2017-04-05 13:12:55 -04:00
}
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
# else
2008-10-28 20:17:38 +01:00
2018-11-15 12:32:38 -05:00
struct ftrace_ops global_ops = {
2011-05-03 21:55:54 -04:00
. func = ftrace_stub ,
2020-11-05 21:32:45 -05:00
. flags = FTRACE_OPS_FL_INITIALIZED |
2015-07-24 10:38:12 -04:00
FTRACE_OPS_FL_PID ,
2011-05-03 21:55:54 -04:00
} ;
2008-10-28 20:17:38 +01:00
static int __init ftrace_nodyn_init ( void )
{
ftrace_enabled = 1 ;
return 0 ;
}
2012-10-05 12:13:07 -04:00
core_initcall ( ftrace_nodyn_init ) ;
2008-10-28 20:17:38 +01:00
2015-01-20 12:13:40 -05:00
static inline int ftrace_init_dyn_tracefs ( struct dentry * d_tracer ) { return 0 ; }
2014-08-05 17:19:38 -04:00
static inline void ftrace_startup_all ( int command ) { }
2013-11-25 20:59:46 -05:00
ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.
For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!
Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.
For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-02 23:23:31 -04:00
static void ftrace_update_trampoline ( struct ftrace_ops * ops )
{
}
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
# endif /* CONFIG_DYNAMIC_FTRACE */
2014-01-10 17:01:58 -05:00
__init void ftrace_init_global_array_ops ( struct trace_array * tr )
{
tr - > ops = & global_ops ;
tr - > ops - > private = tr ;
2017-04-05 13:12:55 -04:00
ftrace_init_trace_array ( tr ) ;
2014-01-10 17:01:58 -05:00
}
void ftrace_init_array_ops ( struct trace_array * tr , ftrace_func_t func )
{
/* If we filter on pids, update to use the pid function */
if ( tr - > flags & TRACE_ARRAY_FL_GLOBAL ) {
if ( WARN_ON ( tr - > ops - > func ! = ftrace_stub ) )
printk ( " ftrace ops had %pS for function \n " ,
tr - > ops - > func ) ;
}
tr - > ops - > func = func ;
tr - > ops - > private = tr ;
}
void ftrace_reset_array_ops ( struct trace_array * tr )
{
tr - > ops - > func = ftrace_stub ;
}
2019-02-24 01:50:20 +09:00
static nokprobe_inline void
2011-08-08 16:57:47 -04:00
__ftrace_ops_list_func ( unsigned long ip , unsigned long parent_ip ,
2020-10-28 17:42:17 -04:00
struct ftrace_ops * ignored , struct ftrace_regs * fregs )
2011-05-04 09:27:52 -04:00
{
2020-10-28 17:42:17 -04:00
struct pt_regs * regs = ftrace_get_regs ( fregs ) ;
2011-05-05 21:14:55 -04:00
struct ftrace_ops * op ;
2012-11-02 17:47:21 -04:00
int bit ;
2011-05-04 09:27:52 -04:00
2021-10-27 11:14:44 +08:00
/*
* The ftrace_test_and_set_recursion ( ) will disable preemption ,
* which is required since some of the ops may be dynamically
* allocated , they must be freed after a synchronize_rcu ( ) .
*/
tracing: Have all levels of checks prevent recursion
While writing an email explaining the "bit = 0" logic for a discussion on
making ftrace_test_recursion_trylock() disable preemption, I discovered a
path that makes the "not do the logic if bit is zero" unsafe.
The recursion logic is done in hot paths like the function tracer. Thus,
any code executed causes noticeable overhead. Thus, tricks are done to try
to limit the amount of code executed. This included the recursion testing
logic.
Having recursion testing is important, as there are many paths that can
end up in an infinite recursion cycle when tracing every function in the
kernel. Thus protection is needed to prevent that from happening.
Because it is OK to recurse due to different running context levels (e.g.
an interrupt preempts a trace, and then a trace occurs in the interrupt
handler), a set of bits are used to know which context one is in (normal,
softirq, irq and NMI). If a recursion occurs in the same level, it is
prevented*.
Then there are infrastructure levels of recursion as well. When more than
one callback is attached to the same function to trace, it calls a loop
function to iterate over all the callbacks. Both the callbacks and the
loop function have recursion protection. The callbacks use the
"ftrace_test_recursion_trylock()" which has a "function" set of context
bits to test, and the loop function calls the internal
trace_test_and_set_recursion() directly, with an "internal" set of bits.
If an architecture does not implement all the features supported by ftrace
then the callbacks are never called directly, and the loop function is
called instead, which will implement the features of ftrace.
Since both the loop function and the callbacks do recursion protection, it
was seemed unnecessary to do it in both locations. Thus, a trick was made
to have the internal set of recursion bits at a more significant bit
location than the function bits. Then, if any of the higher bits were set,
the logic of the function bits could be skipped, as any new recursion
would first have to go through the loop function.
This is true for architectures that do not support all the ftrace
features, because all functions being traced must first go through the
loop function before going to the callbacks. But this is not true for
architectures that support all the ftrace features. That's because the
loop function could be called due to two callbacks attached to the same
function, but then a recursion function inside the callback could be
called that does not share any other callback, and it will be called
directly.
i.e.
traced_function_1: [ more than one callback tracing it ]
call loop_func
loop_func:
trace_recursion set internal bit
call callback
callback:
trace_recursion [ skipped because internal bit is set, return 0 ]
call traced_function_2
traced_function_2: [ only traced by above callback ]
call callback
callback:
trace_recursion [ skipped because internal bit is set, return 0 ]
call traced_function_2
[ wash, rinse, repeat, BOOM! out of shampoo! ]
Thus, the "bit == 0 skip" trick is not safe, unless the loop function is
call for all functions.
Since we want to encourage architectures to implement all ftrace features,
having them slow down due to this extra logic may encourage the
maintainers to update to the latest ftrace features. And because this
logic is only safe for them, remove it completely.
[*] There is on layer of recursion that is allowed, and that is to allow
for the transition between interrupt context (normal -> softirq ->
irq -> NMI), because a trace may occur before the context update is
visible to the trace recursion logic.
Link: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/
Link: https://lkml.kernel.org/r/20211018154412.09fcad3c@gandalf.local.home
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
Cc: =?utf-8?b?546L6LSH?= <yun.wang@linux.alibaba.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: stable@vger.kernel.org
Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-18 15:44:12 -04:00
bit = trace_test_and_set_recursion ( ip , parent_ip , TRACE_LIST_START ) ;
2012-11-02 17:47:21 -04:00
if ( bit < 0 )
return ;
ftrace: Add internal recursive checks
Witold reported a reboot caused by the selftests of the dynamic function
tracer. He sent me a config and I used ktest to do a config_bisect on it
(as my config did not cause the crash). It pointed out that the problem
config was CONFIG_PROVE_RCU.
What happened was that if multiple callbacks are attached to the
function tracer, we iterate a list of callbacks. Because the list is
managed by synchronize_sched() and preempt_disable, the access to the
pointers uses rcu_dereference_raw().
When PROVE_RCU is enabled, the rcu_dereference_raw() calls some
debugging functions, which happen to be traced. The tracing of the debug
function would then call rcu_dereference_raw() which would then call the
debug function and then... well you get the idea.
I first wrote two different patches to solve this bug.
1) add a __rcu_dereference_raw() that would not do any checks.
2) add notrace to the offending debug functions.
Both of these patches worked.
Talking with Paul McKenney on IRC, he suggested to add recursion
detection instead. This seemed to be a better solution, so I decided to
implement it. As the task_struct already has a trace_recursion to detect
recursion in the ring buffer, and that has a very small number it
allows, I decided to use that same variable to add flags that can detect
the recursion inside the infrastructure of the function tracer.
I plan to change it so that the task struct bit can be checked in
mcount, but as that requires changes to all archs, I will hold that off
to the next merge window.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1306348063.1465.116.camel@gandalf.stny.rr.com
Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-25 14:27:43 -04:00
2012-11-02 17:03:03 -04:00
do_for_each_ftrace_op ( op , ftrace_ops_list ) {
2019-04-11 11:46:13 -04:00
/* Stub functions don't need to be called nor tested */
if ( op - > flags & FTRACE_OPS_FL_STUB )
continue ;
2015-11-30 17:23:39 -05:00
/*
* Check the following for each ops before calling their func :
* if RCU flag is set , then rcu_is_watching ( ) must be true
* Otherwise test if the ip matches the ops filter
*
* If any of the above fails then the op - > func ( ) is not executed .
*/
if ( ( ! ( op - > flags & FTRACE_OPS_FL_RCU ) | | rcu_is_watching ( ) ) & &
ftrace_ops_test ( op , ip , regs ) ) {
2014-06-25 11:54:03 -04:00
if ( FTRACE_WARN_ON ( ! op - > func ) ) {
pr_warn ( " op=%p %pS \n " , op , op ) ;
2014-01-10 17:01:58 -05:00
goto out ;
}
2020-10-28 17:42:17 -04:00
op - > func ( ip , parent_ip , op , fregs ) ;
2014-01-10 17:01:58 -05:00
}
2012-11-02 17:03:03 -04:00
} while_for_each_ftrace_op ( op ) ;
2014-01-10 17:01:58 -05:00
out :
2012-11-02 17:47:21 -04:00
trace_clear_recursion ( bit ) ;
2011-05-04 09:27:52 -04:00
}
2011-08-08 16:57:47 -04:00
/*
* Some archs only support passing ip and parent_ip . Even though
* the list function ignores the op parameter , we do not want any
* C side effects , where a function is called without the caller
* sending a third parameter .
2011-08-09 12:50:46 -04:00
* Archs are to support both the regs and ftrace_ops at the same time .
* If they support ftrace_ops , it is assumed they support regs .
* If call backs want to use regs , they must either check for regs
2012-09-28 17:15:17 +09:00
* being NULL , or CONFIG_DYNAMIC_FTRACE_WITH_REGS .
* Note , CONFIG_DYNAMIC_FTRACE_WITH_REGS expects a full regs to be saved .
2011-08-09 12:50:46 -04:00
* An architecture can pass partial regs with ftrace_ops and still
2015-11-30 18:23:36 +08:00
* set the ARCH_SUPPORTS_FTRACE_OPS .
2020-06-17 16:56:16 -04:00
*
* In vmlinux . lds . h , ftrace_ops_list_func ( ) is defined to be
* arch_ftrace_ops_list_func .
2011-08-08 16:57:47 -04:00
*/
# if ARCH_SUPPORTS_FTRACE_OPS
2020-06-17 16:56:16 -04:00
void arch_ftrace_ops_list_func ( unsigned long ip , unsigned long parent_ip ,
struct ftrace_ops * op , struct ftrace_regs * fregs )
2011-08-08 16:57:47 -04:00
{
2020-10-28 17:42:17 -04:00
__ftrace_ops_list_func ( ip , parent_ip , NULL , fregs ) ;
2011-08-08 16:57:47 -04:00
}
# else
2020-06-17 16:56:16 -04:00
void arch_ftrace_ops_list_func ( unsigned long ip , unsigned long parent_ip )
2011-08-08 16:57:47 -04:00
{
2011-08-09 12:50:46 -04:00
__ftrace_ops_list_func ( ip , parent_ip , NULL , NULL ) ;
2011-08-08 16:57:47 -04:00
}
# endif
2020-06-17 16:56:16 -04:00
NOKPROBE_SYMBOL ( arch_ftrace_ops_list_func ) ;
2011-08-08 16:57:47 -04:00
2014-07-22 20:16:57 -04:00
/*
* If there ' s only one function registered but it does not support
2022-10-25 15:39:23 +00:00
* recursion , needs RCU protection , then this function will be called
* by the mcount trampoline .
2014-07-22 20:16:57 -04:00
*/
2015-12-01 13:28:16 -05:00
static void ftrace_ops_assist_func ( unsigned long ip , unsigned long parent_ip ,
2020-10-28 17:42:17 -04:00
struct ftrace_ops * op , struct ftrace_regs * fregs )
2014-07-22 20:16:57 -04:00
{
int bit ;
tracing: Have all levels of checks prevent recursion
While writing an email explaining the "bit = 0" logic for a discussion on
making ftrace_test_recursion_trylock() disable preemption, I discovered a
path that makes the "not do the logic if bit is zero" unsafe.
The recursion logic is done in hot paths like the function tracer. Thus,
any code executed causes noticeable overhead. Thus, tricks are done to try
to limit the amount of code executed. This included the recursion testing
logic.
Having recursion testing is important, as there are many paths that can
end up in an infinite recursion cycle when tracing every function in the
kernel. Thus protection is needed to prevent that from happening.
Because it is OK to recurse due to different running context levels (e.g.
an interrupt preempts a trace, and then a trace occurs in the interrupt
handler), a set of bits are used to know which context one is in (normal,
softirq, irq and NMI). If a recursion occurs in the same level, it is
prevented*.
Then there are infrastructure levels of recursion as well. When more than
one callback is attached to the same function to trace, it calls a loop
function to iterate over all the callbacks. Both the callbacks and the
loop function have recursion protection. The callbacks use the
"ftrace_test_recursion_trylock()" which has a "function" set of context
bits to test, and the loop function calls the internal
trace_test_and_set_recursion() directly, with an "internal" set of bits.
If an architecture does not implement all the features supported by ftrace
then the callbacks are never called directly, and the loop function is
called instead, which will implement the features of ftrace.
Since both the loop function and the callbacks do recursion protection, it
was seemed unnecessary to do it in both locations. Thus, a trick was made
to have the internal set of recursion bits at a more significant bit
location than the function bits. Then, if any of the higher bits were set,
the logic of the function bits could be skipped, as any new recursion
would first have to go through the loop function.
This is true for architectures that do not support all the ftrace
features, because all functions being traced must first go through the
loop function before going to the callbacks. But this is not true for
architectures that support all the ftrace features. That's because the
loop function could be called due to two callbacks attached to the same
function, but then a recursion function inside the callback could be
called that does not share any other callback, and it will be called
directly.
i.e.
traced_function_1: [ more than one callback tracing it ]
call loop_func
loop_func:
trace_recursion set internal bit
call callback
callback:
trace_recursion [ skipped because internal bit is set, return 0 ]
call traced_function_2
traced_function_2: [ only traced by above callback ]
call callback
callback:
trace_recursion [ skipped because internal bit is set, return 0 ]
call traced_function_2
[ wash, rinse, repeat, BOOM! out of shampoo! ]
Thus, the "bit == 0 skip" trick is not safe, unless the loop function is
call for all functions.
Since we want to encourage architectures to implement all ftrace features,
having them slow down due to this extra logic may encourage the
maintainers to update to the latest ftrace features. And because this
logic is only safe for them, remove it completely.
[*] There is on layer of recursion that is allowed, and that is to allow
for the transition between interrupt context (normal -> softirq ->
irq -> NMI), because a trace may occur before the context update is
visible to the trace recursion logic.
Link: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/
Link: https://lkml.kernel.org/r/20211018154412.09fcad3c@gandalf.local.home
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
Cc: =?utf-8?b?546L6LSH?= <yun.wang@linux.alibaba.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: stable@vger.kernel.org
Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-18 15:44:12 -04:00
bit = trace_test_and_set_recursion ( ip , parent_ip , TRACE_LIST_START ) ;
2014-07-22 20:16:57 -04:00
if ( bit < 0 )
return ;
2020-09-29 12:40:31 -04:00
if ( ! ( op - > flags & FTRACE_OPS_FL_RCU ) | | rcu_is_watching ( ) )
2020-10-28 17:42:17 -04:00
op - > func ( ip , parent_ip , op , fregs ) ;
2015-12-01 13:28:16 -05:00
2014-07-22 20:16:57 -04:00
trace_clear_recursion ( bit ) ;
}
2019-02-24 01:50:20 +09:00
NOKPROBE_SYMBOL ( ftrace_ops_assist_func ) ;
2014-07-22 20:16:57 -04:00
2014-07-22 20:41:42 -04:00
/**
* ftrace_ops_get_func - get the function a trampoline should call
* @ ops : the ops to get the function for
*
* Normally the mcount trampoline will call the ops - > func , but there
* are times that it should not . For example , if the ops does not
* have its own recursion protection , then it should call the
2017-02-22 08:29:26 +08:00
* ftrace_ops_assist_func ( ) instead .
2014-07-22 20:41:42 -04:00
*
2024-02-22 21:48:33 -08:00
* Returns : the function that the trampoline should call for @ ops .
2014-07-22 20:41:42 -04:00
*/
ftrace_func_t ftrace_ops_get_func ( struct ftrace_ops * ops )
{
/*
2020-11-05 21:32:45 -05:00
* If the function does not handle recursion or needs to be RCU safe ,
* then we need to call the assist handler .
2014-07-22 20:41:42 -04:00
*/
2020-11-05 21:32:45 -05:00
if ( ops - > flags & ( FTRACE_OPS_FL_RECURSION |
FTRACE_OPS_FL_RCU ) )
2015-12-01 13:28:16 -05:00
return ftrace_ops_assist_func ;
2014-07-22 20:41:42 -04:00
return ops - > func ;
}
2016-04-22 18:11:33 -04:00
static void
ftrace_filter_pid_sched_switch_probe ( void * data , bool preempt ,
2022-01-20 16:25:19 +00:00
struct task_struct * prev ,
2022-05-11 18:28:36 +00:00
struct task_struct * next ,
unsigned int prev_state )
2008-12-04 00:26:40 -05:00
{
2016-04-22 18:11:33 -04:00
struct trace_array * tr = data ;
struct trace_pid_list * pid_list ;
2020-03-19 23:19:06 -04:00
struct trace_pid_list * no_pid_list ;
2008-12-04 00:26:40 -05:00
2016-04-22 18:11:33 -04:00
pid_list = rcu_dereference_sched ( tr - > function_pids ) ;
2020-03-19 23:19:06 -04:00
no_pid_list = rcu_dereference_sched ( tr - > function_no_pids ) ;
2008-12-04 00:26:41 -05:00
2020-03-19 23:19:06 -04:00
if ( trace_ignore_this_task ( pid_list , no_pid_list , next ) )
2020-03-19 23:40:40 -04:00
this_cpu_write ( tr - > array_buffer . data - > ftrace_ignore_pid ,
FTRACE_PID_IGNORE ) ;
else
this_cpu_write ( tr - > array_buffer . data - > ftrace_ignore_pid ,
next - > pid ) ;
2008-12-04 00:26:40 -05:00
}
2017-04-17 11:44:28 +09:00
static void
ftrace_pid_follow_sched_process_fork ( void * data ,
struct task_struct * self ,
struct task_struct * task )
{
struct trace_pid_list * pid_list ;
struct trace_array * tr = data ;
pid_list = rcu_dereference_sched ( tr - > function_pids ) ;
trace_filter_add_remove_task ( pid_list , self , task ) ;
2020-03-19 23:19:06 -04:00
pid_list = rcu_dereference_sched ( tr - > function_no_pids ) ;
trace_filter_add_remove_task ( pid_list , self , task ) ;
2017-04-17 11:44:28 +09:00
}
static void
ftrace_pid_follow_sched_process_exit ( void * data , struct task_struct * task )
{
struct trace_pid_list * pid_list ;
struct trace_array * tr = data ;
pid_list = rcu_dereference_sched ( tr - > function_pids ) ;
trace_filter_add_remove_task ( pid_list , NULL , task ) ;
2020-03-19 23:19:06 -04:00
pid_list = rcu_dereference_sched ( tr - > function_no_pids ) ;
trace_filter_add_remove_task ( pid_list , NULL , task ) ;
2017-04-17 11:44:28 +09:00
}
void ftrace_pid_follow_fork ( struct trace_array * tr , bool enable )
{
if ( enable ) {
register_trace_sched_process_fork ( ftrace_pid_follow_sched_process_fork ,
tr ) ;
2020-08-04 20:00:02 -04:00
register_trace_sched_process_free ( ftrace_pid_follow_sched_process_exit ,
2017-04-17 11:44:28 +09:00
tr ) ;
} else {
unregister_trace_sched_process_fork ( ftrace_pid_follow_sched_process_fork ,
tr ) ;
2020-08-04 20:00:02 -04:00
unregister_trace_sched_process_free ( ftrace_pid_follow_sched_process_exit ,
2017-04-17 11:44:28 +09:00
tr ) ;
}
}
2020-03-19 23:19:06 -04:00
static void clear_ftrace_pids ( struct trace_array * tr , int type )
2008-12-04 00:26:41 -05:00
{
2016-04-22 18:11:33 -04:00
struct trace_pid_list * pid_list ;
2020-03-19 23:19:06 -04:00
struct trace_pid_list * no_pid_list ;
2016-04-22 18:11:33 -04:00
int cpu ;
2008-12-04 00:26:41 -05:00
2016-04-22 18:11:33 -04:00
pid_list = rcu_dereference_protected ( tr - > function_pids ,
lockdep_is_held ( & ftrace_lock ) ) ;
2020-03-19 23:19:06 -04:00
no_pid_list = rcu_dereference_protected ( tr - > function_no_pids ,
lockdep_is_held ( & ftrace_lock ) ) ;
/* Make sure there's something to do */
2020-03-25 19:51:19 -04:00
if ( ! pid_type_enabled ( type , pid_list , no_pid_list ) )
2016-04-22 18:11:33 -04:00
return ;
2009-02-03 20:39:04 +01:00
2020-03-19 23:19:06 -04:00
/* See if the pids still need to be checked after this */
2020-03-25 19:51:19 -04:00
if ( ! still_need_pid_events ( type , pid_list , no_pid_list ) ) {
2020-03-19 23:19:06 -04:00
unregister_trace_sched_switch ( ftrace_filter_pid_sched_switch_probe , tr ) ;
for_each_possible_cpu ( cpu )
per_cpu_ptr ( tr - > array_buffer . data , cpu ) - > ftrace_ignore_pid = FTRACE_PID_TRACE ;
}
2008-12-04 00:26:41 -05:00
2020-03-19 23:19:06 -04:00
if ( type & TRACE_PIDS )
rcu_assign_pointer ( tr - > function_pids , NULL ) ;
2008-12-04 00:26:40 -05:00
2020-03-19 23:19:06 -04:00
if ( type & TRACE_NO_PIDS )
rcu_assign_pointer ( tr - > function_no_pids , NULL ) ;
2008-12-04 00:26:40 -05:00
2016-04-22 18:11:33 -04:00
/* Wait till all users are no longer using pid filtering */
2018-11-06 18:44:52 -08:00
synchronize_rcu ( ) ;
2008-12-04 00:26:41 -05:00
2020-03-19 23:19:06 -04:00
if ( ( type & TRACE_PIDS ) & & pid_list )
2021-09-23 21:03:49 -04:00
trace_pid_list_free ( pid_list ) ;
2020-03-19 23:19:06 -04:00
if ( ( type & TRACE_NO_PIDS ) & & no_pid_list )
2021-09-23 21:03:49 -04:00
trace_pid_list_free ( no_pid_list ) ;
2008-12-04 00:26:41 -05:00
}
2017-04-17 11:44:27 +09:00
void ftrace_clear_pids ( struct trace_array * tr )
{
mutex_lock ( & ftrace_lock ) ;
2020-03-19 23:19:06 -04:00
clear_ftrace_pids ( tr , TRACE_PIDS | TRACE_NO_PIDS ) ;
2017-04-17 11:44:27 +09:00
mutex_unlock ( & ftrace_lock ) ;
}
2020-03-19 23:19:06 -04:00
static void ftrace_pid_reset ( struct trace_array * tr , int type )
2008-11-26 00:16:23 -05:00
{
2009-10-13 16:33:52 -04:00
mutex_lock ( & ftrace_lock ) ;
2020-03-19 23:19:06 -04:00
clear_ftrace_pids ( tr , type ) ;
2008-12-04 00:26:40 -05:00
2009-10-13 16:33:52 -04:00
ftrace_update_pid_func ( ) ;
2014-08-05 17:19:38 -04:00
ftrace_startup_all ( 0 ) ;
2009-10-13 16:33:52 -04:00
mutex_unlock ( & ftrace_lock ) ;
}
2016-04-22 18:11:33 -04:00
/* Greater than any max PID */
# define FTRACE_NO_PIDS (void *)(PID_MAX_LIMIT + 1)
2008-11-26 00:16:23 -05:00
2009-10-13 16:33:52 -04:00
static void * fpid_start ( struct seq_file * m , loff_t * pos )
2016-04-22 18:11:33 -04:00
__acquires ( RCU )
2009-10-13 16:33:52 -04:00
{
2016-04-22 18:11:33 -04:00
struct trace_pid_list * pid_list ;
struct trace_array * tr = m - > private ;
2009-10-13 16:33:52 -04:00
mutex_lock ( & ftrace_lock ) ;
2016-04-22 18:11:33 -04:00
rcu_read_lock_sched ( ) ;
pid_list = rcu_dereference_sched ( tr - > function_pids ) ;
2009-10-13 16:33:52 -04:00
2016-04-22 18:11:33 -04:00
if ( ! pid_list )
return ! ( * pos ) ? FTRACE_NO_PIDS : NULL ;
2009-10-13 16:33:52 -04:00
2016-04-22 18:11:33 -04:00
return trace_pid_start ( pid_list , pos ) ;
2009-10-13 16:33:52 -04:00
}
static void * fpid_next ( struct seq_file * m , void * v , loff_t * pos )
{
2016-04-22 18:11:33 -04:00
struct trace_array * tr = m - > private ;
struct trace_pid_list * pid_list = rcu_dereference_sched ( tr - > function_pids ) ;
2020-01-24 10:02:56 +03:00
if ( v = = FTRACE_NO_PIDS ) {
( * pos ) + + ;
2009-10-13 16:33:52 -04:00
return NULL ;
2020-01-24 10:02:56 +03:00
}
2016-04-22 18:11:33 -04:00
return trace_pid_next ( pid_list , v , pos ) ;
2009-10-13 16:33:52 -04:00
}
static void fpid_stop ( struct seq_file * m , void * p )
2016-04-22 18:11:33 -04:00
__releases ( RCU )
2009-10-13 16:33:52 -04:00
{
2016-04-22 18:11:33 -04:00
rcu_read_unlock_sched ( ) ;
2009-10-13 16:33:52 -04:00
mutex_unlock ( & ftrace_lock ) ;
}
static int fpid_show ( struct seq_file * m , void * v )
{
2016-04-22 18:11:33 -04:00
if ( v = = FTRACE_NO_PIDS ) {
2014-11-08 21:42:10 +01:00
seq_puts ( m , " no pid \n " ) ;
2009-10-13 16:33:52 -04:00
return 0 ;
}
2016-04-22 18:11:33 -04:00
return trace_pid_show ( m , v ) ;
2009-10-13 16:33:52 -04:00
}
static const struct seq_operations ftrace_pid_sops = {
. start = fpid_start ,
. next = fpid_next ,
. stop = fpid_stop ,
. show = fpid_show ,
} ;
2020-03-19 23:19:06 -04:00
static void * fnpid_start ( struct seq_file * m , loff_t * pos )
__acquires ( RCU )
{
struct trace_pid_list * pid_list ;
struct trace_array * tr = m - > private ;
mutex_lock ( & ftrace_lock ) ;
rcu_read_lock_sched ( ) ;
pid_list = rcu_dereference_sched ( tr - > function_no_pids ) ;
if ( ! pid_list )
return ! ( * pos ) ? FTRACE_NO_PIDS : NULL ;
return trace_pid_start ( pid_list , pos ) ;
}
static void * fnpid_next ( struct seq_file * m , void * v , loff_t * pos )
2009-10-13 16:33:52 -04:00
{
2020-03-19 23:19:06 -04:00
struct trace_array * tr = m - > private ;
struct trace_pid_list * pid_list = rcu_dereference_sched ( tr - > function_no_pids ) ;
if ( v = = FTRACE_NO_PIDS ) {
( * pos ) + + ;
return NULL ;
}
return trace_pid_next ( pid_list , v , pos ) ;
}
static const struct seq_operations ftrace_no_pid_sops = {
. start = fnpid_start ,
. next = fnpid_next ,
. stop = fpid_stop ,
. show = fpid_show ,
} ;
static int pid_open ( struct inode * inode , struct file * file , int type )
{
const struct seq_operations * seq_ops ;
2016-04-22 18:11:33 -04:00
struct trace_array * tr = inode - > i_private ;
struct seq_file * m ;
2009-10-13 16:33:52 -04:00
int ret = 0 ;
tracing: Add tracing_check_open_get_tr()
Currently, most files in the tracefs directory test if tracing_disabled is
set. If so, it should return -ENODEV. The tracing_disabled is called when
tracing is found to be broken. Originally it was done in case the ring
buffer was found to be corrupted, and we wanted to prevent reading it from
crashing the kernel. But it's also called if a tracing selftest fails on
boot. It's a one way switch. That is, once it is triggered, tracing is
disabled until reboot.
As most tracefs files can also be used by instances in the tracefs
directory, they need to be carefully done. Each instance has a trace_array
associated to it, and when the instance is removed, the trace_array is
freed. But if an instance is opened with a reference to the trace_array,
then it requires looking up the trace_array to get its ref counter (as there
could be a race with it being deleted and the open itself). Once it is
found, a reference is added to prevent the instance from being removed (and
the trace_array associated with it freed).
Combine the two checks (tracing_disabled and trace_array_get()) into a
single helper function. This will also make it easier to add lockdown to
tracefs later.
Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-11 17:39:57 -04:00
ret = tracing_check_open_get_tr ( tr ) ;
if ( ret )
return ret ;
2016-04-22 18:11:33 -04:00
2009-10-13 16:33:52 -04:00
if ( ( file - > f_mode & FMODE_WRITE ) & &
( file - > f_flags & O_TRUNC ) )
2020-03-19 23:19:06 -04:00
ftrace_pid_reset ( tr , type ) ;
switch ( type ) {
case TRACE_PIDS :
seq_ops = & ftrace_pid_sops ;
break ;
case TRACE_NO_PIDS :
seq_ops = & ftrace_no_pid_sops ;
break ;
2020-05-29 22:12:14 +08:00
default :
trace_array_put ( tr ) ;
WARN_ON_ONCE ( 1 ) ;
return - EINVAL ;
2020-03-19 23:19:06 -04:00
}
2009-10-13 16:33:52 -04:00
2020-03-19 23:19:06 -04:00
ret = seq_open ( file , seq_ops ) ;
2016-04-22 18:11:33 -04:00
if ( ret < 0 ) {
trace_array_put ( tr ) ;
} else {
m = file - > private_data ;
/* copy tr over to seq ops */
m - > private = tr ;
}
2009-10-13 16:33:52 -04:00
return ret ;
}
2020-03-19 23:19:06 -04:00
static int
ftrace_pid_open ( struct inode * inode , struct file * file )
{
return pid_open ( inode , file , TRACE_PIDS ) ;
}
static int
ftrace_no_pid_open ( struct inode * inode , struct file * file )
{
return pid_open ( inode , file , TRACE_NO_PIDS ) ;
}
2016-04-22 18:11:33 -04:00
static void ignore_task_cpu ( void * data )
{
struct trace_array * tr = data ;
struct trace_pid_list * pid_list ;
2020-03-19 23:19:06 -04:00
struct trace_pid_list * no_pid_list ;
2016-04-22 18:11:33 -04:00
/*
* This function is called by on_each_cpu ( ) while the
* event_mutex is held .
*/
pid_list = rcu_dereference_protected ( tr - > function_pids ,
mutex_is_locked ( & ftrace_lock ) ) ;
2020-03-19 23:19:06 -04:00
no_pid_list = rcu_dereference_protected ( tr - > function_no_pids ,
mutex_is_locked ( & ftrace_lock ) ) ;
2016-04-22 18:11:33 -04:00
2020-03-19 23:19:06 -04:00
if ( trace_ignore_this_task ( pid_list , no_pid_list , current ) )
2020-03-19 23:40:40 -04:00
this_cpu_write ( tr - > array_buffer . data - > ftrace_ignore_pid ,
FTRACE_PID_IGNORE ) ;
else
this_cpu_write ( tr - > array_buffer . data - > ftrace_ignore_pid ,
current - > pid ) ;
2016-04-22 18:11:33 -04:00
}
2008-11-26 00:16:23 -05:00
static ssize_t
2020-03-19 23:19:06 -04:00
pid_write ( struct file * filp , const char __user * ubuf ,
size_t cnt , loff_t * ppos , int type )
2008-11-26 00:16:23 -05:00
{
2016-04-22 18:11:33 -04:00
struct seq_file * m = filp - > private_data ;
struct trace_array * tr = m - > private ;
2020-03-19 23:19:06 -04:00
struct trace_pid_list * filtered_pids ;
struct trace_pid_list * other_pids ;
2016-04-22 18:11:33 -04:00
struct trace_pid_list * pid_list ;
ssize_t ret ;
2008-11-26 00:16:23 -05:00
2016-04-22 18:11:33 -04:00
if ( ! cnt )
return 0 ;
mutex_lock ( & ftrace_lock ) ;
2020-03-19 23:19:06 -04:00
switch ( type ) {
case TRACE_PIDS :
filtered_pids = rcu_dereference_protected ( tr - > function_pids ,
2016-04-22 18:11:33 -04:00
lockdep_is_held ( & ftrace_lock ) ) ;
2020-03-19 23:19:06 -04:00
other_pids = rcu_dereference_protected ( tr - > function_no_pids ,
lockdep_is_held ( & ftrace_lock ) ) ;
break ;
case TRACE_NO_PIDS :
filtered_pids = rcu_dereference_protected ( tr - > function_no_pids ,
lockdep_is_held ( & ftrace_lock ) ) ;
other_pids = rcu_dereference_protected ( tr - > function_pids ,
2016-04-22 18:11:33 -04:00
lockdep_is_held ( & ftrace_lock ) ) ;
2020-03-19 23:19:06 -04:00
break ;
2020-05-29 22:12:14 +08:00
default :
ret = - EINVAL ;
WARN_ON_ONCE ( 1 ) ;
goto out ;
2020-03-19 23:19:06 -04:00
}
2016-04-22 18:11:33 -04:00
ret = trace_pid_write ( filtered_pids , & pid_list , ubuf , cnt ) ;
if ( ret < 0 )
goto out ;
2008-11-26 00:16:23 -05:00
2020-03-19 23:19:06 -04:00
switch ( type ) {
case TRACE_PIDS :
rcu_assign_pointer ( tr - > function_pids , pid_list ) ;
break ;
case TRACE_NO_PIDS :
rcu_assign_pointer ( tr - > function_no_pids , pid_list ) ;
break ;
}
2008-11-26 00:16:23 -05:00
2016-04-22 18:11:33 -04:00
if ( filtered_pids ) {
2018-11-06 18:44:52 -08:00
synchronize_rcu ( ) ;
2021-09-23 21:03:49 -04:00
trace_pid_list_free ( filtered_pids ) ;
2020-03-19 23:19:06 -04:00
} else if ( pid_list & & ! other_pids ) {
2016-04-22 18:11:33 -04:00
/* Register a probe to set whether to ignore the tracing of a task */
register_trace_sched_switch ( ftrace_filter_pid_sched_switch_probe , tr ) ;
}
2008-11-26 00:16:23 -05:00
2009-10-13 16:33:52 -04:00
/*
2016-04-22 18:11:33 -04:00
* Ignoring of pids is done at task switch . But we have to
* check for those tasks that are currently running .
* Always do this in case a pid was appended or removed .
2009-10-13 16:33:52 -04:00
*/
2016-04-22 18:11:33 -04:00
on_each_cpu ( ignore_task_cpu , tr , 1 ) ;
2009-10-13 16:33:52 -04:00
2016-04-22 18:11:33 -04:00
ftrace_update_pid_func ( ) ;
ftrace_startup_all ( 0 ) ;
out :
mutex_unlock ( & ftrace_lock ) ;
2008-11-26 00:16:23 -05:00
2016-04-22 18:11:33 -04:00
if ( ret > 0 )
* ppos + = ret ;
2008-11-26 00:16:23 -05:00
2016-04-22 18:11:33 -04:00
return ret ;
2009-10-13 16:33:52 -04:00
}
2008-11-26 00:16:23 -05:00
2020-03-19 23:19:06 -04:00
static ssize_t
ftrace_pid_write ( struct file * filp , const char __user * ubuf ,
size_t cnt , loff_t * ppos )
{
return pid_write ( filp , ubuf , cnt , ppos , TRACE_PIDS ) ;
}
static ssize_t
ftrace_no_pid_write ( struct file * filp , const char __user * ubuf ,
size_t cnt , loff_t * ppos )
{
return pid_write ( filp , ubuf , cnt , ppos , TRACE_NO_PIDS ) ;
}
2009-10-13 16:33:52 -04:00
static int
ftrace_pid_release ( struct inode * inode , struct file * file )
{
2016-04-22 18:11:33 -04:00
struct trace_array * tr = inode - > i_private ;
2008-11-26 00:16:23 -05:00
2016-04-22 18:11:33 -04:00
trace_array_put ( tr ) ;
return seq_release ( inode , file ) ;
2008-11-26 00:16:23 -05:00
}
2009-03-05 21:44:55 -05:00
static const struct file_operations ftrace_pid_fops = {
2009-10-13 16:33:52 -04:00
. open = ftrace_pid_open ,
. write = ftrace_pid_write ,
. read = seq_read ,
2013-12-21 17:39:40 -05:00
. llseek = tracing_lseek ,
2009-10-13 16:33:52 -04:00
. release = ftrace_pid_release ,
2008-11-26 00:16:23 -05:00
} ;
2020-03-19 23:19:06 -04:00
static const struct file_operations ftrace_no_pid_fops = {
. open = ftrace_no_pid_open ,
. write = ftrace_no_pid_write ,
. read = seq_read ,
. llseek = tracing_lseek ,
. release = ftrace_pid_release ,
} ;
2016-04-22 18:11:33 -04:00
void ftrace_init_tracefs ( struct trace_array * tr , struct dentry * d_tracer )
2008-11-26 00:16:23 -05:00
{
2021-08-18 11:24:51 -04:00
trace_create_file ( " set_ftrace_pid " , TRACE_MODE_WRITE , d_tracer ,
2016-04-22 18:11:33 -04:00
tr , & ftrace_pid_fops ) ;
2021-08-18 11:24:51 -04:00
trace_create_file ( " set_ftrace_notrace_pid " , TRACE_MODE_WRITE ,
d_tracer , tr , & ftrace_no_pid_fops ) ;
2008-11-26 00:16:23 -05:00
}
2016-07-05 10:04:34 -04:00
void __init ftrace_init_tracefs_toplevel ( struct trace_array * tr ,
struct dentry * d_tracer )
{
/* Only the top level directory has the dyn_tracefs and profile */
WARN_ON ( ! ( tr - > flags & TRACE_ARRAY_FL_GLOBAL ) ) ;
ftrace_init_dyn_tracefs ( d_tracer ) ;
ftrace_profile_tracefs ( d_tracer ) ;
}
2008-07-10 20:58:15 -04:00
/**
2008-10-23 09:33:02 -04:00
* ftrace_kill - kill ftrace
2008-07-10 20:58:15 -04:00
*
* This function should be used by panic code . It stops ftrace
* but in a not so nice way . If you need to simply kill ftrace
* from a non - atomic section , use ftrace_kill .
*/
2008-10-23 09:33:02 -04:00
void ftrace_kill ( void )
2008-07-10 20:58:15 -04:00
{
ftrace_disabled = 1 ;
ftrace_enabled = 0 ;
2018-02-02 10:14:49 +08:00
ftrace_trace_function = ftrace_stub ;
2008-07-10 20:58:15 -04:00
}
2011-09-29 21:26:16 -04:00
/**
2021-10-29 09:52:23 -04:00
* ftrace_is_dead - Test if ftrace is dead or not .
*
2024-02-22 21:48:33 -08:00
* Returns : 1 if ftrace is " dead " , zero otherwise .
2011-09-29 21:26:16 -04:00
*/
int ftrace_is_dead ( void )
{
return ftrace_disabled ;
}
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
# ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
/*
* When registering ftrace_ops with IPMODIFY , it is necessary to make sure
* it doesn ' t conflict with any direct ftrace_ops . If there is existing
* direct ftrace_ops on a kernel function being patched , call
* FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER on it to enable sharing .
*
* @ ops : ftrace_ops being registered .
*
* Returns :
* 0 on success ;
* Negative on failure .
*/
static int prepare_direct_functions_for_ipmodify ( struct ftrace_ops * ops )
{
struct ftrace_func_entry * entry ;
struct ftrace_hash * hash ;
struct ftrace_ops * op ;
int size , i , ret ;
lockdep_assert_held_once ( & direct_mutex ) ;
if ( ! ( ops - > flags & FTRACE_OPS_FL_IPMODIFY ) )
return 0 ;
hash = ops - > func_hash - > filter_hash ;
size = 1 < < hash - > size_bits ;
for ( i = 0 ; i < size ; i + + ) {
hlist_for_each_entry ( entry , & hash - > buckets [ i ] , hlist ) {
unsigned long ip = entry - > ip ;
bool found_op = false ;
mutex_lock ( & ftrace_lock ) ;
do_for_each_ftrace_op ( op , ftrace_ops_list ) {
if ( ! ( op - > flags & FTRACE_OPS_FL_DIRECT ) )
continue ;
if ( ops_references_ip ( op , ip ) ) {
found_op = true ;
break ;
}
} while_for_each_ftrace_op ( op ) ;
mutex_unlock ( & ftrace_lock ) ;
if ( found_op ) {
if ( ! op - > ops_func )
return - EBUSY ;
ret = op - > ops_func ( op , FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER ) ;
if ( ret )
return ret ;
}
}
}
return 0 ;
}
/*
* Similar to prepare_direct_functions_for_ipmodify , clean up after ops
* with IPMODIFY is unregistered . The cleanup is optional for most DIRECT
* ops .
*/
static void cleanup_direct_functions_after_ipmodify ( struct ftrace_ops * ops )
{
struct ftrace_func_entry * entry ;
struct ftrace_hash * hash ;
struct ftrace_ops * op ;
int size , i ;
if ( ! ( ops - > flags & FTRACE_OPS_FL_IPMODIFY ) )
return ;
mutex_lock ( & direct_mutex ) ;
hash = ops - > func_hash - > filter_hash ;
size = 1 < < hash - > size_bits ;
for ( i = 0 ; i < size ; i + + ) {
hlist_for_each_entry ( entry , & hash - > buckets [ i ] , hlist ) {
unsigned long ip = entry - > ip ;
bool found_op = false ;
mutex_lock ( & ftrace_lock ) ;
do_for_each_ftrace_op ( op , ftrace_ops_list ) {
if ( ! ( op - > flags & FTRACE_OPS_FL_DIRECT ) )
continue ;
if ( ops_references_ip ( op , ip ) ) {
found_op = true ;
break ;
}
} while_for_each_ftrace_op ( op ) ;
mutex_unlock ( & ftrace_lock ) ;
/* The cleanup is optional, ignore any errors */
if ( found_op & & op - > ops_func )
op - > ops_func ( op , FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER ) ;
}
}
mutex_unlock ( & direct_mutex ) ;
}
# define lock_direct_mutex() mutex_lock(&direct_mutex)
# define unlock_direct_mutex() mutex_unlock(&direct_mutex)
# else /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
static int prepare_direct_functions_for_ipmodify ( struct ftrace_ops * ops )
{
return 0 ;
}
static void cleanup_direct_functions_after_ipmodify ( struct ftrace_ops * ops )
{
}
# define lock_direct_mutex() do { } while (0)
# define unlock_direct_mutex() do { } while (0)
# endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
/*
* Similar to register_ftrace_function , except we don ' t lock direct_mutex .
*/
static int register_ftrace_function_nolock ( struct ftrace_ops * ops )
{
int ret ;
ftrace_ops_init ( ops ) ;
mutex_lock ( & ftrace_lock ) ;
ret = ftrace_startup ( ops , 0 ) ;
mutex_unlock ( & ftrace_lock ) ;
return ret ;
}
2008-05-12 21:20:42 +02:00
/**
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
* register_ftrace_function - register a function for profiling
2022-03-07 08:43:03 +08:00
* @ ops : ops structure that holds the function for profiling .
2008-05-12 21:20:42 +02:00
*
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
* Register a function to be called by all functions in the
* kernel .
*
* Note : @ ops - > func and all the functions it calls must be labeled
* with " notrace " , otherwise it will go into a
* recursive loop .
2008-05-12 21:20:42 +02:00
*/
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
int register_ftrace_function ( struct ftrace_ops * ops )
2008-05-12 21:20:42 +02:00
{
2021-07-21 13:09:15 +01:00
int ret ;
2008-05-12 21:20:48 +02:00
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
lock_direct_mutex ( ) ;
ret = prepare_direct_functions_for_ipmodify ( ops ) ;
if ( ret < 0 )
goto out_unlock ;
2011-05-04 09:27:52 -04:00
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
ret = register_ftrace_function_nolock ( ops ) ;
2012-03-29 19:11:40 +02:00
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
out_unlock :
unlock_direct_mutex ( ) ;
2008-05-12 21:20:43 +02:00
return ret ;
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
}
2011-05-05 21:14:55 -04:00
EXPORT_SYMBOL_GPL ( register_ftrace_function ) ;
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
/**
2009-01-12 23:35:50 +01:00
* unregister_ftrace_function - unregister a function for profiling .
2022-03-07 08:43:03 +08:00
* @ ops : ops structure that holds the function to unregister
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
*
* Unregister a function that was added to be called by ftrace profiling .
*/
int unregister_ftrace_function ( struct ftrace_ops * ops )
{
int ret ;
2009-02-14 01:42:44 -05:00
mutex_lock ( & ftrace_lock ) ;
2013-11-25 20:59:46 -05:00
ret = ftrace_shutdown ( ops , 0 ) ;
2009-02-14 01:42:44 -05:00
mutex_unlock ( & ftrace_lock ) ;
2008-05-12 21:20:43 +02:00
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
cleanup_direct_functions_after_ipmodify ( ops ) ;
2008-05-12 21:20:43 +02:00
return ret ;
}
2011-05-05 21:14:55 -04:00
EXPORT_SYMBOL_GPL ( unregister_ftrace_function ) ;
2008-05-12 21:20:43 +02:00
2022-05-10 14:26:13 +02:00
static int symbols_cmp ( const void * a , const void * b )
{
const char * * str_a = ( const char * * ) a ;
const char * * str_b = ( const char * * ) b ;
return strcmp ( * str_a , * str_b ) ;
}
struct kallsyms_data {
unsigned long * addrs ;
const char * * syms ;
size_t cnt ;
size_t found ;
} ;
2022-10-25 15:41:42 +02:00
/* This function gets called for all kernel and module symbols
* and returns 1 in case we resolved all the requested symbols ,
* 0 otherwise .
*/
2023-03-08 15:38:46 +08:00
static int kallsyms_callback ( void * data , const char * name , unsigned long addr )
2022-05-10 14:26:13 +02:00
{
struct kallsyms_data * args = data ;
2022-06-15 13:21:16 +02:00
const char * * sym ;
int idx ;
2022-05-10 14:26:13 +02:00
2022-06-15 13:21:16 +02:00
sym = bsearch ( & name , args - > syms , args - > cnt , sizeof ( * args - > syms ) , symbols_cmp ) ;
if ( ! sym )
return 0 ;
idx = sym - args - > syms ;
if ( args - > addrs [ idx ] )
2022-05-10 14:26:13 +02:00
return 0 ;
2022-09-26 17:33:36 +02:00
if ( ! ftrace_location ( addr ) )
2022-05-10 14:26:13 +02:00
return 0 ;
2022-06-15 13:21:16 +02:00
args - > addrs [ idx ] = addr ;
args - > found + + ;
2022-05-10 14:26:13 +02:00
return args - > found = = args - > cnt ? 1 : 0 ;
}
/**
* ftrace_lookup_symbols - Lookup addresses for array of symbols
*
* @ sorted_syms : array of symbols pointers symbols to resolve ,
* must be alphabetically sorted
* @ cnt : number of symbols / addresses in @ syms / @ addrs arrays
* @ addrs : array for storing resulting addresses
*
* This function looks up addresses for array of symbols provided in
* @ syms array ( must be alphabetically sorted ) and stores them in
* @ addrs array , which needs to be big enough to store at least @ cnt
* addresses .
*
2024-02-22 21:48:33 -08:00
* Returns : 0 if all provided symbols are found , - ESRCH otherwise .
2022-05-10 14:26:13 +02:00
*/
int ftrace_lookup_symbols ( const char * * sorted_syms , size_t cnt , unsigned long * addrs )
{
struct kallsyms_data args ;
2022-10-25 15:41:42 +02:00
int found_all ;
2022-05-10 14:26:13 +02:00
2022-06-15 13:21:16 +02:00
memset ( addrs , 0 , sizeof ( * addrs ) * cnt ) ;
2022-05-10 14:26:13 +02:00
args . addrs = addrs ;
args . syms = sorted_syms ;
args . cnt = cnt ;
args . found = 0 ;
2022-10-25 15:41:42 +02:00
found_all = kallsyms_on_each_symbol ( kallsyms_callback , & args ) ;
if ( found_all )
return 0 ;
2023-01-16 11:10:07 +01:00
found_all = module_kallsyms_on_each_symbol ( NULL , kallsyms_callback , & args ) ;
2022-10-25 15:41:42 +02:00
return found_all ? 0 : - ESRCH ;
2022-05-10 14:26:13 +02:00
}
2022-05-26 16:57:20 -07:00
2022-04-07 15:46:12 +08:00
# ifdef CONFIG_SYSCTL
2022-04-21 11:38:43 -07:00
# ifdef CONFIG_DYNAMIC_FTRACE
2022-04-15 14:29:57 -07:00
static void ftrace_startup_sysctl ( void )
{
int command ;
if ( unlikely ( ftrace_disabled ) )
return ;
/* Force update next time */
saved_ftrace_func = NULL ;
/* ftrace_start_up is true if we want ftrace running */
if ( ftrace_start_up ) {
command = FTRACE_UPDATE_CALLS ;
if ( ftrace_graph_active )
command | = FTRACE_START_FUNC_RET ;
ftrace_startup_enable ( command ) ;
}
}
static void ftrace_shutdown_sysctl ( void )
{
int command ;
if ( unlikely ( ftrace_disabled ) )
return ;
/* ftrace_start_up is true if ftrace is running */
if ( ftrace_start_up ) {
command = FTRACE_DISABLE_CALLS ;
if ( ftrace_graph_active )
command | = FTRACE_STOP_FUNC_RET ;
ftrace_run_update_code ( command ) ;
}
}
2022-04-21 11:38:43 -07:00
# else
# define ftrace_startup_sysctl() do { } while (0)
# define ftrace_shutdown_sysctl() do { } while (0)
# endif /* CONFIG_DYNAMIC_FTRACE */
2022-04-15 14:29:57 -07:00
2019-10-16 13:33:13 +02:00
static bool is_permanent_ops_registered ( void )
{
struct ftrace_ops * op ;
do_for_each_ftrace_op ( op , ftrace_ops_list ) {
if ( op - > flags & FTRACE_OPS_FL_PERMANENT )
return true ;
} while_for_each_ftrace_op ( op ) ;
return false ;
}
2022-02-23 19:11:53 +08:00
static int
2008-05-12 21:20:43 +02:00
ftrace_enable_sysctl ( struct ctl_table * table , int write ,
2020-09-07 11:32:07 +02:00
void * buffer , size_t * lenp , loff_t * ppos )
2008-05-12 21:20:43 +02:00
{
2011-04-21 23:16:46 -04:00
int ret = - ENODEV ;
2008-05-12 21:20:48 +02:00
2009-02-14 01:42:44 -05:00
mutex_lock ( & ftrace_lock ) ;
2008-05-12 21:20:43 +02:00
2011-04-21 23:16:46 -04:00
if ( unlikely ( ftrace_disabled ) )
goto out ;
ret = proc_dointvec ( table , write , buffer , lenp , ppos ) ;
2008-05-12 21:20:43 +02:00
2009-06-26 16:55:51 +08:00
if ( ret | | ! write | | ( last_ftrace_enabled = = ! ! ftrace_enabled ) )
2008-05-12 21:20:43 +02:00
goto out ;
if ( ftrace_enabled ) {
/* we are starting ftrace again */
2017-06-07 16:12:51 +08:00
if ( rcu_dereference_protected ( ftrace_ops_list ,
lockdep_is_held ( & ftrace_lock ) ) ! = & ftrace_list_end )
2013-03-26 17:53:03 +01:00
update_ftrace_function ( ) ;
2008-05-12 21:20:43 +02:00
2015-03-06 19:55:13 -05:00
ftrace_startup_sysctl ( ) ;
2008-05-12 21:20:43 +02:00
} else {
2019-10-16 13:33:13 +02:00
if ( is_permanent_ops_registered ( ) ) {
ftrace_enabled = true ;
ret = - EBUSY ;
goto out ;
}
2008-05-12 21:20:43 +02:00
/* stopping ftrace calls (just send to ftrace_stub) */
ftrace_trace_function = ftrace_stub ;
ftrace_shutdown_sysctl ( ) ;
}
2019-10-16 13:33:13 +02:00
last_ftrace_enabled = ! ! ftrace_enabled ;
2008-05-12 21:20:43 +02:00
out :
2009-02-14 01:42:44 -05:00
mutex_unlock ( & ftrace_lock ) ;
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
return ret ;
2008-05-12 21:20:42 +02:00
}
2022-02-23 19:11:53 +08:00
static struct ctl_table ftrace_sysctls [ ] = {
{
. procname = " ftrace_enabled " ,
. data = & ftrace_enabled ,
. maxlen = sizeof ( int ) ,
. mode = 0644 ,
. proc_handler = ftrace_enable_sysctl ,
} ,
{ }
} ;
static int __init ftrace_sysctl_init ( void )
{
register_sysctl_init ( " kernel " , ftrace_sysctls ) ;
return 0 ;
}
late_initcall ( ftrace_sysctl_init ) ;
# endif