IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Passing bp_type argument to __reserve_bp_slot() and __release_bp_slot()
functions, so we can pass another bp_type than the one defined in
bp->attr.bp_type. This will be handy in following change that fixes
breakpoint slot counts during its modification.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Milind Chabbi <chabbi.milind@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20180312134548.31532-3-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pass bp_type directly as a find_slot_idx() argument,
so we don't need to have whole event to get the
breakpoint slot type. It will be used in following
changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Milind Chabbi <chabbi.milind@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20180312134548.31532-2-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since the kprobe which was optimized by jump can not change
the execution path, the kprobe for error-injection must not
be optimized. To prohibit it, set a dummy post-handler as
officially stated in Documentation/kprobes.txt.
Fixes: 4b1a29a7f542 ("error-injection: Support fault injection framework")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
There's two problems when installing cgroup events on CPUs: firstly
list_update_cgroup_event() only tries to set cpuctx->cgrp for the
first event, if that mismatches on @cgrp we'll not try again for later
additions.
Secondly, when we install a cgroup event into an active context, only
issue an event reprogram when the event matches the current cgroup
context. This avoids a pointless event reprogramming.
Signed-off-by: leilei.lin <leilei.lin@alibaba-inc.com>
[ Improved the changelog and comments. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: brendan.d.gregg@gmail.com
Cc: eranian@gmail.com
Cc: linux-kernel@vger.kernel.org
Cc: yang_oliver@hotmail.com
Link: http://lkml.kernel.org/r/20180306093637.28247-1-linxiulei@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The event schedule order (as per perf_event_sched_in()) is:
- cpu pinned
- task pinned
- cpu flexible
- task flexible
But perf_rotate_context() will unschedule cpu-flexible even if it
doesn't need a rotation.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Similar to how first programming cpu=-1 and then cpu=# is wrong, so is
rotating both. It was especially wrong when we were still programming
the PMU in this same order, because in that scenario we might never
actually end up running cpu=# events at all.
Cure this by using the active_list to pick the rotation event; since
at programming we already select the left-most event.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The last argument is, and always must be, the same.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When an event group contains more events than can be scheduled on the
hardware, iterating the full event group for ctx_sched_out is a waste
of time.
Keep track of the events that got programmed on the hardware, such
that we can iterate this smaller list in order to schedule them out.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that all the grouping is done with RB trees, we no longer need
group_entry and can replace the whole thing with sibling_list.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Scheduling in events with cpu=-1 before events with cpu=# changes
semantics and is undesirable in that it would priorize these events.
Given that groups->index is across all groups we actually have an
inter-group ordering, meaning we can merge-sort two groups, which is
just what we need to preserve semantics.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change event groups into RB trees sorted by CPU and then by a 64bit
index, so that multiplexing hrtimer interrupt handler would be able
skipping to the current CPU's list and ignore groups allocated for the
other CPUs.
New API for manipulating event groups in the trees is implemented as well
as adoption on the API in the current implementation.
pinned_group_sched_in() and flexible_group_sched_in() API are
introduced to consolidate code enabling the whole group from pinned
and flexible groups appropriately.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/372f9c8b-0cfe-4240-e44d-83d863d40813@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mark reported his arm64 perf fuzzer runs sometimes splat like:
armv8pmu_read_counter+0x1e8/0x2d8
armpmu_event_update+0x8c/0x188
armpmu_read+0xc/0x18
perf_output_read+0x550/0x11e8
perf_event_read_event+0x1d0/0x248
perf_event_exit_task+0x468/0xbb8
do_exit+0x690/0x1310
do_group_exit+0xd0/0x2b0
get_signal+0x2e8/0x17a8
do_signal+0x144/0x4f8
do_notify_resume+0x148/0x1e8
work_pending+0x8/0x14
which asserts that we only call pmu::read() on ACTIVE events.
The above callchain does:
perf_event_exit_task()
perf_event_exit_task_context()
task_ctx_sched_out() // INACTIVE
perf_event_exit_event()
perf_event_set_state(EXIT) // EXIT
sync_child_event()
perf_event_read_event()
perf_output_read()
perf_output_read_group()
leader->pmu->read()
Which results in doing a pmu::read() on an !ACTIVE event.
I _think_ this is 'new' since we added attr.inherit_stat, which added
the perf_event_read_event() to the exit path, without that
perf_event_read_output() would only trigger from samples and for
@event to trigger a sample, it's leader _must_ be ACTIVE too.
Still, adding this check makes it consistent with the @sub case for
the siblings.
Reported-and-Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf updates from Thomas Gleixner:
"Another set of perf updates:
- Fix a Skylake Uncore event format declaration
- Prevent perf pipe mode from crahsing which was caused by a missing
buffer allocation
- Make the perf top popup message which tells the user that it uses
fallback mode on older kernels a debug message.
- Make perf context rescheduling work correcctly
- Robustify the jump error drawing in perf browser mode so it does
not try to create references to NULL initialized offset entries
- Make trigger_on() robust so it does not enable the trigger before
everything is set up correctly to handle it
- Make perf auxtrace respect the --no-itrace option so it does not
try to queue AUX data for decoding.
- Prevent having different number of field separators in CVS output
lines when a counter is not supported.
- Make the perf kallsyms man page usage behave like it does for all
other perf commands.
- Synchronize the kernel headers"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix ctx_event_type in ctx_resched()
perf tools: Fix trigger class trigger_on()
perf auxtrace: Prevent decoding when --no-itrace
perf stat: Fix CVS output format for non-supported counters
tools headers: Sync x86's cpufeatures.h
tools headers: Sync copy of kvm UAPI headers
perf record: Fix crash in pipe mode
perf annotate browser: Be more robust when drawing jump arrows
perf top: Fix annoying fallback message on older kernels
perf kallsyms: Fix the usage on the man page
perf/x86/intel/uncore: Fix Skylake UPI event format
Pull locking fix from Thomas Gleixner:
"rt_mutex_futex_unlock() grew a new irq-off call site, but the function
assumes that its always called from irq enabled context.
Use (un)lock_irqsafe() to handle the new call site correctly"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites
Pull RCU updates from Paul E. McKenney:
- Miscellaneous fixes, perhaps most notably removing obsolete
code whose only purpose in life was to gather information for
the now-removed RCU debugfs facility. Other notable changes
include removing NO_HZ_FULL_ALL in favor of the nohz_full kernel
boot parameter, minor optimizations for expedited grace periods,
some added tracing, creating an RCU-specific workqueue using Tejun's
new WQ_MEM_RECLAIM flag, and several cleanups to code and comments.
- SRCU cleanups and optimizations.
- Torture-test updates, perhaps most notably the adding of ARMv8
support, but also including numerous cleanups and usability fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The replace_filter_string() frees the current string and then copies a given
string. But in the two locations that it was used, the allocation happened
right after the filter was allocated (nothing to replace). There's no need
for this to be a helper function. Embedding the allocation in the two places
where it was called will make changing the code in the future easier.
Also make the variable consistent (always use "filter_string" as the name,
as it was used in one instance as "filter_str")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
replace_system_preds() creates a filter list to free even when it doesn't
really need to have it. Only save filters that require synchronize_sched()
in the filter list to free. This will allow the code to be updated a bit
easier in the future.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The __alloc_filter() function does nothing more that allocate the filter.
There's no reason to have it as a helper function.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The filter code does open code string appending to produce an error message.
Instead it can be simplified by using trace_seq function helpers.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
There's no reason to BUG if there's a bug in the filtering code. Simply do a
warning and return.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Synthetic events can be done within the recording of other events. Notify
the ring buffer via ring_buffer_nest_start() and ring_buffer_nest_end() that
this is intended and not to block it due to its recursion protection.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The ring-buffer code has recusion protection in case tracing ends up tracing
itself, the ring-buffer will detect that it was called at the same context
(normal, softirq, interrupt or NMI), and not continue to record the event.
With the histogram synthetic events, they are called while tracing another
event at the same context. The recusion protection triggers because it
detects tracing at the same context and stops it.
Add ring_buffer_nest_start() and ring_buffer_nest_end() that will notify the
ring buffer that a trace is about to happen within another trace and that it
is intended, and not to trigger the recursion blocking.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The default clock if timestamps are used in a histogram is "global".
If timestamps aren't used, the clock is irrelevant.
Use the "clock=" param only if you want to override the default
"global" clock for a histogram with timestamps.
Link: http://lkml.kernel.org/r/427bed1389c5d22aa40c3e0683e30cc3d151e260.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Allow tracing code outside of trace.c to access tracing_set_clock().
Some applications may require a particular clock in order to function
properly, such as latency calculations.
Also, add an accessor returning the current clock string.
Link: http://lkml.kernel.org/r/6d1c53e9ee2163f54e1849f5376573f54f0e6009.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
With the addition of variables and actions, it's become necessary to
provide more detailed error information to users about syntax errors.
Add a 'last error' facility accessible via the erroring event's 'hist'
file. Reading the hist file after an error will display more detailed
information about what went wrong, if information is available. This
extended error information will be available until the next hist
trigger command for that event.
# echo xxx > /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
echo: write error: Invalid argument
# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist
ERROR: Couldn't yyy: zzz
Last command: xxx
Also add specific error messages for variable and action errors.
Link: http://lkml.kernel.org/r/64e9c422fc8aeafcc2f7a3b4328c0cffe7969129.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add support for alias=$somevar where alias can be used as
onmatch.xxx($alias).
Aliases are a way of creating a new name for an existing variable, for
flexibly in making naming more clear in certain cases. For example in
the below the user perhaps feels that using $new_lat in the synthetic
event invocation is opaque or doesn't fit well stylistically with
previous triggers, so creates an alias of $new_lat named $latency and
uses that in the call instead:
# echo 'hist:keys=next_pid:new_lat=common_timestamp.usecs' >
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger
# echo 'hist:keys=pid:latency=$new_lat:
onmatch(sched.sched_switch).wake2($latency,pid)' >
/sys/kernel/debug/tracing/events/synthetic/wake1/trigger
Link: http://lkml.kernel.org/r/ef20a65d921af3a873a6f1e8c71407c926d5586f.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The existing code only allows for one space before and after the 'if'
specifying the filter for a hist trigger. Add code to make that more
permissive as far as whitespace goes. Specifically, we want to allow
spaces in the trigger itself now that we have additional syntax
(onmatch/onmax) where spaces are more natural e.g. spaces after commas
in param lists.
Link: http://lkml.kernel.org/r/1053090c3c308d4f431accdeb59dff4b511d4554.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add an 'onmax(var).save(field,...)' hist trigger action which is
invoked whenever an event exceeds the current maximum.
The end result is that the trace event fields or variables specified
as the onmax.save() params will be saved if 'var' exceeds the current
maximum for that hist trigger entry. This allows context from the
event that exhibited the new maximum to be saved for later reference.
When the histogram is displayed, additional fields displaying the
saved values will be printed.
As an example the below defines a couple of hist triggers, one for
sched_wakeup and another for sched_switch, keyed on pid. Whenever a
sched_wakeup occurs, the timestamp is saved in the entry corresponding
to the current pid, and when the scheduler switches back to that pid,
the timestamp difference is calculated. If the resulting latency
exceeds the current maximum latency, the specified save() values are
saved:
# echo 'hist:keys=pid:ts0=common_timestamp.usecs \
if comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
# echo 'hist:keys=next_pid:\
wakeup_lat=common_timestamp.usecs-$ts0:\
onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) \
if next_comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger
When the histogram is displayed, the max value and the saved values
corresponding to the max are displayed following the rest of the
fields:
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
{ next_pid: 3728 } hitcount: 199 \
max: 123 next_comm: cyclictest prev_pid: 0 \
prev_prio: 120 prev_comm: swapper/3
{ next_pid: 3730 } hitcount: 1321 \
max: 15 next_comm: cyclictest prev_pid: 0 \
prev_prio: 120 prev_comm: swapper/1
{ next_pid: 3729 } hitcount: 1973\
max: 25 next_comm: cyclictest prev_pid: 0 \
prev_prio: 120 prev_comm: swapper/0
Totals:
Hits: 3493
Entries: 3
Dropped: 0
Link: http://lkml.kernel.org/r/006907f71b1e839bb059337ec3c496f84fcb71de.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add an 'onmatch(matching.event).<synthetic_event_name>(param list)'
hist trigger action which is invoked with the set of variables or
event fields named in the 'param list'. The result is the generation
of a synthetic event that consists of the values contained in those
variables and/or fields at the time the invoking event was hit.
As an example the below defines a simple synthetic event using a
variable defined on the sched_wakeup_new event, and shows the event
definition with unresolved fields, since the sched_wakeup_new event
with the testpid variable hasn't been defined yet:
# echo 'wakeup_new_test pid_t pid; int prio' >> \
/sys/kernel/debug/tracing/synthetic_events
# cat /sys/kernel/debug/tracing/synthetic_events
wakeup_new_test pid_t pid; int prio
The following hist trigger both defines a testpid variable and
specifies an onmatch() trace action that uses that variable along with
a non-variable field to generate a wakeup_new_test synthetic event
whenever a sched_wakeup_new event occurs, which because of the 'if
comm == "cyclictest"' filter only happens when the executable is
cyclictest:
# echo 'hist:testpid=pid:keys=$testpid:\
onmatch(sched.sched_wakeup_new).wakeup_new_test($testpid, prio) \
if comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger
Creating and displaying a histogram based on those events is now just
a matter of using the fields and new synthetic event in the
tracing/events/synthetic directory, as usual:
# echo 'hist:keys=pid,prio:sort=pid,prio' >> \
/sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger
Link: http://lkml.kernel.org/r/8c2a574bcb7530c876629c901ecd23911b14afe8.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Users should be able to directly specify event fields in hist trigger
'actions' rather than being forced to explicitly create a variable for
that purpose.
Add support allowing fields to be used directly in actions, which
essentially does just that - creates 'invisible' variables for each
bare field specified in an action. If a bare field refers to a field
on another (matching) event, it even creates a special histogram for
the purpose (since variables can't be defined on an existing histogram
after histogram creation).
Here's a simple example that demonstrates both. Basically the
onmatch() action creates a list of variables corresponding to the
parameters of the synthetic event to be generated, and then uses those
values to generate the event. So for the wakeup_latency synthetic
event 'call' below the first param, $wakeup_lat, is a variable defined
explicitly on sched_switch, where 'next_pid' is just a normal field on
sched_switch, and prio is a normal field on sched_waking.
Since the mechanism works on variables, those two normal fields just
have 'invisible' variables created internally for them. In the case of
'prio', which is on another event, we actually need to create an
additional hist trigger and define the invisible variable on that, since
once a hist trigger is defined, variables can't be added to it later.
echo 'wakeup_latency u64 lat; pid_t pid; int prio' >>
/sys/kernel/debug/tracing/synthetic_events
echo 'hist:keys=pid:ts0=common_timestamp.usecs >>
/sys/kernel/debug/tracing/events/sched/sched_waking/trigger
echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:
onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)
>> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
Link: http://lkml.kernel.org/r/8e8dcdac1ea180ed7a3689e1caeeccede9dc42b3.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Synthetic events are user-defined events generated from hist trigger
variables saved from one or more other events.
To define a synthetic event, the user writes a simple specification
consisting of the name of the new event along with one or more
variables and their type(s), to the tracing/synthetic_events file.
For instance, the following creates a new event named 'wakeup_latency'
with 3 fields: lat, pid, and prio:
# echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \
/sys/kernel/debug/tracing/synthetic_events
Reading the tracing/synthetic_events file lists all the
currently-defined synthetic events, in this case the event we defined
above:
# cat /sys/kernel/debug/tracing/synthetic_events
wakeup_latency u64 lat; pid_t pid; int prio
At this point, the synthetic event is ready to use, and a histogram
can be defined using it:
# echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \
/sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger
The new event is created under the tracing/events/synthetic/ directory
and looks and behaves just like any other event:
# ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency
enable filter format hist id trigger
Although a histogram can be defined for it, nothing will happen until
an action tracing that event via the trace_synth() function occurs.
The trace_synth() function is very similar to all the other trace_*
invocations spread throughout the kernel, except in this case the
trace_ function and its corresponding tracepoint isn't statically
generated but defined by the user at run-time.
How this can be automatically hooked up via a hist trigger 'action' is
discussed in a subsequent patch.
Link: http://lkml.kernel.org/r/c68df2284b7d172669daf9be29db62ad49bbc559.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
[fix noderef.cocci warnings, sizeof pointer for kcalloc of event->fields]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add a hook for executing extra actions whenever a histogram entry is
added or updated.
The default 'action' when a hist entry is added to a histogram is to
update the set of values associated with it. Some applications may
want to perform additional actions at that point, such as generate
another event, or compare and save a maximum.
Add a simple framework for doing that; specific actions will be
implemented on top of it in later patches.
Link: http://lkml.kernel.org/r/9482ba6a3eaf5ca6e60954314beacd0e25c05b24.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add the necessary infrastructure to allow the variables defined on one
event to be referenced in another. This allows variables set by a
previous event to be referenced and used in expressions combining the
variable values saved by that previous event and the event fields of
the current event. For example, here's how a latency can be
calculated and saved into yet another variable named 'wakeup_lat':
# echo 'hist:keys=pid,prio:ts0=common_timestamp ...
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ...
In the first event, the event's timetamp is saved into the variable
ts0. In the next line, ts0 is subtracted from the second event's
timestamp to produce the latency.
Further users of variable references will be described in subsequent
patches, such as for instance how the 'wakeup_lat' variable above can
be displayed in a latency histogram.
Link: http://lkml.kernel.org/r/b1d3e6975374e34d501ff417c20189c3f9b2c7b8.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Some accessor functions, such as for variable references, require
access to a corrsponding tracing_map_elt.
Add a tracing_map_elt param to the function signature and update the
accessor functions accordingly.
Link: http://lkml.kernel.org/r/e0f292b068e9e4948da1d5af21b5ae0efa9b5717.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Up until now, hist triggers only needed per-element support for saving
'comm' data, which was saved directly as a private data pointer.
In anticipation of the need to save other data besides 'comm', add a
new hist_elt_data struct for the purpose, and switch the current
'comm'-related code over to that.
Link: http://lkml.kernel.org/r/4502c338c965ddf5fc19fb1ec4764391e001ed4b.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add support for simple addition, subtraction, and unary expressions
(-(expr) and expr, where expr = b-a, a+b, a+b+c) to hist triggers, in
order to support a minimal set of useful inter-event calculations.
These operations are needed for calculating latencies between events
(timestamp1-timestamp0) and for combined latencies (latencies over 3
or more events).
In the process, factor out some common code from key and value
parsing.
Link: http://lkml.kernel.org/r/9a9308ead4fe32a433d9c7e95921fb798394f6b2.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
[kbuild test robot fix, add static to parse_atom()]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[ Replaced '//' comments with normal /* */ comments ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Named triggers must also have the same set of variables in order to be
considered compatible - update the trigger match test to account for
that.
The reason for this requirement is that named triggers with variables
are meant to allow one or more events to set the same variable.
Link: http://lkml.kernel.org/r/a17eae6328a99917f9d5c66129c9fcd355279ee9.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add support for saving the value of a current event's event field by
assigning it to a variable that can be read by a subsequent event.
The basic syntax for saving a variable is to simply prefix a unique
variable name not corresponding to any keyword along with an '=' sign
to any event field.
Both keys and values can be saved and retrieved in this way:
# echo 'hist:keys=next_pid:vals=$ts0:ts0=common_timestamp ...
# echo 'hist:timer_pid=common_pid:key=$timer_pid ...'
If a variable isn't a key variable or prefixed with 'vals=', the
associated event field will be saved in a variable but won't be summed
as a value:
# echo 'hist:keys=next_pid:ts1=common_timestamp:...
Multiple variables can be assigned at the same time:
# echo 'hist:keys=pid:vals=$ts0,$b,field2:ts0=common_timestamp,b=field1 ...
Multiple (or single) variables can also be assigned at the same time
using separate assignments:
# echo 'hist:keys=pid:vals=$ts0:ts0=common_timestamp:b=field1:c=field2 ...
Variables set as above can be used by being referenced from another
event, as described in a subsequent patch.
Link: http://lkml.kernel.org/r/fc93c4944d9719dbcb1d0067be627d44e98e2adc.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Baohong Liu <baohong.liu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Appending .usecs onto a common_timestamp field will cause the
timestamp value to be in microseconds instead of the default
nanoseconds. A typical latency histogram using usecs would look like
this:
# echo 'hist:keys=pid,prio:ts0=common_timestamp.usecs ...
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0 ...
This also adds an external trace_clock_in_ns() to trace.c for the
timestamp conversion.
Link: http://lkml.kernel.org/r/4e813705a170b3e13e97dc3135047362fb1a39f3.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In order to allow information to be passed between trace events, add
support for per-element variables to tracing_map. This provides a
means for histograms to associate a value or values with an entry when
it's saved or updated, and retrieved by a subsequent event occurrences.
Variables can be set using tracing_map_set_var() and read using
tracing_map_read_var(). tracing_map_var_set() returns true or false
depending on whether or not the variable has been set or not, which is
important for event-matching applications.
tracing_map_read_var_once() reads the variable and resets it to the
'unset' state, implementing read-once variables, which are also
important for event-matching uses.
Link: http://lkml.kernel.org/r/7fa001108252556f0c6dd9d63145eabfe3370d1a.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add support for a timestamp event field. This is actually a 'pseudo-'
event field in that it behaves like it's part of the event record, but
is really part of the corresponding ring buffer event.
To make use of the timestamp field, users can specify
"common_timestamp" as a field name for any histogram. Note that this
doesn't make much sense on its own either as either a key or value,
but needs to be supported even so, since follow-on patches will add
support for making use of this field in time deltas. The
common_timestamp 'field' is not a bona fide event field - so you won't
find it in the event description - but rather it's a synthetic field
that can be used like a real field.
Note that the use of this field requires the ring buffer be put into
'absolute timestamp' mode, which saves the complete timestamp for each
event rather than an offset. This mode will be enabled if and only if
a histogram makes use of the "common_timestamp" field.
Link: http://lkml.kernel.org/r/97afbd646ed146e26271f3458b4b33e16d7817c2.1516069914.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Baohong Liu <baohong.liu@intel.com>
[kasan use-after-free fix]
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>