linux/kernel/trace
Steven Rostedt (Google) ffe3986fec ring-buffer: Only update pages_touched when a new page is touched
The "buffer_percent" logic that is used by the ring buffer splice code to
only wake up the tasks when there's no data after the buffer is filled to
the percentage of the "buffer_percent" file is dependent on three
variables that determine the amount of data that is in the ring buffer:

 1) pages_read - incremented whenever a new sub-buffer is consumed
 2) pages_lost - incremented every time a writer overwrites a sub-buffer
 3) pages_touched - incremented when a write goes to a new sub-buffer

The percentage is the calculation of:

  (pages_touched - (pages_lost + pages_read)) / nr_pages

Basically, the amount of data is the total number of sub-bufs that have been
touched, minus the number of sub-bufs lost and sub-bufs consumed. This is
divided by the total count to give the buffer percentage. When the
percentage is greater than the value in the "buffer_percent" file, it
wakes up splice readers waiting for that amount.

It was observed that over time, the amount read from the splice was
constantly decreasing the longer the trace was running. That is, if one
asked for 60%, it would read over 60% when it first starts tracing, but
then it would be woken up at under 60% and would slowly decrease the
amount of data read after being woken up, where the amount becomes much
less than the buffer percent.

This was due to an accounting of the pages_touched incrementation. This
value is incremented whenever a writer transfers to a new sub-buffer. But
the place where it was incremented was incorrect. If a writer overflowed
the current sub-buffer it would go to the next one. If it gets preempted
by an interrupt at that time, and the interrupt performs a trace, it too
will end up going to the next sub-buffer. But only one should increment
the counter. Unfortunately, that was not the case.

Change the cmpxchg() that does the real switch of the tail-page into a
try_cmpxchg(), and on success, perform the increment of pages_touched. This
will only increment the counter once for when the writer moves to a new
sub-buffer, and not when there's a race and is incremented for when a
writer and its preempting writer both move to the same new sub-buffer.

Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 2c2b0a78b3 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-04-11 17:49:57 -04:00
..
rv
blktrace.c
bpf_trace.c bpf: support deferring bpf_link dealloc to after RCU grace period 2024-03-28 18:47:45 -07:00
bpf_trace.h
error_report-traces.c
fgraph.c tracing: arm64: Avoid missing-prototype warnings 2023-07-12 12:06:04 -04:00
fprobe.c fprobe: Fix to allocate entry_data_size buffer with rethook instances 2024-03-01 09:18:24 +09:00
ftrace_internal.h tracing: arm64: Avoid missing-prototype warnings 2023-07-12 12:06:04 -04:00
ftrace.c ftrace: Fix most kernel-doc warnings 2024-03-18 10:33:05 -04:00
Kconfig tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry 2024-04-11 17:45:18 -04:00
kprobe_event_gen_test.c
Makefile tracing/probes: Move finding func-proto API and getting func-param API to trace_btf 2023-08-23 09:39:45 +09:00
pid_list.c
pid_list.h
power-traces.c
preemptirq_delay_test.c
rethook.c rethook: Use __rcu pointer for rethook::handler 2023-12-01 14:53:56 +09:00
ring_buffer_benchmark.c ring-buffer: Read and write to ring buffers with custom sub buffer size 2023-12-20 07:54:56 -05:00
ring_buffer.c ring-buffer: Only update pages_touched when a new page is touched 2024-04-11 17:49:57 -04:00
rpm-traces.c
synth_event_gen_test.c tracing / synthetic: Disable events after testing in synth_event_gen_test_init() 2023-12-21 10:04:45 -05:00
trace_benchmark.c tracing: Use div64_u64() instead of do_div() 2024-03-18 10:33:06 -04:00
trace_benchmark.h
trace_boot.c tracing: Allow creating instances with specified system events 2023-12-18 23:14:16 -05:00
trace_branch.c
trace_btf.c tracing/probes: Fix to search structure fields correctly 2024-02-17 21:25:42 +09:00
trace_btf.h tracing/probes: Add a function to search a member of a struct/union 2023-08-23 09:40:16 +09:00
trace_clock.c
trace_dynevent.c
trace_dynevent.h
trace_entries.h tracing: Add back FORTIFY_SOURCE logic to kernel_stack event structure 2023-07-30 18:11:44 -04:00
trace_eprobe.c tracing/probes: Support $argN in return probe (kprobe and fprobe) 2024-03-07 00:27:34 +09:00
trace_event_perf.c
trace_events_filter_test.h
trace_events_filter.c tracing: Have trace_event_file have ref counters 2023-11-01 23:44:44 -04:00
trace_events_hist.c tracing histograms: Simplify parse_actions() function 2024-01-08 13:24:56 -05:00
trace_events_inject.c tracing: Have event inject files inc the trace array ref count 2023-09-07 16:38:54 -04:00
trace_events_synth.c tracing/synthetic: Fix trace_string() return value 2024-02-15 11:40:01 -05:00
trace_events_trigger.c tracing: Decrement the snapshot if the snapshot trigger fails to register 2024-03-18 10:33:05 -04:00
trace_events_user.c tracing/user_events: Introduce multi-format events 2024-03-18 10:13:03 -04:00
trace_events.c tracing: hide unused ftrace_event_id_fops 2024-04-11 17:46:55 -04:00
trace_export.c tracing: Add back FORTIFY_SOURCE logic to kernel_stack event structure 2023-07-30 18:11:44 -04:00
trace_fprobe.c tracing/probes: Support $argN in return probe (kprobe and fprobe) 2024-03-07 00:27:34 +09:00
trace_functions_graph.c function_graph: Support recording and printing the return value of function 2023-06-20 18:38:37 -04:00
trace_functions.c
trace_hwlat.c tracing: Remove extra space at the end of hwlat_detector/mode 2023-09-01 21:00:00 -04:00
trace_irqsoff.c tracing: Fix memleak due to race between current_tracer and trace 2023-08-17 13:49:37 -04:00
trace_kdb.c
trace_kprobe_selftest.c tracing: arm64: Avoid missing-prototype warnings 2023-07-12 12:06:04 -04:00
trace_kprobe_selftest.h
trace_kprobe.c tracing/probes: Support $argN in return probe (kprobe and fprobe) 2024-03-07 00:27:34 +09:00
trace_mmiotrace.c
trace_nop.c
trace_osnoise.c tracing/timerlat: Move hrtimer_init to timerlat_fd open() 2024-02-01 11:50:13 -05:00
trace_output.c tracing: Remove precision vsnprintf() check from print event 2024-03-06 13:26:26 -05:00
trace_output.h
trace_preemptirq.c
trace_printk.c
trace_probe_kernel.h tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails 2023-07-14 17:04:58 +09:00
trace_probe_tmpl.h tracing/probes: Support $argN in return probe (kprobe and fprobe) 2024-03-07 00:27:34 +09:00
trace_probe.c tracing: probes: Fix to zero initialize a local variable 2024-03-25 16:24:31 +09:00
trace_probe.h tracing/probes: Support $argN in return probe (kprobe and fprobe) 2024-03-07 00:27:34 +09:00
trace_recursion_record.c
trace_sched_switch.c tracing: Move saved_cmdline code into trace_sched_switch.c 2024-03-17 07:58:53 -04:00
trace_sched_wakeup.c tracing: Fix memleak due to race between current_tracer and trace 2023-08-17 13:49:37 -04:00
trace_selftest_dynamic.c
trace_selftest.c tracing: Support to dump instance traces by ftrace_dump_on_oops 2024-03-18 10:33:06 -04:00
trace_seq.c trace_seq: Increase the buffer size to almost two pages 2023-12-18 23:14:16 -05:00
trace_stack.c
trace_stat.c
trace_stat.h
trace_synth.h
trace_syscalls.c bpf: Change syscall_nr type to int in struct syscall_tp_t 2023-10-13 12:39:36 -07:00
trace_uprobe.c tracing/probes: Support $argN in return probe (kprobe and fprobe) 2024-03-07 00:27:34 +09:00
trace.c tracing: Support to dump instance traces by ftrace_dump_on_oops 2024-03-18 10:33:06 -04:00
trace.h tracing: Add snapshot refcount 2024-03-18 10:12:47 -04:00
tracing_map.c tracing: Ensure visibility when inserting an element into tracing_map 2024-01-22 17:15:40 -05:00
tracing_map.h tracing: Remove unused extern declaration tracing_map_set_field_descr() 2023-07-23 11:08:14 -04:00