Commit Graph

31935 Commits

Author SHA1 Message Date
Steven Rostedt (VMware)
34423f250a tracing: Fix uninitialized buffer var on early exit to trace_vbprintk()
If we exit due to a bad input to trace_printk() (highly unlikely), then the
buffer variable will not be initialized when we unnest the ring buffer.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-22 06:44:50 -05:00
Dan Carpenter
532f49a6f1 tracing/boot: Fix an IS_ERR() vs NULL bug
The trace_array_get_by_name() function doesn't return error pointers,
it returns NULL on error.

Link: http://lkml.kernel.org/r/20200117053007.5h2juv272pokqhtq@kili.mountain

Fixes: 4f712a4d04 ("tracing/boot: Add instance node support")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-21 18:41:39 -05:00
Alex Shi
141597204e tracing: Remove unused TRACE_SEQ_BUF_USED
This macro isn't used from commit 3a161d99c4 ("tracing: Create
seq_buf layer in trace_seq"). so no needs to keep it.

Link: http://lkml.kernel.org/r/1579586086-45543-1-git-send-email-alex.shi@linux.alibaba.com

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-21 18:39:54 -05:00
Alex Shi
b83479482f ring-buffer: Remove abandoned macro RB_MISSED_FLAGS
This macro isn't used since commit d325c40296 ("ring-buffer: Remove
unused function ring_buffer_page_len()"), so better to remove it.

Link: http://lkml.kernel.org/r/1579586080-45300-1-git-send-email-alex.shi@linux.alibaba.com

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-21 18:38:02 -05:00
Alex Shi
aff4866db5 ftrace: Remove NR_TO_INIT macro
This macro isn't used from commit cb7be3b2fc ("ftrace: remove
daemon"). So no needs to keep it.

Link: http://lkml.kernel.org/r/1579586063-44984-1-git-send-email-alex.shi@linux.alibaba.com

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-21 17:30:39 -05:00
Alex Shi
9a09cd74e7 ftrace: Remove abandoned macros
These 2 macros aren't used from commit eee8ded131 ("ftrace: Have the
function probes call their own function"), so remove them.

Link: http://lkml.kernel.org/r/1579585807-43316-1-git-send-email-alex.shi@linux.alibaba.com

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-21 17:28:35 -05:00
Steven Rostedt (VMware)
82d1b8158c tracing: Allow trace_printk() to nest in other tracing code
trace_printk() is used to debug the kernel which includes the tracing
infrastructure. But because it writes to the ring buffer, and so does much
of the tracing infrastructure, the ring buffer's recursive detection will
drop writes to the ring buffer that is in the same context as the current
write is happening (it allows interrupts to write when normal context is
writing, but wont let normal context write while normal context is writing).

This can cause confusion and think that the code is where the trace_printk()
exists is not hit. To solve this, up the recursive nesting of the ring
buffer when trace_printk() is called before it writes to the buffer itself.

Note, this does make it dangerous to use trace_printk() in the ring buffer
code itself, because this basically disables the recursion protection of
trace_printk() buffer writes. But as trace_printk() is only used for
debugging, and if this does occur, the developer will see the cause real
quick (recursive blowing up of the stack). Thus the developer can deal with
that. But having trace_printk() silently ignored is a much bigger problem,
and disabling recursive protection is a small price to pay to fix it.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-16 08:20:18 -05:00
Steven Rostedt (VMware)
cfc585a401 ring-buffer: Fix kernel doc for rb_update_event()
rb_update_event has changed without the kernel-doc update.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-14 16:27:51 -05:00
Fabian Frederick
59e7cffe5c ring-bufer: kernel-doc warning fixes
Also fixes a couple of typos

Link: http://lkml.kernel.org/r/1401992525-10417-1-git-send-email-fabf@skynet.be

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
[ Found this deep in the abyss of my INBOX ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-14 16:23:34 -05:00
Masami Hiramatsu
3b42a4c83a tracing: trigger: Replace unneeded RCU-list traversals
With CONFIG_PROVE_RCU_LIST, I had many suspicious RCU warnings
when I ran ftracetest trigger testcases.

-----
  # dmesg -c > /dev/null
  # ./ftracetest test.d/trigger
  ...
  # dmesg | grep "RCU-list traversed" | cut -f 2 -d ] | cut -f 2 -d " "
  kernel/trace/trace_events_hist.c:6070
  kernel/trace/trace_events_hist.c:1760
  kernel/trace/trace_events_hist.c:5911
  kernel/trace/trace_events_trigger.c:504
  kernel/trace/trace_events_hist.c:1810
  kernel/trace/trace_events_hist.c:3158
  kernel/trace/trace_events_hist.c:3105
  kernel/trace/trace_events_hist.c:5518
  kernel/trace/trace_events_hist.c:5998
  kernel/trace/trace_events_hist.c:6019
  kernel/trace/trace_events_hist.c:6044
  kernel/trace/trace_events_trigger.c:1500
  kernel/trace/trace_events_trigger.c:1540
  kernel/trace/trace_events_trigger.c:539
  kernel/trace/trace_events_trigger.c:584
-----

I investigated those warnings and found that the RCU-list
traversals in event trigger and hist didn't need to use
RCU version because those were called only under event_mutex.

I also checked other RCU-list traversals related to event
trigger list, and found that most of them were called from
event_hist_trigger_func() or hist_unregister_trigger() or
register/unregister functions except for a few cases.

Replace these unneeded RCU-list traversals with normal list
traversal macro and lockdep_assert_held() to check the
event_mutex is held.

Link: http://lkml.kernel.org/r/157680910305.11685.15110237954275915782.stgit@devnote2

Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 15:59:11 -05:00
Masami Hiramatsu
fe1efe9252 tracing/boot: Add function tracer filter options
Add below function-tracer filter options to boot-time tracing.

 - ftrace.[instance.INSTANCE.]ftrace.filters
   This will take an array of tracing function filter rules

 - ftrace.[instance.INSTANCE.]ftrace.notraces
   This will take an array of NON-tracing function filter rules

Link: http://lkml.kernel.org/r/157867244841.17873.10933616628243103561.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:42 -05:00
Masami Hiramatsu
9d15dbbde1 tracing/boot: Add cpu_mask option support
Add ftrace.cpumask option support to boot-time tracing.
This sets cpumask for each instance.

 - ftrace.[instance.INSTANCE.]cpumask = CPUMASK;
   Set the trace cpumask. Note that the CPUMASK should be a string
   which <tracefs>/tracing_cpumask can accepts.

Link: http://lkml.kernel.org/r/157867243625.17873.13613922641273149372.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:42 -05:00
Masami Hiramatsu
4f712a4d04 tracing/boot: Add instance node support
Add instance node support to boot-time tracing. User can set
some options and event nodes under instance node.

 - ftrace.instance.INSTANCE[...]
   Add new INSTANCE instance. Some options and event nodes
   are acceptable for instance node.

Link: http://lkml.kernel.org/r/157867242413.17873.9814204526141500278.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:42 -05:00
Masami Hiramatsu
3fbe2d6e1f tracing/boot: Add synthetic event support
Add synthetic event node support to boot time tracing.
The synthetic event is a kind of event node, but the group
name is "synthetic".

 - ftrace.event.synthetic.EVENT.fields = FIELD[, FIELD2...]
   Defines new synthetic event with FIELDs. Each field should be
   "type varname".

The synthetic node requires "fields" string arraies, which defines
the fields as same as tracing/synth_events interface.

Link: http://lkml.kernel.org/r/157867241236.17873.12411615143321557709.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:42 -05:00
Masami Hiramatsu
4d655281eb tracing/boot Add kprobe event support
Add kprobe event support on event node to boot-time tracing.
If the group name of event is "kprobes", the boot-time tracing
defines new probe event according to "probes" values.

 - ftrace.event.kprobes.EVENT.probes = PROBE[, PROBE2...]
   Defines new kprobe event based on PROBEs. It is able to define
   multiple probes on one event, but those must have same type of
   arguments.

For example,

 ftrace.events.kprobes.myevent {
	probes = "vfs_read $arg1 $arg2";
	enable;
 }

This will add kprobes:myevent on vfs_read with the 1st and the 2nd
arguments.

Link: http://lkml.kernel.org/r/157867240104.17873.9712052065426433111.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:42 -05:00
Masami Hiramatsu
81a59555ff tracing/boot: Add per-event settings
Add per-event settings for boottime tracing. User can set filter,
actions and enable on each event on boot. The event entries are
under ftrace.event.GROUP.EVENT node (note that the option key
includes event's group name and event name.) This supports below
configs.

 - ftrace.event.GROUP.EVENT.enable
   Enables GROUP:EVENT tracing.

 - ftrace.event.GROUP.EVENT.filter = FILTER
   Set FILTER rule to the GROUP:EVENT.

 - ftrace.event.GROUP.EVENT.actions = ACTION[, ACTION2...]
   Set ACTIONs to the GROUP:EVENT.

For example,

  ftrace.event.sched.sched_process_exec {
                filter = "pid < 128"
		enable
  }

this will enable tracing "sched:sched_process_exec" event
with "pid < 128" filter.

Link: http://lkml.kernel.org/r/157867238942.17873.11177628789184546198.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:41 -05:00
Masami Hiramatsu
9c5b9d3d65 tracing/boot: Add boot-time tracing
Setup tracing options via extra boot config in addition to kernel
command line.

This adds following commands support. These are applied to
the global trace instance.

 - ftrace.options = OPT1[,OPT2...]
   Enable given ftrace options.

 - ftrace.trace_clock = CLOCK
   Set given CLOCK to ftrace's trace_clock.

 - ftrace.buffer_size = SIZE
   Configure ftrace buffer size to SIZE. You can use "KB" or "MB"
   for that SIZE.

 - ftrace.events = EVENT[, EVENT2...]
   Enable given events on boot. You can use a wild card in EVENT.

 - ftrace.tracer = TRACER
   Set TRACER to current tracer on boot. (e.g. function)

Note that this is NOT replacing the kernel parameters, because
this boot config based setting is later than that. If you want to
trace earlier boot events, you still need kernel parameters.

Link: http://lkml.kernel.org/r/157867237723.17873.17494943526320587488.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:41 -05:00
Masami Hiramatsu
48ac9488a5 tracing: Add NULL trace-array check in print_synth_event()
Add NULL trace-array check in print_synth_event(), because
if we enable tp_printk option, iter->tr can be NULL.

Link: http://lkml.kernel.org/r/157867236536.17873.12529350542460184019.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:41 -05:00
Masami Hiramatsu
b05e89ae7c tracing: Accept different type for synthetic event fields
Make the synthetic event accepts a different type field to record.
However, the size and signed flag must be same.

Link: http://lkml.kernel.org/r/157867235358.17873.61732996461602171.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:41 -05:00
Masami Hiramatsu
d8d4c6d0e7 tracing: kprobes: Register to dynevent earlier stage
Register kprobe event to dynevent in subsys_initcall level.
This will allow kernel to register new kprobe events in
fs_initcall level via trace_run_command.

Link: http://lkml.kernel.org/r/157867234213.17873.18039000024374948737.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:41 -05:00
Masami Hiramatsu
8cfcf15503 tracing: kprobes: Output kprobe event to printk buffer
Since kprobe-events use event_trigger_unlock_commit_regs() directly,
that events doesn't show up in printk buffer if "tp_printk" is set.

Use trace_event_buffer_commit() in kprobe events so that it can
invoke output_printk() as same as other trace events.

Link: http://lkml.kernel.org/r/157867233085.17873.5210928676787339604.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[ Adjusted data var declaration placement in __kretprobe_trace_func() ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:40 -05:00
Masami Hiramatsu
d8d0c245a7 tracing: Apply soft-disabled and filter to tracepoints printk
Apply soft-disabled and the filter rule of the trace events to
the printk output of tracepoints (a.k.a. tp_printk kernel parameter)
as same as trace buffer output.

Link: http://lkml.kernel.org/r/157867231876.17873.15825819592284704068.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:40 -05:00
Steven Rostedt (VMware)
1329249437 tracing: Make struct ring_buffer less ambiguous
As there's two struct ring_buffers in the kernel, it causes some confusion.
The other one being the perf ring buffer. It was agreed upon that as neither
of the ring buffers are generic enough to be used globally, they should be
renamed as:

   perf's ring_buffer -> perf_buffer
   ftrace's ring_buffer -> trace_buffer

This implements the changes to the ring buffer that ftrace uses.

Link: https://lore.kernel.org/r/20191213140531.116b3200@gandalf.local.home

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:38 -05:00
Steven Rostedt (VMware)
1c5eb4481e tracing: Rename trace_buffer to array_buffer
As we are working to remove the generic "ring_buffer" name that is used by
both tracing and perf, the ring_buffer name for tracing will be renamed to
trace_buffer, and perf's ring buffer will be renamed to perf_buffer.

As there already exists a trace_buffer that is used by the trace_arrays, it
needs to be first renamed to array_buffer.

Link: https://lore.kernel.org/r/20191213153553.GE20583@krava

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:38 -05:00
Steven Rostedt (VMware)
56de4e8f91 perf: Make struct ring_buffer less ambiguous
eBPF requires needing to know the size of the perf ring buffer structure.
But it unfortunately has the same name as the generic ring buffer used by
tracing and oprofile. To make it less ambiguous, rename the perf ring buffer
structure to "perf_buffer".

As other parts of the ring buffer code has "perf_" as the prefix, it only
makes sense to give the ring buffer the "perf_" prefix as well.

Link: https://lore.kernel.org/r/20191213153553.GE20583@krava
Acked-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:38 -05:00
Linus Torvalds
606e9ad200 clone3-tls-v5.5-rc6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXhhtDQAKCRCRxhvAZXjc
 orQ3AQD7H2ovZbPIpWbwOnRIExBF4O8gPDfFc/J/RweZx40v/AD/QwfFnq0TpmUc
 UfS4zzLxJ4K+L4RYWId5v8MFHGIu8QQ=
 =LmmJ
 -----END PGP SIGNATURE-----

Merge tag 'clone3-tls-v5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull thread fixes from Christian Brauner:
 "This contains a series of patches to fix CLONE_SETTLS when used with
  clone3().

  The clone3() syscall passes the tls argument through struct clone_args
  instead of a register. This means, all architectures that do not
  implement copy_thread_tls() but still support CLONE_SETTLS via
  copy_thread() expecting the tls to be located in a register argument
  based on clone() are currently unfortunately broken. Their tls value
  will be garbage.

  The patch series fixes this on all architectures that currently define
  __ARCH_WANT_SYS_CLONE3. It also adds a compile-time check to ensure
  that any architecture that enables clone3() in the future is forced to
  also implement copy_thread_tls().

  My ultimate goal is to get rid of the copy_thread()/copy_thread_tls()
  split and just have copy_thread_tls() at some point in the not too
  distant future (Maybe even renaming copy_thread_tls() back to simply
  copy_thread() once the old function is ripped from all arches). This
  is dependent now on all arches supporting clone3().

  While all relevant arches do that now there are still four missing:
  ia64, m68k, sh and sparc. They have the system call reserved, but not
  implemented. Once they all implement clone3() we can get rid of
  ARCH_WANT_SYS_CLONE3 and HAVE_COPY_THREAD_TLS.

  This series also includes a minor fix for the arm64 uapi headers which
  caused __NR_clone3 to be missing from the exported user headers.

  Unfortunately the series came in a little late especially given that
  it touches a range of architectures. Due to the holidays not all arch
  maintainers responded in time probably due to their backlog. Will and
  Arnd have thankfully acked the arm specific changes.

  Given that the changes are straightforward and rather minimal combined
  with the fact the that clone3() with CLONE_SETTLS is broken I decided
  to send them post rc3 nonetheless"

* tag 'clone3-tls-v5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  um: Implement copy_thread_tls
  clone3: ensure copy_thread_tls is implemented
  xtensa: Implement copy_thread_tls
  riscv: Implement copy_thread_tls
  parisc: Implement copy_thread_tls
  arm: Implement copy_thread_tls
  arm64: Implement copy_thread_tls
  arm64: Move __ARCH_WANT_SYS_CLONE3 definition to uapi headers
2020-01-11 15:33:48 -08:00
Linus Torvalds
a5f48c7878 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Missing netns pointer init in arp_tables, from Florian Westphal.

 2) Fix normal tcp SACK being treated as D-SACK, from Pengcheng Yang.

 3) Fix divide by zero in sch_cake, from Wen Yang.

 4) Len passed to skb_put_padto() is wrong in qrtr code, from Carl
    Huang.

 5) cmd->obj.chunk is leaked in sctp code error paths, from Xin Long.

 6) cgroup bpf programs can be released out of order, fix from Roman
    Gushchin.

 7) Make sure stmmac debugfs entry name is changed when device name
    changes, from Jiping Ma.

 8) Fix memory leak in vlan_dev_set_egress_priority(), from Eric
    Dumazet.

 9) SKB leak in lan78xx usb driver, also from Eric Dumazet.

10) Ridiculous TCA_FQ_QUANTUM values configured can cause loops in fq
    packet scheduler, reject them. From Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
  tipc: fix wrong connect() return code
  tipc: fix link overflow issue at socket shutdown
  netfilter: ipset: avoid null deref when IPSET_ATTR_LINENO is present
  netfilter: conntrack: dccp, sctp: handle null timeout argument
  atm: eni: fix uninitialized variable warning
  macvlan: do not assume mac_header is set in macvlan_broadcast()
  net: sch_prio: When ungrafting, replace with FIFO
  mlxsw: spectrum_qdisc: Ignore grafting of invisible FIFO
  MAINTAINERS: Remove myself as co-maintainer for qcom-ethqos
  gtp: fix bad unlock balance in gtp_encap_enable_socket
  pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM
  tipc: remove meaningless assignment in Makefile
  tipc: do not add socket.o to tipc-y twice
  net: stmmac: dwmac-sun8i: Allow all RGMII modes
  net: stmmac: dwmac-sunxi: Allow all RGMII modes
  net: usb: lan78xx: fix possible skb leak
  net: stmmac: Fixed link does not need MDIO Bus
  vlan: vlan_changelink() should propagate errors
  vlan: fix memory leak in vlan_dev_set_egress_priority
  stmmac: debugfs entry name is not be changed when udev rename device name.
  ...
2020-01-09 10:34:07 -08:00
Amanieu d'Antras
dd499f7a7e
clone3: ensure copy_thread_tls is implemented
copy_thread implementations handle CLONE_SETTLS by reading the TLS
value from the registers containing the syscall arguments for
clone. This doesn't work with clone3 since the TLS value is passed
in clone_args instead.

Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: <stable@vger.kernel.org> # 5.3.x
Link: https://lore.kernel.org/r/20200102172413.654385-8-amanieu@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-07 13:31:27 +01:00
Linus Torvalds
ae6088216c Various tracing fixes:
- kbuild found missing define of MCOUNT_INSN_SIZE for various build configs
  - Initialize variable to zero as gcc thinks it is used undefined
     (it really isn't but the code is subtle enough that this doesn't hurt)
  - Convert from do_div() to div64_ull() to prevent potential divide by zero
  - Unregister a trace point on error path in sched_wakeup tracer
  - Use signed offset for archs that can have stext not be first
  - A simple indentation fix (whitespace error)
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXhOj6xQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qukzAQCMNfkAbMFA+C1uORMhr/jWhi4eshWN
 4jZ2u5X8zGuuXQD+PaQU4n8d0K4uCPF+lFD16DfFxXvCOXHfN3/zXmxGvw8=
 =djaW
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Various tracing fixes:

   - kbuild found missing define of MCOUNT_INSN_SIZE for various build
     configs

   - Initialize variable to zero as gcc thinks it is used undefined (it
     really isn't but the code is subtle enough that this doesn't hurt)

   - Convert from do_div() to div64_ull() to prevent potential divide by
     zero

   - Unregister a trace point on error path in sched_wakeup tracer

   - Use signed offset for archs that can have stext not be first

   - A simple indentation fix (whitespace error)"

* tag 'trace-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix indentation issue
  kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
  tracing: Change offset type to s32 in preempt/irq tracepoints
  ftrace: Avoid potential division by zero in function profiler
  tracing: Have stack tracer compile when MCOUNT_INSN_SIZE is not defined
  tracing: Define MCOUNT_INSN_SIZE when not defined without direct calls
  tracing: Initialize val to zero in parse_entry of inject code
2020-01-06 15:38:38 -08:00
Daniel Borkmann
6d4f151acf bpf: Fix passing modified ctx to ld/abs/ind instruction
Anatoly has been fuzzing with kBdysch harness and reported a KASAN
slab oob in one of the outcomes:

  [...]
  [   77.359642] BUG: KASAN: slab-out-of-bounds in bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.360463] Read of size 4 at addr ffff8880679bac68 by task bpf/406
  [   77.361119]
  [   77.361289] CPU: 2 PID: 406 Comm: bpf Not tainted 5.5.0-rc2-xfstests-00157-g2187f215eba #1
  [   77.362134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  [   77.362984] Call Trace:
  [   77.363249]  dump_stack+0x97/0xe0
  [   77.363603]  print_address_description.constprop.0+0x1d/0x220
  [   77.364251]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.365030]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.365860]  __kasan_report.cold+0x37/0x7b
  [   77.366365]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.366940]  kasan_report+0xe/0x20
  [   77.367295]  bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.367821]  ? bpf_skb_load_helper_8+0xf0/0xf0
  [   77.368278]  ? mark_lock+0xa3/0x9b0
  [   77.368641]  ? kvm_sched_clock_read+0x14/0x30
  [   77.369096]  ? sched_clock+0x5/0x10
  [   77.369460]  ? sched_clock_cpu+0x18/0x110
  [   77.369876]  ? bpf_skb_load_helper_8+0xf0/0xf0
  [   77.370330]  ___bpf_prog_run+0x16c0/0x28f0
  [   77.370755]  __bpf_prog_run32+0x83/0xc0
  [   77.371153]  ? __bpf_prog_run64+0xc0/0xc0
  [   77.371568]  ? match_held_lock+0x1b/0x230
  [   77.371984]  ? rcu_read_lock_held+0xa1/0xb0
  [   77.372416]  ? rcu_is_watching+0x34/0x50
  [   77.372826]  sk_filter_trim_cap+0x17c/0x4d0
  [   77.373259]  ? sock_kzfree_s+0x40/0x40
  [   77.373648]  ? __get_filter+0x150/0x150
  [   77.374059]  ? skb_copy_datagram_from_iter+0x80/0x280
  [   77.374581]  ? do_raw_spin_unlock+0xa5/0x140
  [   77.375025]  unix_dgram_sendmsg+0x33a/0xa70
  [   77.375459]  ? do_raw_spin_lock+0x1d0/0x1d0
  [   77.375893]  ? unix_peer_get+0xa0/0xa0
  [   77.376287]  ? __fget_light+0xa4/0xf0
  [   77.376670]  __sys_sendto+0x265/0x280
  [   77.377056]  ? __ia32_sys_getpeername+0x50/0x50
  [   77.377523]  ? lock_downgrade+0x350/0x350
  [   77.377940]  ? __sys_setsockopt+0x2a6/0x2c0
  [   77.378374]  ? sock_read_iter+0x240/0x240
  [   77.378789]  ? __sys_socketpair+0x22a/0x300
  [   77.379221]  ? __ia32_sys_socket+0x50/0x50
  [   77.379649]  ? mark_held_locks+0x1d/0x90
  [   77.380059]  ? trace_hardirqs_on_thunk+0x1a/0x1c
  [   77.380536]  __x64_sys_sendto+0x74/0x90
  [   77.380938]  do_syscall_64+0x68/0x2a0
  [   77.381324]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [   77.381878] RIP: 0033:0x44c070
  [...]

After further debugging, turns out while in case of other helper functions
we disallow passing modified ctx, the special case of ld/abs/ind instruction
which has similar semantics (except r6 being the ctx argument) is missing
such check. Modified ctx is impossible here as bpf_skb_load_helper_8_no_cache()
and others are expecting skb fields in original position, hence, add
check_ctx_reg() to reject any modified ctx. Issue was first introduced back
in f1174f77b5 ("bpf/verifier: rework value tracking").

Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200106215157.3553-1-daniel@iogearbox.net
2020-01-06 14:19:47 -08:00
Roman Gushchin
e10360f815 bpf: cgroup: prevent out-of-order release of cgroup bpf
Before commit 4bfc0bb2c6 ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
cgroup bpf structures were released with
corresponding cgroup structures. It guaranteed the hierarchical order
of destruction: children were always first. It preserved attached
programs from being released before their propagated copies.

But with cgroup auto-detachment there are no such guarantees anymore:
cgroup bpf is released as soon as the cgroup is offline and there are
no live associated sockets. It means that an attached program can be
detached and released, while its propagated copy is still living
in the cgroup subtree. This will obviously lead to an use-after-free
bug.

To reproduce the issue the following script can be used:

  #!/bin/bash

  CGROOT=/sys/fs/cgroup

  mkdir -p ${CGROOT}/A ${CGROOT}/B ${CGROOT}/A/C
  sleep 1

  ./test_cgrp2_attach ${CGROOT}/A egress &
  A_PID=$!
  ./test_cgrp2_attach ${CGROOT}/B egress &
  B_PID=$!

  echo $$ > ${CGROOT}/A/C/cgroup.procs
  iperf -s &
  S_PID=$!
  iperf -c localhost -t 100 &
  C_PID=$!

  sleep 1

  echo $$ > ${CGROOT}/B/cgroup.procs
  echo ${S_PID} > ${CGROOT}/B/cgroup.procs
  echo ${C_PID} > ${CGROOT}/B/cgroup.procs

  sleep 1

  rmdir ${CGROOT}/A/C
  rmdir ${CGROOT}/A

  sleep 1

  kill -9 ${S_PID} ${C_PID} ${A_PID} ${B_PID}

On the unpatched kernel the following stacktrace can be obtained:

[   33.619799] BUG: unable to handle page fault for address: ffffbdb4801ab002
[   33.620677] #PF: supervisor read access in kernel mode
[   33.621293] #PF: error_code(0x0000) - not-present page
[   33.622754] Oops: 0000 [#1] SMP NOPTI
[   33.623202] CPU: 0 PID: 601 Comm: iperf Not tainted 5.5.0-rc2+ #23
[   33.625545] RIP: 0010:__cgroup_bpf_run_filter_skb+0x29f/0x3d0
[   33.635809] Call Trace:
[   33.636118]  ? __cgroup_bpf_run_filter_skb+0x2bf/0x3d0
[   33.636728]  ? __switch_to_asm+0x40/0x70
[   33.637196]  ip_finish_output+0x68/0xa0
[   33.637654]  ip_output+0x76/0xf0
[   33.638046]  ? __ip_finish_output+0x1c0/0x1c0
[   33.638576]  __ip_queue_xmit+0x157/0x410
[   33.639049]  __tcp_transmit_skb+0x535/0xaf0
[   33.639557]  tcp_write_xmit+0x378/0x1190
[   33.640049]  ? _copy_from_iter_full+0x8d/0x260
[   33.640592]  tcp_sendmsg_locked+0x2a2/0xdc0
[   33.641098]  ? sock_has_perm+0x10/0xa0
[   33.641574]  tcp_sendmsg+0x28/0x40
[   33.641985]  sock_sendmsg+0x57/0x60
[   33.642411]  sock_write_iter+0x97/0x100
[   33.642876]  new_sync_write+0x1b6/0x1d0
[   33.643339]  vfs_write+0xb6/0x1a0
[   33.643752]  ksys_write+0xa7/0xe0
[   33.644156]  do_syscall_64+0x5b/0x1b0
[   33.644605]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by grabbing a reference to the bpf structure of each ancestor
on the initialization of the cgroup bpf structure, and dropping the
reference at the end of releasing the cgroup bpf structure.

This will restore the hierarchical order of cgroup bpf releasing,
without adding any operations on hot paths.

Thanks to Josef Bacik for the debugging and the initial analysis of
the problem.

Fixes: 4bfc0bb2c6 ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
Reported-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-01-06 14:00:30 -08:00
Shakeel Butt
84029fd04c memcg: account security cred as well to kmemcg
The cred_jar kmem_cache is already memcg accounted in the current kernel
but cred->security is not.  Account cred->security to kmemcg.

Recently we saw high root slab usage on our production and on further
inspection, we found a buggy application leaking processes.  Though that
buggy application was contained within its memcg but we observe much
more system memory overhead, couple of GiBs, during that period.  This
overhead can adversely impact the isolation on the system.

One source of high overhead we found was cred->security objects, which
have a lifetime of at least the life of the process which allocated
them.

Link: http://lkml.kernel.org/r/20191205223721.40034-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Colin Ian King
72879ee0c5 tracing: Fix indentation issue
There is a declaration that is indented one level too deeply, remove
the extraneous tab.

Link: http://lkml.kernel.org/r/20191221154825.33073-1-colin.king@canonical.com

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-03 15:20:46 -05:00
Linus Torvalds
d9c82fd8c8 for-linus-2020-01-03
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXg9C5wAKCRCRxhvAZXjc
 oiZXAPsGFXyDCWlKnShBpKufdFh6XugADlyZK0Si2ISWQoJJsgD/Ri1g3zg6V7YC
 HBG0sz8+vSk/Ys55yDQz+K1d1MTkdQ4=
 =8uQe
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2020-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull thread fixes from Christian Brauner:
 "Here are two fixes:

   - Panic earlier when global init exits to generate useable coredumps.

     Currently, when global init and all threads in its thread-group
     have exited we panic via:

       do_exit()
       -> exit_notify()
          -> forget_original_parent()
             -> find_child_reaper()

     This makes it hard to extract a useable coredump for global init
     from a kernel crashdump because by the time we panic exit_mm() will
     have already released global init's mm. We now panic slightly
     earlier. This has been a problem in certain environments such as
     Android.

   - Fix a race in assigning and reading taskstats for thread-groups
     with more than one thread.

     This patch has been waiting for quite a while since people
     disagreed on what the correct fix was at first"

* tag 'for-linus-2020-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  exit: panic before exit_mm() on global init exit
  taskstats: fix data-race
2020-01-03 11:17:14 -08:00
Kaitao Cheng
50f9ad607e kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
In the function, if register_trace_sched_migrate_task() returns error,
sched_switch/sched_wakeup_new/sched_wakeup won't unregister. That is
why fail_deprobe_sched_switch was added.

Link: http://lkml.kernel.org/r/20191231133530.2794-1-pilgrimtao@gmail.com

Cc: stable@vger.kernel.org
Fixes: 478142c39c ("tracing: do not grab lock in wakeup latency function tracing")
Signed-off-by: Kaitao Cheng <pilgrimtao@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-03 11:43:03 -05:00
Wen Yang
e31f7939c1 ftrace: Avoid potential division by zero in function profiler
The ftrace_profile->counter is unsigned long and
do_div truncates it to 32 bits, which means it can test
non-zero and be truncated to zero for division.
Fix this issue by using div64_ul() instead.

Link: http://lkml.kernel.org/r/20200103030248.14516-1-wenyang@linux.alibaba.com

Cc: stable@vger.kernel.org
Fixes: e330b3bcd8 ("tracing: Show sample std dev in function profiling")
Fixes: 34886c8bc5 ("tracing: add average time in function to function profiler")
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-02 22:14:57 -05:00
Steven Rostedt (VMware)
b8299d362d tracing: Have stack tracer compile when MCOUNT_INSN_SIZE is not defined
On some archs with some configurations, MCOUNT_INSN_SIZE is not defined, and
this makes the stack tracer fail to compile. Just define it to zero in this
case.

Link: https://lore.kernel.org/r/202001020219.zvE3vsty%lkp@intel.com

Cc: stable@vger.kernel.org
Fixes: 4df297129f ("tracing: Remove most or all of stack tracer stack size from stack_max_size")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-02 22:04:07 -05:00
Steven Rostedt (VMware)
d2ccbccb54 tracing: Define MCOUNT_INSN_SIZE when not defined without direct calls
In order to handle direct calls along side of function graph tracer, a check
is made to see if the address being traced by the function graph tracer is a
direct call or not. To get the address used by direct callers, the return
address is subtracted by MCOUNT_INSN_SIZE.

For some archs with certain configurations, MCOUNT_INSN_SIZE is undefined
here. But these should not be using direct calls anyway. Just define
MCOUNT_INSN_SIZE to zero in this case.

Link: https://lore.kernel.org/r/202001020219.zvE3vsty%lkp@intel.com

Reported-by: kbuild test robot <lkp@intel.com>
Fixes: ff205766db ("ftrace: Fix function_graph tracer interaction with BPF trampoline")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-02 21:56:44 -05:00
Linus Torvalds
bf6dd9a58e Fixes for seccomp_notify_ioctl uapi sanity
- Fix samples and selftests to zero passed-in buffer (Sargun Dhillon)
 - Enforce zeroed buffer checking (Sargun Dhillon)
 - Verify buffer sanity check in selftest (Sargun Dhillon)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl4OX5wWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJtJZD/4iLG7mOUQNXdcPidjcIMO/tjST
 UzW+9Cb3buePgmCHO9v1TKGL29fVwP5TkuxdrBYDGrJ4rEYANSDX0aNmpHsO8/8M
 2/B/Lo/f9cxFgoKI4QLY2XZ1YR+zkH980mtIG7ZcpYjsNl5AwmT27m2lo6iE7J+x
 7rsaTRPFmUfgbblB6Z5gNwwATudrWJgq066lY2fg3GADP81s6lGQB+ul8rtu84ME
 mTvtb3w6piJb3E+DeYY8p4ykyiewDuYqZWDY+dvWi3kRDjNWX+yFJaPW0YNhM+yh
 HaMXnbuh6gDyCbeUHorC9ypQhJJKzEWCUW8e60BND+fOFCdKMa1AdCtlXWHjrXDQ
 x9hUgQ3UhEedYtQeYtYuoltf0W8Ft4wAapxKJJRegYPQ0RPOgcfdAg4UquusCaLo
 fWK2Hy4XFrxOwISqsFUczUVkBcXl+w0GGH59pSyTImgoQPlTpbVP6f7Axbl+qpKo
 pqOe4bO8curLGlZpdBN6syR5Ik0bizQK0kDZeo+wPmEClp/1zJWMJ4MTP4T80rxY
 74DiQyfNH2iHfsOkdfHCsJC3jM8nmdKk5wMqtrAiIoT8/vdTBgumHrnmkORWFf8c
 R/NHCCLVs9q9sKV0s+VUR3OM2RjqpG1Wo/EBjTlbDQnibC5qdha8X2uVJWIHiF61
 ZgwZ9BoKV/+mKSqTAQ==
 =WgBI
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fixes from Kees Cook:
 "Fixes for seccomp_notify_ioctl uapi sanity from Sargun Dhillon.

  The bulk of this is fixing the surrounding samples and selftests so
  that seccomp can correctly validate the seccomp_notify_ioctl buffer as
  being initially zeroed.

  Summary:

   - Fix samples and selftests to zero passed-in buffer

   - Enforce zeroed buffer checking

   - Verify buffer sanity check in selftest"

* tag 'seccomp-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  selftests/seccomp: Catch garbage on SECCOMP_IOCTL_NOTIF_RECV
  seccomp: Check that seccomp_notif is zeroed out by the user
  selftests/seccomp: Zero out seccomp_notif
  samples/seccomp: Zero out members based on seccomp_notif_sizes
2020-01-02 16:42:10 -08:00
Steven Rostedt (VMware)
02f4e01ce7 tracing: Initialize val to zero in parse_entry of inject code
gcc produces a variable may be uninitialized warning for "val" in
parse_entry(). This is really a false positive, but the code is subtle
enough to just initialize val to zero and it's not a fast path to worry
about it.

Marked for stable to remove the warning in the stable trees as well.

Cc: stable@vger.kernel.org
Fixes: 6c3edaf9fd ("tracing: Introduce trace event injection")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-02 19:04:57 -05:00
Sargun Dhillon
2882d53c9c seccomp: Check that seccomp_notif is zeroed out by the user
This patch is a small change in enforcement of the uapi for
SECCOMP_IOCTL_NOTIF_RECV ioctl. Specifically, the datastructure which
is passed (seccomp_notif) must be zeroed out. Previously any of its
members could be set to nonsense values, and we would ignore it.

This ensures all fields are set to their zero value.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>
Acked-by: Tycho Andersen <tycho@tycho.ws>
Link: https://lore.kernel.org/r/20191229062451.9467-2-sargun@sargun.me
Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-01-02 13:03:45 -08:00
Vladis Dronov
a33121e548 ptp: fix the race between the release of ptp_clock and cdev
In a case when a ptp chardev (like /dev/ptp0) is open but an underlying
device is removed, closing this file leads to a race. This reproduces
easily in a kvm virtual machine:

ts# cat openptp0.c
int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); }
ts# uname -r
5.5.0-rc3-46cf053e
ts# cat /proc/cmdline
... slub_debug=FZP
ts# modprobe ptp_kvm
ts# ./openptp0 &
[1] 670
opened /dev/ptp0, sleeping 10s...
ts# rmmod ptp_kvm
ts# ls /dev/ptp*
ls: cannot access '/dev/ptp*': No such file or directory
ts# ...woken up
[   48.010809] general protection fault: 0000 [#1] SMP
[   48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e #25
[   48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
[   48.016270] RIP: 0010:module_put.part.0+0x7/0x80
[   48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202
[   48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0
[   48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b
[   48.019470] ...                                              ^^^ a slub poison
[   48.023854] Call Trace:
[   48.024050]  __fput+0x21f/0x240
[   48.024288]  task_work_run+0x79/0x90
[   48.024555]  do_exit+0x2af/0xab0
[   48.024799]  ? vfs_write+0x16a/0x190
[   48.025082]  do_group_exit+0x35/0x90
[   48.025387]  __x64_sys_exit_group+0xf/0x10
[   48.025737]  do_syscall_64+0x3d/0x130
[   48.026056]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   48.026479] RIP: 0033:0x7f53b12082f6
[   48.026792] ...
[   48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm]
[   48.045001] Fixing recursive fault but reboot is needed!

This happens in:

static void __fput(struct file *file)
{   ...
    if (file->f_op->release)
        file->f_op->release(inode, file); <<< cdev is kfree'd here
    if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
             !(mode & FMODE_PATH))) {
        cdev_put(inode->i_cdev); <<< cdev fields are accessed here

Namely:

__fput()
  posix_clock_release()
    kref_put(&clk->kref, delete_clock) <<< the last reference
      delete_clock()
        delete_ptp_clock()
          kfree(ptp) <<< cdev is embedded in ptp
  cdev_put
    module_put(p->owner) <<< *p is kfree'd, bang!

Here cdev is embedded in posix_clock which is embedded in ptp_clock.
The race happens because ptp_clock's lifetime is controlled by two
refcounts: kref and cdev.kobj in posix_clock. This is wrong.

Make ptp_clock's sysfs device a parent of cdev with cdev_device_add()
created especially for such cases. This way the parent device with its
ptp_clock is not released until all references to the cdev are released.
This adds a requirement that an initialized but not exposed struct
device should be provided to posix_clock_register() by a caller instead
of a simple dev_t.

This approach was adopted from the commit 72139dfa24 ("watchdog: Fix
the race between the release of watchdog_core_data and cdev"). See
details of the implementation in the commit 233ed09d7f ("chardev: add
helper function to register char devs with a struct device").

Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#u
Analyzed-by: Stephen Johnston <sjohnsto@redhat.com>
Analyzed-by: Vern Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30 20:19:27 -08:00
Daniel Borkmann
f54c7898ed bpf: Fix precision tracking for unbounded scalars
Anatoly has been fuzzing with kBdysch harness and reported a hang in one
of the outcomes. Upon closer analysis, it turns out that precise scalar
value tracking is missing a few precision markings for unknown scalars:

  0: R1=ctx(id=0,off=0,imm=0) R10=fp0
  0: (b7) r0 = 0
  1: R0_w=invP0 R1=ctx(id=0,off=0,imm=0) R10=fp0
  1: (35) if r0 >= 0xf72e goto pc+0
  --> only follow fallthrough
  2: R0_w=invP0 R1=ctx(id=0,off=0,imm=0) R10=fp0
  2: (35) if r0 >= 0x80fe0000 goto pc+0
  --> only follow fallthrough
  3: R0_w=invP0 R1=ctx(id=0,off=0,imm=0) R10=fp0
  3: (14) w0 -= -536870912
  4: R0_w=invP536870912 R1=ctx(id=0,off=0,imm=0) R10=fp0
  4: (0f) r1 += r0
  5: R0_w=invP536870912 R1_w=inv(id=0) R10=fp0
  5: (55) if r1 != 0x104c1500 goto pc+0
  --> push other branch for later analysis
  R0_w=invP536870912 R1_w=inv273421568 R10=fp0
  6: R0_w=invP536870912 R1_w=inv273421568 R10=fp0
  6: (b7) r0 = 0
  7: R0=invP0 R1=inv273421568 R10=fp0
  7: (76) if w1 s>= 0xffffff00 goto pc+3
  --> only follow goto
  11: R0=invP0 R1=inv273421568 R10=fp0
  11: (95) exit
  6: R0_w=invP536870912 R1_w=inv(id=0) R10=fp0
  6: (b7) r0 = 0
  propagating r0
  7: safe
  processed 11 insns [...]

In the analysis of the second path coming after the successful exit above,
the path is being pruned at line 7. Pruning analysis found that both r0 are
precise P0 and both R1 are non-precise scalars and given prior path with
R1 as non-precise scalar succeeded, this one is therefore safe as well.

However, problem is that given condition at insn 7 in the first run, we only
followed goto and didn't push the other branch for later analysis, we've
never walked the few insns in there and therefore dead-code sanitation
rewrites it as goto pc-1, causing the hang depending on the skb address
hitting these conditions. The issue is that R1 should have been marked as
precise as well such that pruning enforces range check and conluded that new
R1 is not in range of old R1. In insn 4, we mark R1 (skb) as unknown scalar
via __mark_reg_unbounded() but not mark_reg_unbounded() and therefore
regs->precise remains as false.

Back in b5dc0163d8 ("bpf: precise scalar_value tracking"), this was not
the case since marking out of __mark_reg_unbounded() had this covered as well.
Once in both are set as precise in 4 as they should have been, we conclude
that given R1 was in prior fall-through path 0x104c1500 and now is completely
unknown, the check at insn 7 concludes that we need to continue walking.
Analysis after the fix:

  0: R1=ctx(id=0,off=0,imm=0) R10=fp0
  0: (b7) r0 = 0
  1: R0_w=invP0 R1=ctx(id=0,off=0,imm=0) R10=fp0
  1: (35) if r0 >= 0xf72e goto pc+0
  2: R0_w=invP0 R1=ctx(id=0,off=0,imm=0) R10=fp0
  2: (35) if r0 >= 0x80fe0000 goto pc+0
  3: R0_w=invP0 R1=ctx(id=0,off=0,imm=0) R10=fp0
  3: (14) w0 -= -536870912
  4: R0_w=invP536870912 R1=ctx(id=0,off=0,imm=0) R10=fp0
  4: (0f) r1 += r0
  5: R0_w=invP536870912 R1_w=invP(id=0) R10=fp0
  5: (55) if r1 != 0x104c1500 goto pc+0
  R0_w=invP536870912 R1_w=invP273421568 R10=fp0
  6: R0_w=invP536870912 R1_w=invP273421568 R10=fp0
  6: (b7) r0 = 0
  7: R0=invP0 R1=invP273421568 R10=fp0
  7: (76) if w1 s>= 0xffffff00 goto pc+3
  11: R0=invP0 R1=invP273421568 R10=fp0
  11: (95) exit
  6: R0_w=invP536870912 R1_w=invP(id=0) R10=fp0
  6: (b7) r0 = 0
  7: R0_w=invP0 R1_w=invP(id=0) R10=fp0
  7: (76) if w1 s>= 0xffffff00 goto pc+3
  R0_w=invP0 R1_w=invP(id=0) R10=fp0
  8: R0_w=invP0 R1_w=invP(id=0) R10=fp0
  8: (a5) if r0 < 0x2007002a goto pc+0
  9: R0_w=invP0 R1_w=invP(id=0) R10=fp0
  9: (57) r0 &= -16316416
  10: R0_w=invP0 R1_w=invP(id=0) R10=fp0
  10: (a6) if w0 < 0x1201 goto pc+0
  11: R0_w=invP0 R1_w=invP(id=0) R10=fp0
  11: (95) exit
  11: R0=invP0 R1=invP(id=0) R10=fp0
  11: (95) exit
  processed 16 insns [...]

Fixes: 6754172c20 ("bpf: fix precision tracking in presence of bpf2bpf calls")
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191222223740.25297-1-daniel@iogearbox.net
2019-12-22 17:21:10 -08:00
Linus Torvalds
78bac77b52 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
    including adding a missing ipv6 match description.

 2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
    Bhat.

 3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.

 4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.

 5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.

 6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
    Chaignon.

 7) Multicast MAC limit test is off by one in qede, from Manish Chopra.

 8) Fix established socket lookup race when socket goes from
    TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
    RCU grace period. From Eric Dumazet.

 9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.

10) Fix active backup transition after link failure in bonding, from
    Mahesh Bandewar.

11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.

12) Fix wrong interface passed to ->mac_link_up(), from Russell King.

13) Fix DSA egress flooding settings in b53, from Florian Fainelli.

14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.

15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.

16) Reject invalid MTU values in stmmac, from Jose Abreu.

17) Fix refcount leak in error path of u32 classifier, from Davide
    Caratti.

18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
    Kaseorg.

19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.

20) Disable hardware GRO when XDP is attached to qede, frm Manish
    Chopra.

21) Since we encode state in the low pointer bits, dst metrics must be
    at least 4 byte aligned, which is not necessarily true on m68k. Add
    annotations to fix this, from Geert Uytterhoeven.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
  sfc: Include XDP packet headroom in buffer step size.
  sfc: fix channel allocation with brute force
  net: dst: Force 4-byte alignment of dst_metrics
  selftests: pmtu: fix init mtu value in description
  hv_netvsc: Fix unwanted rx_table reset
  net: phy: ensure that phy IDs are correctly typed
  mod_devicetable: fix PHY module format
  qede: Disable hardware gro when xdp prog is installed
  net: ena: fix issues in setting interrupt moderation params in ethtool
  net: ena: fix default tx interrupt moderation interval
  net/smc: unregister ib devices in reboot_event
  net: stmmac: platform: Fix MDIO init for platforms without PHY
  llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
  net: hisilicon: Fix a BUG trigered by wrong bytes_compl
  net: dsa: ksz: use common define for tag len
  s390/qeth: don't return -ENOTSUPP to userspace
  s390/qeth: fix promiscuous mode after reset
  s390/qeth: handle error due to unsupported transport mode
  cxgb4: fix refcount init for TC-MQPRIO offload
  tc-testing: initial tdc selftests for cls_u32
  ...
2019-12-22 09:54:33 -08:00
Linus Torvalds
b8e382a185 Various tracing fixes:
- Fix memory leak on error path of process_system_preds()
  - Lock inversion fix with updating tgid recording option
  - Fix histogram compare function on big endian machines
  - Fix histogram trigger function on big endian machines
  - Make trace_printk() irq sync on init for kprobe selftest correctness
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXf6MRxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlw6AQCny2YeASymmOjDqh9/G53UdhO539Y2
 oL/2nQ8B9T9KWgD6AmmohhbX+TS9l5Nwy2/bKmRgADZ7u+2XLM2f2mYR2Ag=
 =D7hI
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix memory leak on error path of process_system_preds()

 - Lock inversion fix with updating tgid recording option

 - Fix histogram compare function on big endian machines

 - Fix histogram trigger function on big endian machines

 - Make trace_printk() irq sync on init for kprobe selftest correctness

* tag 'trace-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix endianness bug in histogram trigger
  samples/trace_printk: Wait for IRQ work to finish
  tracing: Fix lock inversion in trace_event_enable_tgid_record()
  tracing: Have the histogram compare functions convert to u64 first
  tracing: Avoid memory leak in process_system_preds()
2019-12-21 15:16:56 -08:00
Sven Schnelle
fe6e096a5b tracing: Fix endianness bug in histogram trigger
At least on PA-RISC and s390 synthetic histogram triggers are failing
selftests because trace_event_raw_event_synth() always writes a 64 bit
values, but the reader expects a field->size sized value. On little endian
machines this doesn't hurt, but on big endian this makes the reader always
read zero values.

Link: http://lore.kernel.org/linux-trace-devel/20191218074427.96184-4-svens@linux.ibm.com

Cc: stable@vger.kernel.org
Fixes: 4b147936fa ("tracing: Add support for 'synthetic' events")
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-21 16:08:59 -05:00
Prateek Sood
3a53acf1d9 tracing: Fix lock inversion in trace_event_enable_tgid_record()
Task T2                             Task T3
trace_options_core_write()            subsystem_open()

 mutex_lock(trace_types_lock)           mutex_lock(event_mutex)

 set_tracer_flag()

   trace_event_enable_tgid_record()       mutex_lock(trace_types_lock)

    mutex_lock(event_mutex)

This gives a circular dependency deadlock between trace_types_lock and
event_mutex. To fix this invert the usage of trace_types_lock and
event_mutex in trace_options_core_write(). This keeps the sequence of
lock usage consistent.

Link: http://lkml.kernel.org/r/0101016eef175e38-8ca71caf-a4eb-480d-a1e6-6f0bbc015495-000000@us-west-2.amazonses.com

Cc: stable@vger.kernel.org
Fixes: d914ba37d7 ("tracing: Add support for recording tgid of tasks")
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-21 16:05:13 -05:00
Linus Torvalds
fd7a6d2b8f Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Misc fixes: a (rare) PSI crash fix, a CPU affinity related balancing
  fix, and a toning down of active migration attempts"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/cfs: fix spurious active migration
  sched/fair: Fix find_idlest_group() to handle CPU affinity
  psi: Fix a division error in psi poll()
  sched/psi: Fix sampling error and rare div0 crashes with cgroups and high uptime
2019-12-21 10:52:10 -08:00
Linus Torvalds
c4ff10efe8 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes: a BTS fix, a PT NMI handling fix, a PMU sysfs fix and an
  SRCU annotation"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Add SRCU annotation for pmus list walk
  perf/x86/intel: Fix PT PMI handling
  perf/x86/intel/bts: Fix the use of page_private()
  perf/x86: Fix potential out-of-bounds access
2019-12-21 10:51:00 -08:00
chenqiwu
43cf75d964
exit: panic before exit_mm() on global init exit
Currently, when global init and all threads in its thread-group have exited
we panic via:
do_exit()
-> exit_notify()
   -> forget_original_parent()
      -> find_child_reaper()
This makes it hard to extract a useable coredump for global init from a
kernel crashdump because by the time we panic exit_mm() will have already
released global init's mm.
This patch moves the panic futher up before exit_mm() is called. As was the
case previously, we only panic when global init and all its threads in the
thread-group have exited.

Signed-off-by: chenqiwu <chenqiwu@xiaomi.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
[christian.brauner@ubuntu.com: fix typo, rewrite commit message]
Link: https://lore.kernel.org/r/1576736993-10121-1-git-send-email-qiwuchen55@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2019-12-21 16:48:01 +01:00