Commit Graph

24733 Commits

Author SHA1 Message Date
Thomas Gleixner
96fe3b072f posix-timers: Rename do_schedule_next_timer
That function is a misnomer. Rename it with a proper prefix to
posixtimer_rearm().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20170530211656.811362578@linutronix.de
2017-06-04 15:40:25 +02:00
Thomas Gleixner
3080294589 posix-timers: Add timer_rearm() callback
Add a timer_rearm() callback which is used to make the rescheduling of
posix interval timers independent of the underlying clock implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20170530211656.732632167@linutronix.de
2017-06-04 15:40:25 +02:00
Thomas Gleixner
d97bb75ddd posix-timers: Store k_clock pointer in k_itimer
Having the k_clock pointer in the k_itimer struct avoids the lookup in
several code pathes and makes the next steps of unification of the hrtimer
and alarmtimer based posix timers simpler.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20170530211656.641222072@linutronix.de
2017-06-04 15:40:25 +02:00
Thomas Gleixner
80105cd0e6 posix-timers: Move interval out of the union
Preparatory patch to unify the alarm timer and hrtimer based posix interval
timer handling.

The interval is used as a criteria for rearming decisions so moving it out
of the clock specific data structures allows later unification.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20170530211656.563922908@linutronix.de
2017-06-04 15:40:24 +02:00
Thomas Gleixner
af888d677a posix-timers: Unify overrun/requeue_pending handling
hrtimer based posix-timers and posix-cpu-timers handle the update of the
rearming and overflow related status fields differently.

Move that update to the common rearming code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20170530211656.484936964@linutronix.de
2017-06-04 15:40:24 +02:00
Thomas Gleixner
bab0aae9dc posix-timers: Move posix-timer internals to core
None of these declarations is required outside of kernel/time. Move them to
an internal header.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170530211656.394803853@linutronix.de
2017-06-04 15:40:23 +02:00
Thomas Gleixner
6631fa12c1 posix-timers: Avoid gazillions of forward declarations
Move it below the actual implementations as there are new callbacks coming
which would require even more forward declarations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20170530211656.238209952@linutronix.de
2017-06-04 15:40:23 +02:00
Thomas Gleixner
3a06c7ac24 posix-clocks: Remove interval timer facility and mmap/fasync callbacks
The only user of this facility is ptp_clock, which does not implement any of
those functions.

Remove them to prevent accidental users. Especially the interval timer
interfaces are now more or less impossible to implement because the
necessary infrastructure has been confined to the core code. Aside of that
it's really complex to make these callbacks implemented according to spec
as the alarm timer implementation demonstrates. If at all then a nanosleep
callback might be a reasonable extension. For now keep just what ptp_clock
needs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20170530211656.145036286@linutronix.de
2017-06-04 15:40:22 +02:00
Thomas Gleixner
a81129e5a1 posix-timers: Remove unused export of posix_timer_event()
Since the removal of the mmtimer driver the export is not longer needed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20170530211656.052744418@linutronix.de
2017-06-04 15:40:22 +02:00
Thomas Gleixner
18c700c4e3 alarmtimer: Remove pointless config conditional
Having a IF_ENABLED(CONFIG_POSIX_TIMERS) inside of a
#ifdef CONFIG_POSIX_TIMERS section is pointless.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20170530211655.975218056@linutronix.de
2017-06-04 15:40:22 +02:00
Thomas Gleixner
978267b643 Merge branch 'timers/urgent' into WIP.timers
Pick up urgent fixes to avoid conflicts.
2017-06-04 15:21:52 +02:00
Thomas Gleixner
ff86bf0c65 alarmtimer: Rate limit periodic intervals
The alarmtimer code has another source of potentially rearming itself too
fast. Interval timers with a very samll interval have a similar CPU hog
effect as the previously fixed overflow issue.

The reason is that alarmtimers do not implement the normal protection
against this kind of problem which the other posix timer use:

  timer expires -> queue signal -> deliver signal -> rearm timer

This scheme brings the rearming under scheduler control and prevents
permanently firing timers which hog the CPU.

Bringing this scheme to the alarm timer code is a major overhaul because it
lacks all the necessary mechanisms completely.

So for a quick fix limit the interval to one jiffie. This is not
problematic in practice as alarmtimers are usually backed by an RTC for
suspend which have 1 second resolution. It could be therefor argued that
the resolution of this clock should be set to 1 second in general, but
that's outside the scope of this fix.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kostya Serebryany <kcc@google.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170530211655.896767100@linutronix.de
2017-06-04 15:21:18 +02:00
Thomas Gleixner
f4781e76f9 alarmtimer: Prevent overflow of relative timers
Andrey reported a alartimer related RCU stall while fuzzing the kernel with
syzkaller.

The reason for this is an overflow in ktime_add() which brings the
resulting time into negative space and causes immediate expiry of the
timer. The following rearm with a small interval does not bring the timer
back into positive space due to the same issue.

This results in a permanent firing alarmtimer which hogs the CPU.

Use ktime_add_safe() instead which detects the overflow and clamps the
result to KTIME_SEC_MAX.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kostya Serebryany <kcc@google.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170530211655.802921648@linutronix.de
2017-06-04 15:21:18 +02:00
Christoph Hellwig
31ea70e030 posix-timers: Move the do_schedule_next_timer declaration
Having it in asm-generic/siginfo.h doesn't make any sense as it is in no way
architecture specific.  Move it to posix-timers.h instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: sparclinux@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170603190102.28866-4-hch@lst.de
2017-06-04 15:11:46 +02:00
Linus Torvalds
035f1456f9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fix from Jiri Kosina:
 "Kconfig dependency fix for livepatching infrastructure from Miroslav
  Benes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: Make livepatch dependent on !TRIM_UNUSED_KSYMS
2017-06-02 08:59:17 -07:00
Linus Torvalds
39b8ab31bc Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixlet from Thomas Gleixner:
 "Silence dmesg spam by making the posix cpu timer printks depend on
  print_fatal_signals"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-timers: Make signal printks conditional
2017-05-27 09:14:24 -07:00
Linus Torvalds
805f286907 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
 "A fix for a state leak which was introduced in the recent rework of
  futex/rtmutex interaction"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock()
2017-05-27 08:59:37 -07:00
Linus Torvalds
d024baa58a Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull kthread fix from Thomas Gleixner:
 "A single fix which prevents a use after free when kthread fork fails"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kthread: Fix use-after-free if kthread fork fails
2017-05-27 08:52:27 -07:00
Linus Torvalds
77d6465695 There's been a few memory issues found with ftrace.
One was simply a memory leak where not all was being freed that should
 have been in releasing a file pointer on set_graph_function.
 
 Then Thomas found that the ftrace trampolines were marked for read/write
 as well as execute. To shrink the possible attack surface, he added
 calls to set them to ro. Which also uncovered some other issues with
 freeing module allocated memory that had its permissions changed.
 
 Kprobes had a similar issue which is fixed and a selftest was added
 to trigger that issue again.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJZKOiVFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 vBoH/jxVozuAEVCv+Nbj6fhRxe4emjo0lZZb32EbEaSV/nUQGqHIZFdDQtbt+ld+
 sn06/BSMBI+L4BqLj1BCAW0e/zIn/4birIg53SX5jQwc3AlhUG7HS2d+RJZZCrp9
 Zofq9L6xZ4Hl2XjkPXqwEgtrwxQtkIPLlJqeYDJ6BVrlPfOPEwB7bfR7B684wiYT
 6h2Qo7f/ZQzgJ1sK8N2IjHEnAgE08KCYcj4IB4WHJk6SqQz3bv1Y00WBg2UQihVT
 TPPSVhYLnrSw53fxyALqZbHo2DvnQf1TnNadWxvSIpbvgm/T5GG60FDtvHgNfbwz
 yKuKAog+P9xBLkoAcfvODLY9O5s=
 =75TZ
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fixes from Steven Rostedt:
 "There's been a few memory issues found with ftrace.

  One was simply a memory leak where not all was being freed that should
  have been in releasing a file pointer on set_graph_function.

  Then Thomas found that the ftrace trampolines were marked for
  read/write as well as execute. To shrink the possible attack surface,
  he added calls to set them to ro. Which also uncovered some other
  issues with freeing module allocated memory that had its permissions
  changed.

  Kprobes had a similar issue which is fixed and a selftest was added to
  trigger that issue again"

* tag 'trace-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  x86/ftrace: Make sure that ftrace trampolines are not RWX
  x86/mm/ftrace: Do not bug in early boot on irqs_disabled in cpu_flush_range()
  selftests/ftrace: Add a testcase for many kprobe events
  kprobes/x86: Fix to set RWX bits correctly before releasing trampoline
  ftrace: Fix memory leak in ftrace_graph_release()
2017-05-27 08:30:30 -07:00
Thomas Gleixner
b6b3b80fce alarmtimer: Fix posix-timer constification fallout
Some freezer related variables are only used when either CONFIG_POSIX_TIMER
or CONFIG_RTC_CLASS are enabled. Hide them when both are off.

Fixes: d3ba5a9a34 ("posix-timers: Make posix_clocks immutable")
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Helwig <hch@lst.de>
2017-05-27 12:23:47 +02:00
Christoph Hellwig
d3ba5a9a34 posix-timers: Make posix_clocks immutable
There are no more modular users providing a posix clock. The register
function is now pointless so the posix clock array can be initialized
statically at compile time and the array including the various k_clock
structs can be marked 'const'.

Inspired by changes in the Grsecurity patch set, but done proper.

[ tglx: Massaged changelog and fixed the POSIX_TIMER=n case ]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Dimitri Sivanich <sivanich@hpe.com>
Link: http://lkml.kernel.org/r/20170526090311.3377-3-hch@lst.de
2017-05-27 09:46:35 +02:00
Masami Hiramatsu
c93f5cf571 kprobes/x86: Fix to set RWX bits correctly before releasing trampoline
Fix kprobes to set(recover) RWX bits correctly on trampoline
buffer before releasing it. Releasing readonly page to
module_memfree() crash the kernel.

Without this fix, if kprobes user register a bunch of kprobes
in function body (since kprobes on function entry usually
use ftrace) and unregister it, kernel hits a BUG and crash.

Link: http://lkml.kernel.org/r/149570868652.3518.14120169373590420503.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: d0381c81c2 ("kprobes/x86: Set kprobes pages read-only")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-26 22:37:00 -04:00
Luis Henriques
f9797c2f20 ftrace: Fix memory leak in ftrace_graph_release()
ftrace_hash is being kfree'ed in ftrace_graph_release(), however the
->buckets field is not.  This results in a memory leak that is easily
captured by kmemleak:

unreferenced object 0xffff880038afe000 (size 8192):
  comm "trace-cmd", pid 238, jiffies 4294916898 (age 9.736s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff815f561e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff8113964d>] __kmalloc+0x12d/0x1a0
    [<ffffffff810bf6d1>] alloc_ftrace_hash+0x51/0x80
    [<ffffffff810c0523>] __ftrace_graph_open.isra.39.constprop.46+0xa3/0x100
    [<ffffffff810c05e8>] ftrace_graph_open+0x68/0xa0
    [<ffffffff8114003d>] do_dentry_open.isra.1+0x1bd/0x2d0
    [<ffffffff81140df7>] vfs_open+0x47/0x60
    [<ffffffff81150f95>] path_openat+0x2a5/0x1020
    [<ffffffff81152d6a>] do_filp_open+0x8a/0xf0
    [<ffffffff811411df>] do_sys_open+0x12f/0x200
    [<ffffffff811412ce>] SyS_open+0x1e/0x20
    [<ffffffff815fa6e0>] entry_SYSCALL_64_fastpath+0x13/0x94
    [<ffffffffffffffff>] 0xffffffffffffffff

Link: http://lkml.kernel.org/r/20170525152038.7661-1-lhenriques@suse.com

Cc: stable@vger.kernel.org
Fixes: b9b0c831be ("ftrace: Convert graph filter to use hash tables")
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-26 22:35:48 -04:00
Miroslav Benes
5720acf4bf livepatch: Make livepatch dependent on !TRIM_UNUSED_KSYMS
If TRIM_UNUSED_KSYMS is enabled, all unneeded exported symbols are made
unexported. Two-pass build of the kernel is done to find out which
symbols are needed based on a configuration. This effectively
complicates things for out-of-tree modules.

Livepatch exports functions to (un)register and enable/disable a live
patch. The only in-tree module which uses these functions is a sample in
samples/livepatch/. If the sample is disabled, the functions are
trimmed and out-of-tree live patches cannot be built.

Note that live patches are intended to be built out-of-tree.

Suggested-by: Michal Marek <mmarek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-05-27 00:27:37 +02:00
Linus Torvalds
6741d51699 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix state pruning in bpf verifier wrt. alignment, from Daniel
    Borkmann.

 2) Handle non-linear SKBs properly in SCTP ICMP parsing, from Davide
    Caratti.

 3) Fix bit field definitions for rss_hash_type of descriptors in mlx5
    driver, from Jesper Brouer.

 4) Defer slave->link updates until bonding is ready to do a full commit
    to the new settings, from Nithin Sujir.

 5) Properly reference count ipv4 FIB metrics to avoid use after free
    situations, from Eric Dumazet and several others including Cong Wang
    and Julian Anastasov.

 6) Fix races in llc_ui_bind(), from Lin Zhang.

 7) Fix regression of ESP UDP encapsulation for TCP packets, from
    Steffen Klassert.

 8) Fix mdio-octeon driver Kconfig deps, from Randy Dunlap.

 9) Fix regression in setting DSCP on ipv6/GRE encapsulation, from Peter
    Dawson.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  ipv4: add reference counting to metrics
  net: ethernet: ax88796: don't call free_irq without request_irq first
  ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
  sctp: fix ICMP processing if skb is non-linear
  net: llc: add lock_sock in llc_ui_bind to avoid a race condition
  bonding: Don't update slave->link until ready to commit
  test_bpf: Add a couple of tests for BPF_JSGE.
  bpf: add various verifier test cases
  bpf: fix wrong exposure of map_flags into fdinfo for lpm
  bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
  bpf: properly reset caller saved regs after helper call and ld_abs/ind
  bpf: fix incorrect pruning decision when alignment must be tracked
  arp: fixed -Wuninitialized compiler warning
  tcp: avoid fastopen API to be used on AF_UNSPEC
  net: move somaxconn init from sysctl code
  net: fix potential null pointer dereference
  geneve: fix fill_info when using collect_metadata
  virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
  be2net: Fix offload features for Q-in-Q packets
  vlan: Fix tcp checksum offloads in Q-in-Q vlans
  ...
2017-05-26 13:51:01 -07:00
Daniel Borkmann
a316338cb7 bpf: fix wrong exposure of map_flags into fdinfo for lpm
trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
attr->map_flags, since it does not support preallocation yet. We
check the flag, but we never copy the flag into trie->map.map_flags,
which is later on exposed into fdinfo and used by loaders such as
iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
whether a pinned map has the same spec as the one from the BPF obj
file and if not, bails out, which is currently the case for lpm
since it exposes always 0 as flags.

Also copy over flags in array_map_alloc() and stack_map_alloc().
They always have to be 0 right now, but we should make sure to not
miss to copy them over at a later point in time when we add actual
flags for them to use.

Fixes: b95a5c4db0 ("bpf: add a longest prefix match trie map implementation")
Reported-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 13:44:28 -04:00
Daniel Borkmann
a9789ef9af bpf: properly reset caller saved regs after helper call and ld_abs/ind
Currently, after performing helper calls, we clear all caller saved
registers, that is r0 - r5 and fill r0 depending on struct bpf_func_proto
specification. The way we reset these regs can affect pruning decisions
in later paths, since we only reset register's imm to 0 and type to
NOT_INIT. However, we leave out clearing of other variables such as id,
min_value, max_value, etc, which can later on lead to pruning mismatches
due to stale data.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 13:44:27 -04:00
Daniel Borkmann
1ad2f5838d bpf: fix incorrect pruning decision when alignment must be tracked
Currently, when we enforce alignment tracking on direct packet access,
the verifier lets the following program pass despite doing a packet
write with unaligned access:

  0: (61) r2 = *(u32 *)(r1 +76)
  1: (61) r3 = *(u32 *)(r1 +80)
  2: (61) r7 = *(u32 *)(r1 +8)
  3: (bf) r0 = r2
  4: (07) r0 += 14
  5: (25) if r7 > 0x1 goto pc+4
   R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
   R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
  6: (2d) if r0 > r3 goto pc+1
   R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14)
   R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
  7: (63) *(u32 *)(r0 -4) = r0
  8: (b7) r0 = 0
  9: (95) exit

  from 6 to 8:
   R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
   R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
  8: (b7) r0 = 0
  9: (95) exit

  from 5 to 10:
   R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
   R3=pkt_end R7=inv,min_value=2 R10=fp
  10: (07) r0 += 1
  11: (05) goto pc-6
  6: safe                           <----- here, wrongly found safe
  processed 15 insns

However, if we enforce a pruning mismatch by adding state into r8
which is then being mismatched in states_equal(), we find that for
the otherwise same program, the verifier detects a misaligned packet
access when actually walking that path:

  0: (61) r2 = *(u32 *)(r1 +76)
  1: (61) r3 = *(u32 *)(r1 +80)
  2: (61) r7 = *(u32 *)(r1 +8)
  3: (b7) r8 = 1
  4: (bf) r0 = r2
  5: (07) r0 += 14
  6: (25) if r7 > 0x1 goto pc+4
   R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
   R3=pkt_end R7=inv,min_value=0,max_value=1
   R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
  7: (2d) if r0 > r3 goto pc+1
   R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14)
   R3=pkt_end R7=inv,min_value=0,max_value=1
   R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
  8: (63) *(u32 *)(r0 -4) = r0
  9: (b7) r0 = 0
  10: (95) exit

  from 7 to 9:
   R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
   R3=pkt_end R7=inv,min_value=0,max_value=1
   R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
  9: (b7) r0 = 0
  10: (95) exit

  from 6 to 11:
   R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
   R3=pkt_end R7=inv,min_value=2
   R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
  11: (07) r0 += 1
  12: (b7) r8 = 0
  13: (05) goto pc-7                <----- mismatch due to r8
  7: (2d) if r0 > r3 goto pc+1
   R0=pkt(id=0,off=15,r=15) R1=ctx R2=pkt(id=0,off=0,r=15)
   R3=pkt_end R7=inv,min_value=2
   R8=imm0,min_value=0,max_value=0,min_align=2147483648 R10=fp
  8: (63) *(u32 *)(r0 -4) = r0
  misaligned packet access off 2+15+-4 size 4

The reason why we fail to see it in states_equal() is that the
third test in compare_ptrs_to_packet() ...

  if (old->off <= cur->off &&
      old->off >= old->range && cur->off >= cur->range)
          return true;

... will let the above pass. The situation we run into is that
old->off <= cur->off (14 <= 15), meaning that prior walked paths
went with smaller offset, which was later used in the packet
access after successful packet range check and found to be safe
already.

For example: Given is R0=pkt(id=0,off=0,r=0). Adding offset 14
as in above program to it, results in R0=pkt(id=0,off=14,r=0)
before the packet range test. Now, testing this against R3=pkt_end
with 'if r0 > r3 goto out' will transform R0 into R0=pkt(id=0,off=14,r=14)
for the case when we're within bounds. A write into the packet
at offset *(u32 *)(r0 -4), that is, 2 + 14 -4, is valid and
aligned (2 is for NET_IP_ALIGN). After processing this with
all fall-through paths, we later on check paths from branches.
When the above skb->mark test is true, then we jump near the
end of the program, perform r0 += 1, and jump back to the
'if r0 > r3 goto out' test we've visited earlier already. This
time, R0 is of type R0=pkt(id=0,off=15,r=0), and we'll prune
that part because this time we'll have a larger safe packet
range, and we already found that with off=14 all further insn
were already safe, so it's safe as well with a larger off.
However, the problem is that the subsequent write into the packet
with 2 + 15 -4 is then unaligned, and not caught by the alignment
tracking. Note that min_align, aux_off, and aux_off_align were
all 0 in this example.

Since we cannot tell at this time what kind of packet access was
performed in the prior walk and what minimal requirements it has
(we might do so in the future, but that requires more complexity),
fix it to disable this pruning case for strict alignment for now,
and let the verifier do check such paths instead. With that applied,
the test cases pass and reject the program due to misalignment.

Fixes: d117441674 ("bpf: Track alignment of register values in the verifier.")
Reference: http://patchwork.ozlabs.org/patch/761909/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 13:44:27 -04:00
Linus Torvalds
2426125ab4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace fix from Eric Biederman:
 "This fixes a brown paper bag bug. When I fixed the ptrace interaction
  with user namespaces I added a new field ptracer_cred in struct_task
  and I failed to properly initialize it on fork.

  This dangling pointer wound up breaking runing setuid applications run
  from the enlightenment window manager.

  As this is the worst sort of bug. A regression breaking user space for
  no good reason let's get this fixed"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ptrace: Properly initialize ptracer_cred on fork
2017-05-24 08:28:59 -07:00
Thomas Gleixner
43fe8b8eb8 posix-timers: Make signal printks conditional
A recent commit added extra printks for CPU/RT limits. This can result in
excessive spam in dmesg.

Make the printks conditional on print_fatal_signals.

Fixes: e7ea7c9806 ("rlimits: Print more information when CPU/RT limits are exceeded")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arun Raghavan <arun@arunraghavan.net>
2017-05-23 23:39:57 +02:00
Eric W. Biederman
c70d9d809f ptrace: Properly initialize ptracer_cred on fork
When I introduced ptracer_cred I failed to consider the weirdness of
fork where the task_struct copies the old value by default.  This
winds up leaving ptracer_cred set even when a process forks and
the child process does not wind up being ptraced.

Because ptracer_cred is not set on non-ptraced processes whose
parents were ptraced this has broken the ability of the enlightenment
window manager to start setuid children.

Fix this by properly initializing ptracer_cred in ptrace_init_task

This must be done with a little bit of care to preserve the current value
of ptracer_cred when ptrace carries through fork.  Re-reading the
ptracer_cred from the ptracing process at this point is inconsistent
with how PT_PTRACE_CAP has been maintained all of these years.

Tested-by: Takashi Iwai <tiwai@suse.de>
Fixes: 64b875f7ac ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-05-23 07:40:44 -05:00
Linus Torvalds
801099bed0 Power management updates for v4.12-rc3
- Fix RTC wakeup from suspend-to-idle broken by the recent rework
    of ACPI wakeup handling (Rafael Wysocki).
 
  - Update intel_pstate driver documentation to reflect the current
    code and explain how it works in more detail (Rafael Wysocki).
 
  - Fix an issue related to CPU idleness detection on systems with
    shared cpufreq policies in the schedutil governor (Juri Lelli).
 
  - Fix a possible build issue in the dbx500 cpufreq driver (Arnd
    Bergmann).
 
  - Fix a function in the power capping framework core to return
    an error code instead of 0 when there's an error (Dan Carpenter).
 
  - Clean up variable definition in the hibernation core (Pushkar
    Jambhlekar).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZIzszAAoJEILEb/54YlRxMG0P/R4VpPMB1l+wxQRmCMwzOupC
 GJ1jTa2mQQpPy57QPjaCDlUPxSaZA97S4MO0eMn4Or6LX3rG7kTUoe1WaYvRhWNk
 Ul2UfoLdVeFJwvQrzOZKB2xnEGA/nD2jlsD/9zYzy9FxMPjiG0F//RZvhZJVChpg
 wycz9Rw1T2x+1URAD5wkS4xLWzQEv5NqH6mc/KAoP/ntxe+7ahs5SnWmF9MLpHj7
 jXM9651BUSYp3QzHCHFObvsVZfbZz7isFIADmwsxzTy7vTPb1oIyo7EQ5QMcsivS
 LlJjrYy9JN0alwND0mistVlAmFVvvldckjR8zHSEiFt8IeMccrFw0inGir2ngghY
 53kMnJ/QoL1A/C539MHoAmfnpqB0QUd56QjXngungC47YpVHi5DaSXU7rln2xy/C
 7o7gbHUKUbStSvDLjRcQ915HANOuXkJk84BMIGUSlT3K/MvGAMKUNxZV7KOOngpb
 WR4G2lxjYTIHKB+YP5AmG2kMF4GlbGnIQts5Ryd5FijIH3/MYJ4W2Kas+GvbnoBb
 7NtDjyBJgjxleTv3fV89Pod+dKdFzrTRl+mr6bsn/WCiMjUHoXcTnOHh3OO/fJ8F
 AW/dywk9+Hx5DyjY04EJyklflfne97T7/NjJ99Zjzh/EC+uePeM+dMd+o66PpYG5
 +FJgyPc5ZaX1f2thAgv+
 =2sNW
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix RTC wakeup from suspend-to-idle broken recently, fix CPU
  idleness detection condition in the schedutil cpufreq governor, fix a
  cpufreq driver build failure, fix an error code path in the power
  capping framework, clean up the hibernate core and update the
  intel_pstate documentation.

  Specifics:

   - Fix RTC wakeup from suspend-to-idle broken by the recent rework of
     ACPI wakeup handling (Rafael Wysocki).

   - Update intel_pstate driver documentation to reflect the current
     code and explain how it works in more detail (Rafael Wysocki).

   - Fix an issue related to CPU idleness detection on systems with
     shared cpufreq policies in the schedutil governor (Juri Lelli).

   - Fix a possible build issue in the dbx500 cpufreq driver (Arnd
     Bergmann).

   - Fix a function in the power capping framework core to return an
     error code instead of 0 when there's an error (Dan Carpenter).

   - Clean up variable definition in the hibernation core (Pushkar
     Jambhlekar)"

* tag 'pm-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: dbx500: add a Kconfig symbol
  PM / hibernate: Declare variables as static
  PowerCap: Fix an error code in powercap_register_zone()
  RTC: rtc-cmos: Fix wakeup from suspend-to-idle
  PM / wakeup: Fix up wakeup_source_report_event()
  cpufreq: intel_pstate: Document the current behavior and user interface
  cpufreq: schedutil: use now as reference when aggregating shared policy requests
2017-05-22 19:24:32 -07:00
Vegard Nossum
4d6501dce0 kthread: Fix use-after-free if kthread fork fails
If a kthread forks (e.g. usermodehelper since commit 1da5c46fa9) but
fails in copy_process() between calling dup_task_struct() and setting
p->set_child_tid, then the value of p->set_child_tid will be inherited
from the parent and get prematurely freed by free_kthread_struct().

    kthread()
     - worker_thread()
        - process_one_work()
        |  - call_usermodehelper_exec_work()
        |     - kernel_thread()
        |        - _do_fork()
        |           - copy_process()
        |              - dup_task_struct()
        |                 - arch_dup_task_struct()
        |                    - tsk->set_child_tid = current->set_child_tid // implied
        |              - ...
        |              - goto bad_fork_*
        |              - ...
        |              - free_task(tsk)
        |                 - free_kthread_struct(tsk)
        |                    - kfree(tsk->set_child_tid)
        - ...
        - schedule()
           - __schedule()
              - wq_worker_sleeping()
                 - kthread_data(task)->flags // UAF

The problem started showing up with commit 1da5c46fa9 since it reused
->set_child_tid for the kthread worker data.

A better long-term solution might be to get rid of the ->set_child_tid
abuse. The comment in set_kthread_struct() also looks slightly wrong.

Debugged-by: Jamie Iles <jamie.iles@oracle.com>
Fixes: 1da5c46fa9 ("kthread: Make struct kthread kmalloc'ed")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jamie Iles <jamie.iles@oracle.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170509073959.17858-1-vegard.nossum@oracle.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-05-22 22:21:16 +02:00
Peter Zijlstra
04dc1b2fff futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock()
Markus reported that the glibc/nptl/tst-robustpi8 test was failing after
commit:

  cfafcd117d ("futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()")

The following trace shows the problem:

 ld-linux-x86-64-2161  [019] ....   410.760971: SyS_futex: 00007ffbeb76b028: 80000875  op=FUTEX_LOCK_PI
 ld-linux-x86-64-2161  [019] ...1   410.760972: lock_pi_update_atomic: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000875 ret=0
 ld-linux-x86-64-2165  [011] ....   410.760978: SyS_futex: 00007ffbeb76b028: 80000875  op=FUTEX_UNLOCK_PI
 ld-linux-x86-64-2165  [011] d..1   410.760979: do_futex: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000871 ret=0
 ld-linux-x86-64-2165  [011] ....   410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=0000
 ld-linux-x86-64-2161  [019] ....   410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=ETIMEDOUT

Task 2165 does an UNLOCK_PI, assigning the lock to the waiter task 2161
which then returns with -ETIMEDOUT. That wrecks the lock state, because now
the owner isn't aware it acquired the lock and removes the pending robust
list entry.

If 2161 is killed, the robust list will not clear out this futex and the
subsequent acquire on this futex will then (correctly) result in -ESRCH
which is unexpected by glibc, triggers an internal assertion and dies.

Task 2161			Task 2165

rt_mutex_wait_proxy_lock()
   timeout();
   /* T2161 is still queued in  the waiter list */
   return -ETIMEDOUT;

				futex_unlock_pi()
				spin_lock(hb->lock);
				rtmutex_unlock()
				  remove_rtmutex_waiter(T2161);
				   mark_lock_available();
				/* Make the next waiter owner of the user space side */
				futex_uval = 2161;
				spin_unlock(hb->lock);
spin_lock(hb->lock);
rt_mutex_cleanup_proxy_lock()
  if (rtmutex_owner() !== current)
     ...
     return FAIL;
....
return -ETIMEOUT;

This means that rt_mutex_cleanup_proxy_lock() needs to call
try_to_take_rt_mutex() so it can take over the rtmutex correctly which was
assigned by the waker. If the rtmutex is owned by some other task then this
call is harmless and just confirmes that the waiter is not able to acquire
it.

While there, fix what looks like a merge error which resulted in
rt_mutex_cleanup_proxy_lock() having two calls to
fixup_rt_mutex_waiters() and rt_mutex_wait_proxy_lock() not having any.
Both should have one, since both potentially touch the waiter list.

Fixes: 38d589f2fd ("futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()")
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Bug-Spotted-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Link: http://lkml.kernel.org/r/20170519154850.mlomgdsd26drq5j6@hirez.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-05-22 21:57:18 +02:00
Linus Torvalds
86ca984cef Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Mostly netfilter bug fixes in here, but we have some bits elsewhere as
  well.

   1) Don't do SNAT replies for non-NATed connections in IPVS, from
      Julian Anastasov.

   2) Don't delete conntrack helpers while they are still in use, from
      Liping Zhang.

   3) Fix zero padding in xtables's xt_data_to_user(), from Willem de
      Bruijn.

   4) Add proper RCU protection to nf_tables_dump_set() because we
      cannot guarantee that we hold the NFNL_SUBSYS_NFTABLES lock. From
      Liping Zhang.

   5) Initialize rcv_mss in tcp_disconnect(), from Wei Wang.

   6) smsc95xx devices can't handle IPV6 checksums fully, so don't
      advertise support for offloading them. From Nisar Sayed.

   7) Fix out-of-bounds access in __ip6_append_data(), from Eric
      Dumazet.

   8) Make atl2_probe() propagate the error code properly on failures,
      from Alexey Khoroshilov.

   9) arp_target[] in bond_check_params() is used uninitialized. This
      got changes from a global static to a local variable, which is how
      this mistake happened. Fix from Jarod Wilson.

  10) Fix fallout from unnecessary NULL check removal in cls_matchall,
      from Jiri Pirko. This is definitely brown paper bag territory..."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  net: sched: cls_matchall: fix null pointer dereference
  vsock: use new wait API for vsock_stream_sendmsg()
  bonding: fix randomly populated arp target array
  net: Make IP alignment calulations clearer.
  bonding: fix accounting of active ports in 3ad
  net: atheros: atl2: don't return zero on failure path in atl2_probe()
  ipv6: fix out of bound writes in __ip6_append_data()
  bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
  smsc95xx: Support only IPv4 TCP/UDP csum offload
  arp: always override existing neigh entries with gratuitous ARP
  arp: postpone addr_type calculation to as late as possible
  arp: decompose is_garp logic into a separate function
  arp: fixed error in a comment
  tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
  netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside CONFIG_COMPAT
  ebtables: arpreply: Add the standard target sanity check
  netfilter: nf_tables: revisit chain/object refcounting from elements
  netfilter: nf_tables: missing sanitization in data from userspace
  netfilter: nf_tables: can't assume lock is acquired when dumping set elems
  netfilter: synproxy: fix conntrackd interaction
  ...
2017-05-22 12:42:02 -07:00
Rafael J. Wysocki
bb47e96417 Merge branches 'pm-sleep' and 'powercap'
* pm-sleep:
  PM / hibernate: Declare variables as static
  RTC: rtc-cmos: Fix wakeup from suspend-to-idle
  PM / wakeup: Fix up wakeup_source_report_event()

* powercap:
  PowerCap: Fix an error code in powercap_register_zone()
2017-05-22 20:32:05 +02:00
Rafael J. Wysocki
079c1812a2 Merge branches 'intel_pstate', 'pm-cpufreq' and 'pm-cpufreq-sched'
* intel_pstate:
  cpufreq: intel_pstate: Document the current behavior and user interface

* pm-cpufreq:
  cpufreq: dbx500: add a Kconfig symbol

* pm-cpufreq-sched:
  cpufreq: schedutil: use now as reference when aggregating shared policy requests
2017-05-22 20:28:22 +02:00
David S. Miller
e4eda884db net: Make IP alignment calulations clearer.
The assignmnet:

	ip_align = strict ? 2 : NET_IP_ALIGN;

in compare_pkt_ptr_alignment() trips up Coverity because we can only
get to this code when strict is true, therefore ip_align will always
be 2 regardless of NET_IP_ALIGN's value.

So just assign directly to '2' and explain the situation in the
comment above.

Reported-by: "Gustavo A. R. Silva" <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-22 12:27:07 -04:00
Linus Torvalds
970c305aa8 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
 "A single scheduler fix:

  Prevent idle task from ever being preempted. That makes sure that
  synchronize_rcu_tasks() which is ignoring idle task does not pretend
  that no task is stuck in preempted state. If that happens and idle was
  preempted on a ftrace trampoline the machine crashes due to
  inconsistent state"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Call __schedule() from do_idle() without enabling preemption
2017-05-21 11:52:00 -07:00
Linus Torvalds
e7a3d62749 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "A set of small fixes for the irq subsystem:

   - Cure a data ordering problem with chained interrupts

   - Three small fixlets for the mbigen irq chip"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Fix chained interrupt data ordering
  irqchip/mbigen: Fix the clear register offset calculation
  irqchip/mbigen: Fix potential NULL dereferencing
  irqchip/mbigen: Fix memory mapping code
2017-05-21 11:45:26 -07:00
Linus Torvalds
56f410cf45 This fixes a bug caused by not cleaning up the new instance unique triggers
when deleting an instance. It also creates a selftest that triggers that bug.
 
 Fix the delayed optimization happening after kprobes boot up self tests
 being removed by freeing of init memory.
 
 Comment kprobes on why the delay optimization is not a problem for removal
 of modules, to keep other developers from searching that riddle.
 
 Fix another rcu isn't watching in stack trace tracing.
 
 Naveen N. Rao (4):
       ftrace: Simplify glob handling in unregister_ftrace_function_probe_func()
       ftrace/instances: Clear function triggers when removing instances
       selftests/ftrace: Fix bashisms
       selftests/ftrace: Add test to remove instance with active event triggers
 
 Steven Rostedt (1):
       tracing: Move postpone selftests to core from early_initcall
 
 Steven Rostedt (VMware) (3):
       ftrace: Remove #ifdef from code and add clear_ftrace_function_probes() stub
       kprobes: Document how optimized kprobes are removed from module unload
       tracing: Make sure RCU is watching before calling a stack trace
 
 Thomas Gleixner (1):
       tracing/kprobes: Enforce kprobes teardown after testing
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJZIQapFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 A6MIAKFLb6mQ4flRBXpWd2tD2B4DQpQ0H7SovseZnlH6Q7grU6POY/qbNl9xXiBA
 3NavxqbIYokH8cxEqGAusL7ASUFPXJj6erMM1uc1WRuAzMpIjvgNacOtW5R+c5S9
 ofR1xtKlBo/854J/IP6M3J0WqrK+B7TsS1WYKohe/tFMBpolbnFloHVfMMZlaL58
 CQhCoAhkjJRsta6dJhbo+HoQy03VGyWsfFHtutBpIwsf81Naq4Stpxp7jdZLWhB8
 Di5QdOji9lDayK6Uk7DDZqHxbjC9z6cCS9nVWIGHkE4AMpR3peYtsyCaAOBjVMLV
 2OuhuREfZgKaYVMjUfdeYCayDAY=
 =1gek
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix a bug caused by not cleaning up the new instance unique triggers
   when deleting an instance. It also creates a selftest that triggers
   that bug.

 - Fix the delayed optimization happening after kprobes boot up self
   tests being removed by freeing of init memory.

 - Comment kprobes on why the delay optimization is not a problem for
   removal of modules, to keep other developers from searching that
   riddle.

 - Fix another case of rcu not watching in stack trace tracing.

* tag 'trace-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Make sure RCU is watching before calling a stack trace
  kprobes: Document how optimized kprobes are removed from module unload
  selftests/ftrace: Add test to remove instance with active event triggers
  selftests/ftrace: Fix bashisms
  ftrace: Remove #ifdef from code and add clear_ftrace_function_probes() stub
  ftrace/instances: Clear function triggers when removing instances
  ftrace: Simplify glob handling in unregister_ftrace_function_probe_func()
  tracing/kprobes: Enforce kprobes teardown after testing
  tracing: Move postpone selftests to core from early_initcall
2017-05-20 23:39:03 -07:00
Linus Torvalds
894e21642d Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A small collection of fixes that should go into this cycle.

   - a pull request from Christoph for NVMe, which ended up being
     manually applied to avoid pulling in newer bits in master. Mostly
     fibre channel fixes from James, but also a few fixes from Jon and
     Vijay

   - a pull request from Konrad, with just a single fix for xen-blkback
     from Gustavo.

   - a fuseblk bdi fix from Jan, fixing a regression in this series with
     the dynamic backing devices.

   - a blktrace fix from Shaohua, replacing sscanf() with kstrtoull().

   - a request leak fix for drbd from Lars, fixing a regression in the
     last series with the kref changes. This will go to stable as well"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvmet: release the sq ref on rdma read errors
  nvmet-fc: remove target cpu scheduling flag
  nvme-fc: stop queues on error detection
  nvme-fc: require target or discovery role for fc-nvme targets
  nvme-fc: correct port role bits
  nvme: unmap CMB and remove sysfs file in reset path
  blktrace: fix integer parse
  fuseblk: Fix warning in super_setup_bdi_name()
  block: xen-blkback: add null check to avoid null pointer dereference
  drbd: fix request leak introduced by locking/atomic, kref: Kill kref_sub()
2017-05-20 16:12:30 -07:00
Shaohua Li
5f3394530f blktrace: fix integer parse
sscanf is a very poor way to parse integer. For example, I input
"discard" for act_mask, it gets 0xd and completely messes up. Using
correct API to do integer parse.

This patch also makes attributes accept any base of integer.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-19 09:21:15 -06:00
Steven Rostedt (VMware)
a33d7d94ee tracing: Make sure RCU is watching before calling a stack trace
As stack tracing now requires "rcu watching", force RCU to be watching when
recording a stack trace.

Link: http://lkml.kernel.org/r/20170512172449.879684501@goodmis.org

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-18 23:57:56 -04:00
Linus Torvalds
667f867c93 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Don't allow negative TCP reordering values, from Soheil Hassas
    Yeganeh.

 2) Don't overflow while parsing ipv6 header options, from Craig Gallek.

 3) Handle more cleanly the case where an individual route entry during
    a dump will not fit into the allocated netlink SKB, from David
    Ahern.

 4) Add missing CONFIG_INET dependency for mlx5e, from Arnd Bergmann.

 5) Allow neighbour updates to converge more quickly via gratuitous
    ARPs, from Ihar Hrachyshka.

 6) Fix compile error from CONFIG_INET is disabled, from Eric Dumazet.

 7) Fix use after free in x25 protocol init, from Lin Zhang.

 8) Valid VLAN pvid ranges passed into br_validate(), from Tobias
    Jungel.

 9) NULL out address lists in child sockets in SCTP, this is similar to
    the fix we made for inet connection sockets last week. From Eric
    Dumazet.

10) Fix NULL deref in mlxsw driver, from Ido Schimmel.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
  mlxsw: spectrum: Avoid possible NULL pointer dereference
  sh_eth: Do not print an error message for probe deferral
  sh_eth: Use platform device for printing before register_netdev()
  mlxsw: spectrum_router: Fix rif counter freeing routine
  mlxsw: spectrum_dpipe: Fix incorrect entry index
  cxgb4: update latest firmware version supported
  qmi_wwan: add another Lenovo EM74xx device ID
  sctp: do not inherit ipv6_{mc|ac|fl}_list from parent
  udp: make *udp*_queue_rcv_skb() functions static
  bridge: netlink: check vlan_default_pvid range
  net: ethernet: faraday: To support device tree usage.
  net: x25: fix one potential use-after-free issue
  bpf: adjust verifier heuristics
  ipv6: Check ip6_find_1stfragopt() return value properly.
  selftests/bpf: fix broken build due to types.h
  bnxt_en: Check status of firmware DCBX agent before setting DCB_CAP_DCBX_HOST.
  bnxt_en: Call bnxt_dcb_init() after getting firmware DCBX configuration.
  net: fix compile error in skb_orphan_partial()
  ipv6: Prevent overrun when parsing v6 header options
  neighbour: update neigh timestamps iff update is effective
  ...
2017-05-18 11:40:21 -07:00
Linus Torvalds
16d95c4396 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull pid namespace fixes from Eric Biederman:
 "These are two bugs that turn out to have simple fixes that were
  reported during the merge window. Both of these issues have existed
  for a while and it just happens that they both were reported at almost
  the same time"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
  pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes
2017-05-18 10:04:42 -07:00
Daniel Borkmann
3c2ce60bdd bpf: adjust verifier heuristics
Current limits with regards to processing program paths do not
really reflect today's needs anymore due to programs becoming
more complex and verifier smarter, keeping track of more data
such as const ALU operations, alignment tracking, spilling of
PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for
smarter matching of what LLVM generates.

This also comes with the side-effect that we result in fewer
opportunities to prune search states and thus often need to do
more work to prove safety than in the past due to different
register states and stack layout where we mismatch. Generally,
it's quite hard to determine what caused a sudden increase in
complexity, it could be caused by something as trivial as a
single branch somewhere at the beginning of the program where
LLVM assigned a stack slot that is marked differently throughout
other branches and thus causing a mismatch, where verifier
then needs to prove safety for the whole rest of the program.
Subsequently, programs with even less than half the insn size
limit can get rejected. We noticed that while some programs
load fine under pre 4.11, they get rejected due to hitting
limits on more recent kernels. We saw that in the vast majority
of cases (90+%) pruning failed due to register mismatches. In
case of stack mismatches, majority of cases failed due to
different stack slot types (invalid, spill, misc) rather than
differences in spilled registers.

This patch makes pruning more aggressive by also adding markers
that sit at conditional jumps as well. Currently, we only mark
jump targets for pruning. For example in direct packet access,
these are usually error paths where we bail out. We found that
adding these markers, it can reduce number of processed insns
by up to 30%. Another option is to ignore reg->id in probing
PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning
slightly as well by up to 7% observed complexity reduction as
stand-alone. Meaning, if a previous path with register type
PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then
in the current state a PTR_TO_MAP_VALUE_OR_NULL register for
the same map X must be safe as well. Last but not least the
patch also adds a scheduling point and bumps the current limit
for instructions to be processed to a more adequate value.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 22:55:27 -04:00
Steven Rostedt (VMware)
545a028190 kprobes: Document how optimized kprobes are removed from module unload
Thomas discovered a bug where the kprobe trace tests had a race
condition where the kprobe_optimizer called from a delayed work queue
that does the optimizing and "unoptimizing" of a kprobe, can try to
modify the text after it has been freed by the init code.

The kprobe trace selftest is a special case, and Thomas and myself
investigated to see if there's a chance that this could also be a bug
with module unloading, as the code is not obvious to how it handles
this. After adding lots of printks, I figured it out. Thomas suggested
that this should be commented so that others will not have to go
through this exercise again.

Link: http://lkml.kernel.org/r/20170516145835.3827d3aa@gandalf.local.home

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-17 21:55:58 -04:00
Steven Rostedt (VMware)
8a49f3e03c ftrace: Remove #ifdef from code and add clear_ftrace_function_probes() stub
No need to add ugly #ifdefs in the code. Having a standard stub file is much
prettier.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-17 21:53:32 -04:00
Naveen N. Rao
a0e6369e4b ftrace/instances: Clear function triggers when removing instances
If instance directories are deleted while there are registered function
triggers:

  # cd /sys/kernel/debug/tracing/instances
  # mkdir test
  # echo "schedule:enable_event:sched:sched_switch" > test/set_ftrace_filter
  # rmdir test
  Unable to handle kernel paging request for data at address 0x00000008
  Unable to handle kernel paging request for data at address 0x00000008
  Faulting instruction address: 0xc0000000021edde8
  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=2048
  NUMA
  pSeries
  Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc kvm iptable_filter fuse binfmt_misc pseries_rng rng_core vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c multipath virtio_net virtio_blk virtio_pci crc32c_vpmsum virtio_ring virtio
  CPU: 8 PID: 8694 Comm: rmdir Not tainted 4.11.0-nnr+ #113
  task: c0000000bab52800 task.stack: c0000000baba0000
  NIP: c0000000021edde8 LR: c0000000021f0590 CTR: c000000002119620
  REGS: c0000000baba3870 TRAP: 0300   Not tainted  (4.11.0-nnr+)
  MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>
    CR: 22002422  XER: 20000000
  CFAR: 00007fffabb725a8 DAR: 0000000000000008 DSISR: 40000000 SOFTE: 0
  GPR00: c00000000220f750 c0000000baba3af0 c000000003157e00 0000000000000000
  GPR04: 0000000000000040 00000000000000eb 0000000000000040 0000000000000000
  GPR08: 0000000000000000 0000000000000113 0000000000000000 c00000000305db98
  GPR12: c000000002119620 c00000000fd42c00 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 c0000000bab52e90 0000000000000000
  GPR24: 0000000000000000 00000000000000eb 0000000000000040 c0000000baba3bb0
  GPR28: c00000009cb06eb0 c0000000bab52800 c00000009cb06eb0 c0000000baba3bb0
  NIP [c0000000021edde8] ring_buffer_lock_reserve+0x8/0x4e0
  LR [c0000000021f0590] trace_event_buffer_lock_reserve+0xe0/0x1a0
  Call Trace:
  [c0000000baba3af0] [c0000000021f96c8] trace_event_buffer_commit+0x1b8/0x280 (unreliable)
  [c0000000baba3b60] [c00000000220f750] trace_event_buffer_reserve+0x80/0xd0
  [c0000000baba3b90] [c0000000021196b8] trace_event_raw_event_sched_switch+0x98/0x180
  [c0000000baba3c10] [c0000000029d9980] __schedule+0x6e0/0xab0
  [c0000000baba3ce0] [c000000002122230] do_task_dead+0x70/0xc0
  [c0000000baba3d10] [c0000000020ea9c8] do_exit+0x828/0xd00
  [c0000000baba3dd0] [c0000000020eaf70] do_group_exit+0x60/0x100
  [c0000000baba3e10] [c0000000020eb034] SyS_exit_group+0x24/0x30
  [c0000000baba3e30] [c00000000200bcec] system_call+0x38/0x54
  Instruction dump:
  60000000 60420000 7d244b78 7f63db78 4bffaa09 393efff8 793e0020 39200000
  4bfffecc 60420000 3c4c00f7 3842a020 <81230008> 2f890000 409e02f0 a14d0008
  ---[ end trace b917b8985d0e650b ]---
  Unable to handle kernel paging request for data at address 0x00000008
  Faulting instruction address: 0xc0000000021edde8
  Unable to handle kernel paging request for data at address 0x00000008
  Faulting instruction address: 0xc0000000021edde8
  Faulting instruction address: 0xc0000000021edde8

To address this, let's clear all registered function probes before
deleting the ftrace instance.

Link: http://lkml.kernel.org/r/c5f1ca624043690bd94642bb6bffd3f2fc504035.1494956770.git.naveen.n.rao@linux.vnet.ibm.com

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-17 21:52:22 -04:00