29431 Commits

Author SHA1 Message Date
Peter Zijlstra
e6018c0f5c sched/wake_q: Document wake_q_add()
The only guarantee provided by wake_q_add() is that a wakeup will
happen after it, it does _NOT_ guarantee the wakeup will be delayed
until the matching wake_up_q().

If wake_q_add() fails the cmpxchg() a concurrent wakeup is pending and
that can happen at any time after the cmpxchg(). This means we should
not rely on the wakeup happening at wake_q_up(), but should be ready
for wake_q_add() to issue the wakeup.

The delay; if provided (most likely); should only result in more efficient
behaviour.

Reported-by: Yongji Xie <elohimes@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-21 11:15:36 +01:00
Prateek Sood
6dc080eeb2 sched/wait: Fix rcuwait_wake_up() ordering
For some peculiar reason rcuwait_wake_up() has the right barrier in
the comment, but not in the code.

This mistake has been observed to cause a deadlock in the following
situation:

    P1					P2

    percpu_up_read()			percpu_down_write()
      rcu_sync_is_idle() // false
					  rcu_sync_enter()
					  ...
      __percpu_up_read()

[S] ,-  __this_cpu_dec(*sem->read_count)
    |   smp_rmb();
[L] |   task = rcu_dereference(w->task) // NULL
    |
    |				    [S]	    w->task = current
    |					    smp_mb();
    |				    [L]	    readers_active_check() // fail
    `-> <store happens here>

Where the smp_rmb() (obviously) fails to constrain the store.

[ peterz: Added changelog. ]

Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8f95c90ceb54 ("sched/wait, RCU: Introduce rcuwait machinery")
Link: https://lkml.kernel.org/r/1543590656-7157-1-git-send-email-prsood@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-21 11:15:36 +01:00
Linus Torvalds
7d0ae236ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix endless loop in nf_tables, from Phil Sutter.

 2) Fix cross namespace ip6_gre tunnel hash list corruption, from
    Olivier Matz.

 3) Don't be too strict in phy_start_aneg() otherwise we might not allow
    restarting auto negotiation. From Heiner Kallweit.

 4) Fix various KMSAN uninitialized value cases in tipc, from Ying Xue.

 5) Memory leak in act_tunnel_key, from Davide Caratti.

 6) Handle chip errata of mv88e6390 PHY, from Andrew Lunn.

 7) Remove linear SKB assumption in fou/fou6, from Eric Dumazet.

 8) Missing udplite rehash callbacks, from Alexey Kodanev.

 9) Log dirty pages properly in vhost, from Jason Wang.

10) Use consume_skb() in neigh_probe() as this is a normal free not a
    drop, from Yang Wei. Likewise in macvlan_process_broadcast().

11) Missing device_del() in mdiobus_register() error paths, from Thomas
    Petazzoni.

12) Fix checksum handling of short packets in mlx5, from Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (96 commits)
  bpf: in __bpf_redirect_no_mac pull mac only if present
  virtio_net: bulk free tx skbs
  net: phy: phy driver features are mandatory
  isdn: avm: Fix string plus integer warning from Clang
  net/mlx5e: Fix cb_ident duplicate in indirect block register
  net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
  net/mlx5e: Fix wrong error code return on FEC query failure
  net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
  tools: bpftool: Cleanup license mess
  bpf: fix inner map masking to prevent oob under speculation
  bpf: pull in pkt_sched.h header for tooling to fix bpftool build
  selftests: forwarding: Add a test case for externally learned FDB entries
  selftests: mlxsw: Test FDB offload indication
  mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
  net: bridge: Mark FDB entries that were added by user as such
  mlxsw: spectrum_fid: Update dummy FID index
  mlxsw: pci: Return error on PCI reset timeout
  mlxsw: pci: Increase PCI SW reset timeout
  mlxsw: pci: Ring CQ's doorbell before RDQ's
  MAINTAINERS: update email addresses of liquidio driver maintainers
  ...
2019-01-21 12:52:31 +13:00
Daniel Borkmann
9d5564ddcf bpf: fix inner map masking to prevent oob under speculation
During review I noticed that inner meta map setup for map in
map is buggy in that it does not propagate all needed data
from the reference map which the verifier is later accessing.

In particular one such case is index masking to prevent out of
bounds access under speculative execution due to missing the
map's unpriv_array/index_mask field propagation. Fix this such
that the verifier is generating the correct code for inlined
lookups in case of unpriviledged use.

Before patch (test_verifier's 'map in map access' dump):

  # bpftool prog dump xla id 3
     0: (62) *(u32 *)(r10 -4) = 0
     1: (bf) r2 = r10
     2: (07) r2 += -4
     3: (18) r1 = map[id:4]
     5: (07) r1 += 272                |
     6: (61) r0 = *(u32 *)(r2 +0)     |
     7: (35) if r0 >= 0x1 goto pc+6   | Inlined map in map lookup
     8: (54) (u32) r0 &= (u32) 0      | with index masking for
     9: (67) r0 <<= 3                 | map->unpriv_array.
    10: (0f) r0 += r1                 |
    11: (79) r0 = *(u64 *)(r0 +0)     |
    12: (15) if r0 == 0x0 goto pc+1   |
    13: (05) goto pc+1                |
    14: (b7) r0 = 0                   |
    15: (15) if r0 == 0x0 goto pc+11
    16: (62) *(u32 *)(r10 -4) = 0
    17: (bf) r2 = r10
    18: (07) r2 += -4
    19: (bf) r1 = r0
    20: (07) r1 += 272                |
    21: (61) r0 = *(u32 *)(r2 +0)     | Index masking missing (!)
    22: (35) if r0 >= 0x1 goto pc+3   | for inner map despite
    23: (67) r0 <<= 3                 | map->unpriv_array set.
    24: (0f) r0 += r1                 |
    25: (05) goto pc+1                |
    26: (b7) r0 = 0                   |
    27: (b7) r0 = 0
    28: (95) exit

After patch:

  # bpftool prog dump xla id 1
     0: (62) *(u32 *)(r10 -4) = 0
     1: (bf) r2 = r10
     2: (07) r2 += -4
     3: (18) r1 = map[id:2]
     5: (07) r1 += 272                |
     6: (61) r0 = *(u32 *)(r2 +0)     |
     7: (35) if r0 >= 0x1 goto pc+6   | Same inlined map in map lookup
     8: (54) (u32) r0 &= (u32) 0      | with index masking due to
     9: (67) r0 <<= 3                 | map->unpriv_array.
    10: (0f) r0 += r1                 |
    11: (79) r0 = *(u64 *)(r0 +0)     |
    12: (15) if r0 == 0x0 goto pc+1   |
    13: (05) goto pc+1                |
    14: (b7) r0 = 0                   |
    15: (15) if r0 == 0x0 goto pc+12
    16: (62) *(u32 *)(r10 -4) = 0
    17: (bf) r2 = r10
    18: (07) r2 += -4
    19: (bf) r1 = r0
    20: (07) r1 += 272                |
    21: (61) r0 = *(u32 *)(r2 +0)     |
    22: (35) if r0 >= 0x1 goto pc+4   | Now fixed inlined inner map
    23: (54) (u32) r0 &= (u32) 0      | lookup with proper index masking
    24: (67) r0 <<= 3                 | for map->unpriv_array.
    25: (0f) r0 += r1                 |
    26: (05) goto pc+1                |
    27: (b7) r0 = 0                   |
    28: (b7) r0 = 0
    29: (95) exit

Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-18 15:19:56 -08:00
Stephane Eranian
1a51c5da5a perf core: Fix perf_proc_update_handler() bug
The perf_proc_update_handler() handles /proc/sys/kernel/perf_event_max_sample_rate
syctl variable.  When the PMU IRQ handler timing monitoring is disabled, i.e,
when /proc/sys/kernel/perf_cpu_time_max_percent is equal to 0 or 100,
then no modification to sysctl_perf_event_sample_rate is allowed to prevent
possible hang from wrong values.

The problem is that the test to prevent modification is made after the
sysctl variable is modified in perf_proc_update_handler().

You get an error:

  $ echo 10001 >/proc/sys/kernel/perf_event_max_sample_rate
  echo: write error: invalid argument

But the value is still modified causing all sorts of inconsistencies:

  $ cat /proc/sys/kernel/perf_event_max_sample_rate
  10001

This patch fixes the problem by moving the parsing of the value after
the test.

Committer testing:

  # echo 100 > /proc/sys/kernel/perf_cpu_time_max_percent
  # echo 10001 > /proc/sys/kernel/perf_event_max_sample_rate
  -bash: echo: write error: Invalid argument
  # cat /proc/sys/kernel/perf_event_max_sample_rate
  10001
  #

Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1547169436-6266-1-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-18 11:10:38 -03:00
Huacai Chen
12fee4cd5b genirq/irqdesc: Fix double increment in alloc_descs()
The recent rework of alloc_descs() introduced a double increment of the
loop counter. As a consequence only every second affinity mask is
validated.

Remove it.

[ tglx: Massaged changelog ]

Fixes: c410abbbacb9 ("genirq/affinity: Add is_managed to struct irq_affinity_desc")
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Cc: Huacai Chen <chenhuacai@gmail.com>
Cc: Dou Liyang <douliyangs@gmail.com>
Link: https://lkml.kernel.org/r/1547694009-16261-1-git-send-email-chenhc@lemote.com
2019-01-18 00:43:09 +01:00
Linus Torvalds
6d060fa390 Merge branch 'stable/for-linus-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb fix from Konrad Rzeszutek Wilk:
 "A tiny fix for v5.0-rc2:

  This fixes an issue with GPU cards not working anymore with the DMA
  mapping work Christopher did - as the SWIOTLB is initialized first and
  then free'd (as IOMMU is available) but we forgot to clear our start
  and end entries which are used and BOOM"

* 'stable/for-linus-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: clear io_tlb_start and io_tlb_end in swiotlb_exit
2019-01-18 06:22:08 +12:00
Andreas Ziegler
0722069a53 tracing/uprobes: Fix output for multiple string arguments
When printing multiple uprobe arguments as strings the output for the
earlier arguments would also include all later string arguments.

This is best explained in an example:

Consider adding a uprobe to a function receiving two strings as
parameters which is at offset 0xa0 in strlib.so and we want to print
both parameters when the uprobe is hit (on x86_64):

$ echo 'p:func /lib/strlib.so:0xa0 +0(%di):string +0(%si):string' > \
    /sys/kernel/debug/tracing/uprobe_events

When the function is called as func("foo", "bar") and we hit the probe,
the trace file shows a line like the following:

  [...] func: (0x7f7e683706a0) arg1="foobar" arg2="bar"

Note the extra "bar" printed as part of arg1. This behaviour stacks up
for additional string arguments.

The strings are stored in a dynamically growing part of the uprobe
buffer by fetch_store_string() after copying them from userspace via
strncpy_from_user(). The return value of strncpy_from_user() is then
directly used as the required size for the string. However, this does
not take the terminating null byte into account as the documentation
for strncpy_from_user() cleary states that it "[...] returns the
length of the string (not including the trailing NUL)" even though the
null byte will be copied to the destination.

Therefore, subsequent calls to fetch_store_string() will overwrite
the terminating null byte of the most recently fetched string with
the first character of the current string, leading to the
"accumulation" of strings in earlier arguments in the output.

Fix this by incrementing the return value of strncpy_from_user() by
one if we did not hit the maximum buffer size.

Link: http://lkml.kernel.org/r/20190116141629.5752-1-andreas.ziegler@fau.de

Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 5baaa59ef09e ("tracing/probes: Implement 'memory' fetch method for uprobes")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-17 10:54:08 -05:00
Mathieu Malaterre
c8dc79806e bpf: Annotate implicit fall through in cgroup_dev_func_proto
There is a plan to build the kernel with -Wimplicit-fallthrough
and this place in the code produced a warnings (W=1).

This commit removes the following warning:

  kernel/bpf/cgroup.c:719:6: warning: this statement may fall through [-Wimplicit-fallthrough=]

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-17 16:52:23 +01:00
Mathieu Malaterre
583c531853 bpf: Make function btf_name_offset_valid static
Initially in commit 69b693f0aefa ("bpf: btf: Introduce BPF Type Format
(BTF)") the function 'btf_name_offset_valid' was introduced as static
function it was later on changed to a non-static one, and then finally
in commit 23127b33ec80 ("bpf: Create a new btf_name_by_offset() for
non type name use case") the function prototype was removed.

Revert back to original implementation and make the function static.
Remove warning triggered with W=1:

  kernel/bpf/btf.c:470:6: warning: no previous prototype for 'btf_name_offset_valid' [-Wmissing-prototypes]

Fixes: 23127b33ec80 ("bpf: Create a new btf_name_by_offset() for non type name use case")
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-17 16:47:05 +01:00
Stanislav Fomichev
4af396ae48 bpf: zero out build_id for BPF_STACK_BUILD_ID_IP
When returning BPF_STACK_BUILD_ID_IP from stack_map_get_build_id_offset,
make sure that build_id field is empty. Since we are using percpu
free list, there is a possibility that we might reuse some previous
bpf_stack_build_id with non-zero build_id.

Fixes: 615755a77b24 ("bpf: extend stackmap to save binary_build_id+offset instead of address")
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-17 16:42:35 +01:00
Stanislav Fomichev
0b698005a9 bpf: don't assume build-id length is always 20 bytes
Build-id length is not fixed to 20, it can be (`man ld` /--build-id):
  * 128-bit (uuid)
  * 160-bit (sha1)
  * any length specified in ld --build-id=0xhexstring

To fix the issue of missing BPF_STACK_BUILD_ID_VALID for shorter build-ids,
assume that build-id is somewhere in the range of 1 .. 20.
Set the remaining bytes to zero.

v2:
* don't introduce new "len = min(BPF_BUILD_ID_SIZE, nhdr->n_descsz)",
  we already know that nhdr->n_descsz <= BPF_BUILD_ID_SIZE if we enter
  this 'if' condition

Fixes: 615755a77b24 ("bpf: extend stackmap to save binary_build_id+offset instead of address")
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-17 16:42:35 +01:00
Andreas Ziegler
ea6eb5e7d1 tracing: uprobes: Fix typo in pr_fmt string
The subsystem-specific message prefix for uprobes was also
"trace_kprobe: " instead of "trace_uprobe: " as described in
the original commit message.

Link: http://lkml.kernel.org/r/20190117133023.19292-1-andreas.ziegler@fau.de

Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 7257634135c24 ("tracing/probe: Show subsystem name in messages")
Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-17 09:51:42 -05:00
Christoph Hellwig
227a76b647 swiotlb: clear io_tlb_start and io_tlb_end in swiotlb_exit
Otherwise is_swiotlb_buffer will return false positives when
we first initialize a swiotlb buffer, but then free it because
we have an IOMMU available.

Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")
Reported-by: Sibren Vasse <sibren@sibrenvasse.nl>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sibren Vasse <sibren@sibrenvasse.nl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2019-01-16 09:59:17 -05:00
Tycho Andersen
a811dc6155 seccomp: fix UAF in user-trap code
On the failure path, we do an fput() of the listener fd if the filter fails
to install (e.g. because of a TSYNC race that's lost, or if the thread is
killed, etc.). fput() doesn't actually release the fd, it just ads it to a
work queue. Then the thread proceeds to free the filter, even though the
listener struct file has a reference to it.

To fix this, on the failure path let's set the private data to null, so we
know in ->release() to ignore the filter.

Reported-by: syzbot+981c26489b2d1c6316ba@syzkaller.appspotmail.com
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.morris@microsoft.com>
2019-01-15 09:43:12 -08:00
Linus Torvalds
7939f8beec Andrea Righi fixed a NULL pointer dereference in trace_kprobe_create()
It is possible to trigger a NULL pointer dereference by writing an
 incorrectly formatted string to the krpobe_events file.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXD4L1RQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qsv7AP9nzekt94AQC2t5OQ38ph/nYGBjLc3T
 yLqFMshqUSgyVAEAgFB88fvniwLOMFyAqbfRb0+4mq1SDeThBY7TtJBzSQI=
 =+Pyh
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Andrea Righi fixed a NULL pointer dereference in trace_kprobe_create()

  It is possible to trigger a NULL pointer dereference by writing an
  incorrectly formatted string to the krpobe_events file"

* tag 'trace-v5.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/kprobes: Fix NULL pointer dereference in trace_kprobe_create()
2019-01-16 05:28:26 +12:00
Linus Torvalds
e8746440bf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix regression in multi-SKB responses to RTM_GETADDR, from Arthur
    Gautier.

 2) Fix ipv6 frag parsing in openvswitch, from Yi-Hung Wei.

 3) Unbounded recursion in ipv4 and ipv6 GUE tunnels, from Stefano
    Brivio.

 4) Use after free in hns driver, from Yonglong Liu.

 5) icmp6_send() needs to handle the case of NULL skb, from Eric
    Dumazet.

 6) Missing rcu read lock in __inet6_bind() when operating on mapped
    addresses, from David Ahern.

 7) Memory leak in tipc-nl_compat_publ_dump(), from Gustavo A. R. Silva.

 8) Fix PHY vs r8169 module loading ordering issues, from Heiner
    Kallweit.

 9) Fix bridge vlan memory leak, from Ido Schimmel.

10) Dev refcount leak in AF_PACKET, from Jason Gunthorpe.

11) Infoleak in ipv6_local_error(), flow label isn't completely
    initialized. From Eric Dumazet.

12) Handle mv88e6390 errata, from Andrew Lunn.

13) Making vhost/vsock CID hashing consistent, from Zha Bin.

14) Fix lack of UMH cleanup when it unexpectedly exits, from Taehee Yoo.

15) Bridge forwarding must clear skb->tstamp, from Paolo Abeni.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  bnxt_en: Fix context memory allocation.
  bnxt_en: Fix ring checking logic on 57500 chips.
  mISDN: hfcsusb: Use struct_size() in kzalloc()
  net: clear skb->tstamp in bridge forwarding path
  net: bpfilter: disallow to remove bpfilter module while being used
  net: bpfilter: restart bpfilter_umh when error occurred
  net: bpfilter: use cleanup callback to release umh_info
  umh: add exit routine for UMH process
  isdn: i4l: isdn_tty: Fix some concurrency double-free bugs
  vhost/vsock: fix vhost vsock cid hashing inconsistent
  net: stmmac: Prevent RX starvation in stmmac_napi_poll()
  net: stmmac: Fix the logic of checking if RX Watchdog must be enabled
  net: stmmac: Check if CBS is supported before configuring
  net: stmmac: dwxgmac2: Only clear interrupts that are active
  net: stmmac: Fix PCI module removal leak
  tools/bpf: fix bpftool map dump with bitfields
  tools/bpf: test btf bitfield with >=256 struct member offset
  bpf: fix bpffs bitfield pretty print
  net: ethernet: mediatek: fix warning in phy_start_aneg
  tcp: change txhash on SYN-data timeout
  ...
2019-01-16 05:13:36 +12:00
Andrea Righi
8b05a3a750 tracing/kprobes: Fix NULL pointer dereference in trace_kprobe_create()
It is possible to trigger a NULL pointer dereference by writing an
incorrectly formatted string to krpobe_events (trying to create a
kretprobe omitting the symbol).

Example:

 echo "r:event_1 " >> /sys/kernel/debug/tracing/kprobe_events

That triggers this:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 6 PID: 1757 Comm: bash Not tainted 5.0.0-rc1+ #125
 Hardware name: Dell Inc. XPS 13 9370/0F6P3V, BIOS 1.5.1 08/09/2018
 RIP: 0010:kstrtoull+0x2/0x20
 Code: 28 00 00 00 75 17 48 83 c4 18 5b 41 5c 5d c3 b8 ea ff ff ff eb e1 b8 de ff ff ff eb da e8 d6 36 bb ff 66 0f 1f 44 00 00 31 c0 <80> 3f 2b 55 48 89 e5 0f 94 c0 48 01 c7 e8 5c ff ff ff 5d c3 66 2e
 RSP: 0018:ffffb5d482e57cb8 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff82b12720
 RDX: ffffb5d482e57cf8 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffffb5d482e57d70 R08: ffffa0c05e5a7080 R09: ffffa0c05e003980
 R10: 0000000000000000 R11: 0000000040000000 R12: ffffa0c04fe87b08
 R13: 0000000000000001 R14: 000000000000000b R15: ffffa0c058d749e1
 FS:  00007f137c7f7740(0000) GS:ffffa0c05e580000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 0000000497d46004 CR4: 00000000003606e0
 Call Trace:
  ? trace_kprobe_create+0xb6/0x840
  ? _cond_resched+0x19/0x40
  ? _cond_resched+0x19/0x40
  ? __kmalloc+0x62/0x210
  ? argv_split+0x8f/0x140
  ? trace_kprobe_create+0x840/0x840
  ? trace_kprobe_create+0x840/0x840
  create_or_delete_trace_kprobe+0x11/0x30
  trace_run_command+0x50/0x90
  trace_parse_run_command+0xc1/0x160
  probes_write+0x10/0x20
  __vfs_write+0x3a/0x1b0
  ? apparmor_file_permission+0x1a/0x20
  ? security_file_permission+0x31/0xf0
  ? _cond_resched+0x19/0x40
  vfs_write+0xb1/0x1a0
  ksys_write+0x55/0xc0
  __x64_sys_write+0x1a/0x20
  do_syscall_64+0x5a/0x120
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix by doing the proper argument checks in trace_kprobe_create().

Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/lkml/20190111095108.b79a2ee026185cbd62365977@kernel.org
Link: http://lkml.kernel.org/r/20190111060113.GA22841@xps-13
Fixes: 6212dd29683e ("tracing/kprobes: Use dyn_event framework for kprobe events")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-15 11:33:45 -05:00
Thomas Gleixner
93ad0fc088 posix-cpu-timers: Unbreak timer rearming
The recent commit which prevented a division by 0 issue in the alarm timer
code broke posix CPU timers as an unwanted side effect.

The reason is that the common rearm code checks for timer->it_interval
being 0 now. What went unnoticed is that the posix cpu timer setup does not
initialize timer->it_interval as it stores the interval in CPU timer
specific storage. The reason for the separate storage is historical as the
posix CPU timers always had a 64bit nanoseconds representation internally
while timer->it_interval is type ktime_t which used to be a modified
timespec representation on 32bit machines.

Instead of reverting the offending commit and fixing the alarmtimer issue
in the alarmtimer code, store the interval in timer->it_interval at CPU
timer setup time so the common code check works. This also repairs the
existing inconistency of the posix CPU timer code which kept a single shot
timer armed despite of the interval being 0.

The separate storage can be removed in mainline, but that needs to be a
separate commit as the current one has to be backported to stable kernels.

Fixes: 0e334db6bb4b ("posix-timers: Fix division by zero bug")
Reported-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190111133500.840117406@linutronix.de
2019-01-15 16:34:37 +01:00
Srinivas Ramana
bddda606ec genirq: Make sure the initial affinity is not empty
If all CPUs in the irq_default_affinity mask are offline when an interrupt
is initialized then irq_setup_affinity() can set an empty affinity mask for
a newly allocated interrupt.

Fix this by falling back to cpu_online_mask in case the resulting affinity
mask is zero.

Signed-off-by: Srinivas Ramana <sramana@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-msm@vger.kernel.org
Link: https://lkml.kernel.org/r/1545312957-8504-1-git-send-email-sramana@codeaurora.org
2019-01-15 11:23:27 +01:00
Jonathan Neuschäfer
b7285b4253 kernel/sys.c: Clarify that UNAME26 does not generate unique versions anymore
UNAME26 is a mechanism to report Linux's version as 2.6.x, for
compatibility with old/broken software.  Due to the way it is
implemented, it would have to be updated after 5.0, to keep the
resulting versions unique.  Linus Torvalds argued:

 "Do we actually need this?

  I'd rather let it bitrot, and just let it return random versions. It
  will just start again at 2.4.60, won't it?

  Anybody who uses UNAME26 for a 5.x kernel might as well think it's
  still 4.x. The user space is so old that it can't possibly care about
  differences between 4.x and 5.x, can it?

  The only thing that matters is that it shows "2.4.<largeenough>",
  which it will do regardless"

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-14 10:38:03 +12:00
Taehee Yoo
73ab1cb2de umh: add exit routine for UMH process
A UMH process which is created by the fork_usermode_blob() such as
bpfilter needs to release members of the umh_info when process is
terminated.
But the do_exit() does not release members of the umh_info. hence module
which uses UMH needs own code to detect whether UMH process is
terminated or not.
But this implementation needs extra code for checking the status of
UMH process. it eventually makes the code more complex.

The new PF_UMH flag is added and it is used to identify UMH processes.
The exit_umh() does not release members of the umh_info.
Hence umh_info->cleanup callback should release both members of the
umh_info and the private data.

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-11 18:05:40 -08:00
Yonghong Song
17e3ac8125 bpf: fix bpffs bitfield pretty print
Commit 9d5f9f701b18 ("bpf: btf: fix struct/union/fwd types
with kind_flag") introduced kind_flag and used bitfield_size
in the btf_member to directly pretty print member values.

The commit contained a bug where the incorrect parameters could be
passed to function btf_bitfield_seq_show(). The bits_offset
parameter in the function expects a value less than 8.
Instead, the member offset in the structure is passed.

The below is btf_bitfield_seq_show() func signature:
  void btf_bitfield_seq_show(void *data, u8 bits_offset,
                             u8 nr_bits, struct seq_file *m)
both bits_offset and nr_bits are u8 type. If the bitfield
member offset is greater than 256, incorrect value will
be printed.

This patch fixed the issue by calculating correct proper
data offset and bits_offset similar to non kind_flag case.

Fixes: 9d5f9f701b18 ("bpf: btf: fix struct/union/fwd types with kind_flag")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-11 10:40:54 +01:00
Song Liu
beaf3d1901 bpf: fix panic in stack_map_get_build_id() on i386 and arm32
As Naresh reported, test_stacktrace_build_id() causes panic on i386 and
arm32 systems. This is caused by page_address() returns NULL in certain
cases.

This patch fixes this error by using kmap_atomic/kunmap_atomic instead
of page_address.

Fixes: 615755a77b24 (" bpf: extend stackmap to save binary_build_id+offset instead of address")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-10 16:02:17 +01:00
Linus Torvalds
a88cc8da02 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "14 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, page_alloc: do not wake kswapd with zone lock held
  hugetlbfs: revert "use i_mmap_rwsem for more pmd sharing synchronization"
  hugetlbfs: revert "Use i_mmap_rwsem to fix page fault/truncate race"
  mm: page_mapped: don't assume compound page is huge or THP
  mm/memory.c: initialise mmu_notifier_range correctly
  tools/vm/page_owner: use page_owner_sort in the use example
  kasan: fix krealloc handling for tag-based mode
  kasan: make tag based mode work with CONFIG_HARDENED_USERCOPY
  kasan, arm64: use ARCH_SLAB_MINALIGN instead of manual aligning
  mm, memcg: fix reclaim deadlock with writeback
  mm/usercopy.c: no check page span for stack objects
  slab: alien caches must not be initialized if the allocation of the alien cache failed
  fork, memcg: fix cached_stacks case
  zram: idle writeback fixes and cleanup
2019-01-08 18:58:29 -08:00
Shakeel Butt
ba4a45746c fork, memcg: fix cached_stacks case
Commit 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on
memcg charge fail") fixes a crash caused due to failed memcg charge of
the kernel stack.  However the fix misses the cached_stacks case which
this patch fixes.  So, the same crash can happen if the memcg charge of
a cached stack is failed.

Link: http://lkml.kernel.org/r/20190102180145.57406-1-shakeelb@google.com
Fixes: 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on memcg charge fail")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
David Herrmann
7b55851367 fork: record start_time late
This changes the fork(2) syscall to record the process start_time after
initializing the basic task structure but still before making the new
process visible to user-space.

Technically, we could record the start_time anytime during fork(2).  But
this might lead to scenarios where a start_time is recorded long before
a process becomes visible to user-space.  For instance, with
userfaultfd(2) and TLS, user-space can delay the execution of fork(2)
for an indefinite amount of time (and will, if this causes network
access, or similar).

By recording the start_time late, it much closer reflects the point in
time where the process becomes live and can be observed by other
processes.

Lastly, this makes it much harder for user-space to predict and control
the start_time they get assigned.  Previously, user-space could fork a
process and stall it in copy_thread_tls() before its pid is allocated,
but after its start_time is recorded.  This can be misused to later-on
cycle through PIDs and resume the stalled fork(2) yielding a process
that has the same pid and start_time as a process that existed before.
This can be used to circumvent security systems that identify processes
by their pid+start_time combination.

Even though user-space was always aware that start_time recording is
flaky (but several projects are known to still rely on start_time-based
identification), changing the start_time to be recorded late will help
mitigate existing attacks and make it much harder for user-space to
control the start_time a process gets assigned.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 09:40:53 -08:00
Linus Torvalds
85e1ffbd42 Kbuild late updates for v4.21
- improve boolinit.cocci and use_after_iter.cocci semantic patches
 
 - fix alignment for kallsyms
 
 - move 'asm goto' compiler test to Kconfig and clean up jump_label
   CONFIG option
 
 - generate asm-generic wrappers automatically if arch does not implement
   mandatory UAPI headers
 
 - remove redundant generic-y defines
 
 - misc cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJcMV5GAAoJED2LAQed4NsGs9gQAI/oGg8wJgk9a7+dJCX245W5
 F4ReftnQd4AFptFCi9geJkr+sfViXNgwPLqlJxiXz8Qe8XP7z3LcArDw3FUzwvGn
 bMSBiN9ggwWkOFgF523XesYgUVtcLpkNch/Migzf1Ac0FHk0G9o7gjcdsvAWHkUu
 qFwtNcUB6PElRbhsHsh5qCY1/6HaAXgf/7O7wztnaKRe9myN6f2HzT4wANS9HHde
 1e1r0LcIQeGWfG+3va3fZl6SDxSI/ybl244OcDmDyYl6RA1skSDlHbIBIFgUPoS0
 cLyzoVj+GkfI1fRFEIfou+dj7lpukoAXHsggHo0M+ofqtbMF+VB2T3jvg4txanCP
 TXzDc+04QUguK5yVnBfcnyC64Htrhnbq0eGy43kd1VZWAEGApl+680P8CRsWU3ZV
 kOiFvZQ6RP/Ssw+a42yU3SHr31WD7feuQqHU65osQt4rdyL5wnrfU1vaUvJSkltF
 cyPr9Kz/Ism0kPodhpFkuKxwtlKOw6/uwdCQoQHtxAPkvkcydhYx93x3iE0nxObS
 CRMximiRyE12DOcv/3uv69n0JOPn6AsITcMNp8XryASYrR2/52txhGKGhvo3+Zoq
 5pwc063JsuxJ/5/dcOw/erQar5d1eBRaBJyEWnXroxUjbsLPAznE+UIN8tmvyVly
 SunlxNOXBdYeWN6t6S3H
 =I+r6
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v4.21-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - improve boolinit.cocci and use_after_iter.cocci semantic patches

 - fix alignment for kallsyms

 - move 'asm goto' compiler test to Kconfig and clean up jump_label
   CONFIG option

 - generate asm-generic wrappers automatically if arch does not
   implement mandatory UAPI headers

 - remove redundant generic-y defines

 - misc cleanups

* tag 'kbuild-v4.21-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: rename generated .*conf-cfg to *conf-cfg
  kbuild: remove unnecessary stubs for archheader and archscripts
  kbuild: use assignment instead of define ... endef for filechk_* rules
  arch: remove redundant UAPI generic-y defines
  kbuild: generate asm-generic wrappers if mandatory headers are missing
  arch: remove stale comments "UAPI Header export list"
  riscv: remove redundant kernel-space generic-y
  kbuild: change filechk to surround the given command with { }
  kbuild: remove redundant target cleaning on failure
  kbuild: clean up rule_dtc_dt_yaml
  kbuild: remove UIMAGE_IN and UIMAGE_OUT
  jump_label: move 'asm goto' support test to Kconfig
  kallsyms: lower alignment on ARM
  scripts: coccinelle: boolinit: drop warnings on named constants
  scripts: coccinelle: check for redeclaration
  kconfig: remove unused "file" field of yylval union
  nds32: remove redundant kernel-space generic-y
  nios2: remove unneeded HAS_DMA define
2019-01-06 16:33:10 -08:00
Linus Torvalds
e2b745f469 dma-mapping fixes for Linux 4.21-rc1
Fix various regressions introduced in this cycles:
 
  - fix dma-debug tracking for the map_page / map_single consolidatation
  - properly stub out DMA mapping symbols for !HAS_DMA builds to avoid
    link failures
  - fix AMD Gart direct mappings
  - setup the dma address for no kernel mappings using the remap
    allocator
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlwyR9ULHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPvOA/+L+32p2pm8o6NTgvtRvqsKNrbOm02fORLrhBqAiok
 AcirFDxTfMuUWU2isr7E7WNqwEmUQ1nVUa+I0IJ/IJFfKdTggXcaTX1M19+62KWa
 1LHpZLg1t2rl2yFQHgTrFKr5sz1PwUKZO8UbrYaYYgLgQkWDRzJs4E/tFNju8pMm
 0Usexo/bkI5mreJBImMsFwAnuk0k3NT058XIeD+eNttKjcuz5kEH+bE/999vySW3
 sOj9Peic/EFelOGb4ODxUIPjhiGFMv5dVusSAsFBH26iwQfX/tFSmXhrI5cnDewg
 NlREennfyM+6uTH/DO+BlX7eGCRYbFc1GU5H9q4rRMXhEam6oc2AzVKuElJOVstZ
 XVjP6zTwmuOh/5ff0NG6EPjA/OFcmlBEsmeWu4xSS8KsNILOkpUaPed/uWnA7O+2
 mvU104NA5cHgVMgiGNM/4ilirkEZEFEHYhafH42bQxjMigm7ZHN14NtwM7StLTu6
 QgyfPUcW/LmHj2scgvB1AZ+iQX0z7yJJMGifUxtz+eMCWCC7neOJ7JLvNnS9WI5w
 9RwYaCOcDAZyAmCpbSADWxeG9cfsCDp8wmaGs3YVyhkDU8tCSqbxWJutvyDQnC17
 GtZ0vYLTaJXBCq1L/FC0y8NCCGgvySPXYU7/ZYuOCzS4q2jvjwTWD3dKodvnS+mb
 B0s=
 =H9J6
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.21-1' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:
 "Fix various regressions introduced in this cycles:

   - fix dma-debug tracking for the map_page / map_single
     consolidatation

   - properly stub out DMA mapping symbols for !HAS_DMA builds to avoid
     link failures

   - fix AMD Gart direct mappings

   - setup the dma address for no kernel mappings using the remap
     allocator"

* tag 'dma-mapping-4.21-1' of git://git.infradead.org/users/hch/dma-mapping:
  dma-direct: fix DMA_ATTR_NO_KERNEL_MAPPING for remapped allocations
  x86/amd_gart: fix unmapping of non-GART mappings
  dma-mapping: remove a few unused exports
  dma-mapping: properly stub out the DMA API for !CONFIG_HAS_DMA
  dma-mapping: remove dmam_{declare,release}_coherent_memory
  dma-mapping: implement dmam_alloc_coherent using dmam_alloc_attrs
  dma-mapping: implement dma_map_single_attrs using dma_map_page_attrs
2019-01-06 11:47:26 -08:00
Daniel Borkmann
d3bd7413e0 bpf: fix sanitation of alu op with pointer / scalar type from different paths
While 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer
arithmetic") took care of rejecting alu op on pointer when e.g. pointer
came from two different map values with different map properties such as
value size, Jann reported that a case was not covered yet when a given
alu op is used in both "ptr_reg += reg" and "numeric_reg += reg" from
different branches where we would incorrectly try to sanitize based
on the pointer's limit. Catch this corner case and reject the program
instead.

Fixes: 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-05 21:32:38 -08:00
Masahiro Yamada
ad77408635 kbuild: change filechk to surround the given command with { }
filechk_* rules often consist of multiple 'echo' lines. They must be
surrounded with { } or ( ) to work correctly. Otherwise, only the
string from the last 'echo' would be written into the target.

Let's take care of that in the 'filechk' in scripts/Kbuild.include
to clean up filechk_* rules.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-01-06 09:46:51 +09:00
Masahiro Yamada
e9666d10a5 jump_label: move 'asm goto' support test to Kconfig
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".

The jump label is controlled by HAVE_JUMP_LABEL, which is defined
like this:

  #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
  # define HAVE_JUMP_LABEL
  #endif

We can improve this by testing 'asm goto' support in Kconfig, then
make JUMP_LABEL depend on CC_HAS_ASM_GOTO.

Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
match to the real kernel capability.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
2019-01-06 09:46:51 +09:00
Linus Torvalds
a65981109f Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - procfs updates

 - various misc bits

 - lib/ updates

 - epoll updates

 - autofs

 - fatfs

 - a few more MM bits

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (58 commits)
  mm/page_io.c: fix polled swap page in
  checkpatch: add Co-developed-by to signature tags
  docs: fix Co-Developed-by docs
  drivers/base/platform.c: kmemleak ignore a known leak
  fs: don't open code lru_to_page()
  fs/: remove caller signal_pending branch predictions
  mm/: remove caller signal_pending branch predictions
  arch/arc/mm/fault.c: remove caller signal_pending_branch predictions
  kernel/sched/: remove caller signal_pending branch predictions
  kernel/locking/mutex.c: remove caller signal_pending branch predictions
  mm: select HAVE_MOVE_PMD on x86 for faster mremap
  mm: speed up mremap by 20x on large regions
  mm: treewide: remove unused address argument from pte_alloc functions
  initramfs: cleanup incomplete rootfs
  scripts/gdb: fix lx-version string output
  kernel/kcov.c: mark write_comp_data() as notrace
  kernel/sysctl: add panic_print into sysctl
  panic: add options to print system info when panic happens
  bfs: extra sanity checking and static inode bitmap
  exec: separate MM_ANONPAGES and RLIMIT_STACK accounting
  ...
2019-01-05 09:16:18 -08:00
Christoph Hellwig
8270f3a11c dma-direct: fix DMA_ATTR_NO_KERNEL_MAPPING for remapped allocations
We need to return a dma_addr_t even if we don't have a kernel mapping.
Do so by consolidating the phys_to_dma call in a single place and jump
to it from all the branches that return successfully.

Fixes: bfd56cd60521 ("dma-mapping: support highmem in the generic remap allocator")
Reported-by: Liviu Dudau <liviu@dudau.co.uk
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Liviu Dudau <liviu@dudau.co.uk>
2019-01-05 08:28:29 +01:00
Davidlohr Bueso
34ec35ad8f kernel/sched/: remove caller signal_pending branch predictions
This is already done for us internally by the signal machinery.

Link: http://lkml.kernel.org/r/20181116002713.8474-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:48 -08:00
Davidlohr Bueso
3bb5f4ac55 kernel/locking/mutex.c: remove caller signal_pending branch predictions
This is already done for us internally by the signal machinery.

Link: http://lkml.kernel.org/r/20181116002713.8474-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:48 -08:00
Anders Roxell
6347244316 kernel/kcov.c: mark write_comp_data() as notrace
Since __sanitizer_cov_trace_const_cmp4 is marked as notrace, the
function called from __sanitizer_cov_trace_const_cmp4 shouldn't be
traceable either.  ftrace_graph_caller() gets called every time func
write_comp_data() gets called if it isn't marked 'notrace'.  This is the
backtrace from gdb:

 #0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
 #1  0xffffff8010201920 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:151
 #2  0xffffff8010439714 in write_comp_data (type=5, arg1=0, arg2=0, ip=18446743524224276596) at ../kernel/kcov.c:116
 #3  0xffffff8010439894 in __sanitizer_cov_trace_const_cmp4 (arg1=<optimized out>, arg2=<optimized out>) at ../kernel/kcov.c:188
 #4  0xffffff8010201874 in prepare_ftrace_return (self_addr=18446743524226602768, parent=0xffffff801014b918, frame_pointer=18446743524223531344) at ./include/generated/atomic-instrumented.h:27
 #5  0xffffff801020194c in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:182

Rework so that write_comp_data() that are called from
__sanitizer_cov_trace_*_cmp*() are marked as 'notrace'.

Commit 903e8ff86753 ("kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace")
missed to mark write_comp_data() as 'notrace'. When that patch was
created gcc-7 was used. In lib/Kconfig.debug
config KCOV_ENABLE_COMPARISONS
	depends on $(cc-option,-fsanitize-coverage=trace-cmp)

That code path isn't hit with gcc-7. However, it were that with gcc-8.

Link: http://lkml.kernel.org/r/20181206143011.23719-1-anders.roxell@linaro.org
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:47 -08:00
Feng Tang
81c9d43f94 kernel/sysctl: add panic_print into sysctl
So that we can also runtime chose to print out the needed system info
for panic, other than setting the kernel cmdline.

Link: http://lkml.kernel.org/r/1543398842-19295-3-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:47 -08:00
Feng Tang
d999bd9392 panic: add options to print system info when panic happens
Kernel panic issues are always painful to debug, partially because it's
not easy to get enough information of the context when panic happens.

And we have ramoops and kdump for that, while this commit tries to
provide a easier way to show the system info by adding a cmdline
parameter, referring some idea from sysrq handler.

Link: http://lkml.kernel.org/r/1543398842-19295-2-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:47 -08:00
Yi Wang
fb5bf31722 fork: fix some -Wmissing-prototypes warnings
We get a warning when building kernel with W=1:

  kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes]
  kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes]

Add the missing declaration in head file to fix this.

Also, remove arch_release_thread_stack() completely because no arch
seems to implement it since bb9d81264 (arch: remove tile port).

Link: http://lkml.kernel.org/r/1542170087-23645-1-git-send-email-wang.yi59@zte.com.cn
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:47 -08:00
Tetsuo Handa
304ae42739 kernel/hung_task.c: break RCU locks based on jiffies
check_hung_uninterruptible_tasks() is currently calling rcu_lock_break()
for every 1024 threads.  But check_hung_task() is very slow if printk()
was called, and is very fast otherwise.

If many threads within some 1024 threads called printk(), the RCU grace
period might be extended enough to trigger RCU stall warnings.
Therefore, calling rcu_lock_break() for every some fixed jiffies will be
safer.

Link: http://lkml.kernel.org/r/1544800658-11423-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:45 -08:00
Liu, Chuansheng
168e06f793 kernel/hung_task.c: force console verbose before panic
Based on commit 401c636a0eeb ("kernel/hung_task.c: show all hung tasks
before panic"), we could get the call stack of hung task.

However, if the console loglevel is not high, we still can not see the
useful panic information in practice, and in most cases users don't set
console loglevel to high level.

This patch is to force console verbose before system panic, so that the
real useful information can be seen in the console, instead of being
like the following, which doesn't have hung task information.

  INFO: task init:1 blocked for more than 120 seconds.
        Tainted: G     U  W         4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  Kernel panic - not syncing: hung_task: blocked tasks
  CPU: 2 PID: 479 Comm: khungtaskd Tainted: G     U  W         4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1
  Call Trace:
   dump_stack+0x4f/0x65
   panic+0xde/0x231
   watchdog+0x290/0x410
   kthread+0x12c/0x150
   ret_from_fork+0x35/0x40
  reboot: panic mode set: p,w
  Kernel Offset: 0x34000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A6015B675@SHSMSX101.ccr.corp.intel.com
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:45 -08:00
Cheng Lin
09be178400 proc/sysctl: fix return error for proc_doulongvec_minmax()
If the number of input parameters is less than the total parameters, an
EINVAL error will be returned.

For example, we use proc_doulongvec_minmax to pass up to two parameters
with kern_table:

{
	.procname       = "monitor_signals",
	.data           = &monitor_sigs,
	.maxlen         = 2*sizeof(unsigned long),
	.mode           = 0644,
	.proc_handler   = proc_doulongvec_minmax,
},

Reproduce:

When passing two parameters, it's work normal.  But passing only one
parameter, an error "Invalid argument"(EINVAL) is returned.

  [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
  [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
  1       2
  [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
  -bash: echo: write error: Invalid argument
  [root@cl150 ~]# echo $?
  1
  [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
  3       2
  [root@cl150 ~]#

The following is the result after apply this patch.  No error is
returned when the number of input parameters is less than the total
parameters.

  [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
  [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
  1       2
  [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
  [root@cl150 ~]# echo $?
  0
  [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
  3       2
  [root@cl150 ~]#

There are three processing functions dealing with digital parameters,
__do_proc_dointvec/__do_proc_douintvec/__do_proc_doulongvec_minmax.

This patch deals with __do_proc_doulongvec_minmax, just as
__do_proc_dointvec does, adding a check for parameters 'left'.  In
__do_proc_douintvec, its code implementation explicitly does not support
multiple inputs.

static int __do_proc_douintvec(...){
         ...
         /*
          * Arrays are not supported, keep this simple. *Do not* add
          * support for them.
          */
         if (vleft != 1) {
                 *lenp = 0;
                 return -EINVAL;
         }
         ...
}

So, just __do_proc_doulongvec_minmax has the problem.  And most use of
proc_doulongvec_minmax/proc_doulongvec_ms_jiffies_minmax just have one
parameter.

Link: http://lkml.kernel.org/r/1544081775-15720-1-git-send-email-cheng.lin130@zte.com.cn
Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:45 -08:00
Linus Torvalds
594cc251fd make 'user_access_begin()' do 'access_ok()'
Originally, the rule used to be that you'd have to do access_ok()
separately, and then user_access_begin() before actually doing the
direct (optimized) user access.

But experience has shown that people then decide not to do access_ok()
at all, and instead rely on it being implied by other operations or
similar.  Which makes it very hard to verify that the access has
actually been range-checked.

If you use the unsafe direct user accesses, hardware features (either
SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged
Access Never - on ARM) do force you to use user_access_begin().  But
nothing really forces the range check.

By putting the range check into user_access_begin(), we actually force
people to do the right thing (tm), and the range check vill be visible
near the actual accesses.  We have way too long a history of people
trying to avoid them.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 12:56:09 -08:00
Christoph Hellwig
48e638fb68 dma-mapping: remove a few unused exports
Now that the slow path DMA API calls are implemented out of line a few
helpers only used by them don't need to be exported anymore.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-04 09:03:17 +01:00
Christoph Hellwig
4788ba5792 dma-mapping: remove dmam_{declare,release}_coherent_memory
These functions have never been used.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-04 09:03:17 +01:00
Christoph Hellwig
d7076f0784 dma-mapping: implement dmam_alloc_coherent using dmam_alloc_attrs
dmam_alloc_coherent is just the default no-flags case of
dmam_alloc_attrs, so take advantage of this similar to the non-managed
version.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-04 09:03:16 +01:00
Christoph Hellwig
2e05ea5cdc dma-mapping: implement dma_map_single_attrs using dma_map_page_attrs
And also switch the way we implement the unmap side around to stay
consistent.  This ensures dma-debug works again because it records which
function we used for mapping to ensure it is also used for unmapping,
and also reduces further code duplication.  Last but not least this
also officially allows calling dma_sync_single_* for mappings created
using dma_map_page, which is perfectly fine given that the sync calls
only take a dma_addr_t, but not a virtual address or struct page.

Fixes: 7f0fee242e ("dma-mapping: merge dma_unmap_page_attrs and dma_unmap_single_attrs")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: LABBE Corentin <clabbe.montjoie@gmail.com>
2019-01-04 09:02:17 +01:00
Linus Torvalds
96d4f267e4 Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access.  But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model.  And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

 - csky still had the old "verify_area()" name as an alias.

 - the iter_iov code had magical hardcoded knowledge of the actual
   values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
   really used it)

 - microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something.  Any missed conversion should be trivially fixable, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-03 18:57:57 -08:00
Linus Torvalds
43d86ee8c6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Several fixes here. Basically split down the line between newly
  introduced regressions and long existing problems:

   1) Double free in tipc_enable_bearer(), from Cong Wang.

   2) Many fixes to nf_conncount, from Florian Westphal.

   3) op->get_regs_len() can throw an error, check it, from Yunsheng
      Lin.

   4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman
      driver, from Scott Wood.

   5) Inifnite loop in fib_empty_table(), from Yue Haibing.

   6) Use after free in ax25_fillin_cb(), from Cong Wang.

   7) Fix socket locking in nr_find_socket(), also from Cong Wang.

   8) Fix WoL wakeup enable in r8169, from Heiner Kallweit.

   9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani.

  10) Fix ptr_ring wrap during queue swap, from Cong Wang.

  11) Missing shutdown callback in hinic driver, from Xue Chaojing.

  12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano
      Brivio.

  13) BPF out of bounds speculation fixes from Daniel Borkmann"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
  ipv6: Consider sk_bound_dev_if when binding a socket to an address
  ipv6: Fix dump of specific table with strict checking
  bpf: add various test cases to selftests
  bpf: prevent out of bounds speculation on pointer arithmetic
  bpf: fix check_map_access smin_value test when pointer contains offset
  bpf: restrict unknown scalars of mixed signed bounds for unprivileged
  bpf: restrict stack pointer arithmetic for unprivileged
  bpf: restrict map value pointer arithmetic for unprivileged
  bpf: enable access to ax register also from verifier rewrite
  bpf: move tmp variable into ax register in interpreter
  bpf: move {prev_,}insn_idx into verifier env
  isdn: fix kernel-infoleak in capi_unlocked_ioctl
  ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error
  net/hamradio/6pack: use mod_timer() to rearm timers
  net-next/hinic:add shutdown callback
  net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT
  ip: validate header length on virtual device xmit
  tap: call skb_probe_transport_header after setting skb->dev
  ptr_ring: wrap back ->producer in __ptr_ring_swap_queue()
  net: rds: remove unnecessary NULL check
  ...
2019-01-03 12:53:47 -08:00