Commit Graph

736166 Commits

Author SHA1 Message Date
Alexey Dobriyan
163cf548db fs/proc/internal.h: rearrange struct proc_dir_entry
struct proc_dir_entry became bit messy over years:

* move 16-bit ->mode_t before namelen to get rid of padding
* make ->in_use first field: it seems to be most used resulting in
  smaller code on x86_64 (defconfig):

	add/remove: 0/0 grow/shrink: 7/13 up/down: 24/-67 (-43)
	Function                                     old     new   delta
	proc_readdir_de                              451     455      +4
	proc_get_inode                               282     286      +4
	pde_put                                       65      69      +4
	remove_proc_subtree                          294     297      +3
	remove_proc_entry                            297     300      +3
	proc_register                                295     298      +3
	proc_notify_change                            94      97      +3
	unuse_pde                                     27      26      -1
	proc_reg_write                                89      85      -4
	proc_reg_unlocked_ioctl                       85      81      -4
	proc_reg_read                                 89      85      -4
	proc_reg_llseek                               87      83      -4
	proc_reg_get_unmapped_area                   123     119      -4
	proc_entry_rundown                           139     135      -4
	proc_reg_poll                                 91      85      -6
	proc_reg_mmap                                 79      73      -6
	proc_get_link                                 55      49      -6
	proc_reg_release                             108     101      -7
	proc_reg_open                                298     291      -7
	close_pdeo                                   228     218     -10

* move writeable fields together to a first cacheline (on x86_64),
  those include
	* ->in_use: reference count, taken every open/read/write/close etc
	* ->count: reference count, taken at readdir on every entry
	* ->pde_openers: tracks (nearly) every open, dirtied
	* ->pde_unload_lock: spinlock protecting ->pde_openers
	* ->proc_iops, ->proc_fops, ->data: writeonce fields,
	  used right together with previous group.

* other rarely written fields go into 1st/2nd and 2nd/3rd cacheline on
  32-bit and 64-bit respectively.

Additionally on 32-bit, ->subdir, ->subdir_node, ->namelen, ->name go
fully into 2nd cacheline, separated from writeable fields.  They are all
used during lookup.

Link: http://lkml.kernel.org/r/20171220215914.GA7877@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Heiko Carstens
d0290bc20d fs/proc/kcore.c: use probe_kernel_read() instead of memcpy()
Commit df04abfd18 ("fs/proc/kcore.c: Add bounce buffer for ktext
data") added a bounce buffer to avoid hardened usercopy checks.  Copying
to the bounce buffer was implemented with a simple memcpy() assuming
that it is always valid to read from kernel memory iff the
kern_addr_valid() check passed.

A simple, but pointless, test case like "dd if=/proc/kcore of=/dev/null"
now can easily crash the kernel, since the former execption handling on
invalid kernel addresses now doesn't work anymore.

Also adding a kern_addr_valid() implementation wouldn't help here.  Most
architectures simply return 1 here, while a couple implemented a page
table walk to figure out if something is mapped at the address in
question.

With DEBUG_PAGEALLOC active mappings are established and removed all the
time, so that relying on the result of kern_addr_valid() before
executing the memcpy() also doesn't work.

Therefore simply use probe_kernel_read() to copy to the bounce buffer.
This also allows to simplify read_kcore().

At least on s390 this fixes the observed crashes and doesn't introduce
warnings that were removed with df04abfd18 ("fs/proc/kcore.c: Add
bounce buffer for ktext data"), even though the generic
probe_kernel_read() implementation uses uaccess functions.

While looking into this I'm also wondering if kern_addr_valid() could be
completely removed...(?)

Link: http://lkml.kernel.org/r/20171202132739.99971-1-heiko.carstens@de.ibm.com
Fixes: df04abfd18 ("fs/proc/kcore.c: Add bounce buffer for ktext data")
Fixes: f5509cc18d ("mm: Hardened usercopy")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Alexey Dobriyan
171ef917df fs/proc/array.c: delete children_seq_release()
It is 1:1 wrapper around seq_release().

Link: http://lkml.kernel.org/r/20171122171510.GA12161@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Alexey Dobriyan
20d28cde55 proc: less memory for /proc/*/map_files readdir
dentry name can be evaluated later, right before calling into VFS.

Also, spend less time under ->mmap_sem.

Link: http://lkml.kernel.org/r/20171110163034.GA2534@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Alexey Dobriyan
593bc695a1 fs/proc/vmcore.c: simpler /proc/vmcore cleanup
Iterators aren't necessary as you can just grab the first entry and delete
it until no entries left.

Link: http://lkml.kernel.org/r/20171121191121.GA20757@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Alexey Dobriyan
ac7f1061c2 proc: fix /proc/*/map_files lookup
Current code does:

	if (sscanf(dentry->d_name.name, "%lx-%lx", start, end) != 2)

However sscanf() is broken garbage.

It silently accepts whitespace between format specifiers
(did you know that?).

It silently accepts valid strings which result in integer overflow.

Do not use sscanf() for any even remotely reliable parsing code.

	OK
	# readlink '/proc/1/map_files/55a23af39000-55a23b05b000'
	/lib/systemd/systemd

	broken
	# readlink '/proc/1/map_files/               55a23af39000-55a23b05b000'
	/lib/systemd/systemd

	broken
	# readlink '/proc/1/map_files/55a23af39000-55a23b05b000    '
	/lib/systemd/systemd

	very broken
	# readlink '/proc/1/map_files/1000000000000000055a23af39000-55a23b05b000'
	/lib/systemd/systemd

Andrei said:

: This patch breaks criu.  It was a bug in criu.  And this bug is on a minor
: path, which works when memfd_create() isn't available.  It is a reason why
: I ask to not backport this patch to stable kernels.
:
: In CRIU this bug can be triggered, only if this patch will be backported
: to a kernel which version is lower than v3.16.

Link: http://lkml.kernel.org/r/20171120212706.GA14325@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Alexey Dobriyan
9f7118b200 proc: don't use READ_ONCE/WRITE_ONCE for /proc/*/fail-nth
READ_ONCE and WRITE_ONCE are useless when there is only one read/write
is being made.

Link: http://lkml.kernel.org/r/20171120204033.GA9446@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Alexey Dobriyan
e3912ac37e proc: use %u for pid printing and slightly less stack
PROC_NUMBUF is 13 which is enough for "negative int + \n + \0".

However PIDs and TGIDs are never negative and newline is not a concern,
so use just 10 per integer.

Link: http://lkml.kernel.org/r/20171120203005.GA27743@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexander Viro <viro@ftp.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Colin Ian King
48c2323954 kasan: remove redundant initialization of variable 'real_size'
Variable real_size is initialized with a value that is never read, it is
re-assigned a new value later on, hence the initialization is redundant
and can be removed.

Cleans up clang warning:

  lib/test_kasan.c:422:21: warning: Value stored to 'real_size' during its initialization is never read

Link: http://lkml.kernel.org/r/20180206144950.32457-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Andrey Konovalov
917538e212 kasan: clean up KASAN_SHADOW_SCALE_SHIFT usage
Right now the fact that KASAN uses a single shadow byte for 8 bytes of
memory is scattered all over the code.

This change defines KASAN_SHADOW_SCALE_SHIFT early in asm include files
and makes use of this constant where necessary.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/34937ca3b90736eaad91b568edf5684091f662e3.1515775666.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Andrey Konovalov
5f21f3a8f4 kasan: fix prototype author email address
Use the new one.

Link: http://lkml.kernel.org/r/de3b7ffc30a55178913a7d3865216aa7accf6c40.1515775666.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Dmitry Vyukov
b1d5728939 kasan: detect invalid frees
Detect frees of pointers into middle of heap objects.

Link: http://lkml.kernel.org/r/cb569193190356beb018a03bb8d6fbae67e7adbc.1514378558.git.dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>a
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Dmitry Vyukov
1db0e0f9dd kasan: unify code between kasan_slab_free() and kasan_poison_kfree()
Both of these functions deal with freeing of slab objects.
However, kasan_poison_kfree() mishandles SLAB_TYPESAFE_BY_RCU
(must also not poison such objects) and does not detect double-frees.

Unify code between these functions.

This solves both of the problems and allows to add more common code
(e.g. detection of invalid frees).

Link: http://lkml.kernel.org/r/385493d863acf60408be219a021c3c8e27daa96f.1514378558.git.dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>a
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Dmitry Vyukov
6860f6340c kasan: detect invalid frees for large mempool objects
Detect frees of pointers into middle of mempool objects.

I did a one-off test, but it turned out to be very tricky, so I reverted
it.  First, mempool does not call kasan_poison_kfree() unless allocation
function fails.  I stubbed an allocation function to fail on second and
subsequent allocations.  But then mempool stopped to call
kasan_poison_kfree() at all, because it does it only when allocation
function is mempool_kmalloc().  We could support this special failing
test allocation function in mempool, but it also can't live with kasan
tests, because these are in a module.

Link: http://lkml.kernel.org/r/bf7a7d035d7a5ed62d2dd0e3d2e8a4fcdf456aa7.1514378558.git.dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>a
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Dmitry Vyukov
ee3ce779b5 kasan: don't use __builtin_return_address(1)
__builtin_return_address(1) is unreliable without frame pointers.
With defconfig on kmalloc_pagealloc_invalid_free test I am getting:

BUG: KASAN: double-free or invalid-free in           (null)

Pass caller PC from callers explicitly.

Link: http://lkml.kernel.org/r/9b01bc2d237a4df74ff8472a3bf6b7635908de01.1514378558.git.dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>a
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Dmitry Vyukov
47adccce3e kasan: detect invalid frees for large objects
Patch series "kasan: detect invalid frees".

KASAN detects double-frees, but does not detect invalid-frees (when a
pointer into a middle of heap object is passed to free).  We recently had
a very unpleasant case in crypto code which freed an inner object inside
of a heap allocation.  This left unnoticed during free, but totally
corrupted heap and later lead to a bunch of random crashes all over kernel
code.

Detect invalid frees.

This patch (of 5):

Detect frees of pointers into middle of large heap objects.

I dropped const from kasan_kfree_large() because it starts propagating
through a bunch of functions in kasan_report.c, slab/slub nearest_obj(),
all of their local variables, fixup_red_left(), etc.

Link: http://lkml.kernel.org/r/1b45b4fe1d20fc0de1329aab674c1dd973fee723.1514378558.git.dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>a
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:42 -08:00
Alexander Potapenko
d321599cf6 kasan: add functions for unpoisoning stack variables
As a code-size optimization, LLVM builds since r279383 may bulk-manipulate
the shadow region when (un)poisoning large memory blocks.  This requires
new callbacks that simply do an uninstrumented memset().

This fixes linking the Clang-built kernel when using KASAN.

[arnd@arndb.de: add declarations for internal functions]
  Link: http://lkml.kernel.org/r/20180105094112.2690475-1-arnd@arndb.de
[fengguang.wu@intel.com: __asan_set_shadow_00 can be static]
  Link: http://lkml.kernel.org/r/20171223125943.GA74341@lkp-ib03
[ghackmann@google.com: fix memset() parameters, and tweak commit message to describe new callbacks]
Link: http://lkml.kernel.org/r/20171204191735.132544-6-paullawrence@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:42 -08:00
Paul Lawrence
00a14294bb kasan: add tests for alloca poisoning
Link: http://lkml.kernel.org/r/20171204191735.132544-5-paullawrence@google.com
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:42 -08:00
Paul Lawrence
342061ee4e kasan: support alloca() poisoning
clang's AddressSanitizer implementation adds redzones on either side of
alloca()ed buffers.  These redzones are 32-byte aligned and at least 32
bytes long.

__asan_alloca_poison() is passed the size and address of the allocated
buffer, *excluding* the redzones on either side.  The left redzone will
always be to the immediate left of this buffer; but AddressSanitizer may
need to add padding between the end of the buffer and the right redzone.
If there are any 8-byte chunks inside this padding, we should poison
those too.

__asan_allocas_unpoison() is just passed the top and bottom of the dynamic
stack area, so unpoisoning is simpler.

Link: http://lkml.kernel.org/r/20171204191735.132544-4-paullawrence@google.com
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:42 -08:00
Andrey Ryabinin
1a69e7ce83 kasan/Makefile: support LLVM style asan parameters
LLVM doesn't understand GCC-style paramters ("--param asan-foo=bar"), thus
we currently we don't use inline/globals/stack instrumentation when
building the kernel with clang.

Add support for LLVM-style parameters ("-mllvm -asan-foo=bar") to enable
all KASAN features.

Link: http://lkml.kernel.org/r/20171204191735.132544-3-paullawrence@google.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:42 -08:00
Paul Lawrence
53a98ed73b kasan: add compiler support for clang
Patch series "kasan: support alloca, LLVM", v4.

This patch (of 5):

For now we can hard-code ASAN ABI level 5, since historical clang builds
can't build the kernel anyway.  We also need to emulate gcc's
__SANITIZE_ADDRESS__ flag, or memset() calls won't be instrumented.

Link: http://lkml.kernel.org/r/20171204191735.132544-2-paullawrence@google.com
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:42 -08:00
Andrey Konovalov
0e410e158e kasan: don't emit builtin calls when sanitization is off
With KASAN enabled the kernel has two different memset() functions, one
with KASAN checks (memset) and one without (__memset).  KASAN uses some
macro tricks to use the proper version where required.  For example
memset() calls in mm/slub.c are without KASAN checks, since they operate
on poisoned slab object metadata.

The issue is that clang emits memset() calls even when there is no
memset() in the source code.  They get linked with improper memset()
implementation and the kernel fails to boot due to a huge amount of KASAN
reports during early boot stages.

The solution is to add -fno-builtin flag for files with KASAN_SANITIZE :=
n marker.

Link: http://lkml.kernel.org/r/8ffecfffe04088c52c42b92739c2bd8a0bcb3f5e.1516384594.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:42 -08:00
Pablo Neira Ayuso
b408c5b04f netfilter: nf_tables: fix flowtable free
Every flow_offload entry is added into the table twice. Because of this,
rhashtable_free_and_destroy can't be used, since it would call kfree for
each flow_offload object twice.

This patch cleans up the flowtable via nf_flow_table_iterate() to
schedule removal of entries by setting on the dying bit, then there is
an explicitly invocation of the garbage collector to release resources.

Based on patch from Felix Fietkau.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-07 00:58:57 +01:00
Pablo Neira Ayuso
c0ea1bcb39 netfilter: nft_flow_offload: move flowtable cleanup routines to nf_flow_table
Move the flowtable cleanup routines to nf_flow_table and expose the
nf_flow_table_cleanup() helper function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-07 00:58:57 +01:00
Cong Wang
7dc68e9875 netfilter: xt_RATEEST: acquire xt_rateest_mutex for hash insert
rateest_hash is supposed to be protected by xt_rateest_mutex,
and, as suggested by Eric, lookup and insert should be atomic,
so we should acquire the xt_rateest_mutex once for both.

So introduce a non-locking helper for internal use and keep the
locking one for external.

Reported-by: <syzbot+5cb189720978275e4c75@syzkaller.appspotmail.com>
Fixes: 5859034d7e ("[NETFILTER]: x_tables: add RATEEST target")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-07 00:58:57 +01:00
Linus Torvalds
cbd7b8a76b platform-drivers-x86 for v4.16-1
New model support added for Dell, Ideapad, Acer, Asus, Thinkpad, and GPD
 laptops.  Improvements to the common intel-vbtn driver, including tablet
 mode, rotate, and front button support. Intel CPU support added for
 Cannonlake and platform support for Dollar Cove power button.
 
 Overhaul of the mellanox platform driver, creating a new
 platform/mellanox directory for the newly multi-architecture regmap
 interface.
 
 Significant Intel PMC update with CannonLake support, Coffeelake update,
 CPUID enumeration, module support, new read64 API, refactoring and
 cleanups.
 
 Revert the apple-gmux iGP IO lock, addressing reported issues with
 non-binary drivers, leaving Nvidia binary driver users to comment out
 conflicting code.
 
 Miscellaneous fixes and cleanups.
 
 Previously merged during the 4.15-rc cycle:
 - e20a8e771d platform/x86: dell-laptop: Fix keyboard max lighting for Dell Latitude E6410
 - 9cd5cf3710 platform/x86: asus-wireless: send an EV_SYN/SYN_REPORT between state changes
 - 91c73e8092 platform/x86: dell-wmi: check for kmalloc() errors
 - 9a1a625918 platform/x86: wmi: Call acpi_wmi_init() later
 
 The following is an automated git shortlog grouped by driver:
 
 ACPI / LPIT:
  -  Export lpit_read_residency_count_address()
 
 Input:
  -  add KEY_ROTATE_LOCK_TOGGLE
 
 MAINTAINERS:
  -  Update tree for platform-drivers-x86
 
 x86/cpu:
  -  Add Cannonlake to Intel family
 
 acer-wireless:
  - Add Acer Wireless Radio Control driver
 
 intel_chtdc_ti_pwrbtn:
  - Add support for Dollar Cove TI power button
 
 GPD pocket fan:
  -  Add driver for GPD pocket custom fan controller
  -  Stop work on suspend
  -  Use a min-speed of 2 while charging
  -  Set speed to max on get_temp failure
 
 apple-gmux:
  -  Revert: lock iGP IO to protect from vgaarb changes
 
 alienware-wmi:
  -  lightbar LED support for Dell Inspiron 5675
 
 asus-nb-wmi:
  -  Support ALS on the Zenbook UX430UQ
 
 dell-laptop:
  -  Allocate buffer on heap rather than globally
  -  Add 2-in-1 devices to the DMI whitelist
  -  Filter out spurious keyboard backlight change events
  -  make some local functions static
  -  Use bool in struct quirk_entry for true/false fields
 
 dell-smbios:
  -  Correct notation for filtering
 
 dell-wmi:
  -  Add an event created by Dell Latitude 5495
 
 Kconfig
  - have ACPI_CMPC use depends instead of select for INPUT
 
 ideapad-laptop:
  -  Add Y720-15IKB to no_hw_rfkill
  -  add lenovo RESCUER R720-15IKBN to no_hw_rfkill_list
  -  Use __func__ instead of write_ec_cmd in pr_err
  -  Remove unnecessary else
 
 intel-hid:
  -  add a DMI quirk to support Wacom MobileStudio Pro
 
 intel-vbtn:
  -  Replace License by SDPX identifier
  -  Remove redundant inclusions
  -  Support tablet mode switch
  -  Simplify autorelease logic
  -  support panel front button
  -  support KEY_ROTATE_LOCK_TOGGLE
  -  Support separate press/release events
  -  support SW_TABLET_MODE
 
 intel_int0002_vgpio:
  -  Remove IRQF_NO_THREAD irq flag
 
 intel_pmc_core:
  -  Special case for Coffeelake
  -  Add CannonLake PCH support
  -  Read base address from LPIT
  -  Remove unused header file
  -  Convert to ICPU macro
  -  Substitute PCI with CPUID enumeration
  -  Refactor debugfs entries
  -  Update Kconfig
  -  Fix file permission warnings
  -  Change driver to a module
  -  Fix kernel doc for pmc_dev
  -  Remove unused variable
  -  Remove unused EXPORTED API
 
 intel_pmc_ipc:
  -  Add read64 API
 
 intel_telemetry:
  -  Remove redundancies
  -  Improve S0ix logs
  -  Fix suspend stats
 
 mlx-platform:
  -  Fix an ERR_PTR vs NULL issue
  -  Add hotplug device unregister to error path
  -  fix module aliases
  -  Add IO access verification callbacks
  -  Document pdev_hotplug field
  -  Allow compilation for 32 bit arch
 
 platform/mellanox:
  -  mlxreg-hotplug: Add check for negative adapter number
  -  mlxreg-hotplug: Enable building for ARM
  -  mlxreg-hotplug: Modify to use a regmap interface
  -  Group create/destroy with attribute functions
  -  Rename i2c bus to nr
  -  mlxreg-hotplug: Remove unused wait.h include
  -  Move Mellanox platform hotplug driver to platform/mellanox
 
 pmc_atom:
  -  introduce DEFINE_SHOW_ATTRIBUTE() macro
 
 samsung-laptop:
  -  Grammar s/are can/can/
 
 silead_dmi:
  -  Add Teclast X3 Plus tablet support
  -  Add entry for newer BIOS for Trekstor Surftab 7.0
  -  Add entry for the Teclast X98 Plus II
  -  Add entry for the Trekstor Primebook C13
  -  Add entry for the Chuwi Vi8 tablet
  -  add entry for Chuwi Hi8 tablet
  -  Add support for the Onda oBook 20 Plus tablet
  -  Add touchscreen info for SurfTab twin 10.1
 
 thinkpad_acpi:
  -  suppress warning about palm detection
  -  Accept flat mode for type 4 multi mode status
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJaegMmAAoJEKbMaAwKp364TvUH/3D9qNtsbXpZuc3ZMNHjIysU
 hdW6hOVfBN0Rk049mjw7nWv/udhWZ/6ChJDlXHX0ZugtNGnRnzbdtWGg4y38pDF1
 LRuKjWfDeyMeJ11itD2xcxEaE6YsseWCKGZJ5D3T+sN4+1jgS4RLAa9cUJMl8QAo
 xZsT1MKpmGuj5eTLf5GgOVL2yfMZhZHabt3kGRY0eQqNqZBgpJw/GQNI1l6v4nAH
 MHPA7Gtj4HXHK8jGviZXpD9tg/iwahiUjGugG4HcxbMcpJ96a8CGyeaXmq2FlfNC
 /PpmVvhVVqzLuXKWAI+DZFLAiwIvPpxzVfOKF2Lty5Rejxf7pdmHq7aCNcALys0=
 =cKm9
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v4.16-1' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform-driver updates from Darren Hart:
 "New model support added for Dell, Ideapad, Acer, Asus, Thinkpad, and
  GPD laptops. Improvements to the common intel-vbtn driver, including
  tablet mode, rotate, and front button support. Intel CPU support added
  for Cannonlake and platform support for Dollar Cove power button.

  Overhaul of the mellanox platform driver, creating a new
  platform/mellanox directory for the newly multi-architecture regmap
  interface.

  Significant Intel PMC update with CannonLake support, Coffeelake
  update, CPUID enumeration, module support, new read64 API, refactoring
  and cleanups.

  Revert the apple-gmux iGP IO lock, addressing reported issues with
  non-binary drivers, leaving Nvidia binary driver users to comment out
  conflicting code.

  Miscellaneous fixes and cleanups"

* tag 'platform-drivers-x86-v4.16-1' of git://git.infradead.org/linux-platform-drivers-x86: (81 commits)
  platform/x86: mlx-platform: Fix an ERR_PTR vs NULL issue
  platform/x86: intel_pmc_core: Special case for Coffeelake
  platform/x86: intel_pmc_core: Add CannonLake PCH support
  x86/cpu: Add Cannonlake to Intel family
  platform/x86: intel_pmc_core: Read base address from LPIT
  ACPI / LPIT: Export lpit_read_residency_count_address()
  platform/x86: intel-vbtn: Replace License by SDPX identifier
  platform/x86: intel-vbtn: Remove redundant inclusions
  platform/x86: intel-vbtn: Support tablet mode switch
  platform/x86: dell-laptop: Allocate buffer on heap rather than globally
  platform/x86: intel_pmc_core: Remove unused header file
  platform/x86: mlx-platform: Add hotplug device unregister to error path
  platform/x86: mlx-platform: fix module aliases
  platform/mellanox: mlxreg-hotplug: Add check for negative adapter number
  platform/x86: mlx-platform: Add IO access verification callbacks
  platform/x86: mlx-platform: Document pdev_hotplug field
  platform/x86: mlx-platform: Allow compilation for 32 bit arch
  platform/mellanox: mlxreg-hotplug: Enable building for ARM
  platform/mellanox: mlxreg-hotplug: Modify to use a regmap interface
  platform/mellanox: Group create/destroy with attribute functions
  ...
2018-02-06 15:30:52 -08:00
Linus Torvalds
3f551e3cef Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management updates from Zhang Rui:

 - fix a race condition issue in power allocator governor (Yi Zeng).

 - add support for AP806 and CP110 in armada thermal driver, together
   with several improvements (Baruch Siach, Miquel Raynal)

 - add support for r8z7743 in rcar thermal driver (Biju Das)

 - convert thermal core to use new hwmon API to avoid warning (Fabio
   Estevam)

 - small fixes and cleanups in thermal core and x86_pkg_thermal,
   int3400_thermal, hisi_thermal, mtk_thermal and imx_thermal drivers
   (Pravin Shedge, Geert Uytterhoeven, Alexey Khoroshilov, Brian Bian,
   Matthias Brugger, Nicolin Chen, Uwe Kleine-König)

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (25 commits)
  thermal: thermal_hwmon: Convert to hwmon_device_register_with_info()
  thermal/x86 pkg temp: Remove debugfs_create_u32() casts
  thermal: int3400_thermal: fix error handling in int3400_thermal_probe()
  thermal/drivers/hisi: Remove bogus const from function return type
  thermal: armada: Give meaningful names to the thermal zones
  thermal: armada: Wait sensors validity before exiting the init callback
  thermal: armada: Change sensors trim default value
  thermal: armada: Update Kconfig and module description
  thermal: armada: Add support for Armada CP110
  thermal: armada: Add support for Armada AP806
  thermal: armada: Use real status register name
  thermal: armada: Clarify control registers accesses
  thermal: armada: Simplify the check of the validity bit
  thermal: armada: Use msleep for long delays
  dt-bindings: thermal: Describe Armada AP806 and CP110
  dt-bindings: thermal: rcar: Add device tree support for r8a7743
  thermal: mtk: Cleanup unused defines
  thermal: imx: update to new formula according to NXP AN5215
  thermal: imx: use consistent style to write temperatures
  thermal: imx: improve comments describing algorithm for temp calculation
  ...
2018-02-06 15:04:58 -08:00
Stephen Rothwell
b46dc8ae17 media: videobuf2: fix up for "media: annotate ->poll() instances"
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 14:24:51 -08:00
Ingo Molnar
8284507916 Merge branch 'linus' into sched/urgent, to resolve conflicts
Conflicts:
	arch/arm64/kernel/entry.S
	arch/x86/Kconfig
	include/linux/sched/mm.h
	kernel/fork.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06 21:12:31 +01:00
Linus Torvalds
68c5735eaa media updates for v4.16-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJacX62AAoJEAhfPr2O5OEVjKYP/R3v+c8ztiHzaeibcZZ8IFNl
 58E0Y0yGa8OpoGJx9uqtEOamQmZoHhACfId7joIp/Jv38bgWAdbxOmk3Y4FDCFqG
 1bRrpnnmvlfabiMMfLpURLqKhf7rJMtErZkrnmmqg9P/lEMohaZUJAsgBZNfJM8l
 fZeacSnCSpzlxVcUb9Bf4vWhLk39R+xFzvFrwzbVUIHf3bDVpf4S4kNorMkhSZSF
 HaISYXqVMhpKca7CngVKytbfacUStUY01cXcjdMuB/sD7ySwdtKogbPMvrOSaexz
 G/8MB+sGT1JKUgIlh6Qv8hX805KuxBgfP19XSOH46nNU8KbYegdGhN5QXlokwI1m
 dAOiozkU93r5yBZl6QzkN3uwXe492PoLgczifg97pzAJP0BfWeFStkYqlugLTwwC
 Slmr7g3FZVJajbPl6WyioAGW7xfqBF7ftScZOHYxmhy41CWCGKJctmsJOjncyz5O
 GInEIP3KR4CgjR+iM1LoKvE+OvVo4kRc7hrcUsjQNsbfBn6xiixjwH+5M+UVvezA
 6UQpmtWGg4pX1djb8j8f6mKF8KZM12Pp3jb4Rl1cLsytN5BOBKaMEKdV3rgL+19P
 Yo0x/1wK/unkI20Om71vYyQ0nXVF9j7Tpeij5u0M57TeTVYCwloQgHmrcvQJdo8+
 Pqw5XEUiDpAIjvKp0XGh
 =H9AS
 -----END PGP SIGNATURE-----

Merge tag 'media/v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - videobuf2 was moved to a media/common dir, as it is now used by the
   DVB subsystem too

 - Digital TV core memory mapped support interface

 - new sensor driver: ov7740

 - several improvements at ddbridge driver

 - new V4L2 driver: IPU3 CIO2 CSI-2 receiver unit, found on some Intel
   SoCs

 - new tuner driver: tda18250

 - finally got rid of all LIRC staging drivers

 - as we don't have old lirc drivers anymore, restruct the lirc device
   code

 - add support for UVC metadata

 - add a new staging driver for NVIDIA Tegra Video Decoder Engine

 - DVB kAPI headers moved to include/media

 - synchronize the kAPI and uAPI for the DVB subsystem, removing the gap
   for non-legacy APIs

 - reduce the kAPI gap for V4L2

 - lots of other driver enhancements, cleanups, etc.

* tag 'media/v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (407 commits)
  media: v4l2-compat-ioctl32.c: make ctrl_is_pointer work for subdevs
  media: v4l2-compat-ioctl32.c: refactor compat ioctl32 logic
  media: v4l2-compat-ioctl32.c: don't copy back the result for certain errors
  media: v4l2-compat-ioctl32.c: drop pr_info for unknown buffer type
  media: v4l2-compat-ioctl32.c: copy clip list in put_v4l2_window32
  media: v4l2-compat-ioctl32.c: fix ctrl_is_pointer
  media: v4l2-compat-ioctl32.c: copy m.userptr in put_v4l2_plane32
  media: v4l2-compat-ioctl32.c: avoid sizeof(type)
  media: v4l2-compat-ioctl32.c: move 'helper' functions to __get/put_v4l2_format32
  media: v4l2-compat-ioctl32.c: fix the indentation
  media: v4l2-compat-ioctl32.c: add missing VIDIOC_PREPARE_BUF
  media: v4l2-ioctl.c: don't copy back the result for -ENOTTY
  media: v4l2-ioctl.c: use check_fmt for enum/g/s/try_fmt
  media: vivid: fix module load error when enabling fb and no_error_inj=1
  media: dvb_demux: improve debug messages
  media: dvb_demux: Better handle discontinuity errors
  media: cxusb, dib0700: ignore XC2028_I2C_FLUSH
  media: ts2020: avoid integer overflows on 32 bit machines
  media: i2c: ov7740: use gpio/consumer.h instead of gpio.h
  media: entity: Add a nop variant of media_entity_cleanup
  ...
2018-02-06 11:27:48 -08:00
Linus Torvalds
2246edfaf8 Second pull request for 4.16 merge window
- Clean up some function signatures in rxe for clarity
 - Tidy the RDMA netlink header to remove unimplemented constants
 - bnxt_re driver fixes, one is a regression this window.
 - Minor hns driver fixes
 - Various fixes from Dan Carpenter and his tool
 - Fix IRQ cleanup race in HFI1
 - HF1 performance optimizations and a fix to report counters in the right units
 - Fix for an IPoIB startup sequence race with the external manager
 - Oops fix for the new kabi path
 - Endian cleanups for hns
 - Fix for mlx5 related to the new automatic affinity support
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaePL/AAoJELgmozMOVy/dqhsQALUhzDuuJ+/F6supjmyqZG53
 Ak/PoFjTmHToGQfDq/1TRzyKwMx12aB2l6WGZc31FzhvCw4daPWkoEVKReNWUUJ+
 fmESxjLgo8ZRGSqpNxn9Q8agE/I/5JZQoA8bCFCYgdZPKTPNKdtAVBphpdhmrOX4
 ygjABikWf/wBsNF1A8lnX9xkfPO21cPHrFQLTnuOzOT/hc6U+PPklHSQCnS91svh
 1+Pqjtssg54rxYkJqiFq3giSnfwvmAXO8WyVGmRRPFGLpB0nIjq0Sl6ZgLLClz7w
 YJdiBGr7rlnNMgGCjlPU2ZO3lO6J0ytXQzFNqRqvKryXQOv+uVeJgep7WqHTcdQU
 UN30FCKQMgLL/F6NF8wKaKcK4X0VgXQa7gpuH2fVSXF0c3LO3/mmWNjixbGSzT2c
 Wj+EW3eOKlTddhRLhgbMOdwc32tIGhaD85z2F4+FZO+XI9ZQtJaDewWVDjYoumP/
 RlDIFw+KCgSq7+UZL8CoXuh0BuS1nu9TGfkx1HW0DLMF1+yigNiswpUfksV4cISP
 JqE2I3yH0A4UobD/a+f9IhIfk2MjxO0tJWNjU8IA9LXgUFlskQ6MpH/AcE9G8JNv
 tlfLGR3s4PJa/7j/Iy2F84og/b/KH8v7vyj4Eknq/hLq63/BiM5wj0AUBRrGulN6
 HhAMOegxGZ7IKP/y0L7I
 =xwZz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull more rdma updates from Doug Ledford:
 "Items of note:

   - two patches fix a regression in the 4.15 kernel. The 4.14 kernel
     worked fine with NVMe over Fabrics and mlx5 adapters. That broke in
     4.15. The fix is here.

   - one of the patches (the endian notation patch from Lijun) looks
     like a lot of lines of change, but it's mostly mechanical in
     nature. It amounts to the biggest chunk of change in it (it's about
     2/3rds of the overall pull request).

  Summary:

   - Clean up some function signatures in rxe for clarity

   - Tidy the RDMA netlink header to remove unimplemented constants

   - bnxt_re driver fixes, one is a regression this window.

   - Minor hns driver fixes

   - Various fixes from Dan Carpenter and his tool

   - Fix IRQ cleanup race in HFI1

   - HF1 performance optimizations and a fix to report counters in the right units

   - Fix for an IPoIB startup sequence race with the external manager

   - Oops fix for the new kabi path

   - Endian cleanups for hns

   - Fix for mlx5 related to the new automatic affinity support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (38 commits)
  net/mlx5: increase async EQ to avoid EQ overrun
  mlx5: fix mlx5_get_vector_affinity to start from completion vector 0
  RDMA/hns: Fix the endian problem for hns
  IB/uverbs: Use the standard kConfig format for experimental
  IB: Update references to libibverbs
  IB/hfi1: Add 16B rcvhdr trace support
  IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node
  IB/core: Avoid a potential OOPs for an unused optional parameter
  IB/core: Map iWarp AH type to undefined in rdma_ah_find_type
  IB/ipoib: Fix for potential no-carrier state
  IB/hfi1: Show fault stats in both TX and RX directions
  IB/hfi1: Remove blind constants from 16B update
  IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times
  IB/hfi1: Do not override given pcie_pset value
  IB/hfi1: Optimize process_receive_ib()
  IB/hfi1: Remove unnecessary fecn and becn fields
  IB/hfi1: Look up ibport using a pointer in receive path
  IB/hfi1: Optimize packet type comparison using 9B and bypass code paths
  IB/hfi1: Compute BTH only for RDMA_WRITE_LAST/SEND_LAST packet
  IB/hfi1: Remove dependence on qp->s_hdrwords
  ...
2018-02-06 11:09:45 -08:00
Linus Torvalds
3ff1b28caa libnvdimm for 4.16
* Require struct page by default for filesystem DAX to remove a number of
   surprising failure cases.  This includes failures with direct I/O, gdb and
   fork(2).
 
 * Add support for the new Platform Capabilities Structure added to the NFIT in
   ACPI 6.2a.  This new table tells us whether the platform supports flushing
   of CPU and memory controller caches on unexpected power loss events.
 
 * Revamp vmem_altmap and dev_pagemap handling to clean up code and better
   support future future PCI P2P uses.
 
 * Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has become
   out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL spec, and
   instead rely on the generic ND_CMD_CALL approach used by the two other IOCTL
   families, NVDIMM_FAMILY_{HPE,MSFT}.
 
 * Enhance nfit_test so we can test some of the new things added in version 1.6
   of the DSM specification.  This includes testing firmware download and
   simulating the Last Shutdown State (LSS) status.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaeOg0AAoJEJ/BjXdf9fLBAFoQAI/IgcgJ2h9lfEpgjBRTC44t
 2p8dxwT1Ofw3Y1aR/tI8nYRXjRtAGuP4UIeRVnb1CL/N7PagJyoMGU+6hmzg+ptY
 c7cEDvw6nZOhrFwXx/xn7R53sYG8zH+UE6+jTR/PP/G4mQJfFCg4iF9R72Y7z0n7
 aurf82Kz137NPUy6dNr4V9bmPMJWAaOci9WOj5SKddR5ZSNbjoxylTwQRvre5y4r
 7HQTScEkirABOdSf1JoXTSUXCH/RC9UFFXR03ScHstGb1HjCj3KdcicVc50Q++Ub
 qsEudhE6i44PEW1Hh4Qkg6hjHMEa8qHP+ShBuRuVaUmlghYTQn66niJAYLZilwdz
 EVjE7vR+toHA5g3YCalEmYVutUEhIDkh/xfpd7vM6ZorUGJy95a2elEJs2fHBffC
 gEhnCip7FROPcK5RDNUM8hBgnG/q5wwWPQMKY+6rKDZQx3mXssCrKp2Vlx7kBwMG
 rpblkEpYjPonbLEHxsSU8yTg9Uq55ciIWgnOToffcjZvjbihi8WUVlHcwHUMPf/o
 DWElg+4qmG0Sdd4S2NeAGwTl1Ewrf2RrtUGMjHtH4OUFs1wo6ZmfrxFzzMfoZ1Od
 ko/s65v4uwtTzECh2o+XQaNsReR5YETXxmA40N/Jpo7/7twABIoZ/ASvj/3ZBYj+
 sie+u2rTod8/gQWSfHpJ
 =MIMX
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Ross Zwisler:

 - Require struct page by default for filesystem DAX to remove a number
   of surprising failure cases. This includes failures with direct I/O,
   gdb and fork(2).

 - Add support for the new Platform Capabilities Structure added to the
   NFIT in ACPI 6.2a. This new table tells us whether the platform
   supports flushing of CPU and memory controller caches on unexpected
   power loss events.

 - Revamp vmem_altmap and dev_pagemap handling to clean up code and
   better support future future PCI P2P uses.

 - Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has
   become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL
   spec, and instead rely on the generic ND_CMD_CALL approach used by
   the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}.

 - Enhance nfit_test so we can test some of the new things added in
   version 1.6 of the DSM specification. This includes testing firmware
   download and simulating the Last Shutdown State (LSS) status.

* tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits)
  libnvdimm, namespace: remove redundant initialization of 'nd_mapping'
  acpi, nfit: fix register dimm error handling
  libnvdimm, namespace: make min namespace size 4K
  tools/testing/nvdimm: force nfit_test to depend on instrumented modules
  libnvdimm/nfit_test: adding support for unit testing enable LSS status
  libnvdimm/nfit_test: add firmware download emulation
  nfit-test: Add platform cap support from ACPI 6.2a to test
  libnvdimm: expose platform persistence attribute for nd_region
  acpi: nfit: add persistent memory control flag for nd_region
  acpi: nfit: Add support for detect platform CPU cache flush on power loss
  device-dax: Fix trailing semicolon
  libnvdimm, btt: fix uninitialized err_lock
  dax: require 'struct page' by default for filesystem dax
  ext2: auto disable dax instead of failing mount
  ext4: auto disable dax instead of failing mount
  mm, dax: introduce pfn_t_special()
  mm: Fix devm_memremap_pages() collision handling
  mm: Fix memory size alignment in devm_memremap_pages_release()
  memremap: merge find_dev_pagemap into get_dev_pagemap
  memremap: change devm_memremap_pages interface to use struct dev_pagemap
  ...
2018-02-06 10:41:33 -08:00
Linus Torvalds
105cf3c8c6 pci-v4.16-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJad5lgAAoJEFmIoMA60/r8s2kQAI3PztawDpaCP9Z12pkbBHSt
 Ho0xTyk9rCZi9kQJbNjc+a+QrlA3QmTHXIXerB3LSWoh7M+XhsECjem92eHpgLNS
 JvYPhTfOrCr0vdiAmOz6hD0AqN/psrbfzgiJhSwomsGEFS77k7kERSJckRv81sxb
 Aj5F/WjucAgLorwm4auveAJEQ7atE7/6pkXzoqYm4G6NLOb46jUcRGndrnvXZBlz
 fws8fBM4BHyi7i25CYQl24tFq1CGax1rIPgLg+4KnH76bQk/N6Ju0sGVSzfh+hG8
 SIerK9bJbzGRAuNKoxB3aO1dyzsK3x9WztE2mG98w5trOISPIR1FqnvC/225FWAU
 d6eIXiC7wKnEx+DElNTzCjzfHc7SAJoupO32H7CoiTe5zPUlWlxJ1zLYkK1gt50q
 m8PRBiYTglxyznzrO0drtcdjEzvbdZNRrsYnul4wi1vSHzjk6F6XLtzT10XWM1M1
 1pXLB8384FTj0Hu4bq6Y3Aivkmz0Sf+eQM2NaOwe+Zj7/1VV0d3lvi4LUXkqzLCA
 FoXPJSMxG2Qu+iflCeYRQBJjExaZH3eNLZ3dT6QpcJrjaFVedd9u5DeeFqNL27zV
 bhr8TdqrR4p4rc8EBAGoCapw96IxLZROKB3gxbrZVOpfIZpzthwHbElHX6aqUgF4
 w/EV1JWs36WXWaxFk8wd
 =ttq9
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - skip AER driver error recovery callbacks for correctable errors
   reported via ACPI APEI, as we already do for errors reported via the
   native path (Tyler Baicar)

 - fix DPC shared interrupt handling (Alex Williamson)

 - print full DPC interrupt number (Keith Busch)

 - enable DPC only if AER is available (Keith Busch)

 - simplify DPC code (Bjorn Helgaas)

 - calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn
   Helgaas)

 - enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn
   Helgaas)

 - move ASPM internal interfaces out of public header (Bjorn Helgaas)

 - allow hot-removal of VGA devices (Mika Westerberg)

 - speed up unplug and shutdown by assuming Thunderbolt controllers
   don't support Command Completed events (Lukas Wunner)

 - add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling,
   Jay Cornwall)

 - expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes)

 - clean up PCI DMA interface usage (Christoph Hellwig)

 - remove PCI pool API (replaced with DMA pool) (Romain Perier)

 - deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan
   Kaya)

 - move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring)

 - add PCI-specific wrappers for dev_info(), etc (Frederick Lawler)

 - remove warnings on sysfs mmap failure (Bjorn Helgaas)

 - quiet ROM validation messages (Alex Deucher)

 - remove redundant memory alloc failure messages (Markus Elfring)

 - fill in types for compile-time VGA and other I/O port resources
   (Bjorn Helgaas)

 - make "pci=pcie_scan_all" work for Root Ports as well as Downstream
   Ports to help AmigaOne X1000 (Bjorn Helgaas)

 - add SPDX tags to all PCI files (Bjorn Helgaas)

 - quirk Marvell 9128 DMA aliases (Alex Williamson)

 - quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas)

 - fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas
   Cassel)

 - use DMA API to get MSI address for DesignWare IP (Niklas Cassel)

 - fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I)

 - fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun)

 - add support for ARTPEC-7 SoC (Niklas Cassel)

 - add endpoint-mode support for ARTPEC (Niklas Cassel)

 - add Cadence PCIe host and endpoint controller driver (Cyrille
   Pitchen)

 - handle multiple INTx status bits being set in dra7xx (Vignesh R)

 - translate dra7xx hwirq range to fix INTD handling (Vignesh R)

 - remove deprecated Exynos PHY initialization code (Jaehoon Chung)

 - fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu)

 - fix NULL pointer dereference in iProc BCMA driver (Ray Jui)

 - fix Keystone interrupt-controller-node lookup (Johan Hovold)

 - constify qcom driver structures (Julia Lawall)

 - rework Tegra config space mapping to increase space available for
   endpoints (Vidya Sagar)

 - simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy)

 - remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy)

 - add support for Global Fabric Manager Server (GFMS) event to
   Microsemi Switchtec switch driver (Logan Gunthorpe)

 - add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao)

* tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
  PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller
  dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller
  PCI: endpoint: Fix EPF device name to support multi-function devices
  PCI: endpoint: Add the function number as argument to EPC ops
  PCI: cadence: Add host driver for Cadence PCIe controller
  dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller
  PCI: Add vendor ID for Cadence
  PCI: Add generic function to probe PCI host controllers
  PCI: generic: fix missing call of pci_free_resource_list()
  PCI: OF: Add generic function to parse and allocate PCI resources
  PCI: Regroup all PCI related entries into drivers/pci/Makefile
  PCI/DPC: Reformat DPC register definitions
  PCI/DPC: Add and use DPC Status register field definitions
  PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error()
  PCI/DPC: Remove unnecessary RP PIO register structs
  PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info()
  PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info()
  PCI/DPC: Make RP PIO log size check more generic
  PCI/DPC: Rename local "status" to "dpc_status"
  PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error()
  ...
2018-02-06 09:59:40 -08:00
David S. Miller
176bfb406d Merge branch 'be2net-patch-set'
Suresh Reddy says:

====================
be2net: patch-set

Hi Dave, Please consider applying these two patches to net
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06 11:48:40 -05:00
Suresh Reddy
ffc3962010 be2net: Handle transmit completion errors in Lancer
If the driver receives a TX CQE with status as 0x1 or 0x9 or 0xb,
the completion indexes should not be used. The driver must stop
consuming CQEs from this TXQ/CQ. The TXQ from this point on-wards
to be in a bad state. Driver should destroy and recreate the TXQ.

0x1: LANCER_TX_COMP_LSO_ERR
0x9 LANCER_TX_COMP_SGE_ERR
0xb: LANCER_TX_COMP_PARITY_ERR

Reset the adapter if driver sees this error in TX completion. Also
adding sge error counter in ethtool stats.

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06 11:48:33 -05:00
Suresh Reddy
3df40aad1a be2net: Fix HW stall issue in Lancer
Lancer HW cannot handle a TSO packet with a single segment.
Disable TSO/GSO for such packets.

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06 11:48:17 -05:00
Guanglei Li
2c0aa08631 RDS: IB: Fix null pointer issue
Scenario:
1. Port down and do fail over
2. Ap do rds_bind syscall

PID: 47039  TASK: ffff89887e2fe640  CPU: 47  COMMAND: "kworker/u:6"
 #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
 #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
 #2 [ffff898e35f15b30] oops_end at ffffffff8150f518
 #3 [ffff898e35f15b60] no_context at ffffffff8104854c
 #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
 #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
 #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
 #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff898e35f15dc8  RFLAGS: 00010282
    RAX: 00000000fffffffe  RBX: ffff889b77f6fc00  RCX:ffffffff81c99d88
    RDX: 0000000000000000  RSI: ffff896019ee08e8  RDI:ffff889b77f6fc00
    RBP: ffff898e35f15df0   R8: ffff896019ee08c8  R9:0000000000000000
    R10: 0000000000000400  R11: 0000000000000000  R12:ffff896019ee08c0
    R13: ffff889b77f6fe68  R14: ffffffff81c99d80  R15: ffffffffa022a1e0
    ORIG_RAX: ffffffffffffffff  CS: 0010 SS: 0018
 #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
 #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
 #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
 #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6

PID: 45659  TASK: ffff880d313d2500  CPU: 31  COMMAND: "oracle_45659_ap"
 #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
 #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
 #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
 #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
 #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
 #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
 #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
 #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
 #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670

PID: 45659                          PID: 47039
rds_ib_laddr_check
  /* create id_priv with a null event_handler */
  rdma_create_id
  rdma_bind_addr
    cma_acquire_dev
      /* add id_priv to cma_dev->id_list */
      cma_attach_to_dev
                                    cma_ndev_work_handler
                                      /* event_hanlder is null */
                                      id_priv->id.event_handler

Signed-off-by: Guanglei Li <guanglei.li@oracle.com>
Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06 11:44:32 -05:00
Jakub Kicinski
703f578a35 nfp: fix kdoc warnings on nested structures
Commit 84ce5b9877 ("scripts: kernel-doc: improve nested logic to
handle multiple identifiers") improved the handling of nested structure
definitions in scripts/kernel-doc, and changed the expected format of
documentation.  This causes new warnings to appear on W=1 builds.

Only comment changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06 11:43:58 -05:00
David S. Miller
67ae44e1a6 Merge branch 'net-erspan-fixes'
William Tu says:

====================
net: erspan fixes

The first patch fixes erspan metadata extraction issue from packet
header due to commit d350a82302 ("net: erspan: create erspan metadata
uapi header").  The commit moves the erspan 'version' in
'struct erspan_metadata' in front of 'struct erspan_md2' for later
extensibility, but breaks the existing metadata extraction code due
to extra 4-byte size 'version'.  The second patch fixes the case where
tunnel device receives an erspan packet with different tunnel metadata
(ex: version, index, hwid, direction), existing code overwrites the
tunnel device's erspan configuration.  The third patch fixes the bpf
tests due to the above patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06 11:32:49 -05:00
William Tu
9c33ca4317 sample/bpf: fix erspan metadata
The commit c69de58ba8 ("net: erspan: use bitfield instead of
mask and offset") changes the erspan header to use bitfield, and
commit d350a82302 ("net: erspan: create erspan metadata uapi header")
creates a uapi header file.  The above two commit breaks the current
erspan test.  This patch fixes it by adapting the above two changes.

Fixes: ac80c2a165 ("samples/bpf: add erspan v2 sample code")
Fixes: ef88f89c83 ("samples/bpf: extend test_tunnel_bpf.sh with ERSPAN")
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06 11:32:49 -05:00
William Tu
39f57f6799 net: erspan: fix erspan config overwrite
When an erspan tunnel device receives an erpsan packet with different
tunnel metadata (ex: version, index, hwid, direction), existing code
overwrites the tunnel device's erspan configuration with the received
packet's metadata.  The patch fixes it.

Fixes: 1a66a836da ("gre: add collect_md mode to ERSPAN tunnel")
Fixes: f551c91de2 ("net: erspan: introduce erspan v2 for ip_gre")
Fixes: ef7baf5e08 ("ip6_gre: add ip6 erspan collect_md mode")
Fixes: 94d7d8f292 ("ip6_gre: add erspan v2 support")
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06 11:32:49 -05:00
William Tu
3df1928302 net: erspan: fix metadata extraction
Commit d350a82302 ("net: erspan: create erspan metadata uapi header")
moves the erspan 'version' in front of the 'struct erspan_md2' for
later extensibility reason.  This breaks the existing erspan metadata
extraction code because the erspan_md2 then has a 4-byte offset
to between the erspan_metadata and erspan_base_hdr.  This patch
fixes it.

Fixes: 1a66a836da ("gre: add collect_md mode to ERSPAN tunnel")
Fixes: ef7baf5e08 ("ip6_gre: add ip6 erspan collect_md mode")
Fixes: 1d7e2ed22f ("net: erspan: refactor existing erspan code")
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06 11:32:48 -05:00
Paolo Abeni
d7cdee5ea8 cls_u32: fix use after free in u32_destroy_key()
Li Shuang reported an Oops with cls_u32 due to an use-after-free
in u32_destroy_key(). The use-after-free can be triggered with:

dev=lo
tc qdisc add dev $dev root handle 1: htb default 10
tc filter add dev $dev parent 1: prio 5 handle 1: protocol ip u32 divisor 256
tc filter add dev $dev protocol ip parent 1: prio 5 u32 ht 800:: match ip dst\
 10.0.0.0/8 hashkey mask 0x0000ff00 at 16 link 1:
tc qdisc del dev $dev root

Which causes the following kasan splat:

 ==================================================================
 BUG: KASAN: use-after-free in u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
 Read of size 4 at addr ffff881b83dae618 by task kworker/u48:5/571

 CPU: 17 PID: 571 Comm: kworker/u48:5 Not tainted 4.15.0+ #87
 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
 Workqueue: tc_filter_workqueue u32_delete_key_freepf_work [cls_u32]
 Call Trace:
  dump_stack+0xd6/0x182
  ? dma_virt_map_sg+0x22e/0x22e
  print_address_description+0x73/0x290
  kasan_report+0x277/0x360
  ? u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
  u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
  u32_delete_key_freepf_work+0x1c/0x30 [cls_u32]
  process_one_work+0xae0/0x1c80
  ? sched_clock+0x5/0x10
  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
  ? _raw_spin_unlock_irq+0x29/0x40
  ? trace_hardirqs_on_caller+0x381/0x570
  ? _raw_spin_unlock_irq+0x29/0x40
  ? finish_task_switch+0x1e5/0x760
  ? finish_task_switch+0x208/0x760
  ? preempt_notifier_dec+0x20/0x20
  ? __schedule+0x839/0x1ee0
  ? check_noncircular+0x20/0x20
  ? firmware_map_remove+0x73/0x73
  ? find_held_lock+0x39/0x1c0
  ? worker_thread+0x434/0x1820
  ? lock_contended+0xee0/0xee0
  ? lock_release+0x1100/0x1100
  ? init_rescuer.part.16+0x150/0x150
  ? retint_kernel+0x10/0x10
  worker_thread+0x216/0x1820
  ? process_one_work+0x1c80/0x1c80
  ? lock_acquire+0x1a5/0x540
  ? lock_downgrade+0x6b0/0x6b0
  ? sched_clock+0x5/0x10
  ? lock_release+0x1100/0x1100
  ? compat_start_thread+0x80/0x80
  ? do_raw_spin_trylock+0x190/0x190
  ? _raw_spin_unlock_irq+0x29/0x40
  ? trace_hardirqs_on_caller+0x381/0x570
  ? _raw_spin_unlock_irq+0x29/0x40
  ? finish_task_switch+0x1e5/0x760
  ? finish_task_switch+0x208/0x760
  ? preempt_notifier_dec+0x20/0x20
  ? __schedule+0x839/0x1ee0
  ? kmem_cache_alloc_trace+0x143/0x320
  ? firmware_map_remove+0x73/0x73
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? find_held_lock+0x39/0x1c0
  ? schedule+0xf3/0x3b0
  ? lock_downgrade+0x6b0/0x6b0
  ? __schedule+0x1ee0/0x1ee0
  ? do_wait_intr_irq+0x340/0x340
  ? do_raw_spin_trylock+0x190/0x190
  ? _raw_spin_unlock_irqrestore+0x32/0x60
  ? process_one_work+0x1c80/0x1c80
  ? process_one_work+0x1c80/0x1c80
  kthread+0x312/0x3d0
  ? kthread_create_worker_on_cpu+0xc0/0xc0
  ret_from_fork+0x3a/0x50

 Allocated by task 1688:
  kasan_kmalloc+0xa0/0xd0
  __kmalloc+0x162/0x380
  u32_change+0x1220/0x3c9e [cls_u32]
  tc_ctl_tfilter+0x1ba6/0x2f80
  rtnetlink_rcv_msg+0x4f0/0x9d0
  netlink_rcv_skb+0x124/0x320
  netlink_unicast+0x430/0x600
  netlink_sendmsg+0x8fa/0xd60
  sock_sendmsg+0xb1/0xe0
  ___sys_sendmsg+0x678/0x980
  __sys_sendmsg+0xc4/0x210
  do_syscall_64+0x232/0x7f0
  return_from_SYSCALL_64+0x0/0x75

 Freed by task 112:
  kasan_slab_free+0x71/0xc0
  kfree+0x114/0x320
  rcu_process_callbacks+0xc3f/0x1600
  __do_softirq+0x2bf/0xc06

 The buggy address belongs to the object at ffff881b83dae600
  which belongs to the cache kmalloc-4096 of size 4096
 The buggy address is located 24 bytes inside of
  4096-byte region [ffff881b83dae600, ffff881b83daf600)
 The buggy address belongs to the page:
 page:ffffea006e0f6a00 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
 flags: 0x17ffffc0008100(slab|head)
 raw: 0017ffffc0008100 0000000000000000 0000000000000000 0000000100070007
 raw: dead000000000100 dead000000000200 ffff880187c0e600 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff881b83dae500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff881b83dae580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 >ffff881b83dae600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                             ^
  ffff881b83dae680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff881b83dae700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ==================================================================

The problem is that the htnode is freed before the linked knodes and the
latter will try to access the first at u32_destroy_key() time.
This change addresses the issue using the htnode refcnt to guarantee
the correct free order. While at it also add a RCU annotation,
to keep sparse happy.

v1 -> v2: use rtnl_derefence() instead of RCU read locks
v2 -> v3:
  - don't check refcnt in u32_destroy_hnode()
  - cleaned-up u32_destroy() implementation
  - cleaned-up code comment
v3 -> v4:
  - dropped unneeded comment

Reported-by: Li Shuang <shuali@redhat.com>
Fixes: c0d378ef12 ("net_sched: use tcf_queue_work() in u32 filter")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06 11:25:54 -05:00
Wolfram Sang
a3276892db net: amd-xgbe: fix comparison to bitshift when dealing with a mask
Due to a typo, the mask was destroyed by a comparison instead of a bit
shift.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06 11:24:51 -05:00
Andrew Lunn
a56c69803f net: phy: Handle not having GPIO enabled in the kernel
If CONFIG_GPIOLIB is disabled, fwnode_get_named_gpiod() becomes a stub
function, which return -ENOSYS. Handle this in the same way as
-ENOENT, i.e. assume there is no GPIO used to reset the PHYs.

Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Fixes: bafbdd527d ("phylib: Add device reset GPIO support")
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06 11:20:17 -05:00
Dan Carpenter
8a0f5b6f33 platform/x86: mlx-platform: Fix an ERR_PTR vs NULL issue
devm_ioport_map() returns NULL on error but we accidentally check for
error pointers instead.

Fixes: c6acad68eb ("platform/mellanox: mlxreg-hotplug: Modify to use a regmap interface")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Vadim Pasternak <vadimp@melanox.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-02-06 07:42:38 -08:00
Arnd Bergmann
ca66e79712 locking/qrwlock: include asm/byteorder.h as needed
Moving the qrwlock struct definition into a header file introduced
a subtle bug on all little-endian machines, where some files in some
configurations would see the fields in an incorrect order.  This was
found by building with an LTO enabled compiler that warns every time we
try to link together files with incompatible data structures.

A second patch changes linux/kconfig.h to always define the symbols,
but this seems to be the root cause of most of the issues, so I'd suggest
we do both.

On a current linux-next kernel, I verified that this header is
responsible for all type mismatches as a result from the endianess
confusion.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: e0d02285f1 ("locking/qrwlock: Use 'struct qrwlock' instead of 'struct __qrwlock'")
Link: http://lkml.kernel.org/r/20180202154104.1522809-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06 10:28:58 +01:00
Peter Zijlstra
81dcf89f03 jump_label: Add branch hints to static_branch_{un,}likely()
For some reason these were missing, I've not observed this patch
making a difference in the few code locations I checked, but this
makes sense.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06 10:28:57 +01:00
Mel Gorman
32e839dda3 sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS
The select_idle_sibling() (SIS) rewrite in commit:

  10e2f1acd0 ("sched/core: Rewrite and improve select_idle_siblings()")

... replaced a domain iteration with a search that broadly speaking
does a wrapped walk of the scheduler domain sharing a last-level-cache.

While this had a number of improvements, one consequence is that two tasks
that share a waker/wakee relationship push each other around a socket. Even
though two tasks may be active, all cores are evenly used. This is great from
a search perspective and spreads a load across individual cores, but it has
adverse consequences for cpufreq. As each CPU has relatively low utilisation,
cpufreq may decide the utilisation is too low to used a higher P-state and
overall computation throughput suffers.

While individual cpufreq and cpuidle drivers may compensate by artifically
boosting P-state (at c0) or avoiding lower C-states (during idle), it does
not help if hardware-based cpufreq (e.g. HWP) is used.

This patch tracks a recently used CPU based on what CPU a task was running
on when it last was a waker a CPU it was recently using when a task is a
wakee. During SIS, the recently used CPU is used as a target if it's still
allowed by the task and is idle.

The benefit may be non-obvious so consider an example of two tasks
communicating back and forth. Task A may be an application doing IO where
task B is a kworker or kthread like journald. Task A may issue IO, wake
B and B wakes up A on completion.  With the existing scheme this may look
like the following (potentially different IDs if SMT is in use but similar
principal applies).

 A (cpu 0)	wake	B (wakes on cpu 1)
 B (cpu 1)	wake	A (wakes on cpu 2)
 A (cpu 2)	wake	B (wakes on cpu 3)
 etc.

A careful reader may wonder why CPU 0 was not idle when B wakes A the
first time and it's simply due to the fact that A can be rescheduled to
another CPU and the pattern is that prev == target when B tries to wakeup A
and the information about CPU 0 has been lost.

With this patch, the pattern is more likely to be:

 A (cpu 0)	wake	B (wakes on cpu 1)
 B (cpu 1)	wake	A (wakes on cpu 0)
 A (cpu 0)	wake	B (wakes on cpu 1)
 etc

i.e. two communicating casts are more likely to use just two cores instead
of all available cores sharing a LLC.

The most dramatic speedup was noticed on dbench using the XFS filesystem on
UMA as clients interact heavily with workqueues in that configuration. Note
that a similar speedup is not observed on ext4 as the wakeup pattern
is different:

                          4.15.0-rc9             4.15.0-rc9
                           waprev-v1        biasancestor-v1
 Hmean      1      287.54 (   0.00%)      817.01 ( 184.14%)
 Hmean      2     1268.12 (   0.00%)     1781.24 (  40.46%)
 Hmean      4     1739.68 (   0.00%)     1594.47 (  -8.35%)
 Hmean      8     2464.12 (   0.00%)     2479.56 (   0.63%)
 Hmean     64     1455.57 (   0.00%)     1434.68 (  -1.44%)

The results can be less dramatic on NUMA where automatic balancing interferes
with the test. It's also known that network benchmarks running on localhost
also benefit quite a bit from this patch (roughly 10% on netperf RR for UDP
and TCP depending on the machine). Hackbench also seens small improvements
(6-11% depending on machine and thread count). The facebook schbench was also
tested but in most cases showed little or no different to wakeup latencies.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180130104555.4125-5-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06 10:20:37 +01:00
Mel Gorman
806486c377 sched/fair: Do not migrate if the prev_cpu is idle
wake_affine_idle() prefers to move a task to the current CPU if the
wakeup is due to an interrupt. The expectation is that the interrupt
data is cache hot and relevant to the waking task as well as avoiding
a search. However, there is no way to determine if there was cache hot
data on the previous CPU that may exceed the interrupt data. Furthermore,
round-robin delivery of interrupts can migrate tasks around a socket where
each CPU is under-utilised.  This can interact badly with cpufreq which
makes decisions based on per-cpu data. It has been observed on machines
with HWP that p-states are not boosted to their maximum levels even though
the workload is latency and throughput sensitive.

This patch uses the previous CPU for the task if it's idle and cache-affine
with the current CPU even if the current CPU is idle due to the wakup
being related to the interrupt. This reduces migrations at the cost of
the interrupt data not being cache hot when the task wakes.

A variety of workloads were tested on various machines and no adverse
impact was noticed that was outside noise. dbench on ext4 on UMA showed
roughly 10% reduction in the number of CPU migrations and it is a case
where interrupts are frequent for IO competions. In most cases, the
difference in performance is quite small but variability is often
reduced. For example, this is the result for pgbench running on a UMA
machine with different numbers of clients.

                          4.15.0-rc9             4.15.0-rc9
                            baseline              waprev-v1
 Hmean     1     22096.28 (   0.00%)    22734.86 (   2.89%)
 Hmean     4     74633.42 (   0.00%)    75496.77 (   1.16%)
 Hmean     7    115017.50 (   0.00%)   113030.81 (  -1.73%)
 Hmean     12   126209.63 (   0.00%)   126613.40 (   0.32%)
 Hmean     16   131886.91 (   0.00%)   130844.35 (  -0.79%)
 Stddev    1       636.38 (   0.00%)      417.11 (  34.46%)
 Stddev    4       614.64 (   0.00%)      583.24 (   5.11%)
 Stddev    7       542.46 (   0.00%)      435.45 (  19.73%)
 Stddev    12      173.93 (   0.00%)      171.50 (   1.40%)
 Stddev    16      671.42 (   0.00%)      680.30 (  -1.32%)
 CoeffVar  1         2.88 (   0.00%)        1.83 (  36.26%)

Note that the different in performance is marginal but for low utilisation,
there is less variability.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180130104555.4125-4-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06 10:20:36 +01:00