25710 Commits

Author SHA1 Message Date
Thomas Gleixner
19d39a3810 genirq: Keep chip buslock across irq_request/release_resources()
Moving the irq_request/release_resources() callbacks out of the spinlocked,
irq disabled and bus locked region, unearthed an interesting abuse of the
irq_bus_lock/irq_bus_sync_unlock() callbacks.

The OMAP GPIO driver does merily power management inside of them. The
irq_request_resources() callback of this GPIO irqchip calls a function
which reads a GPIO register. That read aborts now because the clock of the
GPIO block is not magically enabled via the irq_bus_lock() callback.

Move the callbacks under the bus lock again to prevent this. In the
free_irq() path this requires to drop the bus_lock before calling
synchronize_irq() and reaquiring it before calling the
irq_release_resources() callback.

The bus lock can't be held because:

   1) The data which has been changed between bus_lock/un_lock is cached in
      the irq chip driver private data and needs to go out to the irq chip
      via the slow bus (usually SPI or I2C) before calling
      synchronize_irq().

      That's the reason why this bus_lock/unlock magic exists in the first
      place, as you cannot do SPI/I2C transactions while holding desc->lock
      with interrupts disabled.

   2) synchronize_irq() will actually deadlock, if there is a handler on
      flight. These chips use threaded handlers for obvious reasons, as
      they allow to do SPI/I2C communication. When the threaded handler
      returns then bus_lock needs to be taken in irq_finalize_oneshot() as
      we need to talk to the actual irq chip once more. After that the
      threaded handler is marked done, which makes synchronize_irq() return.

      So if we hold bus_lock accross the synchronize_irq() call, the
      handler cannot mark itself done because it blocks on the bus
      lock. That in turn makes synchronize_irq() wait forever on the
      threaded handler to complete....

Add the missing unlock of desc->request_mutex in the error path of
__free_irq() and add a bunch of comments to explain the locking and
protection rules.

Fixes: 46e48e257360 ("genirq: Move irq resource handling out of spinlocked region")
Reported-and-tested-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-longer-ranted-at-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
2017-07-12 10:14:42 +02:00
Arnd Bergmann
69449bbd65 ftrace: Hide cached module code for !CONFIG_MODULES
When modules are disabled, we get a harmless build warning:

kernel/trace/ftrace.c:4051:13: error: 'process_cached_mods' defined but not used [-Werror=unused-function]

This adds the same #ifdef around the new code that exists around
its caller.

Link: http://lkml.kernel.org/r/20170710084413.1820568-1-arnd@arndb.de

Fixes: d7fbf8df7ca0 ("ftrace: Implement cached modules tracing on module load")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-07-11 19:29:04 -04:00
Steven Rostedt (VMware)
bbd1d27d86 tracing: Do note expose stack_trace_filter without DYNAMIC_FTRACE
The "stack_trace_filter" file only makes sense if DYNAMIC_FTRACE is
configured in. If it is not, then the user can not filter any functions.

Not only that, the open function causes warnings when DYNAMIC_FTRACE is not
set.

Link: http://lkml.kernel.org/r/20170710110521.600806-1-arnd@arndb.de

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-07-11 19:21:04 -04:00
Steven Rostedt (VMware)
b11fb73743 tracing: Fixup trace file header alignment
The addition of TGID to the tracing header added a check to see if TGID
shoudl be displayed or not, and updated the header accordingly.
Unfortunately, it broke the default header.

Also add constant strings to use for spacing. This does remove the
visibility of the header a bit, but cuts it down from the extended lines
much greater than 80 characters.

Before this change:

 # tracer: function
 #
 #                            _-----=> irqs-off
 #                           / _----=> need-resched
 #                          | / _---=> hardirq/softirq
 #                          || / _--=> preempt-depth
 #                          ||| /     delay
 #           TASK-PID   CPU#||||    TIMESTAMP  FUNCTION
 #              | |       | ||||       |         |
        swapper/0-1     [000] ....     0.277830: migration_init <-do_one_initcall
        swapper/0-1     [002] d...    13.861967: Unknown type 1201
        swapper/0-1     [002] d..1    13.861970: Unknown type 1202

After this change:

 # tracer: function
 #
 #                              _-----=> irqs-off
 #                             / _----=> need-resched
 #                            | / _---=> hardirq/softirq
 #                            || / _--=> preempt-depth
 #                            ||| /     delay
 #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
 #              | |       |   ||||       |         |
        swapper/0-1     [000] ....     0.278245: migration_init <-do_one_initcall
        swapper/0-1     [003] d...    13.861189: Unknown type 1201
        swapper/0-1     [003] d..1    13.861192: Unknown type 1202

Cc: Joel Fernandes <joelaf@google.com>
Fixes: 441dae8f2f29 ("tracing: Add support for display of tgid in trace output")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-07-11 16:48:19 -04:00
Thomas Gleixner
dea1d0f5f1 smp/hotplug: Replace BUG_ON and react useful
The move of the unpark functions to the control thread moved the BUG_ON()
there as well. While it made some sense in the idle thread of the upcoming
CPU, it's bogus to crash the control thread on the already online CPU,
especially as the function has a return value and the callsite is prepared
to handle an error return.

Replace it with a WARN_ON_ONCE() and return a proper error code.

Fixes: 9cd4f1a4e7a8 ("smp/hotplug: Move unparking of percpu threads to the control CPU")
Rightfully-ranted-at-by: Linux Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-07-11 22:25:44 +02:00
Ingo Molnar
6a8a75f323 Revert "perf/core: Drop kernel samples even though :u is specified"
This reverts commit cc1582c231ea041fbc68861dfaf957eaf902b829.

This commit introduced a regression that broke rr-project, which uses sampling
events to receive a signal on overflow (but does not care about the contents
of the sample). These signals are critical to the correct operation of rr.

There's been some back and forth about how to fix it - but to not keep
applications in limbo queue up a revert.

Reported-by: Kyle Huey <me@kylehuey.com>
Acked-by: Kyle Huey <me@kylehuey.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170628105600.GC5981@leverpostej
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-11 10:56:54 +02:00
Linus Torvalds
9967468c0a Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - most of the rest of MM

 - KASAN updates

 - lib/ updates

 - checkpatch updates

 - some binfmt_elf changes

 - various misc bits

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (115 commits)
  kernel/exit.c: avoid undefined behaviour when calling wait4()
  kernel/signal.c: avoid undefined behaviour in kill_something_info
  binfmt_elf: safely increment argv pointers
  s390: reduce ELF_ET_DYN_BASE
  powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
  arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
  arm: move ELF_ET_DYN_BASE to 4MB
  binfmt_elf: use ELF_ET_DYN_BASE only for PIE
  fs, epoll: short circuit fetching events if thread has been killed
  checkpatch: improve multi-line alignment test
  checkpatch: improve macro reuse test
  checkpatch: change format of --color argument to --color[=WHEN]
  checkpatch: silence perl 5.26.0 unescaped left brace warnings
  checkpatch: improve tests for multiple line function definitions
  checkpatch: remove false warning for commit reference
  checkpatch: fix stepping through statements with $stat and ctx_statement_block
  checkpatch: [HLP]LIST_HEAD is also declaration
  checkpatch: warn when a MAINTAINERS entry isn't [A-Z]:\t
  checkpatch: improve the unnecessary OOM message test
  lib/bsearch.c: micro-optimize pivot position calculation
  ...
2017-07-10 16:58:42 -07:00
zhongjiang
dd83c161fb kernel/exit.c: avoid undefined behaviour when calling wait4()
wait4(-2147483648, 0x20, 0, 0xdd0000) triggers:
UBSAN: Undefined behaviour in kernel/exit.c:1651:9

The related calltrace is as follows:

  negation of -2147483648 cannot be represented in type 'int':
  CPU: 9 PID: 16482 Comm: zj Tainted: G    B          ---- -------   3.10.0-327.53.58.71.x86_64+ #66
  Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285          /BC11BTSA              , BIOS CTSAV036 04/27/2011
  Call Trace:
    dump_stack+0x19/0x1b
    ubsan_epilogue+0xd/0x50
    __ubsan_handle_negate_overflow+0x109/0x14e
    SyS_wait4+0x1cb/0x1e0
    system_call_fastpath+0x16/0x1b

Exclude the overflow to avoid the UBSAN warning.

Link: http://lkml.kernel.org/r/1497264618-20212-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhongjiang <zhongjiang@huawei.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:36 -07:00
zhongjiang
4ea77014af kernel/signal.c: avoid undefined behaviour in kill_something_info
When running kill(72057458746458112, 0) in userspace I hit the following
issue.

  UBSAN: Undefined behaviour in kernel/signal.c:1462:11
  negation of -2147483648 cannot be represented in type 'int':
  CPU: 226 PID: 9849 Comm: test Tainted: G    B          ---- -------   3.10.0-327.53.58.70.x86_64_ubsan+ #116
  Hardware name: Huawei Technologies Co., Ltd. RH8100 V3/BC61PBIA, BIOS BLHSV028 11/11/2014
  Call Trace:
    dump_stack+0x19/0x1b
    ubsan_epilogue+0xd/0x50
    __ubsan_handle_negate_overflow+0x109/0x14e
    SYSC_kill+0x43e/0x4d0
    SyS_kill+0xe/0x10
    system_call_fastpath+0x16/0x1b

Add code to avoid the UBSAN detection.

[akpm@linux-foundation.org: tweak comment]
Link: http://lkml.kernel.org/r/1496670008-59084-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhongjiang <zhongjiang@huawei.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:36 -07:00
Thomas Meyer
a94c33dd1f lib/extable.c: use bsearch() library function in search_extable()
[thomas@m3y3r.de: v3: fix arch specific implementations]
  Link: http://lkml.kernel.org/r/1497890858.12931.7.camel@m3y3r.de
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:35 -07:00
Masahiro Yamada
63b23e2cbc kernel/kallsyms.c: replace all_var with IS_ENABLED(CONFIG_KALLSYMS_ALL)
'all_var' looks like a variable, but is actually a macro.  Use
IS_ENABLED(CONFIG_KALLSYMS_ALL) for clarification.

Link: http://lkml.kernel.org/r/1497577591-3434-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:34 -07:00
Rasmus Villemoes
b7b2562f72 kernel/groups.c: use sort library function
setgroups is not exactly a hot path, so we might as well use the library
function instead of open-coding the sorting.  Saves ~150 bytes.

Link: http://lkml.kernel.org/r/1497301378-22739-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:34 -07:00
Arvind Yadav
9dcdcea114 kernel/ksysfs.c: constify attribute_group structures.
attribute_groups are not supposed to change at runtime.  All functions
working with attribute_groups provided by <linux/sysfs.h> work with
const attribute_group.  So mark the non-const structs as const.

File size before:
   text	   data	    bss	    dec	    hex	filename
   1120	    544	     16	   1680	    690	kernel/ksysfs.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
   1160	    480	     16	   1656	    678	kernel/ksysfs.o

Link: http://lkml.kernel.org/r/aa224b3cc923fdbb3edd0c41b2c639c85408c9e8.1498737347.git.arvind.yadav.cs@gmail.com
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Petr Tesarik <ptesarik@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:34 -07:00
Michal Hocko
1860033237 mm: make PR_SET_THP_DISABLE immediately active
PR_SET_THP_DISABLE has a rather subtle semantic.  It doesn't affect any
existing mapping because it only updated mm->def_flags which is a
template for new mappings.

The mappings created after prctl(PR_SET_THP_DISABLE) have VM_NOHUGEPAGE
flag set.  This can be quite surprising for all those applications which
do not do prctl(); fork() & exec() and want to control their own THP
behavior.

Another usecase when the immediate semantic of the prctl might be useful
is a combination of pre- and post-copy migration of containers with
CRIU.  In this case CRIU populates a part of a memory region with data
that was saved during the pre-copy stage.  Afterwards, the region is
registered with userfaultfd and CRIU expects to get page faults for the
parts of the region that were not yet populated.  However, khugepaged
collapses the pages and the expected page faults do not occur.

In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a
temporary mechanism for enabling/disabling THP process wide.

Implementation wise, a new MMF_DISABLE_THP flag is added.  This flag is
tested when decision whether to use huge pages is taken either during
page fault of at the time of THP collapse.

It should be noted, that the new implementation makes PR_SET_THP_DISABLE
master override to any per-VMA setting, which was not the case
previously.

Fixes: a0715cc22601 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE")
Link: http://lkml.kernel.org/r/1496415802-30944-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:31 -07:00
Linus Torvalds
1633b39610 More power management updates for v4.13-rc1
- Revert a recent change in the generic power domains (genpd)
    framework that led to regressions and turned out the be misguided
    (Rafael Wysocki).
 
  - Fix a recently introduced build issue in the generic power domains
    (genpd) framework (Arnd Bergmann).
 
  - Constify attribute_group structures in the PM core, the cpufreq
    stats code and in intel_pstate (Arvind Yadav).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZY/L6AAoJEILEb/54YlRxMBUP/0lCziyBNAUPm8+gpC7RA5gv
 tZwtGPnDr2toWg3+te0Aqc3/LOKt4CFtQEui+IGGPA5ghoZZ53lPuxZH8MCEcv/I
 LhoHmNK+2D088JViiaXlkanLgjcbtkgWKEgRQXOm75XlbaReW3wKmiPkc8iLTRde
 tytdf82GUN0AKKWCsUMiiEWDCYs9mpM3MPX2GOS+ZPBZfX2cyubKZ9STPzzFDwFf
 NfP+NuzJmYfEonBXproTa6ZqAq2UVGPeolyYe1lwV8QVCU8Z4W6GRAKbSzk0Hq6N
 wcdpNaNmkQytjDQ1hZ0NNFecTH4qjStOkc9OwNZJwoSbC31sQGHyKnxWP8Re1+hU
 UmpIAuNBc6eKJmkyoOE9GfIB08AvvuB4s7B3X8ffpWGqQmASYAY9DhEKDlPmmkhD
 NV+HTUkebw+gZoJp6VGL072KGARrNEmodKrcmXA/T4T8ZwoHFbnQbzDaODzW7rzx
 1UxwCtUa/Jl5hOPngo0XuLnbeM7AAG1MjaJSKDqoUl4WbjdYG3f7yRxs6T+JS+dk
 1+NpVJiIKBM1bqP7Jf+v9xrbYG31w5blikxhCpjA601ztV0vgtiiojssKpNwjkpv
 Myh1BavAaLcnMCkCfHppLlXv2bnLHFANMyMcPU+EuLzPDTsxxxfhmbBHBur4r7BA
 DXmpRQvWCoAqlQzzvZoZ
 =B+R7
 -----END PGP SIGNATURE-----

Merge tag 'pm-extra-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These revert one recent change in the generic power domains
  framework, fix a recently introduced build issue in there and
  constify attribute_group structures in some places.

  Specifics:

   - Revert a recent change in the generic power domains (genpd)
     framework that led to regressions and turned out the be misguided
     (Rafael Wysocki).

   - Fix a recently introduced build issue in the generic power domains
     (genpd) framework (Arnd Bergmann).

   - Constify attribute_group structures in the PM core, the cpufreq
     stats code and in intel_pstate (Arvind Yadav)"

* tag 'pm-extra-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: constify attribute_group structures
  cpufreq: cpufreq_stats: constify attribute_group structures
  PM / sleep: constify attribute_group structures
  PM / Domains: provide pm_genpd_poweroff_noirq() stub
  Revert "PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device"
2017-07-10 15:16:21 -07:00
Rafael J. Wysocki
15d56b3921 Merge branches 'pm-domains', 'pm-sleep' and 'pm-cpufreq'
* pm-domains:
  PM / Domains: provide pm_genpd_poweroff_noirq() stub
  Revert "PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device"

* pm-sleep:
  PM / sleep: constify attribute_group structures

* pm-cpufreq:
  cpufreq: intel_pstate: constify attribute_group structures
  cpufreq: cpufreq_stats: constify attribute_group structures
2017-07-10 22:45:16 +02:00
Linus Torvalds
4d3c4a4293 Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp/hotplug fix from Thomas Gleixner:
 "A single fix for a brown paperbag bug:

  The unparking of the initial percpu threads of an upcoming CPU happens
  right now on the idle task, but that's wrong as the unpark function
  might sleep. Move it to the control CPU."

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  smp/hotplug: Move unparking of percpu threads to the control CPU
2017-07-09 11:16:19 -07:00
Linus Torvalds
4fde846ac0 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
 "This scheduler update provides:

   - The (hopefully) final fix for the vtime accounting issues which
     were around for quite some time

   - Use types known to user space in UAPI headers to unbreak user space
     builds

   - Make load balancing respect the current scheduling domain again
     instead of evaluating unrelated CPUs"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/headers/uapi: Fix linux/sched/types.h userspace compilation errors
  sched/fair: Fix load_balance() affinity redo path
  sched/cputime: Accumulate vtime on top of nsec clocksource
  sched/cputime: Move the vtime task fields to their own struct
  sched/cputime: Rename vtime fields
  sched/cputime: Always set tsk->vtime_snap_whence after accounting vtime
  vtime, sched/cputime: Remove vtime_account_user()
  Revert "sched/cputime: Refactor the cputime_adjust() code"
2017-07-09 10:52:16 -07:00
Linus Torvalds
c3931a87db Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A couple of fixes for perf and kprobes:

   - Add he missing exclude_kernel attribute for the precise_ip level so
     !CAP_SYS_ADMIN users get the proper results.

   - Warn instead of failing completely when perf has no unwind support
     for a particular architectiure built in.

   - Ensure that jprobes are at function entry and not at some random
     place"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes: Ensure that jprobe probepoints are at function entry
  kprobes: Simplify register_jprobes()
  kprobes: Rename [arch_]function_offset_within_entry() to [arch_]kprobe_on_func_entry()
  perf unwind: Do not fail due to missing unwind support
  perf evsel: Set attr.exclude_kernel when probing max attr.precise_ip
2017-07-09 10:49:47 -07:00
Linus Torvalds
c8b2ba83fb Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:

 - Fix the EINTR logic in rwsem-spinlock to avoid double locking by a
   writer and a reader

 - Add a missing include to qspinlocks

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/qspinlock: Explicitly include asm/prefetch.h
  locking/rwsem-spinlock: Fix EINTR branch in __down_write_common()
2017-07-09 10:47:50 -07:00
Linus Torvalds
7cb328c30a Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:

 - A few fixes mopping up the fallout of the big irq overhaul

 - Move the interrupt resource management logic out of the spin locked,
   irq disabled region to avoid unnecessary restrictions of the resource
   callbacks

 - Preparation for reworking the per cpu irq request function.

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqdomain: Allow ACPI device nodes to be used as irqdomain identifiers
  genirq/debugfs: Remove redundant NULL pointer check
  genirq: Allow to pass the IRQF_TIMER flag with percpu irq request
  genirq/timings: Move free timings out of spinlocked region
  genirq: Move irq resource handling out of spinlocked region
  genirq: Add mutex to irq desc to serialize request/free_irq()
  genirq: Move bus locking into __setup_irq()
  genirq: Force inlining of __irq_startup_managed to prevent build failure
  genirq/debugfs: Fix build for !CONFIG_IRQ_DOMAIN
2017-07-09 10:24:46 -07:00
Linus Torvalds
e28e9e3ec0 Merge branch 'waitid-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull waitid fix from Al Viro.

* 'waitid-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix waitid(2) breakage
2017-07-09 08:58:50 -07:00
Naveen N. Rao
fca18a47cf trace/kprobes: Sanitize derived event names
When we derive event names, convert some expected symbols (such as ':'
used to specify module:name and '.' present in some symbols) into
underscores so that the event name is not rejected.

Before this patch:
    # echo 'p kobject_example:foo_store' > kprobe_events
    trace_kprobe: Failed to allocate trace_probe.(-22)
    -sh: write error: Invalid argument

After this patch:
    # echo 'p kobject_example:foo_store' > kprobe_events
    # cat kprobe_events
    p:kprobes/p_kobject_example_foo_store_0 kobject_example:foo_store

Link: http://lkml.kernel.org/r/66c189e09e71361aba91dd4a5bd146a1b62a7a51.1499453040.git.naveen.n.rao@linux.vnet.ibm.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-07-09 07:45:53 -04:00
Linus Torvalds
f263fbb8d6 pci-v4.13-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZYAFUAAoJEFmIoMA60/r8cFQP/A4fpdjhd42WRNQXGTpZieop
 i40lBQtGdBn/UY97U6BoutcS1ygDi9OiSzg+IR6I90iMgidqyUHFhe4hGWgVHD2g
 Tg0KLzd+lKKfQ6Gqt1P6t4dLGLvyEj5NUbCeFE4XYODAUkkiBaOndax6DK1GvU54
 Vjuj63rHtMKFR/tG/4iFTigObqyI8QE6O9JVxwuvIyEX6RXKbJe+wkulv5taSnWt
 Ne94950i10MrELtNreVdi8UbCbXiqjg0r5sKI/WTJ7Bc7WsC7X5PhWlhcNrbHyBT
 Ivhoypkui3Ky8gvwWqL0KBG+cRp8prBXAdabrD9wRbz0TKnfGI6pQzseCGRnkE6T
 mhlSJpsSNIHaejoCjk93yPn5oRiTNtPMdVhMpEQL9V/crVRGRRmbd7v2TYvpMHVR
 JaPZ8bv+C2aBTY8uL3/v/rgrjsMKOYFeaxeNklpErxrknsbgb6BgubmeZXDvTBVv
 YUIbAkvveonUKisv+kbD8L7tp1+jdbRUT0AikS0NVgAJQhfArOmBcDpTL9YC51vE
 feFhkVx4A32vvOm7Zcg9A7IMXNjeSfccKGw3dJOAvzgDODuJiaCG6S0o7B5Yngze
 axMi87ixGT4QM98z/I4MC8E9rDrJdIitlpvb6ZBgiLzoO3kmvsIZZKt8UxWqf5r8
 w3U2HoyKH13Qbkn1xkum
 =mkyb
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

  - add sysfs max_link_speed/width, current_link_speed/width (Wong Vee
    Khee)

  - make host bridge IRQ mapping much more generic (Matthew Minter,
    Lorenzo Pieralisi)

  - convert most drivers to pci_scan_root_bus_bridge() (Lorenzo
    Pieralisi)

  - mutex sriov_configure() (Jakub Kicinski)

  - mutex pci_error_handlers callbacks (Christoph Hellwig)

  - split ->reset_notify() into ->reset_prepare()/reset_done()
    (Christoph Hellwig)

  - support multiple PCIe portdrv interrupts for MSI as well as MSI-X
    (Gabriele Paoloni)

  - allocate MSI/MSI-X vector for Downstream Port Containment (Gabriele
    Paoloni)

  - fix MSI IRQ affinity pre/post/min_vecs issue (Michael Hernandez)

  - test INTx masking during enumeration, not at run-time (Piotr Gregor)

  - avoid using device_may_wakeup() for runtime PM (Rafael J. Wysocki)

  - restore the status of PCI devices across hibernation (Chen Yu)

  - keep parent resources that start at 0x0 (Ard Biesheuvel)

  - enable ECRC only if device supports it (Bjorn Helgaas)

  - restore PRI and PASID state after Function-Level Reset (CQ Tang)

  - skip DPC event if device is not present (Keith Busch)

  - check domain when matching SMBIOS info (Sujith Pandel)

  - mark Intel XXV710 NIC INTx masking as broken (Alex Williamson)

  - avoid AMD SB7xx EHCI USB wakeup defect (Kai-Heng Feng)

  - work around long-standing Macbook Pro poweroff issue (Bjorn Helgaas)

  - add Switchtec "running" status flag (Logan Gunthorpe)

  - fix dra7xx incorrect RW1C IRQ register usage (Arvind Yadav)

  - modify xilinx-nwl IRQ chip for legacy interrupts (Bharat Kumar
    Gogada)

  - move VMD SRCU cleanup after bus, child device removal (Jon Derrick)

  - add Faraday clock handling (Linus Walleij)

  - configure Rockchip MPS and reorganize (Shawn Lin)

  - limit Qualcomm TLP size to 2K (hardware issue) (Srinivas Kandagatla)

  - support Tegra MSI 64-bit addressing (Thierry Reding)

  - use Rockchip normal (not privileged) register bank (Shawn Lin)

  - add HiSilicon Kirin SoC PCIe controller driver (Xiaowei Song)

  - add Sigma Designs Tango SMP8759 PCIe controller driver (Marc
    Gonzalez)

  - add MediaTek PCIe host controller support (Ryder Lee)

  - add Qualcomm IPQ4019 support (John Crispin)

  - add HyperV vPCI protocol v1.2 support (Jork Loeser)

  - add i.MX6 regulator support (Quentin Schulz)

* tag 'pci-v4.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (113 commits)
  PCI: tango: Add Sigma Designs Tango SMP8759 PCIe host bridge support
  PCI: Add DT binding for Sigma Designs Tango PCIe controller
  PCI: rockchip: Use normal register bank for config accessors
  dt-bindings: PCI: Add documentation for MediaTek PCIe
  PCI: Remove __pci_dev_reset() and pci_dev_reset()
  PCI: Split ->reset_notify() method into ->reset_prepare() and ->reset_done()
  PCI: xilinx: Make of_device_ids const
  PCI: xilinx-nwl: Modify IRQ chip for legacy interrupts
  PCI: vmd: Move SRCU cleanup after bus, child device removal
  PCI: vmd: Correct comment: VMD domains start at 0x10000, not 0x1000
  PCI: versatile: Add local struct device pointers
  PCI: tegra: Do not allocate MSI target memory
  PCI: tegra: Support MSI 64-bit addressing
  PCI: rockchip: Use local struct device pointer consistently
  PCI: rockchip: Check for clk_prepare_enable() errors during resume
  MAINTAINERS: Remove Wenrui Li as Rockchip PCIe driver maintainer
  PCI: rockchip: Configure RC's MPS setting
  PCI: rockchip: Reconfigure configuration space header type
  PCI: rockchip: Split out rockchip_pcie_cfg_configuration_accesses()
  PCI: rockchip: Move configuration accesses into rockchip_pcie_cfg_atu()
  ...
2017-07-08 15:51:57 -07:00
Linus Torvalds
fe1b518075 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next
Pull sparc updates from David Miller:

 1) Queued spinlocks and rwlocks for sparc64, from Babu Moger.

 2) Some const'ification from Arvind Yadav.

 3) LDC/VIO driver infrastructure changes to facilitate future upcoming
    drivers, from Jag Raman.

 4) Initialize sched_clock() et al. early so that the initial printk
    timestamps are all done while the implementation is available and
    functioning. From Pavel Tatashin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (38 commits)
  sparc: kernel: pmc: make of_device_ids const.
  sparc64: fix typo in property
  sparc64: add port_id to VIO device metadata
  sparc64: Enhance search for VIO device in MDESC
  sparc64: enhance VIO device probing
  sparc64: check if a client is allowed to register for MDESC notifications
  sparc64: remove restriction on VIO device name size
  sparc64: refactor code to obtain cfg_handle property from MDESC
  sparc64: add MDESC node name property to VIO device metadata
  sparc64: mdesc: use __GFP_REPEAT action modifier for VM allocation
  sparc64: expand MDESC interface
  sparc64: skip handshake for LDC channels in RAW mode
  sparc64: specify the device class in VIO version info. packet
  sparc64: ensure VIO operations are defined while being used
  sparc: kernel: apc: make of_device_ids const
  sparc/time: make of_device_ids const
  sparc64: broken %tick frequency on spitfire cpus
  sparc64: use prom interface to get %stick frequency
  sparc64: optimize functions that access tick
  sparc64: add hot-patched and inlined get_tick()
  ...
2017-07-08 12:14:14 -07:00
Al Viro
634a816095 fix waitid(2) breakage
We lose the distinction between "found a PID" and "nothing, but that's not
an error" a bit too early in waitid().  Easily fixed, fortunately...

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Fixes: 67d7ddded322 ("waitid(2): leave copyout of siginfo to syscall itself")
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-08 11:26:39 -04:00
Tejun Heo
610467270f cgroup: don't call migration methods if there are no tasks to migrate
Subsystem migration methods shouldn't be called for empty migrations.
cgroup_migrate_execute() implements this guarantee by bailing early if
there are no source css_sets.  This used to be correct before
a79a908fd2b0 ("cgroup: introduce cgroup namespaces"), but no longer
since the commit because css_sets can stay pinned without tasks in
them.

This caused cgroup_migrate_execute() call into cpuset migration
methods with an empty cgroup_taskset.  cpuset migration methods
correctly assume that cgroup_taskset_first() never returns NULL;
however, due to the bug, it can, leading to the following oops.

  Unable to handle kernel paging request for data at address 0x00000960
  Faulting instruction address: 0xc0000000001d6868
  Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  CPU: 14 PID: 16947 Comm: kworker/14:0 Tainted: G        W
  4.12.0-rc4-next-20170609 #2
  Workqueue: events cpuset_hotplug_workfn
  task: c00000000ca60580 task.stack: c00000000c728000
  NIP: c0000000001d6868 LR: c0000000001d6858 CTR: c0000000001d6810
  REGS: c00000000c72b720 TRAP: 0300   Tainted: GW (4.12.0-rc4-next-20170609)
  MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44722422  XER: 20000000
  CFAR: c000000000008710 DAR: 0000000000000960 DSISR: 40000000 SOFTE: 1
  GPR00: c0000000001d6858 c00000000c72b9a0 c000000001536e00 0000000000000000
  GPR04: c00000000c72b9c0 0000000000000000 c00000000c72bad0 c000000766367678
  GPR08: c000000766366d10 c00000000c72b958 c000000001736e00 0000000000000000
  GPR12: c0000000001d6810 c00000000e749300 c000000000123ef8 c000000775af4180
  GPR16: 0000000000000000 0000000000000000 c00000075480e9c0 c00000075480e9e0
  GPR20: c00000075480e8c0 0000000000000001 0000000000000000 c00000000c72ba20
  GPR24: c00000000c72baa0 c00000000c72bac0 c000000001407248 c00000000c72ba20
  GPR28: c00000000141fc80 c00000000c72bac0 c00000000c6bc790 0000000000000000
  NIP [c0000000001d6868] cpuset_can_attach+0x58/0x1b0
  LR [c0000000001d6858] cpuset_can_attach+0x48/0x1b0
  Call Trace:
  [c00000000c72b9a0] [c0000000001d6858] cpuset_can_attach+0x48/0x1b0 (unreliable)
  [c00000000c72ba00] [c0000000001cbe80] cgroup_migrate_execute+0xb0/0x450
  [c00000000c72ba80] [c0000000001d3754] cgroup_transfer_tasks+0x1c4/0x360
  [c00000000c72bba0] [c0000000001d923c] cpuset_hotplug_workfn+0x86c/0xa20
  [c00000000c72bca0] [c00000000011aa44] process_one_work+0x1e4/0x580
  [c00000000c72bd30] [c00000000011ae78] worker_thread+0x98/0x5c0
  [c00000000c72bdc0] [c000000000124058] kthread+0x168/0x1b0
  [c00000000c72be30] [c00000000000b2e8] ret_from_kernel_thread+0x5c/0x74
  Instruction dump:
  f821ffa1 7c7d1b78 60000000 60000000 38810020 7fa3eb78 3f42ffed 4bff4c25
  60000000 3b5a0448 3d420020 eb610020 <e9230960> 7f43d378 e9290000 f92af200
  ---[ end trace dcaaf98fb36d9e64 ]---

This patch fixes the bug by adding an explicit nr_tasks counter to
cgroup_taskset and skipping calling the migration methods if the
counter is zero.  While at it, remove the now spurious check on no
source css_sets.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: stable@vger.kernel.org # v4.6+
Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces")
Link: http://lkml.kernel.org/r/1497266622.15415.39.camel@abdul.in.ibm.com
2017-07-08 07:37:50 -04:00
Naveen N. Rao
dbf580623d kprobes: Ensure that jprobe probepoints are at function entry
Similar to commit 90ec5e89e393c ("kretprobes: Ensure probe location is
at function entry"), ensure that the jprobe probepoint is at function
entry.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a4525af6c5a42df385efa31251246cf7cca73598.1499443367.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-08 11:05:35 +02:00
Naveen N. Rao
0f73ff80b7 kprobes: Simplify register_jprobes()
Re-factor jprobe registration functions as the current version is
getting too unwieldy. Move the actual jprobe registration to
register_jprobe() and re-organize code accordingly.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/089cae4bfe73767f765291ee0e6fb0c3d240e5f1.1499443367.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-08 11:05:34 +02:00
Naveen N. Rao
659b957f20 kprobes: Rename [arch_]function_offset_within_entry() to [arch_]kprobe_on_func_entry()
Rename function_offset_within_entry() to scope it to kprobe namespace by
using kprobe_ prefix, and to also simplify it.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3aa6c7e2e4fb6e00f3c24fa306496a66edb558ea.1499443367.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-08 11:05:34 +02:00
Stafford Horne
5671360f29 locking/qspinlock: Explicitly include asm/prefetch.h
In architectures that use qspinlock, like x86, prefetch is loaded
indirectly via the asm/qspinlock.h include.  On other architectures, like
OpenRISC, which may want to use asm-generic/qspinlock.h the built will
fail without the asm/prefetch.h include.

Fix this by including directly.

Signed-off-by: Stafford Horne <shorne@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170707195658.23840-1-shorne@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-08 11:01:11 +02:00
Linus Torvalds
ef3ad0898a linux-kselftest-4.13-rc1-update
This update consists of:
 
 -- TAP13 framework and changes to some tests to convert to TAP13.
    Converting kselftest output to standard format will help identify
    run to run differences and pin point failures easily. TAP13 format
    has been in use for several years and the output is human friendly.
 
    Please find the specification:
    https://testanything.org/tap-version-13-specification.html
 
    Credit goes to Tim Bird for recommending TAP13 as a suitable format,
    and to Grag KH for kick starting the work with help from Paul Elder
    and Alice Ferrazzi
 
    The first phase of the TAp13 conversion is included in this update.
    Future updates will include updates to rest of the tests.
 
 -- Masami Hiramatsu fixed ftrace to run on 4.9 stable kernels.
 
 -- Kselftest documnetation has been converted to ReST format. Document
    now has a new home under Documentation/dev-tools.
 
 -- kselftest_harness.h is now available for general use as a result of
    Mickaël Salaün's work.
 
 -- Several fixes to skip and/or fail tests gracefully on older releases.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJZXo9JAAoJEAsCRMQNDUMc1OUQAOJsBFWiMgWWxOZg1RBT5khl
 7OvGLoHsu3qydF5gzVnyDuEZAGHRc4c6OKqbHIqQB3tp9o4PnX2m9KIa6z7sjzys
 jett2ZjMe7BtctBluZF0zVyCbRdAXgfxp7QGfv/CkN+hw4uztwFwen4LpwvJseLd
 gkie/lVPFKszyaWfiF3pDPazk5qhc53ChjAhnSkRY8HlwFcVtZwO7Ptvex0l8gO2
 t+ZxhX9zt3jxRbiHq5h/N6EDw2pPthvSR4iT4FcyYiwqxUK64Nq5RQpkxJTfu0iz
 l2mxMTNol/tDKH+iOvWJX565LzVXxonCf8Cne4mooqegkn0f2bnkPqoE5N8OwTdd
 oIGT/Vq84C5eQwPubtr2oXr6Xh7pywbPW8h7fn972QWl5ySbR4JEmdBzSviF5ALq
 Dwz8lJeGX6qYpSKz8aVqKYJ3U31hYxT/EPhGIJ4VtjcTxyfgcobaD26W0vT0Cjad
 dIdK11IDMxErquS1Vb/kkTzVxCnVhmWRsjmUeKLl/FxDkhiJmjIxaCOvtitzsiHz
 tooMpcCQ7Z97QbDxKfolpcCC563okYhUoca3EhZLq9pZkEwfbGN9YI4/i608oSaA
 K4mJgdL6c704TqGwouIBn/+MTWq4LOkzN2zUP0kpY2z61GvEPMYxmdoQBn2yHBb9
 cnt9MZNlZML2YqnMjiDf
 =j1Um
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-4.13-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest updates from Shuah Khan:
 "This update consists of:

   - TAP13 framework and changes to some tests to convert to TAP13.
     Converting kselftest output to standard format will help identify
     run to run differences and pin point failures easily. TAP13 format
     has been in use for several years and the output is human friendly.

     Please find the specification:
       https://testanything.org/tap-version-13-specification.html

     Credit goes to Tim Bird for recommending TAP13 as a suitable
     format, and to Grag KH for kick starting the work with help from
     Paul Elder and Alice Ferrazzi

     The first phase of the TAp13 conversion is included in this update.
     Future updates will include updates to rest of the tests.

   - Masami Hiramatsu fixed ftrace to run on 4.9 stable kernels.

   - Kselftest documnetation has been converted to ReST format. Document
     now has a new home under Documentation/dev-tools.

   - kselftest_harness.h is now available for general use as a result of
     Mickaël Salaün's work.

   - Several fixes to skip and/or fail tests gracefully on older
     releases"

* tag 'linux-kselftest-4.13-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (48 commits)
  selftests: membarrier: use ksft_* var arg msg api
  selftests: breakpoints: breakpoint_test_arm64: convert test to use TAP13
  selftests: breakpoints: step_after_suspend_test use ksft_* var arg msg api
  selftests: breakpoint_test: use ksft_* var arg msg api
  kselftest: add ksft_print_msg() function to output general information
  kselftest: make ksft_* output functions variadic
  selftests/capabilities: Fix the test_execve test
  selftests: intel_pstate: add .gitignore
  selftests: fix memory-hotplug test
  selftests: add missing test name in memory-hotplug test
  selftests: check percentage range for memory-hotplug test
  selftests: check hot-pluggagble memory for memory-hotplug test
  selftests: typo correction for memory-hotplug test
  selftests: ftrace: Use md5sum to take less time of checking logs
  tools/testing/selftests/sysctl: Add pre-check to the value of writes_strict
  kselftest.rst: do some adjustments after ReST conversion
  selftest/net/Makefile: Specify output with $(OUTPUT)
  selftest/intel_pstate/aperf: Use LDLIBS instead of LDFLAGS
  selftest/memfd/Makefile: Fix build error
  selftests: lib: Skip tests on missing test modules
  ...
2017-07-07 14:04:47 -07:00
Joel Fernandes
29b1a8ad7d tracing: Attempt to record other information even if some fail
In recent patches where we record comm and tgid at the same time, we skip
continuing to record if any fail. Fix that by trying to record as many things
as we can even if some couldn't be recorded. If any information isn't recorded,
then we don't set trace_taskinfo_save as before.

Link: http://lkml.kernel.org/r/20170706230023.17942-3-joelaf@google.com

Cc: kernel-team@android.com
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-07-07 09:11:34 -04:00
Joel Fernandes
bd45d34d25 tracing: Treat recording tgid for idle task as a success
Currently we stop recording tgid for non-idle tasks when switching from/to idle
task since we treat that as a record failure. Fix that by treat recording of
tgid for idle task as a success.

Link: http://lkml.kernel.org/r/20170706230023.17942-2-joelaf@google.com

Cc: kernel-team@android.com
Cc: Ingo Molnar <mingo@redhat.com>
Reported-by: Michael Sartain <mikesart@gmail.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-07-07 09:04:23 -04:00
Joel Fernandes
eaf260ac04 tracing: Treat recording comm for idle task as a success
Currently we stop recording comm for non-idle tasks when switching from/to idle
task since we treat that as a record failure. Fix that by treat recording of
comm for idle task as a success.

Link: http://lkml.kernel.org/r/20170706230023.17942-1-joelaf@google.com

Cc: kernel-team@android.com
Cc: Ingo Molnar <mingo@redhat.com>
Reported-by: Michael Sartain <mikesart@gmail.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-07-07 09:04:14 -04:00
Marc Zyngier
c5c601c429 irqdomain: Allow ACPI device nodes to be used as irqdomain identifiers
A number of irqchip implementations are (ab)using the irqdomain allocator
by passing a fwnode that is neither a FWNODE_OF or a FWNODE_IRQCHIP.

This is pretty bad, but it also feels pretty crap to force these drivers to
allocate their own irqchip_fwid when they already have a proper fwnode.

Instead, let's teach the irqdomain allocator about ACPI device nodes, and
add some lovely name generation code... Tested on an arm64 D05 system.

Reported-and-tested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Ma Jun <majun258@huawei.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Link: http://lkml.kernel.org/r/20170707083959.10349-1-marc.zyngier@arm.com
2017-07-07 12:13:29 +02:00
Thomas Gleixner
f610c9d68b genirq/debugfs: Remove redundant NULL pointer check
debugfs_remove() can be called with a NULL pointer.

Fixes: 087cdfb662ae5 ("genirq/debugfs: Add proper debugfs interface")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-07-07 08:57:57 +02:00
Linus Torvalds
9f45efb928 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - a few hotfixes

 - various misc updates

 - ocfs2 updates

 - most of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (108 commits)
  mm, memory_hotplug: move movable_node to the hotplug proper
  mm, memory_hotplug: drop CONFIG_MOVABLE_NODE
  mm, memory_hotplug: drop artificial restriction on online/offline
  mm: memcontrol: account slab stats per lruvec
  mm: memcontrol: per-lruvec stats infrastructure
  mm: memcontrol: use generic mod_memcg_page_state for kmem pages
  mm: memcontrol: use the node-native slab memory counters
  mm: vmstat: move slab statistics from zone to node counters
  mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()
  mm/zswap.c: improve a size determination in zswap_frontswap_init()
  mm/zswap.c: delete an error message for a failed memory allocation in zswap_pool_create()
  mm/swapfile.c: sort swap entries before free
  mm/oom_kill: count global and memory cgroup oom kills
  mm: per-cgroup memory reclaim stats
  mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects
  mm: kmemleak: factor object reference updating out of scan_block()
  mm: kmemleak: slightly reduce the size of some structures on 64-bit architectures
  mm, mempolicy: don't check cpuset seqlock where it doesn't matter
  mm, cpuset: always use seqlock when changing task's nodemask
  mm, mempolicy: simplify rebinding mempolicies when updating cpusets
  ...
2017-07-06 22:27:08 -07:00
Linus Torvalds
c856863988 Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc compat stuff updates from Al Viro:
 "This part is basically untangling various compat stuff. Compat
  syscalls moved to their native counterparts, getting rid of quite a
  bit of double-copying and/or set_fs() uses. A lot of field-by-field
  copyin/copyout killed off.

   - kernel/compat.c is much closer to containing just the
     copyin/copyout of compat structs. Not all compat syscalls are gone
     from it yet, but it's getting there.

   - ipc/compat_mq.c killed off completely.

   - block/compat_ioctl.c cleaned up; floppy compat ioctls moved to
     drivers/block/floppy.c where they belong. Yes, there are several
     drivers that implement some of the same ioctls. Some are m68k and
     one is 32bit-only pmac. drivers/block/floppy.c is the only one in
     that bunch that can be built on biarch"

* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mqueue: move compat syscalls to native ones
  usbdevfs: get rid of field-by-field copyin
  compat_hdio_ioctl: get rid of set_fs()
  take floppy compat ioctls to sodding floppy.c
  ipmi: get rid of field-by-field __get_user()
  ipmi: get COMPAT_IPMICTL_RECEIVE_MSG in sync with the native one
  rt_sigtimedwait(): move compat to native
  select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()
  put_compat_rusage(): switch to copy_to_user()
  sigpending(): move compat to native
  getrlimit()/setrlimit(): move compat to native
  times(2): move compat to native
  compat_{get,put}_bitmap(): use unsafe_{get,put}_user()
  fb_get_fscreeninfo(): don't bother with do_fb_ioctl()
  do_sigaltstack(): lift copying to/from userland into callers
  take compat_sys_old_getrlimit() to native syscall
  trim __ARCH_WANT_SYS_OLD_GETRLIMIT
2017-07-06 20:57:13 -07:00
Linus Torvalds
2074006dac The new features of this release:
- Added TRACE_DEFINE_SIZEOF() which allows trace events that use
     sizeof() it the TP_printk() to be converted to the actual size such
     that trace-cmd and perf can parse them correctly.
 
   - Some rework of the TRACE_DEFINE_ENUM() such that the above
     TRACE_DEFINE_SIZEOF() could reuse the same code.
 
   - Recording of tgid (Thread Group ID). This is similar to how
     task COMMs are recorded (cached at sched_switch), where it is
     in a table and used on output of the trace and trace_pipe files.
 
   - Have ":mod:<module>" be cached when written into set_ftrace_filter.
     Then the functions of the module will be traced at module load.
 
   - Some random clean ups and small fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJZXjYuFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 fsgIAKUvhpn2igoYCR9tWqu+DovEmwxCIumbCzmCFQcRKlLttRte94yY5+W9hnV0
 JPzd9T9zBDVqq1fI7iIop1SuTwEfKW6lJom0usZ8AFpK+YKm6FHnQ28POlvHzre2
 lzO41tpRWiehLQsITZ47eByhsvEfhx86mYT/oM1JSR6Pii1OpjyNYmDMw6BaMNBT
 kSCQFgIhzAhVuHjwAnB/S++E/ou7M5bCwCb5CNh7MubKubV5upHpoJcgYGO+WWa6
 56H/iEhff4EECTGJVefd8e78MtJPL8EsuM0nAcMPlnl8AaiOpP7XCdlgTwdefLvP
 b3o+nP15voSHkARGXC6eM6gH0po=
 =rvGB
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The new features of this release:

   - Added TRACE_DEFINE_SIZEOF() which allows trace events that use
     sizeof() it the TP_printk() to be converted to the actual size such
     that trace-cmd and perf can parse them correctly.

   - Some rework of the TRACE_DEFINE_ENUM() such that the above
     TRACE_DEFINE_SIZEOF() could reuse the same code.

   - Recording of tgid (Thread Group ID). This is similar to how task
     COMMs are recorded (cached at sched_switch), where it is in a table
     and used on output of the trace and trace_pipe files.

   - Have ":mod:<module>" be cached when written into set_ftrace_filter.
     Then the functions of the module will be traced at module load.

   - Some random clean ups and small fixes"

* tag 'trace-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (26 commits)
  ftrace: Test for NULL iter->tr in regex for stack_trace_filter changes
  ftrace: Decrement count for dyn_ftrace_total_info for init functions
  ftrace: Unlock hash mutex on failed allocation in process_mod_list()
  tracing: Add support for display of tgid in trace output
  tracing: Add support for recording tgid of tasks
  ftrace: Decrement count for dyn_ftrace_total_info file
  ftrace: Remove unused function ftrace_arch_read_dyn_info()
  sh/ftrace: Remove only user of ftrace_arch_read_dyn_info()
  ftrace: Have cached module filters be an active filter
  ftrace: Implement cached modules tracing on module load
  ftrace: Have the cached module list show in set_ftrace_filter
  ftrace: Add :mod: caching infrastructure to trace_array
  tracing: Show address when function names are not found
  ftrace: Add missing comment for FTRACE_OPS_FL_RCU
  tracing: Rename update the enum_map file
  tracing: Add TRACE_DEFINE_SIZEOF() macros
  tracing: define TRACE_DEFINE_SIZEOF() macro to map sizeof's to their values
  tracing: Rename enum_replace to eval_replace
  trace: rename enum_map functions
  trace: rename trace.c enum functions
  ...
2017-07-06 19:45:45 -07:00
Johannes Weiner
ed52be7bfd mm: memcontrol: use generic mod_memcg_page_state for kmem pages
The kmem-specific functions do the same thing.  Switch and drop.

Link: http://lkml.kernel.org/r/20170530181724.27197-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Vlastimil Babka
5f155f27cb mm, cpuset: always use seqlock when changing task's nodemask
When updating task's mems_allowed and rebinding its mempolicy due to
cpuset's mems being changed, we currently only take the seqlock for
writing when either the task has a mempolicy, or the new mems has no
intersection with the old mems.

This should be enough to prevent a parallel allocation seeing no
available nodes, but the optimization is IMHO unnecessary (cpuset
updates should not be frequent), and we still potentially risk issues if
the intersection of new and old nodes has limited amount of
free/reclaimable memory.

Let's just use the seqlock for all tasks.

Link: http://lkml.kernel.org/r/20170517081140.30654-6-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:34 -07:00
Vlastimil Babka
213980c0f2 mm, mempolicy: simplify rebinding mempolicies when updating cpusets
Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") has introduced a two-step protocol when
rebinding task's mempolicy due to cpuset update, in order to avoid a
parallel allocation seeing an empty effective nodemask and failing.

Later, commit cc9a6c877661 ("cpuset: mm: reduce large amounts of memory
barrier related damage v3") introduced a seqlock protection and removed
the synchronization point between the two update steps.  At that point
(or perhaps later), the two-step rebinding became unnecessary.

Currently it only makes sure that the update first adds new nodes in
step 1 and then removes nodes in step 2.  Without memory barriers the
effects are questionable, and even then this cannot prevent a parallel
zonelist iteration checking the nodemask at each step to observe all
nodes as unusable for allocation.  We now fully rely on the seqlock to
prevent premature OOMs and allocation failures.

We can thus remove the two-step update parts and simplify the code.

Link: http://lkml.kernel.org/r/20170517081140.30654-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:34 -07:00
Pavel Tatashin
3d375d7859 mm: update callers to use HASH_ZERO flag
Update dcache, inode, pid, mountpoint, and mount hash tables to use
HASH_ZERO, and remove initialization after allocations.  In case of
places where HASH_EARLY was used such as in __pv_init_lock_hash the
zeroed hash table was already assumed, because memblock zeroes the
memory.

CPU: SPARC M6, Memory: 7T
Before fix:
  Dentry cache hash table entries: 1073741824
  Inode-cache hash table entries: 536870912
  Mount-cache hash table entries: 16777216
  Mountpoint-cache hash table entries: 16777216
  ftrace: allocating 20414 entries in 40 pages
  Total time: 11.798s

After fix:
  Dentry cache hash table entries: 1073741824
  Inode-cache hash table entries: 536870912
  Mount-cache hash table entries: 16777216
  Mountpoint-cache hash table entries: 16777216
  ftrace: allocating 20414 entries in 40 pages
  Total time: 3.198s

CPU: Intel Xeon E5-2630, Memory: 2.2T:
Before fix:
  Dentry cache hash table entries: 536870912
  Inode-cache hash table entries: 268435456
  Mount-cache hash table entries: 8388608
  Mountpoint-cache hash table entries: 8388608
  CPU: Physical Processor ID: 0
  Total time: 3.245s

After fix:
  Dentry cache hash table entries: 536870912
  Inode-cache hash table entries: 268435456
  Mount-cache hash table entries: 8388608
  Mountpoint-cache hash table entries: 8388608
  CPU: Physical Processor ID: 0
  Total time: 3.244s

Link: http://lkml.kernel.org/r/1488432825-92126-4-git-send-email-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:33 -07:00
Mike Rapoport
57ecbd3831 kernel/exit.c: don't include unused userfaultfd_k.h
Commit dd0db88d8094 ("userfaultfd: non-cooperative: rollback
userfaultfd_exit") removed userfaultfd callback from exit() which makes
the include of <linux/userfaultfd_k.h> unnecessary.

Link: http://lkml.kernel.org/r/1494930907-3060-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:32 -07:00
Michal Hocko
3d79a728f9 mm, memory_hotplug: replace for_device by want_memblock in arch_add_memory
arch_add_memory gets for_device argument which then controls whether we
want to create memblocks for created memory sections.  Simplify the
logic by telling whether we want memblocks directly rather than going
through pointless negation.  This also makes the api easier to
understand because it is clear what we want rather than nothing telling
for_device which can mean anything.

This shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170515085827.16474-13-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:32 -07:00
Michal Hocko
f1dd2cd13c mm, memory_hotplug: do not associate hotadded memory to zones until online
The current memory hotplug implementation relies on having all the
struct pages associate with a zone/node during the physical hotplug
phase (arch_add_memory->__add_pages->__add_section->__add_zone).  In the
vast majority of cases this means that they are added to ZONE_NORMAL.
This has been so since 9d99aaa31f59 ("[PATCH] x86_64: Support memory
hotadd without sparsemem") and it wasn't a big deal back then because
movable onlining didn't exist yet.

Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
onlining 511c2aba8f07 ("mm, memory-hotplug: dynamic configure movable
memory and portion memory") and then things got more complicated.
Rather than reconsidering the zone association which was no longer
needed (because the memory hotplug already depended on SPARSEMEM) a
convoluted semantic of zone shifting has been developed.  Only the
currently last memblock or the one adjacent to the zone_movable can be
onlined movable.  This essentially means that the online type changes as
the new memblocks are added.

Let's simulate memory hot online manually
  $ echo 0x100000000 > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory32/valid_zones
  Normal Movable

  $ echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable

  $ echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal
  /sys/devices/system/memory/memory34/valid_zones:Normal Movable

  $ echo online_movable > /sys/devices/system/memory/memory34/state
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Movable Normal

This is an awkward semantic because an udev event is sent as soon as the
block is onlined and an udev handler might want to online it based on
some policy (e.g.  association with a node) but it will inherently race
with new blocks showing up.

This patch changes the physical online phase to not associate pages with
any zone at all.  All the pages are just marked reserved and wait for
the onlining phase to be associated with the zone as per the online
request.  There are only two requirements

	- existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap

	- ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses

the latter one is not an inherent requirement and can be changed in the
future.  It preserves the current behavior and made the code slightly
simpler.  This is subject to change in future.

This means that the same physical online steps as above will lead to the
following state: Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Movable

Implementation:
The current move_pfn_range is reimplemented to check the above
requirements (allow_online_pfn_range) and then updates the respective
zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
pfn range with the zone/node.  __add_pages is updated to not require the
zone and only initializes sections in the range.  This allowed to
simplify the arch_add_memory code (s390 could get rid of quite some of
code).

devm_memremap_pages is the only user of arch_add_memory which relies on
the zone association because it only hooks into the memory hotplug only
half way.  It uses it to associate the new memory with ZONE_DEVICE but
doesn't allow it to be {on,off}lined via sysfs.  This means that this
particular code path has to call move_pfn_range_to_zone explicitly.

The original zone shifting code is kept in place and will be removed in
the follow up patch for an easier review.

Please note that this patch also changes the original behavior when
offlining a memory block adjacent to another zone (Normal vs.  Movable)
used to allow to change its movable type.  This will be handled later.

[richard.weiyang@gmail.com: simplify zone_intersects()]
  Link: http://lkml.kernel.org/r/20170616092335.5177-1-richard.weiyang@gmail.com
[richard.weiyang@gmail.com: remove duplicate call for set_page_links]
  Link: http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiyang@gmail.com
[akpm@linux-foundation.org: remove unused local `i']
Link: http://lkml.kernel.org/r/20170515085827.16474-12-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # For s390 bits
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:32 -07:00
Michael Ellerman
563ec5cbc6 kernel/module.c: use linux/set_memory.h
This header always exists, so doesn't require an ifdef around its
inclusion.  When CONFIG_ARCH_HAS_SET_MEMORY=y it includes the asm
header, otherwise it provides empty versions of the set_memory_xx()
routines.

The usages of set_memory_xx() are still guarded by
CONFIG_STRICT_MODULE_RWX.

Link: http://lkml.kernel.org/r/1498717781-29151-3-git-send-email-mpe@ellerman.id.au
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Laura Abbott <labbott@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:30 -07:00
Michael Ellerman
61f6d09a93 kernel/power/snapshot.c: use linux/set_memory.h
This header always exists, so doesn't require an ifdef around its
inclusion.  When CONFIG_ARCH_HAS_SET_MEMORY=y it includes the asm
header, otherwise it provides empty versions of the set_memory_xx()
routines.

Link: http://lkml.kernel.org/r/1498717781-29151-2-git-send-email-mpe@ellerman.id.au
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Laura Abbott <labbott@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:30 -07:00
Marcin Nowakowski
c0d80ddab8 kernel/extable.c: mark core_kernel_text notrace
core_kernel_text is used by MIPS in its function graph trace processing,
so having this method traced leads to an infinite set of recursive calls
such as:

  Call Trace:
     ftrace_return_to_handler+0x50/0x128
     core_kernel_text+0x10/0x1b8
     prepare_ftrace_return+0x6c/0x114
     ftrace_graph_caller+0x20/0x44
     return_to_handler+0x10/0x30
     return_to_handler+0x0/0x30
     return_to_handler+0x0/0x30
     ftrace_ops_no_ops+0x114/0x1bc
     core_kernel_text+0x10/0x1b8
     core_kernel_text+0x10/0x1b8
     core_kernel_text+0x10/0x1b8
     ftrace_ops_no_ops+0x114/0x1bc
     core_kernel_text+0x10/0x1b8
     prepare_ftrace_return+0x6c/0x114
     ftrace_graph_caller+0x20/0x44
     (...)

Mark the function notrace to avoid it being traced.

Link: http://lkml.kernel.org/r/1498028607-6765-1-git-send-email-marcin.nowakowski@imgtec.com
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:29 -07:00