25761 Commits

Author SHA1 Message Date
Thomas Gleixner
7edaeb6841 kernel/watchdog: Prevent false positives with turbo modes
The hardlockup detector on x86 uses a performance counter based on unhalted
CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
performance counter period, so the hrtimer should fire 2-3 times before the
performance counter NMI fires. The NMI code checks whether the hrtimer
fired since the last invocation. If not, it assumess a hard lockup.

The calculation of those periods is based on the nominal CPU
frequency. Turbo modes increase the CPU clock frequency and therefore
shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
nominal frequency) the perf/NMI period is shorter than the hrtimer period
which leads to false positives.

A simple fix would be to shorten the hrtimer period, but that comes with
the side effect of more frequent hrtimer and softlockup thread wakeups,
which is not desired.

Implement a low pass filter, which checks the perf/NMI period against
kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
elapsed then the event is ignored and postponed to the next perf/NMI.

That solves the problem and avoids the overhead of shorter hrtimer periods
and more frequent softlockup thread wakeups.

Fixes: 58687acba592 ("lockup_detector: Combine nmi_watchdog and softlockup detector")
Reported-and-tested-by: Kan Liang <Kan.liang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: dzickus@redhat.com
Cc: prarit@redhat.com
Cc: ak@linux.intel.com
Cc: babu.moger@oracle.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: acme@redhat.com
Cc: stable@vger.kernel.org
Cc: atomlin@redhat.com
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos
2017-08-18 12:35:02 +02:00
Marc Zyngier
e8f241893d genirq: Restore trigger settings in irq_modify_status()
irq_modify_status starts by clearing the trigger settings from
irq_data before applying the new settings, but doesn't restore them,
leaving them to IRQ_TYPE_NONE.

That's pretty confusing to the potential request_irq() that could
follow. Instead, snapshot the settings before clearing them, and restore
them if the irq_modify_status() invocation was not changing the trigger.

Fixes: 1e2a7d78499e ("irqdomain: Don't set type when mapping an IRQ")
Reported-and-tested-by: jeffy <jeffy.chen@rock-chips.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jon Hunter <jonathanh@nvidia.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170818095345.12378-1-marc.zyngier@arm.com
2017-08-18 12:04:14 +02:00
Thomas Gleixner
6629695465 Merge branch 'irq/for-gpio' into irq/core
Merge the flow handlers and irq domain extensions which are in a separate
branch so they can be consumed by the gpio folks.
2017-08-18 11:22:27 +02:00
David Daney
495c38d300 irqdomain: Add irq_domain_{push,pop}_irq() functions
For an already existing irqdomain hierarchy, as might be obtained via
a call to pci_enable_msix_range(), a PCI driver wishing to add an
additional irqdomain to the hierarchy needs to be able to insert the
irqdomain to that already initialized hierarchy.  Calling
irq_domain_create_hierarchy() allows the new irqdomain to be created,
but no existing code allows for initializing the associated irq_data.

Add a couple of helper functions (irq_domain_push_irq() and
irq_domain_pop_irq()) to initialize the irq_data for the new
irqdomain added to an existing hierarchy.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-gpio@vger.kernel.org
Link: http://lkml.kernel.org/r/1503017616-3252-6-git-send-email-david.daney@cavium.com
2017-08-18 11:21:42 +02:00
David Daney
0d12ec075a irqdomain: Check for NULL function pointer in irq_domain_free_irqs_hierarchy()
A follow-on patch will call irq_domain_free_irqs_hierarchy() when the
free() function pointer may be NULL.

Add a NULL pointer check to handle this new use case.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-gpio@vger.kernel.org
Link: http://lkml.kernel.org/r/1503017616-3252-5-git-send-email-david.daney@cavium.com
2017-08-18 11:21:42 +02:00
David Daney
b526adfe1b irqdomain: Factor out code to add and remove items to and from the revmap
The code to add and remove items to and from the revmap occurs several
times.

In preparation for the follow on patches that add more uses of this
code, factor this out in to separate static functions.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-gpio@vger.kernel.org
Link: http://lkml.kernel.org/r/1503017616-3252-4-git-send-email-david.daney@cavium.com
2017-08-18 11:21:41 +02:00
David Daney
7703b08cc9 genirq: Add handle_fasteoi_{level,edge}_irq flow handlers
Follow-on patch for gpio-thunderx uses a irqdomain hierarchy which
requires slightly different flow handlers, add them to chip.c which
contains most of the other flow handlers.  Make these conditionally
compiled based on CONFIG_IRQ_FASTEOI_HIERARCHY_HANDLERS.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-gpio@vger.kernel.org
Link: http://lkml.kernel.org/r/1503017616-3252-3-git-send-email-david.daney@cavium.com
2017-08-18 11:21:41 +02:00
David Daney
65efd9a49a genirq: Export more irq_chip_*_parent() functions
Many of the family of functions including irq_chip_mask_parent(),
irq_chip_unmask_parent() are exported, but not all.

Add EXPORT_SYMBOL_GPL to irq_chip_enable_parent,
irq_chip_disable_parent and irq_chip_set_affinity_parent, so they
likewise are usable from modules.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-gpio@vger.kernel.org
Link: http://lkml.kernel.org/r/1503017616-3252-2-git-send-email-david.daney@cavium.com
2017-08-18 11:21:40 +02:00
Marc Zyngier
6bc6d4abd2 genirq/proc: Use the the accessor to report the effective affinity
If CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK is defined, but that the
interrupt is not single target, the effective affinity reported in
/proc/irq/x/effective_affinity will be empty, which is not the truth.

Instead, use the accessor to report the affinity, which will pick
the right mask.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: Wei Xu <xuwei5@hisilicon.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Gregory Clement <gregory.clement@free-electrons.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Link: http://lkml.kernel.org/r/20170818083925.10108-3-marc.zyngier@arm.com
2017-08-18 10:54:39 +02:00
Marc Zyngier
536e2e34bd genirq/debugfs: Triggering of interrupts from userspace
When developing new (and therefore buggy) interrupt related
code, it can sometimes be useful to inject interrupts without
having to rely on a device to actually generate them.

This functionnality relies either on the irqchip driver to
expose a irq_set_irqchip_state(IRQCHIP_STATE_PENDING) callback,
or on the core code to be able to retrigger a (edge-only)
interrupt.

To use this feature:

echo -n trigger > /sys/kernel/debug/irq/irqs/IRQNUM

WARNING: This is DANGEROUS, and strictly a debug feature.
Do not use it on a production system. Your HW is likely to
catch fire, your data to be corrupted, and reporting this will
make you look an even bigger fool than the idiot who wrote
this patch.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170818081156.9264-1-marc.zyngier@arm.com
2017-08-18 10:36:24 +02:00
Srinivas Pandruvada
726fb6b4f2 ACPI / PM: Check low power idle constraints for debug only
For SoC to achieve its lowest power platform idle state a set of hardware
preconditions must be met. These preconditions or constraints can be
obtained by issuing a device specific method (_DSM) with function "1".
Refer to the document provided in the link below.

Here during initialization (from attach() callback of LPS0 device), invoke
function 1 to get the device constraints. Each enabled constraint is
stored in a table.

The devices in this table are used to check whether they were in required
minimum state, while entering suspend. This check is done from platform
freeze wake() callback, only when /sys/power/pm_debug_messages attribute
is non zero.

If any constraint is not met and device is ACPI power managed then it
prints the device information to kernel logs.

Also if debug is enabled in acpi/sleep.c, the constraint table and state
of each device on wake is dumped in kernel logs.

Since pm_debug_messages_on setting is used as condition to check
constraints outside kernel/power/main.c, pm_debug_messages_on is changed
to a global variable.

Link: http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-18 01:54:22 +02:00
Viresh Kumar
c49cbc19b3 cpufreq: schedutil: Always process remote callback with slow switching
The frequency update from the utilization update handlers can be divided
into two parts:

(A) Finding the next frequency
(B) Updating the frequency

While any CPU can do (A), (B) can be restricted to a group of CPUs only,
depending on the current platform.

For platforms where fast cpufreq switching is possible, both (A) and (B)
are always done from the same CPU and that CPU should be capable of
changing the frequency of the target CPU.

But for platforms where fast cpufreq switching isn't possible, after
doing (A) we wake up a kthread which will eventually do (B). This
kthread is already bound to the right set of CPUs, i.e. only those which
can change the frequency of CPUs of a cpufreq policy. And so any CPU
can actually do (A) in this case, as the frequency is updated from the
right set of CPUs only.

Check cpufreq_can_do_remote_dvfs() only for the fast switching case.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-18 01:35:19 +02:00
Viresh Kumar
e2cabe48c2 cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily
Utilization update callbacks are now processed remotely, even on the
CPUs that don't share cpufreq policy with the target CPU (if
dvfs_possible_from_any_cpu flag is set).

But in non-fast switch paths, the frequency is changed only from one of
policy->related_cpus. This happens because the kthread which does the
actual update is bound to a subset of CPUs (i.e. related_cpus).

Allow frequency to be remotely updated as well (i.e. call
__cpufreq_driver_target()) if dvfs_possible_from_any_cpu flag is set.

Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-18 01:35:18 +02:00
Kees Cook
c71b02e4d2 Revert "pstore: Honor dmesg_restrict sysctl on dmesg dumps"
This reverts commit 68c4a4f8abc60c9440ede9cd123d48b78325f7a3, with
various conflict clean-ups.

The capability check required too much privilege compared to simple DAC
controls. A system builder was forced to have crash handler processes
run with CAP_SYSLOG which would give it the ability to read (and wipe)
the _current_ dmesg, which is much more access than being given access
only to the historical log stored in pstorefs.

With the prior commit to make the root directory 0750, the files are
protected by default but a system builder can now opt to give access
to a specific group (via chgrp on the pstorefs root directory) without
being forced to also give away CAP_SYSLOG.

Suggested-by: Nick Kralevich <nnk@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
2017-08-17 16:29:19 -07:00
Geert Uytterhoeven
47b4a457e4 alarmtimer: Fix unavailable wake-up source in sysfs
Currently the alarmtimer registers a wake-up source unconditionally,
regardless of the system having a (wake-up capable) RTC or not.
Hence the alarmtimer will always show up in
/sys/kernel/debug/wakeup_sources, even if it is not available, and thus
cannot be a wake-up source.

To fix this, postpone registration until a wake-up capable RTC device is
added.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2017-08-17 12:15:10 -07:00
Stafford Horne
a529bea8fa timekeeping: Use proper timekeeper for debug code
When CONFIG_DEBUG_TIMEKEEPING is enabled the timekeeping_check_update()
function will update status like last_warning and underflow_seen on the
timekeeper.

If there are issues found this state is used to rate limit the warnings
that get printed.

This rate limiting doesn't really really work if stored in real_tk as
the shadow timekeeper is overwritten onto real_tk at the end of every
update_wall_time() call, resetting last_warning and other statuses.

Fix rate limiting by using the shadow_timekeeper for
timekeeping_check_update().

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Fixes: commit 57d05a93ada7 ("time: Rework debugging variables so they aren't global")
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2017-08-17 12:15:04 -07:00
Daniel Borkmann
976d28bfd1 bpf: don't enable preemption twice in smap_do_verdict
In smap_do_verdict(), the fall-through branch leads to call
preempt_enable() twice for the SK_REDIRECT, which creates an
imbalance. Only enable it for all remaining cases again.

Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-17 10:25:18 -07:00
Daniel Borkmann
1ab2de2bfe bpf: fix liveness propagation to parent in spilled stack slots
Using parent->regs[] when propagating REG_LIVE_READ for spilled regs
doesn't work since parent->regs[] denote the set of normal registers
but not spilled ones. Propagate to the correct regs.

Fixes: dc503a8ad984 ("bpf/verifier: track liveness for pruning")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-17 10:15:20 -07:00
Paul E. McKenney
656e7c0c0a Merge branches 'doc.2017.08.17a', 'fixes.2017.08.17a', 'hotplug.2017.07.25b', 'misc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD
doc.2017.08.17a: Documentation updates.
fixes.2017.08.17a: RCU fixes.
hotplug.2017.07.25b: CPU-hotplug updates.
misc.2017.08.17a: Miscellaneous fixes outside of RCU (give or take conflicts).
spin_unlock_wait_no.2017.08.17a: Remove spin_unlock_wait().
srcu.2017.07.27c: SRCU updates.
torture.2017.07.24c: Torture-test updates.
2017-08-17 08:10:04 -07:00
Paul E. McKenney
d3a024abbc locking: Remove spin_unlock_wait() generic definitions
There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes spin_unlock_wait() and related
definitions from core code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-17 08:08:58 -07:00
Paul E. McKenney
8083f29349 exit: Replace spin_unlock_wait() with lock/unlock pair
There is no agreed-upon definition of spin_unlock_wait()'s semantics, and
it appears that all callers could do just as well with a lock/unlock pair.
This commit therefore replaces the spin_unlock_wait() call in do_exit()
with spin_lock() followed immediately by spin_unlock().  This should be
safe from a performance perspective because the lock is a per-task lock,
and this is happening only at task-exit time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-17 08:08:57 -07:00
Paul E. McKenney
dec13c42d2 completion: Replace spin_unlock_wait() with lock/unlock pair
There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore replaces the spin_unlock_wait() call in
completion_done() with spin_lock() followed immediately by spin_unlock().
This should be safe from a performance perspective because the lock
will be held only the wakeup happens really quickly.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17 08:06:44 -07:00
Mathieu Desnoyers
22e4ebb975 membarrier: Provide expedited private command
Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED with IPIs using cpumask built
from all runqueues for which current thread's mm is the same as the
thread calling sys_membarrier. It executes faster than the non-expedited
variant (no blocking). It also works on NOHZ_FULL configurations.

Scheduler-wise, it requires a memory barrier before and after context
switching between processes (which have different mm). The memory
barrier before context switch is already present. For the barrier after
context switch:

* Our TSO archs can do RELEASE without being a full barrier. Look at
  x86 spin_unlock() being a regular STORE for example.  But for those
  archs, all atomics imply smp_mb and all of them have atomic ops in
  switch_mm() for mm_cpumask(), and on x86 the CR3 load acts as a full
  barrier.

* From all weakly ordered machines, only ARM64 and PPC can do RELEASE,
  the rest does indeed do smp_mb(), so there the spin_unlock() is a full
  barrier and we're good.

* ARM64 has a very heavy barrier in switch_to(), which suffices.

* PPC just removed its barrier from switch_to(), but appears to be
  talking about adding something to switch_mm(). So add a
  smp_mb__after_unlock_lock() for now, until this is settled on the PPC
  side.

Changes since v3:
- Properly document the memory barriers provided by each architecture.

Changes since v2:
- Address comments from Peter Zijlstra,
- Add smp_mb__after_unlock_lock() after finish_lock_switch() in
  finish_task_switch() to add the memory barrier we need after storing
  to rq->curr. This is much simpler than the previous approach relying
  on atomic_dec_and_test() in mmdrop(), which actually added a memory
  barrier in the common case of switching between userspace processes.
- Return -EINVAL when MEMBARRIER_CMD_SHARED is used on a nohz_full
  kernel, rather than having the whole membarrier system call returning
  -ENOSYS. Indeed, CMD_PRIVATE_EXPEDITED is compatible with nohz_full.
  Adapt the CMD_QUERY mask accordingly.

Changes since v1:
- move membarrier code under kernel/sched/ because it uses the
  scheduler runqueue,
- only add the barrier when we switch from a kernel thread. The case
  where we switch from a user-space thread is already handled by
  the atomic_dec_and_test() in mmdrop().
- add a comment to mmdrop() documenting the requirement on the implicit
  memory barrier.

CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: gromer@google.com
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Dave Watson <davejwatson@fb.com>
2017-08-17 07:28:05 -07:00
Paul E. McKenney
16c0b10607 rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()
The rcu_idle_exit() and rcu_idle_enter() functions are exported because
they were originally used by RCU_NONIDLE(), which was intended to
be usable from modules.  However, RCU_NONIDLE() now instead uses
rcu_irq_enter_irqson() and rcu_irq_exit_irqson(), which are not
exported, and there have been no complaints.

This commit therefore removes the exports from rcu_idle_exit() and
rcu_idle_enter().

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:25 -07:00
Paul E. McKenney
d4db30af51 rcu: Add warning to rcu_idle_enter() for irqs enabled
All current callers of rcu_idle_enter() have irqs disabled, and
rcu_idle_enter() relies on this, but doesn't check.  This commit
therefore adds a RCU_LOCKDEP_WARN() to add some verification to the trust.
While we are there, pass "true" rather than "1" to rcu_eqs_enter().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:25 -07:00
Peter Zijlstra (Intel)
3a60799269 rcu: Make rcu_idle_enter() rely on callers disabling irqs
All callers to rcu_idle_enter() have irqs disabled, so there is no
point in rcu_idle_enter disabling them again.  This commit therefore
replaces the irq disabling with a RCU_LOCKDEP_WARN().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:24 -07:00
Paul E. McKenney
2dee9404fa rcu: Add assertions verifying blocked-tasks list
This commit adds assertions verifying the consistency of the rcu_node
structure's ->blkd_tasks list and its ->gp_tasks, ->exp_tasks, and
->boost_tasks pointers.  In particular, the ->blkd_tasks lists must be
empty except for leaf rcu_node structures.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:23 -07:00
Masami Hiramatsu
35fe723bda rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()
Set disable_rcu_irq_enter on not only rcu_eqs_enter_common() but also
rcu_eqs_exit(), since rcu_eqs_exit() suffers from the same issue as was
fixed for rcu_eqs_enter_common() by commit 03ecd3f48e57 ("rcu/tracing:
Add rcu_disabled to denote when rcu_irq_enter() will not work").

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:23 -07:00
Paul E. McKenney
d8db2e86d8 rcu: Add TPS() protection for _rcu_barrier_trace strings
The _rcu_barrier_trace() function is a wrapper for trace_rcu_barrier(),
which needs TPS() protection for strings passed through the second
argument.  However, it has escaped prior TPS()-ification efforts because
it _rcu_barrier_trace() does not start with "trace_".  This commit
therefore adds the needed TPS() protection

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17 07:26:22 -07:00
Luis R. Rodriguez
d5374226c3 rcu: Use idle versions of swait to make idle-hack clear
These RCU waits were set to use interruptible waits to avoid the kthreads
contributing to system load average, even though they are not interruptible
as they are spawned from a kthread. Use the new TASK_IDLE swaits which makes
our goal clear, and removes confusion about these paths possibly being
interruptible -- they are not.

When the system is idle the RCU grace-period kthread will spend all its time
blocked inside the swait_event_interruptible(). If the interruptible() was
not used, then this kthread would contribute to the load average. This means
that an idle system would have a load average of 2 (or 3 if PREEMPT=y),
rather than the load average of 0 that almost fifty years of UNIX has
conditioned sysadmins to expect.

The same argument applies to swait_event_interruptible_timeout() use. The
RCU grace-period kthread spends its time blocked inside this call while
waiting for grace periods to complete. In particular, if there was only one
busy CPU, but that CPU was frequently invoking call_rcu(), then the RCU
grace-period kthread would spend almost all its time blocked inside the
swait_event_interruptible_timeout(). This would mean that the load average
would be 2 rather than the expected 1 for the single busy CPU.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:15 -07:00
Paul E. McKenney
c5ebe66ce7 rcu: Add event tracing to ->gp_tasks update at GP start
There is currently event tracing to track when a task is preempted
within a preemptible RCU read-side critical section, and also when that
task subsequently reaches its outermost rcu_read_unlock(), but none
indicating when a new grace period starts when that grace period must
wait on pre-existing readers that have been been preempted at least once
since the beginning of their current RCU read-side critical sections.

This commit therefore adds an event trace at grace-period start in
the case where there are such readers.  Note that only the first
reader in the list is traced.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17 07:26:06 -07:00
Paul E. McKenney
7414fac050 rcu: Move rcu.h to new trivial-function style
This commit saves a few lines in kernel/rcu/rcu.h by moving to single-line
definitions for trivial functions, instead of the old style where the
two curly braces each get their own line.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:06 -07:00
Paul E. McKenney
bedbb648ef rcu: Add TPS() to event-traced strings
Strings used in event tracing need to be specially handled, for example,
using the TPS() macro.  Without the TPS() macro, although output looks
fine from within a running kernel, extracting traces from a crash dump
produces garbage instead of strings.  This commit therefore adds the TPS()
macro to some unadorned strings that were passed to event-tracing macros.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17 07:26:05 -07:00
Paul E. McKenney
ccdd29ffff rcu: Create reasonable API for do_exit() TASKS_RCU processing
Currently, the exit-time support for TASKS_RCU is open-coded in do_exit().
This commit creates exit_tasks_rcu_start() and exit_tasks_rcu_finish()
APIs for do_exit() use.  This has the benefit of confining the use of the
tasks_rcu_exit_srcu variable to one file, allowing it to become static.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:05 -07:00
Paul E. McKenney
7e42776d5e rcu: Drive TASKS_RCU directly off of PREEMPT
The actual use of TASKS_RCU is only when PREEMPT, otherwise RCU-sched
is used instead.  This commit therefore makes synchronize_rcu_tasks()
and call_rcu_tasks() available always, but mapped to synchronize_sched()
and call_rcu_sched(), respectively, when !PREEMPT.  This approach also
allows some #ifdefs to be removed from rcutorture.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
2017-08-17 07:26:04 -07:00
Boqun Feng
52fa5bc5cb locking/lockdep: Explicitly initialize wq_barrier::done::map
With the new lockdep crossrelease feature, which checks completions usage,
a false positive is reported in the workqueue code:

> Worker A : acquired of wfc.work -> wait for cpu_hotplug_lock to be released
> Task   B : acquired of cpu_hotplug_lock -> wait for lock#3 to be released
> Task   C : acquired of lock#3 -> wait for completion of barr->done
> (Task C is in lru_add_drain_all_cpuslocked())
> Worker D : wait for wfc.work to be released -> will complete barr->done

Such a dead lock can not happen because Task C's barr->done and Worker D's
barr->done can not be the same instance.

The reason of this false positive is we initialize all wq_barrier::done
at insert_wq_barrier() via init_completion(), which makes them belong to
the same lock class, therefore, impossible circles are reported.

To fix this, explicitly initialize the lockdep map for wq_barrier::done
in insert_wq_barrier(), so that the lock class key of wq_barrier::done
is a subkey of the corresponding work_struct, as a result we won't build
a dependency between a wq_barrier with a unrelated work, and we can
differ wq barriers based on the related works, so the false positive
above is avoided.

Also define the empty lockdep_init_map_crosslock() for !CROSSRELEASE
to make the code simple and away from unnecessary #ifdefs.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170817094622.12915-1-boqun.feng@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17 12:12:33 +02:00
Kees Cook
7a46ec0e2f locking/refcounts, x86/asm: Implement fast refcount overflow protection
This implements refcount_t overflow protection on x86 without a noticeable
performance impact, though without the fuller checking of REFCOUNT_FULL.

This is done by duplicating the existing atomic_t refcount implementation
but with normally a single instruction added to detect if the refcount
has gone negative (e.g. wrapped past INT_MAX or below zero). When detected,
the handler saturates the refcount_t to INT_MIN / 2. With this overflow
protection, the erroneous reference release that would follow a wrap back
to zero is blocked from happening, avoiding the class of refcount-overflow
use-after-free vulnerabilities entirely.

Only the overflow case of refcounting can be perfectly protected, since
it can be detected and stopped before the reference is freed and left to
be abused by an attacker. There isn't a way to block early decrements,
and while REFCOUNT_FULL stops increment-from-zero cases (which would
be the state _after_ an early decrement and stops potential double-free
conditions), this fast implementation does not, since it would require
the more expensive cmpxchg loops. Since the overflow case is much more
common (e.g. missing a "put" during an error path), this protection
provides real-world protection. For example, the two public refcount
overflow use-after-free exploits published in 2016 would have been
rendered unexploitable:

  http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/

  http://cyseclabs.com/page?n=02012016

This implementation does, however, notice an unchecked decrement to zero
(i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it
resulted in a zero). Decrements under zero are noticed (since they will
have resulted in a negative value), though this only indicates that a
use-after-free may have already happened. Such notifications are likely
avoidable by an attacker that has already exploited a use-after-free
vulnerability, but it's better to have them reported than allow such
conditions to remain universally silent.

On first overflow detection, the refcount value is reset to INT_MIN / 2
(which serves as a saturation value) and a report and stack trace are
produced. When operations detect only negative value results (such as
changing an already saturated value), saturation still happens but no
notification is performed (since the value was already saturated).

On the matter of races, since the entire range beyond INT_MAX but before
0 is negative, every operation at INT_MIN / 2 will trap, leaving no
overflow-only race condition.

As for performance, this implementation adds a single "js" instruction
to the regular execution flow of a copy of the standard atomic_t refcount
operations. (The non-"and_test" refcount_dec() function, which is uncommon
in regular refcount design patterns, has an additional "jz" instruction
to detect reaching exactly zero.) Since this is a forward jump, it is by
default the non-predicted path, which will be reinforced by dynamic branch
prediction. The result is this protection having virtually no measurable
change in performance over standard atomic_t operations. The error path,
located in .text.unlikely, saves the refcount location and then uses UD0
to fire a refcount exception handler, which resets the refcount, handles
reporting, and returns to regular execution. This keeps the changes to
.text size minimal, avoiding return jumps and open-coded calls to the
error reporting routine.

Example assembly comparison:

refcount_inc() before:

  .text:
  ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)

refcount_inc() after:

  .text:
  ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
  ffffffff8154614d:       0f 88 80 d5 17 00       js     ffffffff816c36d3
  ...
  .text.unlikely:
  ffffffff816c36d3:       48 8d 4d f4             lea    -0xc(%rbp),%rcx
  ffffffff816c36d7:       0f ff                   (bad)

These are the cycle counts comparing a loop of refcount_inc() from 1
to INT_MAX and back down to 0 (via refcount_dec_and_test()), between
unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL
(refcount_t-full), and this overflow-protected refcount (refcount_t-fast):

  2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s:
		    cycles		protections
  atomic_t           82249267387	none
  refcount_t-fast    82211446892	overflow, untested dec-to-zero
  refcount_t-full   144814735193	overflow, untested dec-to-zero, inc-from-zero

This code is a modified version of the x86 PAX_REFCOUNT atomic_t
overflow defense from the last public patch of PaX/grsecurity, based
on my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code. Thanks
to PaX Team for various suggestions for improvement for repurposing this
code to be a refcount-only protection.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Jann Horn <jannh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arozansk@redhat.com
Cc: axboe@kernel.dk
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170815161924.GA133115@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17 10:40:26 +02:00
Ingo Molnar
927d2c21f2 Merge branch 'linus' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17 09:41:41 +02:00
Linus Torvalds
422ce075f9 audit/stable-4.13 PR 20170816
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAlmUlmUUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQVeRaWujKfIo92hAAqbffYKqih+3VPCYg0bx7N9pCl8Ya
 k9RNxyRPv9+IxJGTrnG00x6k8GIv3hjyJIYmqGQl/GWdbZadmySazl20YI9ls47p
 7ydJAJELRPnfKFLJ9T2mqi6Az8qDtRoV2DwLCSCnsBCJdsK4wcUxtM3/qV2JGxzJ
 O2YIw4C4kuoM2SRl6weGnCUTVkdaDdHk6GcC2GClIlsjapUpNB+UieGijN/3HqHi
 YpSofAXD1lkZ4DZCM51t/3vuIlNTGSQOVvXqsVZWJv4fFR1qZbGiYuVQervYaaP2
 sRN+2OwNtdy5yUStQ5BMHT44zTc49ACizSqU3j96yzEa5H3IfMSN9U5Aa+GYIy5N
 um6qeUz7wKOto0/hBtDpabGeeBkdLZBY6L7Dt2NLTcC8vT65b8NveGj4rvVGt0b5
 REjoT0Slja4yQeER3IgUByR5H6h983Em/cjDmL6V/oLqxfOGGLkLQgKyfGoF+aSK
 DrpCWS/XiGU/Q2W3XhLSSIlJXbZ6y/dttM4tFOrk6omekLpdzdJwgo8DRz91dIZI
 vB5DAHG+Pvxw6sYFz2eAF2/3UYeEdxhAsQs8V3NJWz+7BD/AxAdfMDriGQnQ6jfU
 NIWRcCxkU/FtrqsznIqp0BkitOQ7ZwDqusUebWl34y8iNa/m2f9Jp+rvSnxq8+Zu
 Zw0EjuRyfwu2SE0=
 =tP6Y
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20170816' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit fixes from Paul Moore:
 "Two small fixes to the audit code, both explained well in the
  respective patch descriptions, but the quick summary is one
  use-after-free fix, and one silly fanotify notification flag fix"

* tag 'audit-pr-20170816' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: Receive unmount event
  audit: Fix use after free in audit_remove_watch_rule()
2017-08-16 16:48:34 -07:00
John Fastabend
6bdc9c4c31 bpf: sock_map fixes for !CONFIG_BPF_SYSCALL and !STREAM_PARSER
Resolve issues with !CONFIG_BPF_SYSCALL and !STREAM_PARSER

net/core/filter.c: In function ‘do_sk_redirect_map’:
net/core/filter.c:1881:3: error: implicit declaration of function ‘__sock_map_lookup_elem’ [-Werror=implicit-function-declaration]
   sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
   ^
net/core/filter.c:1881:6: warning: assignment makes pointer from integer without a cast [enabled by default]
   sk = __sock_map_lookup_elem(ri->map, ri->ifindex);

Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 15:34:13 -07:00
John Fastabend
cf56e3b98c bpf: sockmap state change warning fix
psock will uninitialized in default case we need to do the same psock lookup
and check as in other branch. Fixes compile warning below.

kernel/bpf/sockmap.c: In function ‘smap_state_change’:
kernel/bpf/sockmap.c:156:21: warning: ‘psock’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  struct smap_psock *psock;

Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 15:34:13 -07:00
John Fastabend
cf9d014059 bpf: devmap: remove unnecessary value size check
In the devmap alloc map logic we check to ensure that the sizeof the
values are not greater than KMALLOC_MAX_SIZE. But, in the dev map case
we ensure the value size is 4bytes earlier in the function because all
values should be netdev ifindex values.

The second check is harmless but is not needed so remove it.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:35:14 -07:00
John Fastabend
8a31db5615 bpf: add access to sock fields and pkt data from sk_skb programs
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:53 -07:00
John Fastabend
174a79ff95 bpf: sockmap with sk redirect support
Recently we added a new map type called dev map used to forward XDP
packets between ports (6093ec2dc313). This patches introduces a
similar notion for sockets.

A sockmap allows users to add participating sockets to a map. When
sockets are added to the map enough context is stored with the
map entry to use the entry with a new helper

  bpf_sk_redirect_map(map, key, flags)

This helper (analogous to bpf_redirect_map in XDP) is given the map
and an entry in the map. When called from a sockmap program, discussed
below, the skb will be sent on the socket using skb_send_sock().

With the above we need a bpf program to call the helper from that will
then implement the send logic. The initial site implemented in this
series is the recv_sock hook. For this to work we implemented a map
attach command to add attributes to a map. In sockmap we add two
programs a parse program and a verdict program. The parse program
uses strparser to build messages and pass them to the verdict program.
The parse programs use the normal strparser semantics. The verdict
program is of type SK_SKB.

The verdict program returns a verdict SK_DROP, or  SK_REDIRECT for
now. Additional actions may be added later. When SK_REDIRECT is
returned, expected when bpf program uses bpf_sk_redirect_map(), the
sockmap logic will consult per cpu variables set by the helper routine
and pull the sock entry out of the sock map. This pattern follows the
existing redirect logic in cls and xdp programs.

This gives the flow,

 recv_sock -> str_parser (parse_prog) -> verdict_prog -> skb_send_sock
                                                     \
                                                      -> kfree_skb

As an example use case a message based load balancer may use specific
logic in the verdict program to select the sock to send on.

Sample programs are provided in future patches that hopefully illustrate
the user interfaces. Also selftests are in follow-on patches.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:53 -07:00
John Fastabend
a6f6df69c4 bpf: export bpf_prog_inc_not_zero
bpf_prog_inc_not_zero will be used by upcoming sockmap patches this
patch simply exports it so we can pull it in.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:53 -07:00
Steven Rostedt
9c8783201c sched/completion: Document that reinit_completion() must be called after complete_all()
The complete_all() function modifies the completion's "done" variable to
UINT_MAX, and no other caller (wait_for_completion(), etc) will modify
it back to zero. That means that any call to complete_all() must have a
reinit_completion() before that completion can be used again.

Document this fact by the complete_all() function.

Also document that completion_done() will always return true if
complete_all() is called.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170816131202.195c2f4b@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-16 20:08:10 +02:00
Thomas Gleixner
7cb2fad97e Merge branch 'irq/for-gpio' into irq/core
Merge the irq simulator which is in a separate branch so it can be consumed
by the gpio folks.
2017-08-16 16:41:28 +02:00
Bartosz Golaszewski
44e72c7ebf genirq/irq_sim: Add a devres variant of irq_sim_init()
Add a resource managed version of irq_sim_init(). This can be
conveniently used in device drivers.

Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-gpio@vger.kernel.org
Cc: Bamvor Jian Zhang <bamvor.zhangjian@linaro.org>
Cc: Jonathan Cameron <jic23@kernel.org>
Link: http://lkml.kernel.org/r/20170814145318.6495-3-brgl@bgdev.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-08-16 16:40:02 +02:00
Bartosz Golaszewski
b19af510e6 genirq/irq_sim: Add a simple interrupt simulator framework
Implement a simple, irq_work-based framework for simulating
interrupts. Currently the API exposes routines for initializing and
deinitializing the simulator object, enqueueing the interrupts and
retrieving the allocated interrupt numbers based on the offset of the
dummy interrupt in the simulator struct.

Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-gpio@vger.kernel.org
Cc: Bamvor Jian Zhang <bamvor.zhangjian@linaro.org>
Cc: Jonathan Cameron <jic23@kernel.org>
Link: http://lkml.kernel.org/r/20170814145318.6495-2-brgl@bgdev.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-08-16 16:40:02 +02:00
David S. Miller
463910e2df Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-08-15 20:23:23 -07:00