21354 Commits

Author SHA1 Message Date
Rafael J. Wysocki
ef25ba0476 PM / sleep: Add flags to indicate platform firmware involvement
There are quite a few cases in which device drivers, bus types or
even the PM core itself may benefit from knowing whether or not
the platform firmware will be involved in the upcoming system power
transition (during system suspend) or whether or not it was involved
in it (during system resume).

For this reason, introduce global system suspend flags that can be
used by the platform code to expose that information for the benefit
of the other parts of the kernel and make the ACPI core set them
as appropriate.

Users of the new flags will be added later.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-14 02:17:33 +02:00
Marc Zyngier
be5436c83a irqdomain/msi: Use fwnode instead of of_node
As we continue to push of_node towards the outskirts of irq domains,
let's start tackling the case of msi_create_irq_domain and its little
friends.

This has limited impact in both PCI/MSI, platform MSI, and a few
drivers.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Graeme Gregory <graeme@xora.org.uk>
Cc: Jake Oshins <jakeo@microsoft.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/1444737105-31573-17-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-13 19:01:25 +02:00
Marc Zyngier
2a5e9a072d irqdomain: Introduce irq_domain_create_hierarchy
As we're about to start converting the various MSI layers to
use fwnode_handle instead of device_node, add irq_domain_create_hierarchy
as a directly equivalent of irq_domain_add_hierarchy (which still
exists as a compatibility interface).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Graeme Gregory <graeme@xora.org.uk>
Cc: Jake Oshins <jakeo@microsoft.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/1444737105-31573-16-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-13 19:01:25 +02:00
Marc Zyngier
b145dcc45a irqdomain: Add a fwnode_handle allocator
In order to be able to reference an irqdomain from ACPI, we need
to be able to create an identifier, which is usually a struct
device_node.

This device node does't really fit the ACPI infrastructure, so
we cunningly allocate a new structure containing a fwnode_handle,
and return that.

This structure doesn't really point to a device (interrupt
controllers are not "real" devices in Linux), but as we cannot
really deny that they exist, we create them with a new fwnode_type
(FWNODE_IRQCHIP).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Graeme Gregory <graeme@xora.org.uk>
Cc: Jake Oshins <jakeo@microsoft.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/1444737105-31573-9-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-13 19:01:24 +02:00
Marc Zyngier
1bf4ddc46c irqdomain: Introduce irq_domain_create_{linear, tree}
Just like we have irq_domain_add_{linear,tree} to create a irq domain
identified by an of_node, introduce irq_domain_create_{linear,tree}
that do the same thing, except that they take a struct fwnode_handle.

Existing functions get rewritten in terms of the new ones so that
everything keeps working as before (and __irq_domain_add is now
fwnode_handle based as well).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Graeme Gregory <graeme@xora.org.uk>
Cc: Jake Oshins <jakeo@microsoft.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/1444737105-31573-8-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-13 19:01:24 +02:00
Marc Zyngier
c0131f09de irqdomain: Introduce irq_create_fwspec_mapping
Just like we have irq_create_of_mapping, irq_create_fwspec_mapping
creates a IRQ domain mapping for an interrupt described in a
struct irq_fwspec.

irq_create_of_mapping gets rewritten in terms of the new function,
and the hack we introduced before gets removed (now that no stacked
irqchip uses of_phandle_args anymore).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Graeme Gregory <graeme@xora.org.uk>
Cc: Jake Oshins <jakeo@microsoft.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/1444737105-31573-7-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-13 19:01:24 +02:00
Marc Zyngier
11e4438ee3 irqdomain: Introduce a firmware-specific IRQ specifier structure
So far the closest thing to a generic IRQ specifier structure is
of_phandle_args, which happens to be pretty OF specific (the of_node
pointer in there is quite annoying).

Let's introduce 'struct irq_fwspec' that can be used in place of
of_phandle_args for OF, but also for other firmware implementations
(that'd be ACPI). This is used together with a new 'translate' method
that is the pendent of 'xlate'.

We convert irq_create_of_mapping to use this new structure (with a
small hack that will be removed later).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Graeme Gregory <graeme@xora.org.uk>
Cc: Jake Oshins <jakeo@microsoft.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/1444737105-31573-5-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-13 19:01:23 +02:00
Marc Zyngier
130b8c6c8d irqdomain: Allow irq domain lookup by fwnode
So far, our irq domains are still looked up by device node.
Let's change this and allow a domain to be looked up using
a fwnode_handle pointer.

The existing interfaces are preserved with a couple of helpers.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Graeme Gregory <graeme@xora.org.uk>
Cc: Jake Oshins <jakeo@microsoft.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/1444737105-31573-4-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-13 19:01:23 +02:00
Marc Zyngier
f110711a60 irqdomain: Convert irqdomain-%3Eof_node to fwnode
Now that we have everyone accessing the of_node field via the
irq_domain_get_of_node accessor, it is pretty easy to swap it
for a pointer to a fwnode_handle.

This translates into a few limited changes in __irq_domain_add,
and an updated irq_domain_get_of_node.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Graeme Gregory <graeme@xora.org.uk>
Cc: Jake Oshins <jakeo@microsoft.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/1444737105-31573-3-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-13 19:01:23 +02:00
Marc Zyngier
5d4c9bc776 irqdomain: Use irq_domain_get_of_node() instead of direct field access
The struct irq_domain contains a "struct device_node *" field
(of_node) that is almost the only link between the irqdomain
and the device tree infrastructure.

In order to prepare for the removal of that field, convert all
users to use irq_domain_get_of_node() instead.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Graeme Gregory <graeme@xora.org.uk>
Cc: Jake Oshins <jakeo@microsoft.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/1444737105-31573-2-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-13 19:01:23 +02:00
Thomas Gleixner
e50226b4b8 Merge branch 'linus' into irq/core
Bring in upstream updates for patches which depend on them
2015-10-13 19:00:14 +02:00
Alexei Starovoitov
aaac3ba95e bpf: charge user for creation of BPF maps and programs
since eBPF programs and maps use kernel memory consider it 'locked' memory
from user accounting point of view and charge it against RLIMIT_MEMLOCK limit.
This limit is typically set to 64Kbytes by distros, so almost all
bpf+tracing programs would need to increase it, since they use maps,
but kernel charges maximum map size upfront.
For example the hash map of 1024 elements will be charged as 64Kbyte.
It's inconvenient for current users and changes current behavior for root,
but probably worth doing to be consistent root vs non-root.

Similar accounting logic is done by mmap of perf_event.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:13:36 -07:00
Alexei Starovoitov
1be7f75d16 bpf: enable non-root eBPF programs
In order to let unprivileged users load and execute eBPF programs
teach verifier to prevent pointer leaks.
Verifier will prevent
- any arithmetic on pointers
  (except R10+Imm which is used to compute stack addresses)
- comparison of pointers
  (except if (map_value_ptr == 0) ... )
- passing pointers to helper functions
- indirectly passing pointers in stack to helper functions
- returning pointer from bpf program
- storing pointers into ctx or maps

Spill/fill of pointers into stack is allowed, but mangling
of pointers stored in the stack or reading them byte by byte is not.

Within bpf programs the pointers do exist, since programs need to
be able to access maps, pass skb pointer to LD_ABS insns, etc
but programs cannot pass such pointer values to the outside
or obfuscate them.

Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs,
so that socket filters (tcpdump), af_packet (quic acceleration)
and future kcm can use it.
tracing and tc cls/act program types still require root permissions,
since tracing actually needs to be able to see all kernel pointers
and tc is for root only.

For example, the following unprivileged socket filter program is allowed:
int bpf_prog1(struct __sk_buff *skb)
{
  u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
  u64 *value = bpf_map_lookup_elem(&my_map, &index);

  if (value)
	*value += skb->len;
  return 0;
}

but the following program is not:
int bpf_prog1(struct __sk_buff *skb)
{
  u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
  u64 *value = bpf_map_lookup_elem(&my_map, &index);

  if (value)
	*value += (u64) skb;
  return 0;
}
since it would leak the kernel address into the map.

Unprivileged socket filter bpf programs have access to the
following helper functions:
- map lookup/update/delete (but they cannot store kernel pointers into them)
- get_random (it's already exposed to unprivileged user space)
- get_smp_processor_id
- tail_call into another socket filter program
- ktime_get_ns

The feature is controlled by sysctl kernel.unprivileged_bpf_disabled.
This toggle defaults to off (0), but can be set true (1).  Once true,
bpf programs and maps cannot be accessed from unprivileged process,
and the toggle cannot be set back to false.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:13:35 -07:00
Rafael J. Wysocki
a5e22db268 Merge back earlier 'pm-sleep' material for v4.4. 2015-10-12 22:30:57 +02:00
Xunlei Pang
e2273584d3 workqueue: Allocate the unbound pool using local node memory
Currently, get_unbound_pool() uses kzalloc() to allocate the
worker pool. Actually, we can use the right node to do the
allocation, achieving local memory access.

This patch selects target node first, and uses kzalloc_node()
instead.

Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2015-10-12 12:17:31 -04:00
Ingo Molnar
b9f27c0f4f Linux 4.3-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWGqXxAAoJEHm+PkMAQRiGP80H+wX+0PTYpCNaH09LpvlRVZvM
 vG3RnqIYQ8cyOrJJEoVqLznm4vRmw3GrbHxhzDEVrX1beXNVRJdqyZUOttQuDuyS
 A/NWGaZCcu45VcL08NVMCqYv9D7HwDxe5WOhXigo2QX4nlbmJsBoU24ibV35nGYT
 2xdhEwHIH3X+qlzp8Mya6mheYHO+eZ6C+jSy7dYjQoXto0Acz6SoGC6/lhsV4biw
 ENRTJY7y+wzG6ND2PrjF6QV0SwCDAU/f7KcYe01+wm74/uCLYgQuUOPRUeu6ydfN
 Li6CNwN8NzcimLTF4zmQWBte8SkQDVM9LeC8Eyoz2aUYzq7hf6fTfihmQSZnQtQ=
 =vDtg
 -----END PGP SIGNATURE-----

Merge tag 'v4.3-rc5' into timers/core, to pick up fixes before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-12 09:51:18 +02:00
Daniel Bristot de Oliveira
9babcd7929 sched, tracing: Stop/start critical timings around the idle=poll idle loop
When using idle=poll, the preemptoff tracer is always showing
the idle task as the culprit for long latencies. That happens
because critical timings are not stopped before idle loop. This
patch stops critical timings before entering the idle loop,
starting it again after the idle loop.

This problem does not affect the irqsoff tracer because
interruptions are enabled before entering the idle loop.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Reviewed-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/10fc3705874aef11dbe152a068b591a7be1899b4.1444314899.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-12 09:45:25 +02:00
Rasmus Villemoes
9fc4468d54 timers: Use __fls in apply_slack()
In apply_slack(), find_last_bit() is applied to a bitmask consisting
of precisely BITS_PER_LONG bits. Since mask is non-zero, we might as
well eliminate the function call and use __fls() directly. On x86_64,
this shaves 23 bytes of the only caller, mod_timer().

This also gets rid of Coverity CID 1192106, but that is a false
positive: Coverity is not aware that mask != 0 implies that
find_last_bit will not return BITS_PER_LONG.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1443771931-6284-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-11 22:13:46 +02:00
Guillaume Gomez
cfed432d7f clocksource: Remove return statement from void functions
Signed-off-by: Guillaume Gomez <guillaume1.gomez@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/CAAOQCfSDgmqSWDBsetau%2ByF8x0%2BDagCF_pfFw0p5xH_BKkKEog@mail.gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-11 22:13:46 +02:00
Linus Torvalds
9a78f9c3c6 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
 "Fix a long standing state race in finish_task_switch()"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Fix TASK_DEAD race in finish_task_switch()
2015-10-11 10:24:32 -07:00
Alexei Starovoitov
ff936a04e5 bpf: fix cb access in socket filter programs
eBPF socket filter programs may see junk in 'u32 cb[5]' area,
since it could have been used by protocol layers earlier.

For socket filter programs used in af_packet we need to clean
20 bytes of skb->cb area if it could be used by the program.
For programs attached to TCP/UDP sockets we need to save/restore
these 20 bytes, since it's used by protocol layers.

Remove SK_RUN_FILTER macro, since it's no longer used.

Long term we may move this bpf cb area to per-cpu scratch, but that
requires addition of new 'per-cpu load/store' instructions,
so not suitable as a short term fix.

Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-11 04:40:05 -07:00
Thomas Gleixner
e9849777d0 genirq: Add flag to force mask in disable_irq[_nosync]()
If an irq chip does not implement the irq_disable callback, then we
use a lazy approach for disabling the interrupt. That means that the
interrupt is marked disabled, but the interrupt line is not
immediately masked in the interrupt chip. It only becomes masked if
the interrupt is raised while it's marked disabled. We use this to avoid
possibly expensive mask/unmask operations for common case operations.

Unfortunately there are devices which do not allow the interrupt to be
disabled easily at the device level. They are forced to use
disable_irq_nosync(). This can result in taking each interrupt twice.

Instead of enforcing the non lazy mode on all interrupts of a irq
chip, provide a settings flag, which can be set by the driver for that
particular interrupt line.

Reported-and-tested-by: Duc Dang <dhdang@apm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1510092348370.6097@nanos
2015-10-11 11:33:42 +02:00
Feng Wu
fcf1ae2f7a genirq: Make irq_set_vcpu_affinity available for CONFIG_SMP=n
irq_set_vcpu_affinity() is needed when CONFIG_SMP=n, so move the
definition out of "#ifdef CONFIG_SMP"

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Cc: jiang.liu@linux.intel.com
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1443860438-144926-1-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-09 22:47:27 +02:00
Mika Westerberg
e509bd7da1 genirq: Allow migration of chained interrupts by installing default action
When a CPU is offlined all interrupts that have an action are migrated to
other still online CPUs. However, if the interrupt has chained handler
installed this is not done. Chained handlers are used by GPIO drivers which
support interrupts, for instance.

When the affinity is not corrected properly we end up in situation where
most interrupts are not arriving to the online CPUs anymore. For example on
Intel Braswell system which has SD-card card detection signal connected to
a GPIO the IO-APIC routing entries look like below after CPU1 is offlined:

  pin30, enabled , level, low , V(52), IRR(0), S(0), logical , D(03), M(1)
  pin31, enabled , level, low , V(42), IRR(0), S(0), logical , D(03), M(1)
  pin32, enabled , level, low , V(62), IRR(0), S(0), logical , D(03), M(1)
  pin5b, enabled , level, low , V(72), IRR(0), S(0), logical , D(03), M(1)

The problem here is that the destination mask still contains both CPUs even
if CPU1 is already offline. This means that the IO-APIC still routes
interrupts to the other CPU as well.

We solve the problem by providing a default action for chained interrupts.
This action allows the migration code to correct affinity (as it finds
desc->action != NULL).

Also make the default action handler to emit a warning if for some reason a
chained handler ends up calling it.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/1444039935-30475-1-git-send-email-mika.westerberg@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-09 22:47:27 +02:00
Catalin Marinas
a78afccbba Merge branch 'irq/for-arm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'irq/for-arm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Introduce generic irq migration for cpu hotunplug
2015-10-09 16:47:34 +01:00
Arnd Bergmann
e3096c9c7c genirq: Fix handle_bad_irq kerneldoc comment
A recent cleanup removed the 'irq' parameter from many functions, but
left the documentation for this in place for at least one function.

This removes it.

Fixes: bd0b9ac405e1 ("genirq: Remove irq argument from irq flow handlers")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: kbuild-all@01.org
Cc: Austin Schuh <austin@peloton-tech.com>
Cc: Santosh Shilimkar <ssantosh@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/5400000.cD19rmgWjV@wuerfel
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-09 17:17:30 +02:00
Arnd Bergmann
9d67dc5da5 genirq: Export handle_bad_irq
A cleanup of the omap gpio driver introduced a use of the
handle_bad_irq() function in a device driver that can be
a loadable module.

This broke the ARM allmodconfig build:

ERROR: "handle_bad_irq" [drivers/gpio/gpio-omap.ko] undefined!

This patch exports the handle_bad_irq symbol in order to
allow the use in modules.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Santosh Shilimkar <ssantosh@kernel.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Austin Schuh <austin@peloton-tech.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/5847725.4IBopItaOr@wuerfel
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-09 17:17:30 +02:00
Daniel Borkmann
3ad0040573 bpf: split state from prandom_u32() and consolidate {c, e}BPF prngs
While recently arguing on a seccomp discussion that raw prandom_u32()
access shouldn't be exposed to unpriviledged user space, I forgot the
fact that SKF_AD_RANDOM extension actually already does it for some time
in cBPF via commit 4cd3675ebf74 ("filter: added BPF random opcode").

Since prandom_u32() is being used in a lot of critical networking code,
lets be more conservative and split their states. Furthermore, consolidate
eBPF and cBPF prandom handlers to use the new internal PRNG. For eBPF,
bpf_get_prandom_u32() was only accessible for priviledged users, but
should that change one day, we also don't want to leak raw sequences
through things like eBPF maps.

One thought was also to have own per bpf_prog states, but due to ABI
reasons this is not easily possible, i.e. the program code currently
cannot access bpf_prog itself, and copying the rnd_state to/from the
stack scratch space whenever a program uses the prng seems not really
worth the trouble and seems too hacky. If needed, taus113 could in such
cases be implemented within eBPF using a map entry to keep the state
space, or get_random_bytes() could become a second helper in cases where
performance would not be critical.

Both sides can trigger a one-time late init via prandom_init_once() on
the shared state. Performance-wise, there should even be a tiny gain
as bpf_user_rnd_u32() saves one function call. The PRNG needs to live
inside the BPF core since kernels could have a NET-less config as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Chema Gonzalez <chema@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 05:26:39 -07:00
Ingo Molnar
d3df65c198 Merge branch 'perf/urgent' into perf/core, before pulling new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-08 10:52:18 +02:00
Paul E. McKenney
39cd2dd39a Merge branches 'doc.2015.10.06a', 'percpu-rwsem.2015.10.06a' and 'torture.2015.10.06a' into HEAD
doc.2015.10.06a:  Documentation updates.
percpu-rwsem.2015.10.06a:  Optimization of per-CPU reader-writer semaphores.
torture.2015.10.06a:  Torture-test updates.
2015-10-07 16:06:25 -07:00
Paul E. McKenney
d2856b046d Merge branches 'fixes.2015.10.06a' and 'exp.2015.10.07a' into HEAD
exp.2015.10.07a:  Reduce OS jitter of RCU-sched expedited grace periods.
fixes.2015.10.06a:  Miscellaneous fixes.
2015-10-07 16:05:21 -07:00
Paul E. McKenney
338b0f760e rcu: Better hotplug handling for synchronize_sched_expedited()
Earlier versions of synchronize_sched_expedited() can prematurely end
grace periods due to the fact that a CPU marked as cpu_is_offline()
can still be using RCU read-side critical sections during the time that
CPU makes its last pass through the scheduler and into the idle loop
and during the time that a given CPU is in the process of coming online.
This commit therefore eliminates this window by adding additional
interaction with the CPU-hotplug operations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
b08517c76d rcu: Enable stall warnings for synchronize_rcu_expedited()
This commit redirects synchronize_rcu_expedited()'s wait to
synchronize_sched_expedited_wait(), thus enabling RCU CPU
stall warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
c58656382e rcu: Add tasks to expedited stall-warning messages
This commit adds task-print ability to the expedited RCU CPU stall
warning messages in preparation for adding stall warnings to
synchornize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
74611ecb0f rcu: Add online/offline info to expedited stall warning message
This commit makes the RCU CPU stall warning message print online/offline
indications immediately after the CPU number.  A "O" indicates global
offline, a "." global online, and a "o" indicates RCU believes that the
CPU is offline for the current grace period and "." otherwise, and an
"N" indicates that RCU believes that the CPU will be offline for the
next grace period, and "." otherwise, all right after the CPU number.
So for CPU 10, you would normally see "10-...:" indicating that everything
believes that the CPU is online.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
dcdb8807ba rcu: Consolidate expedited CPU selection
Now that sync_sched_exp_select_cpus() and sync_rcu_exp_select_cpus()
are identical aside from the the argument to smp_call_function_single(),
this commit consolidates them with a functional argument.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
66fe6cbee4 rcu: Prepare for consolidating expedited CPU selection
This commit brings sync_sched_exp_select_cpus() into alignment with
sync_rcu_exp_select_cpus(), as a first step towards consolidating them
into one function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
02ef3c4a2a cpu: Remove try_get_online_cpus()
Now that synchronize_sched_expedited() no longer uses it, there are
no users of try_get_online_cpus() in mainline.  This commit therefore
removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
2015-10-07 16:02:49 -07:00
Paul E. McKenney
807226e2fb rcu: Stop excluding CPU hotplug in synchronize_sched_expedited()
Now that synchronize_sched_expedited() uses IPIs, a hook in
rcu_sched_qs(), and the ->expmask field in the rcu_node combining
tree, it is no longer necessary to exclude CPU hotplug.  Any
races with CPU hotplug will be detected when attempting to send
the IPI.  This commit therefore removes the code excluding
CPU hotplug operations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:49 -07:00
Paul E. McKenney
83c2c735e7 rcu: Stop silencing lockdep false positive for expedited grace periods
This reverts commit af859beaaba4 (rcu: Silence lockdep false positive
for expedited grace periods).  Because synchronize_rcu_expedited()
no longer invokes synchronize_sched_expedited(), ->exp_funnel_mutex
acquisition is no longer nested, so the false positive no longer happens.
This commit therefore removes the extra lockdep data structures, as they
are no longer needed.
2015-10-07 16:02:49 -07:00
Paul E. McKenney
6587a23b6b rcu: Switch synchronize_sched_expedited() to IPI
This commit switches synchronize_sched_expedited() from stop_one_cpu_nowait()
to smp_call_function_single(), thus moving from an IPI and a pair of
context switches to an IPI and a single pass through the scheduler.
Of course, if the scheduler actually does decide to switch to a different
task, there will still be a pair of context switches, but there would
likely have been a pair of context switches anyway, just a bit later.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:01:12 -07:00
Paul E. McKenney
a36a99618b locktorture: Fix module unwind when bad torture_type specified
The locktorture module has a list of torture types, and specifying
a type not on this list is supposed to cleanly fail the module load.
Unfortunately, the "fail" happens without the "cleanly".  This commit
therefore adds the needed clean-up after an incorrect torture_type.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:28:44 -07:00
Paul E. McKenney
4f441a258f rcutorture: Fix unused-function warning for torturing_tasks()
The torturing_tasks() function is used only in kernels built with
CONFIG_PROVE_RCU=y, so the second definition can result in unused-function
compiler warnings.  This commit adds __maybe_unused to suppress these
warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:28:09 -07:00
Paul E. McKenney
889d487a26 rcutorture: Fix module unwind when bad torture_type specified
The rcutorture module has a list of torture types, and specifying a
type not on this list is supposed to cleanly fail the module load.
Unfortunately, the "fail" happens without the "cleanly".  This commit
therefore adds the needed clean-up after an incorrect torture_type.

Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Miller <davem@davemloft.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:28:01 -07:00
Oleg Nesterov
4bace7344d rcu_sync: Cleanup the CONFIG_PROVE_RCU checks
1. Rename __rcu_sync_is_idle() to rcu_sync_lockdep_assert() and
   change it to use rcu_lockdep_assert().

2. Change rcu_sync_is_idle() to return rsp->gp_state == GP_IDLE
   unconditonally, this way we can remove the same check from
   rcu_sync_lockdep_assert() and clearly isolate the debugging
   code.

Note: rcu_sync_enter()->wait_event(gp_state == GP_PASSED) needs
another CONFIG_PROVE_RCU check, the same as is done in ->sync(); but
this needs some simple preparations in the core RCU code to avoid the
code duplication.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:45 -07:00
Oleg Nesterov
cc5f730b41 locking/percpu-rwsem: Clean up the lockdep annotations in percpu_down_read()
Based on Peter Zijlstra's earlier patch.

Change percpu_down_read() to use __down_read(), this way we can
do rwsem_acquire_read() unconditionally at the start to make this
code more symmetric and clean.

Originally-From: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:40 -07:00
Oleg Nesterov
f324a76324 locking/percpu-rwsem: Fix the comments outdated by rcu_sync
Update the comments broken by the previous change.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:36 -07:00
Oleg Nesterov
001dac627f locking/percpu-rwsem: Make use of the rcu_sync infrastructure
Currently down_write/up_write calls synchronize_sched_expedited()
twice, which is evil.  Change this code to rely on rcu-sync primitives.
This avoids the _expedited "big hammer", and this can be faster in
the contended case or even in the case when a single thread does
down_write/up_write in a loop.

Of course, a single down_write() will take more time, but otoh it
will be much more friendly to the whole system.

To simplify the review this patch doesn't update the comments, fixed
by the next change.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:31 -07:00
Oleg Nesterov
95b19f684c locking/percpu-rwsem: Make percpu_free_rwsem() after kzalloc() safe
This is the temporary ugly hack which will be reverted later. We only
need it to ensure that the next patch will not break "change sb_writers
to use percpu_rw_semaphore" patches routed via the VFS tree.

The alloc_super()->destroy_super() error path assumes that it is safe
to call percpu_free_rwsem() after kzalloc() without percpu_init_rwsem(),
so let's not disappoint it.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:26 -07:00
Oleg Nesterov
07899a6e5f rcu_sync: Introduce rcu_sync_dtor()
This commit allows rcu_sync structures to be safely deallocated,
The trick is to add a new ->wait field to the gp_ops array.
This field is a pointer to the rcu_barrier() function corresponding
to the flavor of RCU in question.  This allows a new rcu_sync_dtor()
to wait for any outstanding callbacks before freeing the rcu_sync
structure.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:21 -07:00