16007 Commits

Author SHA1 Message Date
Ingo Molnar
ca02c21674 Better comments so we understand our existing machine check
bank bitmaps - prelude to adding another bitmap soon.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRygQkAAoJEKurIx+X31iB5RsP/R6w/X4wbKlXD8K8e2qhSXU2
 oYVtaMN46M+xLRdNDdE1Y+clVKDQTuapfrlOtBLZVHIkNfqF5jAtu2O1wTERpx+F
 7lDHuWuFbddbqV2vqE6RJ6ZqsUTsrlzrqLSLWGSoHn36DlnkSm+L5vDDG75QzV+w
 fPpGpJTM3kRxPUHnmwvfniJxTiBjsli6FDZ1vGq4Ingu+xxOJDU6+FdWts08hn4Y
 +KLjs1JMJSdo3di05T6t4ARKBgW3B1lJnrIS6fGYMmax+kHQqO/zvzHs3A86LZyN
 7L0toEoZHa8VRMN6C1mw0XsFwPmyMOsrHrRpBlFgHpC7QLAzx6LkxchyDBqPL0mo
 W4bnoyu1QZUvblq1mrsnJa9yeyMjSHAeq2XTBj8Pbv20AKzmCbNl3aJ5tCDhPj4Y
 A+e8vk/tq8zWCjdf3SV/D+wjgEtkeWALZZj70maufu9AKtKXy5repa6DZCKEhyIC
 yh72c3gZ25IkxjwcHmJh+CYaKll9MgdW5toZkftBin2iOdWSaHS9QX2r4I5Awvbi
 m0Iwq5Fg8DSs3AhBLxiJi8L7N+RNa+7GoTH0z6UDtfAY3ExvepfwjPzLxAfTbPw6
 oK2wLfuPiNizl2DX1yNlizGD2YSRiWKoap3Iog84A3IMTIe1nJ+97GHvsVcGna8I
 zXjnZDNQThAgBswtNY1s
 =U/Kg
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce-bitmap-comment' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull MCE updates from Tony Luck:

 "Better comments so we understand our existing machine check
  bank bitmaps - prelude to adding another bitmap soon."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 10:53:45 +02:00
Colin Cross
88c8004fd3 futex: Use freezable blocking call
Avoid waking up every thread sleeping in a futex_wait call during
suspend and resume by calling a freezable blocking call.  Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.

This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.

Signed-off-by: Colin Cross <ccross@android.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: arve@android.com
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: http://lkml.kernel.org/r/1367458508-9133-8-git-send-email-ccross@android.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-25 23:11:19 +02:00
Zhang Yi
13d60f4b6a futex: Take hugepages into account when generating futex_key
The futex_keys of process shared futexes are generated from the page
offset, the mapping host and the mapping index of the futex user space
address. This should result in an unique identifier for each futex.

Though this is not true when futexes are located in different subpages
of an hugepage. The reason is, that the mapping index for all those
futexes evaluates to the index of the base page of the hugetlbfs
mapping. So a futex at offset 0 of the hugepage mapping and another
one at offset PAGE_SIZE of the same hugepage mapping have identical
futex_keys. This happens because the futex code blindly uses
page->index.

Steps to reproduce the bug:

1. Map a file from hugetlbfs. Initialize pthread_mutex1 at offset 0
   and pthread_mutex2 at offset PAGE_SIZE of the hugetlbfs
   mapping.

   The mutexes must be initialized as PTHREAD_PROCESS_SHARED because
   PTHREAD_PROCESS_PRIVATE mutexes are not affected by this issue as
   their keys solely depend on the user space address.

2. Lock mutex1 and mutex2

3. Create thread1 and in the thread function lock mutex1, which
   results in thread1 blocking on the locked mutex1.

4. Create thread2 and in the thread function lock mutex2, which
   results in thread2 blocking on the locked mutex2.

5. Unlock mutex2. Despite the fact that mutex2 got unlocked, thread2
   still blocks on mutex2 because the futex_key points to mutex1.

To solve this issue we need to take the normal page index of the page
which contains the futex into account, if the futex is in an hugetlbfs
mapping. In other words, we calculate the normal page mapping index of
the subpage in the hugetlbfs mapping.

Mappings which are not based on hugetlbfs are not affected and still
use page->index.

Thanks to Mel Gorman who provided a patch for adding proper evaluation
functions to the hugetlbfs code to avoid exposing hugetlbfs specific
details to the futex code.

[ tglx: Massaged changelog ]

Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
Reviewed-by: 'Mel Gorman' <mgorman@suse.de>
Acked-by: 'Darren Hart' <dvhart@linux.intel.com>
Cc: 'Peter Zijlstra' <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/000101ce71a6%24a83c5880%24f8b50980%24@com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-25 23:11:19 +02:00
Tejun Heo
fc76df7061 cgroup: reserve ID 0 for dummy_root and 1 for unified hierarchy
Before 1a57423166 ("cgroup: make hierarchy_id use cyclic idr"),
hierarchy IDs were allocated from 0.  As the dummy hierarchy was
always the one first initialized, it got assigned 0 and all other
hierarchies from 1.  The patch accidentally changed the minimum
useable ID to 2.

Let's restore ID 0 for dummy_root and while at it reserve 1 for
unified hierarchy.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org
2013-06-25 11:53:37 -07:00
Tejun Heo
30159ec7a9 cgroup: implement for_each_[builtin_]subsys()
There are quite a few places where all loaded [builtin] subsys are
iterated.  Implement for_each_[builtin_]subsys() and replace manual
iterations with those to simplify those places a bit.  The new
iterators automatically skip NULL subsystems.  This shouldn't cause
any functional difference.

Iteration loops which scan all subsystems and then skipping modular
ones explicitly are converted to use for_each_builtin_subsys().

While at it, reorder variable declarations and adjust whitespaces a
bit in the affected functions.

v2: Add lockdep_assert_held() in for_each_subsys() and add comments
    about synchronization as suggested by Li.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-25 11:53:37 -07:00
Tejun Heo
82fe9b0da0 cgroup: move init_css_set initialization inside cgroup_mutex
cgroup_init() was doing init_css_set initialization outside
cgroup_mutex, which is fine but we want to add lockdep annotation on
subsystem iterations and cgroup_init() will trigger it spuriously.
Move init_css_set initialization inside cgroup_mutex.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-25 11:53:37 -07:00
Javier Martinez Canillas
fbab62c5cd irqdomain: Use irq_get_trigger_type() to get IRQ flags
Use irq_get_trigger_type() to get the IRQ trigger type flags
instead calling irqd_get_trigger_type(irq_desc_get_irq_data(virq))

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Acked-by: Grant Likely <grant.likely@linaro.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Link: http://lkml.kernel.org/r/1371228049-27080-8-git-send-email-javier.martinez@collabora.co.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-25 11:48:25 +02:00
Tejun Heo
5549c49791 cgroup: s/for_each_subsys()/for_each_root_subsys()/
for_each_subsys() walks over subsystems attached to a hierarchy and
we're gonna add iterators which walk over all available subsystems.
Rename for_each_subsys() to for_each_root_subsys() so that it's more
appropriately named and for_each_subsys() can be used to iterate all
subsystems.

While at it, remove unnecessary underbar prefix from macro arguments,
put them inside parentheses, and adjust indentation for the two
for_each_*() macros.

This patch is purely cosmetic.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-24 15:21:48 -07:00
Tejun Heo
b326f9d0db cgroup: clean up find_css_set() and friends
find_css_set() passes uninitialized on-stack template[] array to
find_existing_css_set() which sets the entries for all subsystems.
Passing around an uninitialized array is a bit icky and we want to
introduce an iterator which only iterates loaded subsystems.  Let's
initialize it on definition.

While at it, also make the following cosmetic cleanups.

* Convert to proper /** comments.

* Reorder variable declarations.

* Replace comment on synchronization with lockdep_assert_held().

This patch doesn't make any functional differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-24 15:21:48 -07:00
Tejun Heo
a8a648c4ac cgroup: remove cgroup->actual_subsys_mask
cgroup curiously has two subsystem masks, ->subsys_mask and
->actual_subsys_mask.  The latter only exists because the new target
subsys_mask is passed into rebind_subsystems() via @root>subsys_mask.
rebind_subsystems() needs to know what the current mask is to decide
how to reach the target mask so ->actual_subsys_mask is used as the
temp location to remember the current state.

Adding a temporary field to a permanent data structure is rather silly
and can be misleading.  Update rebind_subsystems() to take @added_mask
and @removed_mask instead and remove @root->actual_subsys_mask.

This patch shouldn't introduce any behavior changes.

v2: Comment and description updated as suggested by Li.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-24 15:21:47 -07:00
Tejun Heo
9871bf9550 cgroup: prefix global variables with "cgroup_"
Global variable names in kernel/cgroup.c are asking for trouble -
subsys, roots, rootnode and so on.  Rename them to have "cgroup_"
prefix.

* s/subsys/cgroup_subsys/

* s/rootnode/cgroup_dummy_root/

* s/dummytop/cgroup_cummy_top/

* s/roots/cgroup_roots/

* s/root_count/cgroup_root_count/

This patch is purely cosmetic.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-24 15:21:47 -07:00
Greg Kroah-Hartman
b5aef682e0 Merge 3.10-rc7 into driver-core-next
We want the firmware merge fixes, and other bits, in here now.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-24 15:14:43 -07:00
James Hogan
6fff831404 genirq: Irqchip: document gcflags arg of irq_alloc_domain_generic_chips
Commit 088f40b7b027dad6519712ff224a5798dd62a204 ("genirq: Generic chip:
Add linear irq domain support") missed kerneldoc for the gcflags
argument of irq_alloc_domain_generic_chips(). Add it now.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Link: http://lkml.kernel.org/r/1371564513-4327-1-git-send-email-james.hogan@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-24 15:38:12 +02:00
Dave Hansen
14c63f17b1 perf: Drop sample rate when sampling is too slow
This patch keeps track of how long perf's NMI handler is taking,
and also calculates how many samples perf can take a second.  If
the sample length times the expected max number of samples
exceeds a configurable threshold, it drops the sample rate.

This way, we don't have a runaway sampling process eating up the
CPU.

This patch can tend to drop the sample rate down to level where
perf doesn't work very well.  *BUT* the alternative is that my
system hangs because it spends all of its time handling NMIs.

I'll take a busted performance tool over an entire system that's
busted and undebuggable any day.

BTW, my suspicion is that there's still an underlying bug here.
Using the HPET instead of the TSC is definitely a contributing
factor, but I suspect there are some other things going on.
But, I can't go dig down on a bug like that with my machine
hanging all the time.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
[ Prettified it a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:52:57 +02:00
Linus Torvalds
f71194a7d4 Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This series fixes a couple of build failures, and fixes MTRR cleanup
  and memory setup on very specific memory maps.

  Finally, it fixes triggering backtraces on all CPUs, which was
  inadvertently disabled on x86."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Fix dummy variable buffer allocation
  x86: Fix trigger_all_cpu_backtrace() implementation
  x86: Fix section mismatch on load_ucode_ap
  x86: fix build error and kconfig for ia32_emulation and binfmt
  range: Do not add new blank slot with add_range_with_merge
  x86, mtrr: Fix original mtrr range get for mtrr_cleanup
2013-06-21 06:33:48 -10:00
Daniel Lezcano
ea8deb8dfa tick: Fix tick_broadcast_pending_mask not cleared
The recent modification in the cpuidle framework consolidated the
timer broadcast code across the different drivers by setting a new
flag in the idle state. It tells the cpuidle core code to enter/exit
the broadcast mode for the cpu when entering a deep idle state. The
broadcast timer enter/exit is no longer handled by the back-end
driver.

This change made the local interrupt to be enabled *before* calling
CLOCK_EVENT_NOTIFY_EXIT.

On a tegra114, a four cores system, when the flag has been introduced
in the driver, the following warning appeared:

WARNING: at kernel/time/tick-broadcast.c:578 tick_broadcast_oneshot_control
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.10.0-rc3-next-20130529+ #15
[<c00667f8>] (tick_broadcast_oneshot_control+0x1a4/0x1d0) from [<c0065cd0>] (tick_notify+0x240/0x40c)
[<c0065cd0>] (tick_notify+0x240/0x40c) from [<c0044724>] (notifier_call_chain+0x44/0x84)
[<c0044724>] (notifier_call_chain+0x44/0x84) from [<c0044828>] (raw_notifier_call_chain+0x18/0x20)
[<c0044828>] (raw_notifier_call_chain+0x18/0x20) from [<c00650cc>] (clockevents_notify+0x28/0x170)
[<c00650cc>] (clockevents_notify+0x28/0x170) from [<c033f1f0>] (cpuidle_idle_call+0x11c/0x168)
[<c033f1f0>] (cpuidle_idle_call+0x11c/0x168) from [<c000ea94>] (arch_cpu_idle+0x8/0x38)
[<c000ea94>] (arch_cpu_idle+0x8/0x38) from [<c005ea80>] (cpu_startup_entry+0x60/0x134)
[<c005ea80>] (cpu_startup_entry+0x60/0x134) from [<804fe9a4>] (0x804fe9a4)

I don't have the hardware, so I wasn't able to reproduce the warning
but after looking a while at the code, I deduced the following:

 1. the CPU2 enters a deep idle state and sets the broadcast timer

 2. the timer expires, the tick_handle_oneshot_broadcast function is
    called, setting the tick_broadcast_pending_mask and waking up the
    idle cpu CPU2

 3. the CPU2 exits idle handles the interrupt and then invokes
    tick_broadcast_oneshot_control with CLOCK_EVENT_NOTIFY_EXIT which
    runs the following code:

    [...]
    if (dev->next_event.tv64 == KTIME_MAX)
            goto out;

    if (cpumask_test_and_clear_cpu(cpu,
                                 tick_broadcast_pending_mask))
            goto out;
    [...]

    So if there is no next event scheduled for CPU2, we fulfil the
    first condition and jump out without clearing the
    tick_broadcast_pending_mask.

 4. CPU2 goes to deep idle again and calls
    tick_broadcast_oneshot_control with CLOCK_NOTIFY_EVENT_ENTER but
    with the tick_broadcast_pending_mask set for CPU2, triggering the
    warning.

The issue only surfaced due to the modifications of the cpuidle
framework, which resulted in interrupts being enabled before the call
to the clockevents code. If the call happens before interrupts have
been enabled, the warning cannot trigger, because there is still the
event pending which caused the broadcast timer expiry.

Move the check for the next event below the check for the pending bit,
so the pending bit gets cleared whether an event is scheduled on the
cpu or not.

[ tglx: Massaged changelog ]

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reported-and-tested-by: Joseph Lo <josephl@nvidia.com>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linaro-kernel@lists.linaro.org
Link: http://lkml.kernel.org/r/1371485735-31249-1-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-21 13:10:34 +02:00
Linus Torvalds
a3d5c3460a Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Two smaller fixes - plus a context tracking tracing fix that is a bit
  bigger"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing/context-tracking: Add preempt_schedule_context() for tracing
  sched: Fix clear NOHZ_BALANCE_KICK
  sched/x86: Construct all sibling maps if smt
2013-06-20 08:18:35 -10:00
Linus Torvalds
86c76676cf Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Four fixes.  The mmap ones are unfortunately larger than desired -
  fuzzing uncovered bugs that needed perf context life time management
  changes to fix properly"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
  perf: Fix mmap() accounting hole
  perf: Fix perf mmap bugs
  kprobes: Fix to free gone and unused optprobes
2013-06-20 08:17:36 -10:00
Linus Torvalds
805e318548 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu idle fixes from Thomas Gleixner:
 - Add a missing irq enable. Fallout of the idle conversion
 - Fix stackprotector wreckage caused by the idle conversion

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  idle: Enable interrupts in the weak arch_cpu_idle() implementation
  idle: Add the stack canary init to cpu_startup_entry()
2013-06-20 08:16:07 -10:00
Linus Torvalds
4db88eb4c3 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 - Fix inconstinant clock usage in virtual time accounting
 - Fix a build error in KVM caused by the NOHZ work
 - Remove a pointless timekeeping duty assignment which breaks NOHZ
 - Use a proper notifier return value to avoid random behaviour

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick: Remove useless timekeeping duty attribution to broadcast source
  nohz: Fix notifier return val that enforce timekeeping
  kvm: Move guest entry/exit APIs to context_tracking
  vtime: Use consistent clocks among nohz accounting
2013-06-20 08:15:13 -10:00
Oleg Nesterov
bde96030f4 hw_breakpoint: Introduce "struct bp_cpuinfo"
This patch simply moves all per-cpu variables into the new
single per-cpu "struct bp_cpuinfo".

To me this looks more logical and clean, but this can also
simplify the further potential changes. In particular, I do not
think this memory should be per-cpu, it is never used "locally".
After this change it is trivial to turn it into, say,
bootmem[nr_cpu_ids].

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20130620155020.GA6350@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 17:58:57 +02:00
Oleg Nesterov
e12cbc10cb hw_breakpoint: Simplify *register_wide_hw_breakpoint()
1. register_wide_hw_breakpoint() can use unregister_ if failure,
   no need to duplicate the code.

2. "struct perf_event **pevent" adds the unnecesary lever of
   indirection and complication, use per_cpu(*cpu_events, cpu).

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20130620155018.GA6347@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 17:58:57 +02:00
Oleg Nesterov
1c10adbb92 hw_breakpoint: Introduce cpumask_of_bp()
Add the trivial helper which simply returns cpumask_of() or
cpu_possible_mask depending on bp->cpu.

Change fetch_bp_busy_slots() and toggle_bp_slot() to always do
for_each_cpu(cpumask_of_bp) to simplify the code and avoid the
code duplication.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20130620155015.GA6340@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 17:58:56 +02:00
Oleg Nesterov
7ab71f3244 hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
Change toggle_bp_slot() to make "weight" negative if !enable.
This way we can always use "+ weight" without additional "if
(enable)" check and toggle_bp_task_slot() no longer needs this
arg.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20130620155013.GA6337@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 17:58:55 +02:00
Oleg Nesterov
e1ebe86203 hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
The enable/disable logic in toggle_bp_slot() is not symmetrical
and imho very confusing. "old_count" in toggle_bp_task_slot() is
actually new_count because this bp was already removed from the
list.

Change toggle_bp_slot() to always call list_add/list_del after
toggle_bp_task_slot(). This way old_idx is task_bp_pinned() and
this entry should be decremented, new_idx is +/-weight and we
need to increment this element. The code/logic looks obvious.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20130620155011.GA6330@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 17:58:55 +02:00
Ingo Molnar
f070a4dba9 Merge branch 'perf/urgent' into perf/core
Merge in two hw_breakpoint fixes, before applying another 5.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 17:57:40 +02:00
Oleg Nesterov
c790b0ad23 hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()
fetch_bp_busy_slots() and toggle_bp_slot() use
for_each_online_cpu(), this is obviously wrong wrt cpu_up() or
cpu_down(), we can over/under account the per-cpu numbers.

For example:

	# echo 0 >> /sys/devices/system/cpu/cpu1/online
	# perf record -e mem:0x10 -p 1 &
	# echo 1 >> /sys/devices/system/cpu/cpu1/online
	# perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10 -C1 -a &
	# taskset -p 0x2 1

triggers the same WARN_ONCE("Can't find any breakpoint slot") in
arch_install_hw_breakpoint().

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20130620155009.GA6327@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 17:57:01 +02:00
Oleg Nesterov
8b4d801b2b hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)
trinity fuzzer triggered WARN_ONCE("Can't find any breakpoint
slot") in arch_install_hw_breakpoint() but the problem is not
arch-specific.

The problem is, task_bp_pinned(cpu) checks "cpu == iter->cpu"
but this doesn't account the "all cpus" events with iter->cpu <
0.

This means that, say, register_user_hw_breakpoint(tsk) can
happily create the arbitrary number > HBP_NUM of breakpoints
which can not be activated. toggle_bp_task_slot() is equally
wrong by the same reason and nr_task_bp_pinned[] can have
negative entries.

Simple test:

	# perl -e 'sleep 1 while 1' &
	# perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10,mem:0x10 -p `pidof perl`

Before this patch this triggers the same problem/WARN_ON(),
after the patch it correctly fails with -ENOSPC.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20130620155006.GA6324@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 17:57:00 +02:00
Ingo Molnar
d908e1ebbc Miscellaneous fixes for ACPI EINJ (error injection) code.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRsQ4tAAoJEKurIx+X31iBAaIP+wf/lGnTtpR6vwRHbrCV2ZXl
 EC/qnxiqoy8qKe3wn0yVtmORNsBDWnMEsl6ubyHFIhkOL9zrTTQAQRWWzTYu5MWO
 4Vl/Z0zgnBmj66TFVCzDMJLOIsx6o/UHg7uiiCawUEQshWd1jYVZ8bEn+iyj36+y
 mNL74yJkkR9s5AJIQWr7lAKcgeTR8tXHCBuuW5BoY/ThKKtajDKqg8/q4HytPhwi
 u18OpTUv01jK9oS2t6wMzf4Hl0jeHt3FySSgQFPBwfcMJxDlPXnbzlO98Out9h12
 VEye9Ya1CIcVYfPhRjMFgdIvCEbK1afXkk2SRivjOzRFokwk9XR2VTMm5E+KeGZc
 bh8JGxZwGqTxZQlWgqAdZusL928LxPPF631+ZmaJUfaJMspf4/25Y2tAFWXuAycr
 foi1kw/KepE956C0cImS5chRaXPzD1+MJa1lIolD2DUZlTVHRx7rlcPIapWLizyd
 Ir/TNzYUY14oGcQNZMTNcj3sWIUaGn9iEGB+ZtHynW813Sby4bDtvhe8h6UHzADV
 mQnvpz/qcKYwEjg8HmtpmGYEioLz/6RxUU1I1VuAyOCIgxFwVHkyWf41+IPBsTqc
 a4cBF0aLXFn4rcmIVlUGACbrog83XLUNrhXfnmLoFgUcA9ljkmY2ydurcQ1GIhcP
 5WAzisHQor+F3GmOFfaD
 =Pywj
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-einj' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull miscellaneous fixes for ACPI EINJ (error injection) code, from Tony Luck.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:54:04 +02:00
Ingo Molnar
b1fe9987b7 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

"The major changes for this series are:

 1.      Simplify RCU's grace-period and callback processing based on
         the new numbering for callbacks.  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/330.

 2.      Documentation updates.  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/348.

 3.      Miscellaneous fixes, including converting a few remaining printk()
         calls to pr_*().  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/324.

 4.      SRCU-related changes and fixes.  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/425.

 5.      Removal of TINY_PREEMPT_RCU in favor of TREE_PREEMPT_RCU for
         single-CPU low-latency systems.  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/427."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:48:57 +02:00
Joe Perches
be7002e6c6 sched: Don't mix use of typedef ctl_table and struct ctl_table
Just use struct ctl_table.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1371063336.2069.22.camel@joe-AO722
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:58:48 +02:00
Viresh Kumar
94c95ba69f sched: Remove WARN_ON(!sd) from init_sched_groups_power()
sd can't be NULL in init_sched_groups_power() and so checking it for NULL isn't
useful. In case it is required, then also we need to rearrange the code a bit as
we already accessed invalid pointer sd to get sg: sg = sd->groups.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/2bbe633cd74b431c05253a8ce61fdfd5066a531b.1370948150.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:58:47 +02:00
Viresh Kumar
cd08e9234c sched: Fix memory leakage in build_sched_groups()
In build_sched_groups() we don't need to call get_group() for cpus
which are already covered in previous iterations. Calling get_group()
would mark the group used and eventually leak it since we wouldn't
connect it and not find it again to free it.

This will happen only in cases where sg->cpumask contained more than
one cpu (For any topology level). This patch would free sg's memory
for all cpus leaving the group leader as the group isn't marked used
now.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/7a61e955abdcbb1dfa9fe493f11a5ec53a11ddd3.1370948150.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:58:46 +02:00
Viresh Kumar
0936629f01 sched: Use cached value of span instead of calling sched_domain_span()
In the beginning of build_sched_groups() we called sched_domain_span() and
cached its return value in span. Few statements later we are calling it again to
get the same pointer.

Lets use the cached value instead as it hasn't changed in between.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/834ecd507071ad88aff039352dbc7e063dd996a7.1370948150.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:58:46 +02:00
Viresh Kumar
27723a68ca sched: Create for_each_sd_topology()
For loop for traversing sched_domain_topology was used at multiple placed in
core.c. This patch removes code redundancy by creating for_each_sd_topology().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/e0e04542f54e9464bd9da54f5ccfe62ec6c4c0bc.1370861520.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:58:45 +02:00
Viresh Kumar
c75e01288c sched: Don't set sd->child to NULL when it is already NULL
Memory for sd is allocated with kzalloc_node() which will initialize its fields
with zero. In build_sched_domain() we are setting sd->child to child even if
child is NULL, which isn't required.

Lets do it only if child isn't NULL.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f4753a1730051341003ad2ad29a3229c7356678e.1370861520.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:58:45 +02:00
Viresh Kumar
1c63216940 sched: Don't initialize alloc_state in build_sched_domains()
alloc_state will be overwritten by __visit_domain_allocation_hell() and so we
don't actually need to initialize alloc_state.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/df57734a075cc5ad130e1ae498702e24f2529ab8.1370861520.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:58:44 +02:00
Viresh Kumar
22da956953 sched: Optimize build_sched_domains() for saving first SD node for a cpu
We are saving first scheduling domain for a cpu in build_sched_domains() by
iterating over the nested sd->child list. We don't actually need to do it this
way.

tl will be equal to sched_domain_topology for the first iteration and so we can
set *per_cpu_ptr(d.sd, i) based on that.  So, save pointer to first SD while
running the iteration loop over tl's.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/fc473527cbc4dfa0b8eeef2a59db74684eb59a83.1370436120.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:58:43 +02:00
Viresh Kumar
4a850cbefa sched: Remove unused params of build_sched_domain()
build_sched_domain() never uses parameter struct s_data *d and so passing it is
useless.

Remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/545e0b4536166a15b4475abcafe5ed0db4ad4a2c.1370436120.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:58:42 +02:00
Viresh Kumar
0a0fca9d83 sched: Rename sched.c as sched/core.c in comments and Documentation
Most of the stuff from kernel/sched.c was moved to kernel/sched/core.c long time
back and the comments/Documentation never got updated.

I figured it out when I was going through sched-domains.txt and so thought of
fixing it globally.

I haven't crossed check if the stuff that is referenced in sched/core.c by all
these files is still present and hasn't changed as that wasn't the motive behind
this patch.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/cdff76a265326ab8d71922a1db5be599f20aad45.1370329560.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:58:42 +02:00
Michael Wang
8404c90d05 sched: Femove the useless declaration in kernel/sched/fair.c
default_cfs_period(), do_sched_cfs_period_timer(), do_sched_cfs_slack_timer()
already defined previously, no need to declare again.

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51AD8808.7020608@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:58:41 +02:00
Michael Wang
22b958d8cc sched: Refine the code in unthrottle_cfs_rq()
Directly use rq to save some code.

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51AD87EB.1070605@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:58:41 +02:00
Kirill Tkhai
e23ee74777 sched/rt: Simplify pull_rt_task() logic and remove .leaf_rt_rq_list
[ Peter, this is based off of some of my work, I ran it though a few
  tests and it passed. I also reviewed it, and added my SOB as I am
  somewhat a co-author to it. ]

Based on the patch by Steven Rostedt from previous year:

https://lkml.org/lkml/2012/4/18/517

1)Simplify pull_rt_task() logic: search in pushable tasks of dest runqueue.
The only pullable tasks are the tasks which are pushable in their local rq,
and no others.

2)Remove .leaf_rt_rq_list member of struct rt_rq and functions connected
with it: nobody uses it since now.

Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/287571370557898@web7d.yandex.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:58:40 +02:00
Ingo Molnar
d81344c508 Merge branch 'sched/urgent' into sched/core
Merge in fixes before applying ongoing new work.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:55:31 +02:00
Steven Rostedt
29bb9e5a75 tracing/context-tracking: Add preempt_schedule_context() for tracing
Dave Jones hit the following bug report:

 ===============================
 [ INFO: suspicious RCU usage. ]
 3.10.0-rc2+ #1 Not tainted
 -------------------------------
 include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
 other info that might help us debug this:
 RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
 RCU used illegally from extended quiescent state!
 2 locks held by cc1/63645:
  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0

 CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
 Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
  0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
  ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
  0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
 Call Trace:
  [<ffffffff816ae383>] dump_stack+0x19/0x1b
  [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
  [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
  [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
  [<ffffffff8108dffc>] update_curr+0xec/0x240
  [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
  [<ffffffff816b3a71>] __schedule+0x161/0x9b0
  [<ffffffff816b4721>] preempt_schedule+0x51/0x80
  [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
  [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
  [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
  [<ffffffff816be280>] ftrace_call+0x5/0x2f
  [<ffffffff816b681d>] ? retint_careful+0xb/0x2e
  [<ffffffff816b4805>] ? schedule_user+0x5/0x70
  [<ffffffff816b4805>] ? schedule_user+0x5/0x70
  [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
 ------------[ cut here ]------------

What happened was that the function tracer traced the schedule_user() code
that tells RCU that the system is coming back from userspace, and to
add the CPU back to the RCU monitoring.

Because the function tracer does a preempt_disable/enable_notrace() calls
the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
then preempt_schedule() is called. But this is called before the user_exit()
function can inform the kernel that the CPU is no longer in user mode and
needs to be accounted for by RCU.

The fix is to create a new preempt_schedule_context() that checks if
the kernel is still in user mode and if so to switch it to kernel mode
before calling schedule. It also switches back to user mode coming back
from schedule in need be.

The only user of this currently is the preempt_enable_notrace(), which is
only used by the tracing subsystem.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:55:10 +02:00
Vincent Guittot
873b4c65b5 sched: Fix clear NOHZ_BALANCE_KICK
I have faced a sequence where the Idle Load Balance was sometime not
triggered for a while on my platform, in the following scenario:

 CPU 0 and CPU 1 are running tasks and CPU 2 is idle

 CPU 1 kicks the Idle Load Balance
 CPU 1 selects CPU 2 as the new Idle Load Balancer
 CPU 2 sets NOHZ_BALANCE_KICK for CPU 2
 CPU 2 sends a reschedule IPI to CPU 2

 While CPU 3 wakes up, CPU 0 or CPU 1 migrates a waking up task A on CPU 2

 CPU 2 finally wakes up, runs task A and discards the Idle Load Balance
       task A quickly goes back to sleep (before a tick occurs on CPU 2)
 CPU 2 goes back to idle with NOHZ_BALANCE_KICK set

Whenever CPU 2 will be selected as the ILB, no reschedule IPI will be sent
because NOHZ_BALANCE_KICK is already set and no Idle Load Balance will be
performed.

We must wait for the sched softirq to be raised on CPU 2 thanks to another
part the kernel to come back to clear NOHZ_BALANCE_KICK.

The proposed solution clears NOHZ_BALANCE_KICK in schedule_ipi if
we can't raise the sched_softirq for the Idle Load Balance.

Change since V1:

- move the clear of NOHZ_BALANCE_KICK in got_nohz_idle_kick if the ILB
  can't run on this CPU (as suggested by Peter)

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370419991-13870-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:55:09 +02:00
Mischa Jonker
03d8e80beb perf: Add const qualifier to perf_pmu_register's 'name' arg
This allows us to use pdev->name for registering a PMU device.
IMO the name is not supposed to be changed anyway.

Signed-off-by: Mischa Jonker <mjonker@synopsys.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370339148-5566-1-git-send-email-mjonker@synopsys.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:50:23 +02:00
Stephane Eranian
e712209a9e perf: Fix hypervisor branch sampling permission check
Commit 2b923c8 perf/x86: Check branch sampling priv level in generic code
was missing the check for the hypervisor (HV) priv level, so add it back.

With this patch, we get the following correct behavior:

  # echo 2 >/proc/sys/kernel/perf_event_paranoid

  $ perf record -j any,k noploop 1
  Error:
  You may not have permission to collect stats.
  Consider tweaking /proc/sys/kernel/perf_event_paranoid:
   -1 - Not paranoid at all
    0 - Disallow raw tracepoint access for unpriv
    1 - Disallow cpu events for unpriv
    2 - Disallow kernel profiling for unpriv

   $ perf record -j any,hv noploop 1
   Error:
   You may not have permission to collect stats.
   Consider tweaking /proc/sys/kernel/perf_event_paranoid:
    -1 - Not paranoid at all
     0 - Disallow raw tracepoint access for unpriv
     1 - Disallow cpu events for unpriv
     2 - Disallow kernel profiling for unpriv

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130606090204.GA3725@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:50:21 +02:00
Ingo Molnar
eff2108f02 Merge branch 'perf/urgent' into perf/core
Merge in the latest fixes, to avoid conflicts with ongoing work.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:44:41 +02:00
Peter Zijlstra
9bb5d40cd9 perf: Fix mmap() accounting hole
Vince's fuzzer once again found holes. This time it spotted a leak in
the locked page accounting.

When an event had redirected output and its close() was the last
reference to the buffer we didn't have a vm context to undo accounting.

Change the code to destroy the buffer on the last munmap() and detach
all redirected events at that time. This provides us the right context
to undo the vm accounting.

Reported-and-tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130604084421.GI8923@twins.programming.kicks-ass.net
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:44:13 +02:00