16937 Commits

Author SHA1 Message Date
Eric Paris
e13f91e3c5 audit: use memset instead of trying to initialize field by field
We currently are setting fields to 0 to initialize the structure
declared on the stack.  This is a bad idea as if the structure has holes
or unpacked space these will not be initialized.  Just use memset.  This
is not a performance critical section of code.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:35 -05:00
Mathias Krause
64fbff9ae0 audit: fix info leak in AUDIT_GET requests
We leak 4 bytes of kernel stack in response to an AUDIT_GET request as
we miss to initialize the mask member of status_set. Fix that.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: stable@vger.kernel.org  # v2.6.6+
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:30 -05:00
Richard Guy Briggs
db510fc5cd audit: update AUDIT_INODE filter rule to comparator function
It appears this one comparison function got missed in f368c07d (and 9c937dcc).

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:24 -05:00
Eric Paris
21b85c31d2 audit: audit feature to set loginuid immutable
This adds a new 'audit_feature' bit which allows userspace to set it
such that the loginuid is absolutely immutable, even if you have
CAP_AUDIT_CONTROL.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:17 -05:00
Eric Paris
d040e5af38 audit: audit feature to only allow unsetting the loginuid
This is a new audit feature which only grants processes with
CAP_AUDIT_CONTROL the ability to unset their loginuid.  They cannot
directly set it from a valid uid to another valid uid.  The ability to
unset the loginuid is nice because a priviledged task, like that of
container creation, can unset the loginuid and then priv is not needed
inside the container when a login daemon needs to set the loginuid.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:13 -05:00
Eric Paris
81407c84ac audit: allow unsetting the loginuid (with priv)
If a task has CAP_AUDIT_CONTROL allow that task to unset their loginuid.
This would allow a child of that task to set their loginuid without
CAP_AUDIT_CONTROL.  Thus when launching a new login daemon, a
priviledged helper would be able to unset the loginuid and then the
daemon, which may be malicious user facing, do not need priv to function
correctly.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:09 -05:00
Eric Paris
83fa6bbe4c audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE
After trying to use this feature in Fedora we found the hard coding
policy like this into the kernel was a bad idea.  Surprise surprise.
We ran into these problems because it was impossible to launch a
container as a logged in user and run a login daemon inside that container.
This reverts back to the old behavior before this option was added.  The
option will be re-added in a userspace selectable manor such that
userspace can choose when it is and when it is not appropriate.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:01 -05:00
Eric Paris
da0a610497 audit: loginuid functions coding style
This is just a code rework.  It makes things more readable.  It does not
make any functional changes.

It does change the log messages to include both the old session id as
well the new and it includes a new res field, which means we get
messages even when the user did not have permission to change the
loginuid.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:56 -05:00
Eric Paris
b0fed40214 audit: implement generic feature setting and retrieving
The audit_status structure was not designed with extensibility in mind.
Define a new AUDIT_SET_FEATURE message type which takes a new structure
of bits where things can be enabled/disabled/locked one at a time.  This
structure should be able to grow in the future while maintaining forward
and backward compatibility (based loosly on the ideas from capabilities
and prctl)

This does not actually add any features, but is just infrastructure to
allow new on/off types of audit system features.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:30 -05:00
Richard Guy Briggs
42f74461a5 audit: change decimal constant to macro for invalid uid
SFR reported this 2013-05-15:

> After merging the final tree, today's linux-next build (i386 defconfig)
> produced this warning:
>
> kernel/auditfilter.c: In function 'audit_data_to_entry':
> kernel/auditfilter.c:426:3: warning: this decimal constant is unsigned only
> in ISO C90 [enabled by default]
>
> Introduced by commit 780a7654cee8 ("audit: Make testing for a valid
> loginuid explicit") from Linus' tree.

Replace this decimal constant in the code with a macro to make it more readable
(add to the unsigned cast to quiet the warning).

Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:27 -05:00
Tyler Hicks
0868a5e150 audit: printk USER_AVC messages when audit isn't enabled
When the audit=1 kernel parameter is absent and auditd is not running,
AUDIT_USER_AVC messages are being silently discarded.

AUDIT_USER_AVC messages should be sent to userspace using printk(), as
mentioned in the commit message of 4a4cd633 ("AUDIT: Optimise the
audit-disabled case for discarding user messages").

When audit_enabled is 0, audit_receive_msg() discards all user messages
except for AUDIT_USER_AVC messages. However, audit_log_common_recv_msg()
refuses to allocate an audit_buffer if audit_enabled is 0. The fix is to
special case AUDIT_USER_AVC messages in both functions.

It looks like commit 50397bd1 ("[AUDIT] clean up audit_receive_msg()")
introduced this bug.

Cc: <stable@kernel.org> # v2.6.25+
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: linux-audit@redhat.com
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:23 -05:00
Oleg Nesterov
d48d805122 audit_alloc: clear TIF_SYSCALL_AUDIT if !audit_context
If audit_filter_task() nacks the new thread it makes sense
to clear TIF_SYSCALL_AUDIT which can be copied from parent
by dup_task_struct().

A wrong TIF_SYSCALL_AUDIT is not really bad but it triggers
the "slow" audit paths in entry.S to ensure the task can not
miss audit_syscall_*() calls, this is pointless if the task
has no ->audit_context.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Steve Grubb <sgrubb@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:18 -05:00
Gao feng
af0e493d30 Audit: remove duplicate comments
Remove it.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:14 -05:00
Richard Guy Briggs
b8f89caafe audit: remove newline accidentally added during session id helper refactor
A newline was accidentally added during session ID helper refactorization in
commit 4d3fb709.  This needlessly uses up buffer space, messes up syslog
formatting and makes userspace processing less efficient.  Remove it.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:09 -05:00
Ilya V. Matveychikov
47145705e3 audit: remove duplicate inclusion of the netlink header
Signed-off-by: Ilya V. Matveychikov <matvejchikov@gmail.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:06:53 -05:00
Richard Guy Briggs
b50eba7e2d audit: format user messages to size of MAX_AUDIT_MESSAGE_LENGTH
Messages of type AUDIT_USER_TTY were being formatted to 1024 octets,
truncating messages approaching MAX_AUDIT_MESSAGE_LENGTH (8970 octets).

Set the formatting to 8560 characters, given maximum estimates for prefix and
suffix budgets.

See the problem discussion:
https://www.redhat.com/archives/linux-audit/2009-January/msg00030.html

And the new size rationale:
https://www.redhat.com/archives/linux-audit/2013-September/msg00016.html

Test ~8k messages with:
auditctl -m "$(for i in $(seq -w 001 820);do echo -n "${i}0______";done)"

Reported-by: LC Bruzenak <lenny@magitekltd.com>
Reported-by: Justin Stephenson <jstephen@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:06:49 -05:00
David S. Miller
394efd19d5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be.h
	drivers/net/netconsole.c
	net/bridge/br_private.h

Three mostly trivial conflicts.

The net/bridge/br_private.h conflict was a function signature (argument
addition) change overlapping with the extern removals from Joe Perches.

In drivers/net/netconsole.c we had one change adjusting a printk message
whilst another changed "printk(KERN_INFO" into "pr_info(".

Lastly, the emulex change was a new inline function addition overlapping
with Joe Perches's extern removals.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 13:48:30 -05:00
Ingo Molnar
2a3ede8cb2 Merge branch 'perf/urgent' into perf/core to fix conflicts
Conflicts:
	tools/perf/bench/numa.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-04 07:49:35 +01:00
Ingo Molnar
fb10d5b7ef Merge branch 'linus' into sched/core
Resolve cherry-picking conflicts:

Conflicts:
	mm/huge_memory.c
	mm/memory.c
	mm/mprotect.c

See this upstream merge commit for more details:

  52469b4fcd4f Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-01 08:24:41 +01:00
Oleg Nesterov
6a716c90a5 hung_task debugging: Add tracepoint to report the hang
Currently check_hung_task() prints a warning if it detects the
problem, but it is not convenient to watch the system logs if
user-space wants to be notified about the hang.

Add the new trace_sched_process_hang() into check_hung_task(),
this way a user-space monitor can easily wait for the hang and
potentially resolve a problem.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Sullivan <dsulliva@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20131019161828.GA7439@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-31 11:16:18 +01:00
Chen Gang
6ef4d2eaf5 kernel/system_certificate.S: use real contents instead of macro GLOBAL()
If a macro is only used within 2 times, and also its contents are
within 2 lines, recommend to expand it to shrink code line.

For our case, the macro is not portable either: some architectures'
assembler may use another character to mark newline in a macro (e.g.
'`' for arc), which will cause issue.

If still want to use macro and let it portable enough, it will also
need include additional header file (e.g "#include <linux/linkage.h>",
although it also need be fixed).


Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2013-10-30 12:58:00 +00:00
Mathias Krause
0b6b098efc padata: make the sequence counter an atomic_t
Using a spinlock to atomically increase a counter sounds wrong -- we've
atomic_t for this!

Also move 'seq_nr' to a different cache line than 'lock' to reduce cache
line trashing. This has the nice side effect of decreasing the size of
struct parallel_data from 192 to 128 bytes for a x86-64 build, e.g.
occupying only two instead of three cache lines.

Those changes results in a 5% performance increase on an IPsec test run
using pcrypt.

Btw. the seq_lock spinlock was never explicitly initialized -- one more
reason to get rid of it.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-10-30 12:02:58 +08:00
Oleg Nesterov
3ab6796617 uprobes: Teach uprobe_copy_process() to handle CLONE_VFORK
uprobe_copy_process() does nothing if the child shares ->mm with
the forking process, but there is a special case: CLONE_VFORK.
In this case it would be more correct to do dup_utask() but avoid
dup_xol(). This is not that important, the child should not unwind
its stack too much, this can corrupt the parent's stack, but at
least we need this to allow to ret-probe __vfork() itself.

Note: in theory, it would be better to check task_pt_regs(p)->sp
instead of CLONE_VFORK, we need to dup_utask() if and only if the
child can return from the function called by the parent. But this
needs the arch-dependant helper, and I think that nobody actually
does clone(same_stack, CLONE_VM).

Reported-by: Martin Cermak <mcermak@redhat.com>
Reported-by: David Smith <dsmith@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-10-29 18:02:55 +01:00
Oleg Nesterov
aa59c53fd4 uprobes: Change uprobe_copy_process() to dup xol_area
This finally fixes the serious bug in uretprobes: a forked child
crashes if the parent called fork() with the pending ret probe.

Trivial test-case:

	# perf probe -x /lib/libc.so.6 __fork%return
	# perf record -e probe_libc:__fork perl -le 'fork || print "OK"'

(the child doesn't print "OK", it is killed by SIGSEGV)

If the child returns from the probed function it actually returns
to trampoline_vaddr, because it got the copy of parent's stack
mangled by prepare_uretprobe() when the parent entered this func.

It crashes because a) this address is not mapped and b) until the
previous change it doesn't have the proper->return_instances info.

This means that uprobe_copy_process() has to create xol_area which
has the trampoline slot, and its vaddr should be equal to parent's
xol_area->vaddr.

Unfortunately, uprobe_copy_process() can not simply do
__create_xol_area(child, xol_area->vaddr). This could actually work
but perf_event_mmap() doesn't expect the usage of foreign ->mm. So
we offload this to task_work_run(), and pass the argument via not
yet used utask->vaddr.

We know that this vaddr is fine for install_special_mapping(), the
necessary hole was recently "created" by dup_mmap() which skips the
parent's VM_DONTCOPY area, and nobody else could use the new mm.

Unfortunately, this also means that we can not handle the errors
properly, we obviously can not abort the already completed fork().
So we simply print the warning if GFP_KERNEL allocation (the only
possible reason) fails.

Reported-by: Martin Cermak <mcermak@redhat.com>
Reported-by: David Smith <dsmith@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:54 +01:00
Oleg Nesterov
248d3a7b2f uprobes: Change uprobe_copy_process() to dup return_instances
uprobe_copy_process() assumes that the new child doesn't need
->utask, it should be allocated by demand.

But this is not true if the forking task has the pending ret-
probes, the child should report them as well and thus it needs
the copy of parent's ->return_instances chain. Otherwise the
child crashes when it returns from the probed function.

Alternatively we could cleanup the child's stack, but this needs
per-arch changes and this is not what we want. At least systemtap
expects a .return in the child too.

Note: this change alone doesn't fix the problem, see the next
change.

Reported-by: Martin Cermak <mcermak@redhat.com>
Reported-by: David Smith <dsmith@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:53 +01:00
Oleg Nesterov
af0d95af79 uprobes: Teach __create_xol_area() to accept the predefined vaddr
Currently xol_add_vma() uses get_unmapped_area() for area->vaddr,
but the next patches need to use the fixed address. So this patch
adds the new "vaddr" argument to __create_xol_area() which should
be used as area->vaddr if it is nonzero.

xol_add_vma() doesn't bother to verify that the predefined addr is
not used, insert_vm_struct() should fail if find_vma_links() detects
the overlap with the existing vma.

Also, __create_xol_area() doesn't need __GFP_ZERO to allocate area.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:51 +01:00
Oleg Nesterov
6441ec8b7c uprobes: Introduce __create_xol_area()
No functional changes, preparation.

Extract the code which actually allocates/installs the new area
into the new helper, __create_xol_area().

While at it remove the unnecessary "ret = ENOMEM" and "ret = 0"
in xol_add_vma(), they both have no effect.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:50 +01:00
Oleg Nesterov
b68e074910 uprobes: Change the callsite of uprobe_copy_process()
Preparation for the next patches.

Move the callsite of uprobe_copy_process() in copy_process() down
to the succesfull return. We do not care if copy_process() fails,
uprobe_free_utask() won't be called in this case so the wrong
->utask != NULL doesn't matter.

OTOH, with this change we know that copy_process() can't fail when
uprobe_copy_process() is called, the new task should either return
to user-mode or call do_exit(). This way uprobe_copy_process() can:

	1. setup p->utask != NULL if necessary

	2. setup uprobes_state.xol_area

	3. use task_work_add(p)

Also, move the definition of uprobe_copy_process() down so that it
can see get_utask().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:48 +01:00
Peter Zijlstra
5a3126d4fe perf: Fix the perf context switch optimization
Currently we only optimize the context switch between two
contexts that have the same parent; this forgoes the
optimization between parent and child context, even though these
contexts could be equivalent too.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Shishkin, Alexander <alexander.shishkin@intel.com>
Link: http://lkml.kernel.org/r/20131007164257.GH3081@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 14:13:01 +01:00
Peter Zijlstra
2c42cfbfe1 perf: Change zero-padding of strings in perf_event_mmap_event()
Oleg complained about the excessive 0-ing in perf_event_mmap_event(),
so try and be smarter about it while keeping it fairly fool proof and
avoid leaking random bits out to userspace.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-8jirlm99m6if2z13wd6rbyu6@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:53 +01:00
Oleg Nesterov
3ea2f2b96f perf: Do not waste PAGE_SIZE bytes for ALIGN(8) in perf_event_mmap_event()
perf_event_mmap_event() does kzalloc(PATH_MAX + sizeof(u64)) to
ensure we can align the size later. However this means that we
actually allocate PAGE_SIZE * 2 buffer, seems too much.

Change this code to allocate PATH_MAX==PAGE_SIZE bytes, but tell
d_path() to not use the last sizeof(u64) bytes.

Note: it is not clear why do we need __GFP_ZERO, see the next patch.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016201004.GC23214@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:52 +01:00
Oleg Nesterov
32c5fb7e7d perf: Kill the dead !vma->vm_mm code in perf_event_mmap_event()
1. perf_event_mmap(vma) is never called with a gate_vma-like arg,
   remove the "if (!vma->vm_mm)" code.

2. arch_vma_name() can use the chached value of mmap_event->vma.

3. Change the code to not call arch_vma_name() twice.

4. Purely cosmetic, but since we use "goto got_name" all the time
   remove "else" from "[stack]" branch just for symmetry.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016200945.GB23214@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:51 +01:00
Peter Zijlstra
d9494cb429 perf: Remove useless atomic_t
There's nothing atomic about atomic_set vs atomic_read; so remove the
atomic_t usage.

Also, make running_sample_length static as it really is (and should
be) local to this translation unit.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: eranian@google.com
Cc: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/n/tip-vw9lg588x1ic248whybjon0c@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:49 +01:00
Ben Segall
f9f9ffc237 sched: Avoid throttle_cfs_rq() racing with period_timer stopping
throttle_cfs_rq() doesn't check to make sure that period_timer is running,
and while update_curr/assign_cfs_runtime does, a concurrently running
period_timer on another cpu could cancel itself between this cpu's
update_curr and throttle_cfs_rq(). If there are no other cfs_rqs running
in the tg to restart the timer, this causes the cfs_rq to be stranded
forever.

Fix this by calling __start_cfs_bandwidth() in throttle if the timer is
inactive.

(Also add some sched_debug lines for cfs_bandwidth.)

Tested: make a run/sleep task in a cgroup, loop switching the cgroup
between 1ms/100ms quota and unlimited, checking for timer_active=0 and
throttled=1 as a failure. With the throttle_cfs_rq() change commented out
this fails, with the full patch it passes.

Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181632.22647.84174.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:32 +01:00
Paul Turner
0ac9b1c218 sched: Guarantee new group-entities always have weight
Currently, group entity load-weights are initialized to zero. This
admits some races with respect to the first time they are re-weighted in
earlty use. ( Let g[x] denote the se for "g" on cpu "x". )

Suppose that we have root->a and that a enters a throttled state,
immediately followed by a[0]->t1 (the only task running on cpu[0])
blocking:

  put_prev_task(group_cfs_rq(a[0]), t1)
  put_prev_entity(..., t1)
  check_cfs_rq_runtime(group_cfs_rq(a[0]))
  throttle_cfs_rq(group_cfs_rq(a[0]))

Then, before unthrottling occurs, let a[0]->b[0]->t2 wake for the first
time:

  enqueue_task_fair(rq[0], t2)
  enqueue_entity(group_cfs_rq(b[0]), t2)
  enqueue_entity_load_avg(group_cfs_rq(b[0]), t2)
  account_entity_enqueue(group_cfs_ra(b[0]), t2)
  update_cfs_shares(group_cfs_rq(b[0]))
  < skipped because b is part of a throttled hierarchy >
  enqueue_entity(group_cfs_rq(a[0]), b[0])
  ...

We now have b[0] enqueued, yet group_cfs_rq(a[0])->load.weight == 0
which violates invariants in several code-paths. Eliminate the
possibility of this by initializing group entity weight.

Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016181627.22647.47543.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:23 +01:00
Ben Segall
927b54fccb sched: Fix hrtimer_cancel()/rq->lock deadlock
__start_cfs_bandwidth calls hrtimer_cancel while holding rq->lock,
waiting for the hrtimer to finish. However, if sched_cfs_period_timer
runs for another loop iteration, the hrtimer can attempt to take
rq->lock, resulting in deadlock.

Fix this by ensuring that cfs_b->timer_active is cleared only if the
_latest_ call to do_sched_cfs_period_timer is returning as idle. Then
__start_cfs_bandwidth can just call hrtimer_try_to_cancel and wait for
that to succeed or timer_active == 1.

Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181622.22647.16643.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:21 +01:00
Ben Segall
db06e78cc1 sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
hrtimer_expires_remaining does not take internal hrtimer locks and thus
must be guarded against concurrent __hrtimer_start_range_ns (but
returning HRTIMER_RESTART is safe). Use cfs_b->lock to make it safe.

Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181617.22647.73829.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:20 +01:00
Ben Segall
1ee14e6c8c sched: Fix race on toggling cfs_bandwidth_used
When we transition cfs_bandwidth_used to false, any currently
throttled groups will incorrectly return false from cfs_rq_throttled.
While tg_set_cfs_bandwidth will unthrottle them eventually, currently
running code (including at least dequeue_task_fair and
distribute_cfs_runtime) will cause errors.

Fix this by turning off cfs_bandwidth_used only after unthrottling all
cfs_rqs.

Tested: toggle bandwidth back and forth on a loaded cgroup. Caused
crashes in minutes without the patch, hasn't crashed with it.

Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181611.22647.80365.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:19 +01:00
Peter Zijlstra
bf378d341e perf: Fix perf ring buffer memory ordering
The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.

When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.

Reported-by: Victor Kaplansky <victork@il.ibm.com>
Tested-by: Victor Kaplansky <victork@il.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: michael@ellerman.id.au
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:01:19 +01:00
Ingo Molnar
aac898548d Merge branch 'perf/urgent' into perf/core
Conflicts:
	tools/perf/builtin-record.c
	tools/perf/builtin-top.c
	tools/perf/util/hist.h
2013-10-29 11:23:32 +01:00
Michael wang
ac9ff7997b sched: Remove extra put_online_cpus() inside sched_setaffinity()
Commit 6acce3ef8:

	sched: Remove get_online_cpus() usage

has left one extra put_online_cpus() inside sched_setaffinity(),
remove it to fix the WARN:

   ------------[ cut here ]------------
   WARNING: CPU: 0 PID: 3166 at kernel/cpu.c:84 put_online_cpus+0x43/0x70()
   ...
   [<ffffffff810c3fef>] put_online_cpus+0x43/0x70 [
   [<ffffffff810efd59>] sched_setaffinity+0x7d/0x1f9 [
   ...

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/526DD0EE.1090309@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-28 11:36:50 +01:00
Thomas Pfaff
bbfe65c219 genirq: Set the irq thread policy without checking CAP_SYS_NICE
In commit ee23871389 ("genirq: Set irq thread to RT priority on
creation") we moved the assigment of the thread's priority from the
thread's function into __setup_irq(). That function may run in user
context for instance if the user opens an UART node and then driver
calls requests in the ->open() callback. That user may not have
CAP_SYS_NICE and so the irq thread won't run with the SCHED_OTHER
policy.

This patch uses sched_setscheduler_nocheck() so we omit the CAP_SYS_NICE
check which is otherwise required for the SCHED_OTHER policy.

[bigeasy: Rewrite the changelog]

Signed-off-by: Thomas Pfaff <tpfaff@pcs.com>
Cc: Ivo Sieben <meltedpianoman@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1381489240-29626-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-10-28 09:50:42 +01:00
Rafael J. Wysocki
6e0ca95aa3 Merge branch 'pm-sleep'
* pm-sleep:
  PM / Hibernate: Use bool for boolean fields of struct snapshot_data
  PM / Sleep: Detect device suspend/resume lockup and log event
2013-10-28 01:28:07 +01:00
Rafael J. Wysocki
400fc45273 Merge branch 'pm-qos'
* pm-qos:
  PM / QoS: simplify pm_qos_power_write()
2013-10-28 01:27:21 +01:00
Linus Torvalds
aff22d3f1a Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
 "This tree contains a clockevents regression fix for certain ARM
  subarchitectures"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clockevents: Sanitize ticks to nsec conversion
2013-10-27 10:29:25 -07:00
Linus Torvalds
e2756f5e0f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "The tree contains three fixes:

   - Two tooling fixes

   - Reversal of the new 'MMAP2' extended mmap record ABI, introduced in
     this merge window.  (Patches were proposed to fix it but it was all
     a bit late and we felt it's safer to just delay the ABI one more
     kernel release and do it right)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Disable PERF_RECORD_MMAP2 support
  perf scripting perl: Fix build error on Fedora 12
  perf probe: Fix to initialize fname always before use it
2013-10-27 10:28:35 -07:00
Linus Torvalds
1c99ca43a4 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
 "This tree fixes a boot crash in CONFIG_DEBUG_MUTEXES=y kernels, on
  kernels built with GCC 3.x (there are still such distros)"

Side note: it's not just a fix for old gcc versions, it's also removing
an incredibly broken/subtle check that LLVM had issues with, and that
made no sense.

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  mutex: Avoid gcc version dependent __builtin_constant_p() usage
2013-10-27 10:18:15 -07:00
Li Bin
e9aa39bb7c sched/rt: Fix task_tick_rt() comment
This issue was introduced by 454c79999f7e ("sched/rt: Fix SCHED_RR
across cgroups") that missed the word 'not'. Fix it.

Signed-off-by: Li Bin <huawei.libin@huawei.com>
Cc: <guohanjun@huawei.com>
Cc: <xiexiuqi@huawei.com>
Cc: <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1382357743-54136-1-git-send-email-huawei.libin@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-26 12:25:21 +02:00
Linus Torvalds
20582e34c8 ACPI and power management fixes for 3.12-rc7
- Fix for rounding errors in intel_pstate causing CPU utilization to
    be underestimated from Brennan Shacklett.
 
  - intel_pstate fix to always use the correct max pstate value when
    computing the min pstate from Dirk Brandewie.
 
  - Hibernation fix for deadlocking resume in cases when the probing
    of the device containing the image is deferred from Russ Dill.
 
  - acpi-cpufreq fix to prevent the module from staying in memory
    when the driver cannot be registered and then attempting to
    unregister things that have never been registered on exit.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJSatHPAAoJEILEb/54YlRxGrYP/0VplYzQX/CqDwO/folY1wiJ
 nASYG0ZhMR7wKTr75PF/9bSoTvftk+KGbx8i48DGFXsc1a8e4lCA878zu1debH0x
 +oi+qNlrRvNgwr0Eg7ALuM68zxG+b/D8vqHRqlXcnDa+rMlS/8q/l8d/D8rou49O
 pERPWLF0LI2kdRxmeOzvqo8oA1KnhUvN/nPvtFKB7LwcQU+9iY3RMAJa2i6K/zsn
 Epq76aePKz98KB6U9LZIqzshe6AQQanFr0+kz/6dHXArygVfh7+ZpByRIYcH67KK
 iR8N6O02Fq06dVLIu3Ta5Lqq5/EWBVfauLcMBTwZ2LxmZsPGUqPac02X3HpGcB3+
 1R0zWGmKpsTcs9hUGaCp9Qs2IzQvopcbnjZ6YxMkDtfZip+2FI1aaHapRT1egl8B
 xV3Yd0ZqqyAUs50tneQE9AMQ8ndae6t8gK8c3Z+EZTF0cSBuz9puQkRBGc+RdGDs
 TTaKBbgI+TY5uws6VVWEijKkQjzCsVWM+n6yW8GlRmqrpBO/yAkuw0cBB5PXnrw0
 uoSqw3+XLFrsVzeJ7V/Zi0+jpCq9qOBubO88j9EMX7tmL1xDPLECWeztAAwiFy9C
 Vwapn6qygPV5nfnMu+zDyMe/gcKcGbXmYactaLKDap0FMsRVFDY/YikN5PeJFwJF
 gLbn26BQxHcKNNUFAYr+
 =PsHF
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from
 "These fix two bugs in the intel_pstate driver, a hibernate bug leading
  to nasty resume failures sometimes and acpi-cpufreq initialization bug
  that causes problems to happen during module unload when intel_pstate
  is in use.

  Specifics:

   - Fix for rounding errors in intel_pstate causing CPU utilization to
     be underestimated from Brennan Shacklett.

   - intel_pstate fix to always use the correct max pstate value when
     computing the min pstate from Dirk Brandewie.

   - Hibernation fix for deadlocking resume in cases when the probing of
     the device containing the image is deferred from Russ Dill.

   - acpi-cpufreq fix to prevent the module from staying in memory when
     the driver cannot be registered and then attempting to unregister
     things that have never been registered on exit"

* tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  acpi-cpufreq: Fail initialization if driver cannot be registered
  PM / hibernate: Move software_resume to late_initcall_sync
  intel_pstate: Correct calculation of min pstate value
  intel_pstate: Improve accuracy by not truncating until final result
2013-10-26 04:38:47 +01:00
Dmitry Kasatkin
3fe78ca2fb keys: change asymmetric keys to use common hash definitions
This patch makes use of the newly defined common hash algorithm info,
replacing, for example, PKEY_HASH with HASH_ALGO.

Changelog:
- Lindent fixes - Mimi

CC: David Howells <dhowells@redhat.com>
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:15:18 -04:00