IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Conflicts:
arch/s390/net/bpf_jit_comp.c
drivers/net/ethernet/ti/netcp_ethss.c
net/bridge/br_multicast.c
net/ipv4/ip_fragment.c
All four conflicts were cases of simple overlapping
changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
The Linux kernel suspend path has traditionally invoked sys_sync()
before freezing user threads.
But sys_sync() can be expensive, and some user-space OS's do not want
the kernel to pay the cost of sys_sync() on every suspend -- preferring
invoke sync() from user-space if/when they want it.
So make sys_sync on suspend build-time optional.
The default is unchanged.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The modify_ldt syscall exposes a large attack surface and is
unnecessary for modern userspace. Make it optional.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/a605166a771c343fd64802dece77a903507333bd.1438291540.git.luto@kernel.org
[ Made MATH_EMULATION dependent on MODIFY_LDT_SYSCALL. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The xol_free_insn_slot()->waitqueue_active() check is buggy. We
need mb() after we set the conditon for wait_event(), or
xol_take_insn_slot() can miss the wakeup.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134036.GA4799@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change xol_add_vma() to use _install_special_mapping(), this way
we can name the vma installed by uprobes. Currently it looks
like private anonymous mapping, this is confusing and
complicates the debugging. With this change /proc/$pid/maps
reports "[uprobes]".
As a side effect this will cause core dumps to include the XOL vma
and I think this is good; this can help to debug the problem if
the app crashed because it was probed.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134033.GA4796@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
install_special_mapping(pages) expects that "pages" is the zero-
terminated array while xol_add_vma() passes &area->page, this
means that special_mapping_fault() can wrongly use the next
member in xol_area (vaddr) as "struct page *".
Fortunately, this area is not expandable so pgoff != 0 isn't
possible (modulo bugs in special_mapping_vmops), but still this
does not look good.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134031.GA4789@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The previous change documents that cleanup_return_instances()
can't always detect the dead frames, the stack can grow. But
there is one special case which imho worth fixing:
arch_uretprobe_is_alive() can return true when the stack didn't
actually grow, but the next "call" insn uses the already
invalidated frame.
Test-case:
#include <stdio.h>
#include <setjmp.h>
jmp_buf jmp;
int nr = 1024;
void func_2(void)
{
if (--nr == 0)
return;
longjmp(jmp, 1);
}
void func_1(void)
{
setjmp(jmp);
func_2();
}
int main(void)
{
func_1();
return 0;
}
If you ret-probe func_1() and func_2() prepare_uretprobe() hits
the MAX_URETPROBE_DEPTH limit and "return" from func_2() is not
reported.
When we know that the new call is not chained, we can do the
more strict check. In this case "sp" points to the new ret-addr,
so every frame which uses the same "sp" must be dead. The only
complication is that arch_uretprobe_is_alive() needs to know was
it chained or not, so we add the new RP_CHECK_CHAIN_CALL enum
and change prepare_uretprobe() to pass RP_CHECK_CALL only if
!chained.
Note: arch_uretprobe_is_alive() could also re-read *sp and check
if this word is still trampoline_vaddr. This could obviously
improve the logic, but I would like to avoid another
copy_from_user() especially in the case when we can't avoid the
false "alive == T" positives.
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134028.GA4786@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
arch/x86 doesn't care (so far), but as Pratyush Anand pointed
out other architectures might want why arch_uretprobe_is_alive()
was called and use different checks depending on the context.
Add the new argument to distinguish 2 callers.
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134026.GA4779@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change prepare_uretprobe() to flush the !arch_uretprobe_is_alive()
return_instance's. This is not needed correctness-wise, but can help
to avoid the failure caused by MAX_URETPROBE_DEPTH.
Note: in this case arch_uretprobe_is_alive() can be false
positive, the stack can grow after longjmp(). Unfortunately, the
kernel can't 100% solve this problem, but see the next patch.
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134023.GA4776@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Test-case:
#include <stdio.h>
#include <setjmp.h>
jmp_buf jmp;
void func_2(void)
{
longjmp(jmp, 1);
}
void func_1(void)
{
if (setjmp(jmp))
return;
func_2();
printf("ERR!! I am running on the caller's stack\n");
}
int main(void)
{
func_1();
return 0;
}
fails if you probe func_1() and func_2() because
handle_trampoline() assumes that the probed function should must
return and hit the bp installed be prepare_uretprobe(). But in
this case func_2() does not return, so when func_1() returns the
kernel uses the no longer valid return_instance of func_2().
Change handle_trampoline() to unwind ->return_instances until we
know that the next chain is alive or NULL, this ensures that the
current chain is the last we need to report and free.
Alternatively, every return_instance could use unique
trampoline_vaddr, in this case we could use it as a key. And
this could solve the problem with sigaltstack() automatically.
But this approach needs more changes, and it puts the "hard"
limit on MAX_URETPROBE_DEPTH. Plus it can not solve another
problem partially fixed by the next patch.
Note: this change has no effect on !x86, the arch-agnostic
version of arch_uretprobe_is_alive() just returns "true".
TODO: as documented by the previous change, arch_uretprobe_is_alive()
can be fooled by sigaltstack/etc.
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134021.GA4773@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add the x86 specific version of arch_uretprobe_is_alive()
helper. It returns true if the stack frame mangled by
prepare_uretprobe() is still on stack. So if it returns false,
we know that the probed function has already returned.
We add the new return_instance->stack member and change the
generic code to initialize it in prepare_uretprobe, but it
should be equally useful for other architectures.
TODO: this assumes that the probed application can't use
multiple stacks (say sigaltstack). We will try to improve
this logic later.
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134018.GA4766@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add the new "weak" helper, arch_uretprobe_is_alive(), used by
the next patches. It should return true if this return_instance
is still valid. The arch agnostic version just always returns
true.
The patch exports "struct return_instance" for the architectures
which want to override this hook. We can also cleanup
prepare_uretprobe() if we pass the new return_instance to
arch_uretprobe_hijack_return_addr().
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134016.GA4762@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
No functional changes, preparation.
Add the new helper, find_next_ret_chain(), which finds the first
!chained entry and returns its ->next. Yes, it is suboptimal. We
probably want to turn ->chained into ->start_of_this_chain
pointer and avoid another loop. But this needs the boring
changes in dup_utask(), so lets do this later.
Change the main loop in handle_trampoline() to unwind the stack
until ri is equal to the pointer returned by this new helper.
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134013.GA4755@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Turn the last pr_warn() in uprobes.c into uprobe_warn().
While at it:
- s/kzalloc/kmalloc, we initialize every member of 'ri'
- remove the pointless comment above the obvious code
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134010.GA4752@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
1. It doesn't make sense to continue if handle_trampoline()
fails, change handle_swbp() to always return after this call.
2. Turn pr_warn() into uprobe_warn(), and change
handle_trampoline() to send SIGILL on failure. It is pointless to
return to user mode with the corrupted instruction_pointer() which
we can't restore.
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134008.GA4745@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We can simplify uprobe_free_utask() and handle_uretprobe_chain()
if we add a simple helper which does put_uprobe/kfree and
returns the ->next return_instance.
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134006.GA4740@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cosmetic. Add the new trivial helper, get_uprobe(). It matches
put_uprobe() we already have and we can simplify a couple of its
users.
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134003.GA4736@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is not uncommon (at least with the ARM stuff) to have a piece
of hardware that implements different flavours of "interrupts".
A typical example of this is the GICv3 ITS, which implements
standard PCI/MSI support, but also some form of "generic MSI".
So far, the PCI/MSI domain is registered using the ITS device_node,
so that irq_find_host can return it. On the contrary, the raw MSI
domain is not registered with an device_node, making it impossible
to be looked up by another subsystem (obviously, using the same
device_node twice would only result in confusion, as it is not
defined which one irq_find_host would return).
A solution to this is to "type" domains that may be aliasing, and
to be able to lookup an device_node that matches a given type.
For this, we introduce irq_find_matching_host() as a superset
of irq_find_host:
struct irq_domain *irq_find_matching_host(struct device_node *node,
enum irq_domain_bus_token bus_token);
where bus_token is the "type" we want to match the domain against
(so far, only DOMAIN_BUS_ANY is defined). This result in some
moderately invasive changes on the PPC side (which is the only
user of the .match method).
This has otherwise no functionnal change.
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Ma Jun <majun258@huawei.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Duc Dang <dhdang@apm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1438091186-10244-2-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The following warning is emitted for make xmldocs:
Warning(.//kernel/irq/chip.c:1009): No description found for parameter 'vcpu_info'
Warning(.//kernel/irq/chip.c:1009): Excess function parameter 'dest' description in 'irq_chip_set_vcpu_affinity_parent'
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Link: http://lkml.kernel.org/r/1438164576-5945-1-git-send-email-standby24x7@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently we have two different ways to signal an I/O error on a BIO:
(1) by clearing the BIO_UPTODATE flag
(2) by returning a Linux errno value to the bi_end_io callback
The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario. Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.
So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Restart the tick when necessary from the irq exit path. It makes nohz
full more flexible, simplify the related IPIs and doesn't bring
significant overhead on irq exit.
In a longer term view, it will allow us to piggyback the nohz kick
on the scheduler IPI in the future instead of sending a dedicated IPI
that often doubles the scheduler IPI on task wakeup. This will require
more changes though including careful review of resched_curr() callers
to include nohz full needs.
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
On nohz full early days, idle dynticks and full dynticks weren't well
integrated and we couldn't risk full dynticks calls on idle without
risking messing up tick idle statistics. This is why we prevented such
thing to happen.
Nowadays full dynticks and idle dynticks are better integrated and
interact without known issue.
So lets remove that.
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
HZ never goes much further 1000 and a bit. And if we ever reach one tick
per microsecond, we might be having a problem.
Lets stop maintaining this special case, just leave a paranoid check.
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc; John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
We don't actually hold the module_mutex when calling find_module_all
from module_kallsyms_lookup_name: that's because it's used by the oops
code and we don't want to deadlock.
However, access to the list read-only is safe if preempt is disabled,
so we can weaken the assertion. Keep a strong version for external
callers though.
Fixes: 0be964be0d45 ("module: Sanitize RCU usage and locking")
Reported-by: He Kuang <hekuang@huawei.com>
Cc: stable@kernel.org
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
A recent fix to the shadow timestamp inadvertly broke the running time
accounting.
We must not update the running timestamp if we fail to schedule the
event, the event will not have ran. This can (and did) result in
negative total runtime because the stopped timestamp was before the
running timestamp (we 'started' but never stopped the event -- because
it never really started we didn't have to stop it either).
Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 72f669c0086f ("perf: Update shadow timestamp before add event")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org # 4.1
Cc: Shaohua Li <shli@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
mov %rsp, %r1 ; r1 = rsp
add $-8, %r1 ; r1 = rsp - 8
store_q $123, -8(%rsp) ; *(u64*)r1 = 123 <- valid
store_q $123, (%r1) ; *(u64*)r1 = 123 <- previously invalid
mov $0, %r0
exit ; Always need to exit
And we'd get the following error:
0: (bf) r1 = r10
1: (07) r1 += -8
2: (7a) *(u64 *)(r10 -8) = 999
3: (7a) *(u64 *)(r1 +0) = 999
R1 invalid mem access 'fp'
Unable to load program
We already know that a register is a stack address and the appropriate
offset, so we should be able to validate those references as well.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some (admittedly odd) irqchips perform functions that are not directly
related to any of their child IRQ lines, and therefore need to perform
some tasks during suspend/resume regardless of whether there are
any "installed" interrupts for the irqchip. However, the current
generic-chip framework does not call the chip's irq_{suspend,resume}
when there are no interrupts installed (this makes sense, because there
are no irq_data objects for such a call to be made).
More specifically, irq-bcm7120-l2 configures both a forwarding mask
(which affects other top-level GIC IRQs) and a second-level interrupt
mask (for managing its own child interrupts). The former must be
saved/restored on suspend/resume, even when there's nothing to do for
the latter.
This patch adds a new set of suspend/resume hooks to irq_chip_generic,
to help represent *chip* suspend/resume, rather than IRQ suspend/resume.
These callbacks will always be called for an IRQ chip (regardless of the
installed interrupts) and are based on the per-chip irq_chip_generic
struct, rather than the per-IRQ irq_data struct.
The original problem report is described in extra detail here:
http://lkml.kernel.org/g/20150619224123.GL4917@ld-irv-0074
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Gregory Fong <gregory.0xf0@gmail.com>
Cc: bcm-kernel-feedback-list@broadcom.com
Cc: linux-mips@linux-mips.org
Cc: Kevin Cernekee <cernekee@chromium.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1437607300-40858-1-git-send-email-computersforpeace@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Export these functions to be able to build the Qualcomm family A PMIC
gpio and mpp drivers as modules.
[ tglx: Made them GPL exports ]
Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: <kernel-build-reports@lists.linaro.org>
Cc: <linaro-kernel@lists.linaro.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Link: http://lkml.kernel.org/r/1437594184-22966-1-git-send-email-bjorn.andersson@sonymobile.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x86 fixes from Thomas Gleixner:
"This update contains:
- the manual revert of the SYSCALL32 changes which caused a
regression
- a fix for the MPX vma handling
- three fixes for the ioremap 'is ram' checks.
- PAT warning fixes
- a trivial fix for the size calculation of TLB tracepoints
- handle old EFI structures gracefully
This also contains a PAT fix from Jan plus a revert thereof. Toshi
explained why the code is correct"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/pat: Revert 'Adjust default caching mode translation tables'
x86/asm/entry/32: Revert 'Do not use R9 in SYSCALL32' commit
x86/mm: Fix newly introduced printk format warnings
mm: Fix bugs in region_is_ram()
x86/mm: Remove region_is_ram() call from ioremap
x86/mm: Move warning from __ioremap_check_ram() to the call site
x86/mm/pat, drivers/media/ivtv: Move the PAT warning and replace WARN() with pr_warn()
x86/mm/pat, drivers/infiniband/ipath: Replace WARN() with pr_warn()
x86/mm/pat: Adjust default caching mode translation tables
x86/fpu: Disable dependent CPU features on "noxsave"
x86/mpx: Do not set ->vm_ops on MPX VMAs
x86/mm: Add parenthesis for TLB tracepoint size calculation
efi: Handle memory error structures produced based on old versions of standard
double iteration list (one for registered ftrace ops, and one for
registered "global" ops), to just use one list. That simplified the code
but also broke the function tracing filtering on pid.
This updates the code to handle the filtering again with the new logic.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVsq1dAAoJEEjnJuOKh9ldfagH/39dixx3tMMElO7pGM9Q3DBE
WmJuGSOmeZA1O0balnfcV2SgegEGCSEP6sjBtyuYCfgcFDWyIdAvGrMLS/hKTOxR
pilaqyNQz7CROx1zco9gbu5pGwkSkcAfnqYzOg6IpJ0WHtiyH36GMe2wMu29u2VT
I7NgSC6ByA82N4pwvgetlUcIDcPTyrkkhGmGPBJGY+diKzeSSo8NlRbv3SNYs0ua
V072Oumu64RrZBMdn/Sb2pCF2hf6vhTXD6qS4dbpK/Rfnlblqer9SqUIx2kpg603
yDOmPY7wN9FJ94Te3EeXubLi0LqDJH4iPOndrRn1fYsgMbUpq1BlViF7W7Ajze8=
=LQ+S
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.2-rc2-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftrace fix from Steven Rostedt:
"Back in 3.16 the ftrace code was redesigned and cleaned up to remove
the double iteration list (one for registered ftrace ops, and one for
registered "global" ops), to just use one list. That simplified the
code but also broke the function tracing filtering on pid.
This updates the code to handle the filtering again with the new
logic"
* tag 'trace-v4.2-rc2-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix breakage of set_ftrace_pid
Commit 4104d326b670 ("ftrace: Remove global function list and call function
directly") simplified the ftrace code by removing the global_ops list with a
new design. But this cleanup also broke the filtering of PIDs that are added
to the set_ftrace_pid file.
Add back the proper hooks to have pid filtering working once again.
Cc: stable@vger.kernel.org # 3.16+
Reported-by: Matt Fleming <matt@console-pimps.org>
Reported-by: Richard Weinberger <richard.weinberger@gmail.com>
Tested-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There are already two events for context switches, namely the tracepoint
sched:sched_switch and the software event context_switches.
Unfortunately neither are suitable for use by non-privileged users for
the purpose of synchronizing hardware trace data (e.g. Intel PT) to the
context switch.
Tracepoints are no good at all for non-privileged users because they
need either CAP_SYS_ADMIN or /proc/sys/kernel/perf_event_paranoid <= -1.
On the other hand, kernel software events need either CAP_SYS_ADMIN or
/proc/sys/kernel/perf_event_paranoid <= 1.
Now many distributions do default perf_event_paranoid to 1 making
context_switches a contender, except it has another problem (which is
also shared with sched:sched_switch) which is that it happens before
perf schedules events out instead of after perf schedules events in.
Whereas a privileged user can see all the events anyway, a
non-privileged user only sees events for their own processes, in other
words they see when their process was scheduled out not when it was
scheduled in. That presents two problems to use the event:
1. the information comes too late, so tools have to look ahead in the
event stream to find out what the current state is
2. if they are unlucky tracing might have stopped before the
context-switches event is recorded.
This new PERF_RECORD_SWITCH event does not have those problems
and it also has a couple of other small advantages.
It is easier to use because it is an auxiliary event (like mmap, comm
and task events) which can be enabled by setting a single bit. It is
smaller than sched:sched_switch and easier to parse.
To make the event useful for privileged users also, if the
context is cpu-wide then the event record will be
PERF_RECORD_SWITCH_CPU_WIDE which is the same as
PERF_RECORD_SWITCH except it also provides the next or
previous pid/tid.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1437471846-26995-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Conflicts:
net/bridge/br_mdb.c
br_mdb.c conflict was a function call being removed to fix a bug in
'net' but whose signature was changed in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
RCU's rcu_oom_notify() disables CPU hotplug in order to stabilize the
list of online CPUs, which it traverses. However, this is completely
pointless because smp_call_function_single() will quietly fail if invoked
on an offline CPU. Because the count of requests is incremented in the
rcu_oom_notify_cpu() function that is remotely invoked, everything works
nicely even in the face of concurrent CPU-hotplug operations.
Furthermore, in recent kernels, invoking get_online_cpus() from an OOM
notifier can result in deadlock. This commit therefore removes the
call to get_online_cpus() and put_online_cpus() from rcu_oom_notify().
Reported-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
The RCU_LOCKDEP_WARN() in synchronize_rcu_tasks() triggers if the
scheduler is active, which is backwards. This commit therefore
negates the test.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for
consistency with the WARN() series of macros. This also requires
inverting the sense of the conditional, which this commit also does.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Although rcu_is_watching() is marked notrace, it invokes preempt_disable()
and preempt_enable(), both of which can be traced. This defeats the
purpose of the notrace on rcu_is_watching(), so this commit substitutes
preempt_disable_notrace() and preempt_enable_notrace().
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
In kernels built with CONFIG_PREEMPT, _cpu_down() waits for RCU and
RCU-sched grace periods back-to-back, incurring quite a bit more latency
than required. This commit therefore uses the new synchronize_rcu_mult()
to allow waiting for both grace periods concurrently.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There have been several requests for a primitive that waits for
grace periods for several RCU flavors concurrently, so this
commit creates it. This is a variadic macro, and you pass in
the call_rcu() functions of the flavors of RCU that you wish to
wait for.
Note that you cannot pass in call_srcu() for two reasons: (1) This
would result in a type mismatch and (2) You need to specify which
srcu_struct you want to use. Handle this by creating a wrapper
function for your SRCU domain, for example:
void call_srcu_mine(struct rcu_head *head, rcu_callback_t func)
{
call_srcu(&ss_mine, head, func);
}
You can then do something like this:
synchronize_rcu_mult(call_srcu_mine, call_rcu, call_rcu_sched);
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tasks are no longer migrated to the root rcu_node, so there is no
longer any need for a boost kthread for the root rcu_node, and there no
longer is such a kthread. This commit therefore fixes the comment in
rcu_boost_kthread()'s header to reflect this new reality.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The get_state_synchronize_rcu() and cond_synchronize_rcu() functions
allow polling for grace-period completion, with an actual wait for a
grace period occurring only when cond_synchronize_rcu() is called too
soon after the corresponding get_state_synchronize_rcu(). However,
these functions work only for vanilla RCU. This commit adds the
get_state_synchronize_sched() and cond_synchronize_sched(), which provide
the same capability for RCU-sched.
Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Move alloc_msi_entry() from PCI MSI code into generic MSI code, so it
can be reused by other generic MSI drivers. Also introduce
free_msi_entry() for completeness.
Suggested-by: Stuart Yoder <stuart.yoder@freescale.com>.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Yijing Wang <wangyijing@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/1436428847-8886-13-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
region_is_ram() looks up the iomem_resource table to check if
a target range is in RAM. However, it always returns with -1
due to invalid range checks. It always breaks the loop at the
first entry of the table.
Another issue is that it compares p->flags and flags, but it always
fails. flags is declared as int, which makes it as a negative value
with IORESOURCE_BUSY (0x80000000) set while p->flags is unsigned long.
Fix the range check and flags so that region_is_ram() works as
advertised.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1437088996-28511-4-git-send-email-toshi.kani@hp.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Enabling locking-selftest in a VM guest may cause the following
kernel panic:
kernel BUG at .../kernel/locking/qspinlock_paravirt.h:137!
This is due to the fact that the pvqspinlock unlock function is
expecting either a _Q_LOCKED_VAL or _Q_SLOW_VAL in the lock
byte. This patch prevents that bug report by ignoring it when
debug_locks_silent is set. Otherwise, a warning will be printed
if it contains an unexpected value.
With this patch applied, the kernel locking-selftest completed
without any noise.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1436663959-53092-1-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>