2008-04-17 22:05:37 +04:00
/*
2010-04-02 20:48:03 +04:00
* Kernel Debug Core
2008-04-17 22:05:37 +04:00
*
* Maintainer : Jason Wessel < jason . wessel @ windriver . com >
*
* Copyright ( C ) 2000 - 2001 VERITAS Software Corporation .
* Copyright ( C ) 2002 - 2004 Timesys Corporation
* Copyright ( C ) 2003 - 2004 Amit S . Kale < amitkale @ linsyssoft . com >
2010-07-18 16:27:13 +04:00
* Copyright ( C ) 2004 Pavel Machek < pavel @ ucw . cz >
2008-04-17 22:05:37 +04:00
* Copyright ( C ) 2004 - 2006 Tom Rini < trini @ kernel . crashing . org >
* Copyright ( C ) 2004 - 2006 LinSysSoft Technologies Pvt . Ltd .
2010-04-02 20:48:03 +04:00
* Copyright ( C ) 2005 - 2009 Wind River Systems , Inc .
2008-04-17 22:05:37 +04:00
* Copyright ( C ) 2007 MontaVista Software , Inc .
* Copyright ( C ) 2008 Red Hat , Inc . , Ingo Molnar < mingo @ redhat . com >
*
* Contributors at various stages not listed above :
* Jason Wessel ( jason . wessel @ windriver . com )
* George Anzinger < george @ mvista . com >
* Anurekh Saxena ( anurekh . saxena @ timesys . com )
* Lake Stevens Instrument Division ( Glenn Engel )
* Jim Kingdon , Cygnus Support .
*
* Original KGDB stub : David Grothe < dave @ gcom . com > ,
* Tigran Aivazian < tigran @ sco . com >
*
* This file is licensed under the terms of the GNU General Public License
* version 2. This program is licensed " as is " without any warranty of any
* kind , whether express or implied .
*/
2014-06-12 23:30:11 +04:00
# define pr_fmt(fmt) "KGDB: " fmt
2008-04-17 22:05:37 +04:00
# include <linux/pid_namespace.h>
2008-02-15 23:55:54 +03:00
# include <linux/clocksource.h>
2013-02-05 03:35:26 +04:00
# include <linux/serial_core.h>
2008-04-17 22:05:37 +04:00
# include <linux/interrupt.h>
# include <linux/spinlock.h>
# include <linux/console.h>
# include <linux/threads.h>
# include <linux/uaccess.h>
# include <linux/kernel.h>
# include <linux/module.h>
# include <linux/ptrace.h>
# include <linux/string.h>
# include <linux/delay.h>
# include <linux/sched.h>
# include <linux/sysrq.h>
2012-03-16 23:20:41 +04:00
# include <linux/reboot.h>
2008-04-17 22:05:37 +04:00
# include <linux/init.h>
# include <linux/kgdb.h>
2010-05-21 06:04:21 +04:00
# include <linux/kdb.h>
2017-02-08 20:51:31 +03:00
# include <linux/nmi.h>
2008-04-17 22:05:37 +04:00
# include <linux/pid.h>
# include <linux/smp.h>
# include <linux/mm.h>
mm: per-thread vma caching
This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed. There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma(). Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.
We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality. On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.
The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number. The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed. Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question. Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:
1) System bootup: Most programs are single threaded, so the per-thread
scheme does improve ~50% hit rate by just adding a few more slots to
the cache.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 50.61% | 19.90 |
| patched | 73.45% | 13.58 |
+----------------+----------+------------------+
2) Kernel build: This one is already pretty good with the current
approach as we're dealing with good locality.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 75.28% | 11.03 |
| patched | 88.09% | 9.31 |
+----------------+----------+------------------+
3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 70.66% | 17.14 |
| patched | 91.15% | 12.57 |
+----------------+----------+------------------+
4) Ebizzy: There's a fair amount of variation from run to run, but this
approach always shows nearly perfect hit rates, while baseline is just
about non-existent. The amounts of cycles can fluctuate between
anywhere from ~60 to ~116 for the baseline scheme, but this approach
reduces it considerably. For instance, with 80 threads:
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 1.06% | 91.54 |
| patched | 99.97% | 14.18 |
+----------------+----------+------------------+
[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 02:37:25 +04:00
# include <linux/vmacache.h>
2010-08-13 21:44:04 +04:00
# include <linux/rcupdate.h>
2018-12-05 06:38:26 +03:00
# include <linux/irq.h>
2008-04-17 22:05:37 +04:00
# include <asm/cacheflush.h>
# include <asm/byteorder.h>
2011-07-27 03:09:06 +04:00
# include <linux/atomic.h>
2008-04-17 22:05:37 +04:00
2010-04-02 20:48:03 +04:00
# include "debug_core.h"
2008-04-17 22:05:37 +04:00
2010-04-02 20:48:03 +04:00
static int kgdb_break_asap ;
2010-04-02 20:47:02 +04:00
2010-04-02 20:48:03 +04:00
struct debuggerinfo_struct kgdb_info [ NR_CPUS ] ;
2008-04-17 22:05:37 +04:00
/**
* kgdb_connected - Is a host GDB connected to us ?
*/
int kgdb_connected ;
EXPORT_SYMBOL_GPL ( kgdb_connected ) ;
/* All the KGDB handlers are installed */
2010-05-21 06:04:25 +04:00
int kgdb_io_module_registered ;
2008-04-17 22:05:37 +04:00
/* Guard for recursive entry */
static int exception_level ;
2010-04-02 20:48:03 +04:00
struct kgdb_io * dbg_io_ops ;
2008-04-17 22:05:37 +04:00
static DEFINE_SPINLOCK ( kgdb_registration_lock ) ;
2012-03-20 04:35:55 +04:00
/* Action for the reboot notifiter, a global allow kdb to change it */
static int kgdbreboot ;
2008-04-17 22:05:37 +04:00
/* kgdb console driver is loaded */
static int kgdb_con_registered ;
/* determine if kgdb console output should be used */
static int kgdb_use_con ;
2010-05-21 06:04:29 +04:00
/* Flag for alternate operations for early debugging */
bool dbg_is_early = true ;
2010-05-21 06:04:21 +04:00
/* Next cpu to become the master debug core */
int dbg_switch_cpu ;
/* Use kdb or gdbserver mode */
2010-05-21 06:04:24 +04:00
int dbg_kdb_mode = 1 ;
2008-04-17 22:05:37 +04:00
static int __init opt_kgdb_con ( char * str )
{
kgdb_use_con = 1 ;
return 0 ;
}
early_param ( " kgdbcon " , opt_kgdb_con ) ;
module_param ( kgdb_use_con , int , 0644 ) ;
2012-03-20 04:35:55 +04:00
module_param ( kgdbreboot , int , 0644 ) ;
2008-04-17 22:05:37 +04:00
/*
* Holds information about breakpoints in a kernel . These breakpoints are
* added and removed by gdb .
*/
static struct kgdb_bkpt kgdb_break [ KGDB_MAX_BREAKPOINTS ] = {
[ 0 . . . KGDB_MAX_BREAKPOINTS - 1 ] = { . state = BP_UNDEFINED }
} ;
/*
* The CPU # of the active CPU , or - 1 if none :
*/
atomic_t kgdb_active = ATOMIC_INIT ( - 1 ) ;
2010-05-21 06:04:21 +04:00
EXPORT_SYMBOL_GPL ( kgdb_active ) ;
2010-05-21 17:46:00 +04:00
static DEFINE_RAW_SPINLOCK ( dbg_master_lock ) ;
static DEFINE_RAW_SPINLOCK ( dbg_slave_lock ) ;
2008-04-17 22:05:37 +04:00
/*
* We use NR_CPUs not PERCPU , in case kgdb is used to debug early
* bootup code ( which might not have percpu set up yet ) :
*/
2010-05-21 17:46:00 +04:00
static atomic_t masters_in_kgdb ;
static atomic_t slaves_in_kgdb ;
2009-06-03 23:06:57 +04:00
static atomic_t kgdb_break_tasklet_var ;
2008-04-17 22:05:37 +04:00
atomic_t kgdb_setting_breakpoint ;
struct task_struct * kgdb_usethread ;
struct task_struct * kgdb_contthread ;
int kgdb_single_step ;
2010-04-02 20:48:03 +04:00
static pid_t kgdb_sstep_pid ;
2008-04-17 22:05:37 +04:00
/* to keep track of the CPU which is doing the single stepping*/
atomic_t kgdb_cpu_doing_single_step = ATOMIC_INIT ( - 1 ) ;
/*
* If you are debugging a problem where roundup ( the collection of
* all other CPUs ) is a problem [ this should be extremely rare ] ,
* then use the nokgdbroundup option to avoid roundup . In that case
* the other CPUs might interfere with your debugging context , so
* use this with care :
*/
2008-04-25 01:57:23 +04:00
static int kgdb_do_roundup = 1 ;
2008-04-17 22:05:37 +04:00
static int __init opt_nokgdbroundup ( char * str )
{
kgdb_do_roundup = 0 ;
return 0 ;
}
early_param ( " nokgdbroundup " , opt_nokgdbroundup ) ;
/*
* Finally , some KGDB code : - )
*/
/*
* Weak aliases for breakpoint management ,
* can be overriden by architectures when needed :
*/
2012-03-21 19:17:03 +04:00
int __weak kgdb_arch_set_breakpoint ( struct kgdb_bkpt * bpt )
2008-04-17 22:05:37 +04:00
{
int err ;
2012-03-21 19:17:03 +04:00
err = probe_kernel_read ( bpt - > saved_instr , ( char * ) bpt - > bpt_addr ,
BREAK_INSTR_SIZE ) ;
2008-04-17 22:05:37 +04:00
if ( err )
return err ;
2012-03-21 19:17:03 +04:00
err = probe_kernel_write ( ( char * ) bpt - > bpt_addr ,
arch_kgdb_ops . gdb_bpt_instr , BREAK_INSTR_SIZE ) ;
return err ;
2008-04-17 22:05:37 +04:00
}
2012-03-21 19:17:03 +04:00
int __weak kgdb_arch_remove_breakpoint ( struct kgdb_bkpt * bpt )
2008-04-17 22:05:37 +04:00
{
2012-03-21 19:17:03 +04:00
return probe_kernel_write ( ( char * ) bpt - > bpt_addr ,
( char * ) bpt - > saved_instr , BREAK_INSTR_SIZE ) ;
2008-04-17 22:05:37 +04:00
}
2008-08-01 17:39:34 +04:00
int __weak kgdb_validate_break_address ( unsigned long addr )
{
2012-03-21 19:17:03 +04:00
struct kgdb_bkpt tmp ;
2008-08-01 17:39:34 +04:00
int err ;
2012-03-21 19:17:03 +04:00
/* Validate setting the breakpoint and then removing it. If the
2008-08-01 17:39:34 +04:00
* remove fails , the kernel needs to emit a bad message because we
* are deep trouble not being able to put things back the way we
* found them .
*/
2012-03-21 19:17:03 +04:00
tmp . bpt_addr = addr ;
err = kgdb_arch_set_breakpoint ( & tmp ) ;
2008-08-01 17:39:34 +04:00
if ( err )
return err ;
2012-03-21 19:17:03 +04:00
err = kgdb_arch_remove_breakpoint ( & tmp ) ;
2008-08-01 17:39:34 +04:00
if ( err )
2014-06-12 23:30:11 +04:00
pr_err ( " Critical breakpoint error, kernel memory destroyed at: %lx \n " ,
addr ) ;
2008-08-01 17:39:34 +04:00
return err ;
}
2008-04-17 22:05:37 +04:00
unsigned long __weak kgdb_arch_pc ( int exception , struct pt_regs * regs )
{
return instruction_pointer ( regs ) ;
}
int __weak kgdb_arch_init ( void )
{
return 0 ;
}
2008-02-20 22:33:38 +03:00
int __weak kgdb_skipexception ( int exception , struct pt_regs * regs )
{
return 0 ;
}
2018-12-05 06:38:26 +03:00
# ifdef CONFIG_SMP
/*
* Default ( weak ) implementation for kgdb_roundup_cpus
*/
static DEFINE_PER_CPU ( call_single_data_t , kgdb_roundup_csd ) ;
void __weak kgdb_call_nmi_hook ( void * ignored )
{
/*
* NOTE : get_irq_regs ( ) is supposed to get the registers from
* before the IPI interrupt happened and so is supposed to
* show where the processor was . In some situations it ' s
* possible we might be called without an IPI , so it might be
* safer to figure out how to make kgdb_breakpoint ( ) work
* properly here .
*/
kgdb_nmicallback ( raw_smp_processor_id ( ) , get_irq_regs ( ) ) ;
}
void __weak kgdb_roundup_cpus ( void )
{
call_single_data_t * csd ;
int this_cpu = raw_smp_processor_id ( ) ;
int cpu ;
2018-12-05 06:38:27 +03:00
int ret ;
2018-12-05 06:38:26 +03:00
for_each_online_cpu ( cpu ) {
/* No need to roundup ourselves */
if ( cpu = = this_cpu )
continue ;
csd = & per_cpu ( kgdb_roundup_csd , cpu ) ;
2018-12-05 06:38:27 +03:00
/*
* If it didn ' t round up last time , don ' t try again
* since smp_call_function_single_async ( ) will block .
*
* If rounding_up is false then we know that the
* previous call must have at least started and that
* means smp_call_function_single_async ( ) won ' t block .
*/
if ( kgdb_info [ cpu ] . rounding_up )
continue ;
kgdb_info [ cpu ] . rounding_up = true ;
2018-12-05 06:38:26 +03:00
csd - > func = kgdb_call_nmi_hook ;
2018-12-05 06:38:27 +03:00
ret = smp_call_function_single_async ( cpu , csd ) ;
if ( ret )
kgdb_info [ cpu ] . rounding_up = false ;
2018-12-05 06:38:26 +03:00
}
}
# endif
2008-04-17 22:05:37 +04:00
/*
* Some architectures need cache flushes when we set / clear a
* breakpoint :
*/
static void kgdb_flush_swbreak_addr ( unsigned long addr )
{
if ( ! CACHE_FLUSH_IS_SAFE )
return ;
mm: per-thread vma caching
This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed. There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma(). Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.
We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality. On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.
The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number. The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed. Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question. Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:
1) System bootup: Most programs are single threaded, so the per-thread
scheme does improve ~50% hit rate by just adding a few more slots to
the cache.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 50.61% | 19.90 |
| patched | 73.45% | 13.58 |
+----------------+----------+------------------+
2) Kernel build: This one is already pretty good with the current
approach as we're dealing with good locality.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 75.28% | 11.03 |
| patched | 88.09% | 9.31 |
+----------------+----------+------------------+
3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 70.66% | 17.14 |
| patched | 91.15% | 12.57 |
+----------------+----------+------------------+
4) Ebizzy: There's a fair amount of variation from run to run, but this
approach always shows nearly perfect hit rates, while baseline is just
about non-existent. The amounts of cycles can fluctuate between
anywhere from ~60 to ~116 for the baseline scheme, but this approach
reduces it considerably. For instance, with 80 threads:
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 1.06% | 91.54 |
| patched | 99.97% | 14.18 |
+----------------+----------+------------------+
[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 02:37:25 +04:00
if ( current - > mm ) {
int i ;
for ( i = 0 ; i < VMACACHE_SIZE ; i + + ) {
2017-02-03 13:03:31 +03:00
if ( ! current - > vmacache . vmas [ i ] )
mm: per-thread vma caching
This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed. There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma(). Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.
We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality. On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.
The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number. The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed. Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question. Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:
1) System bootup: Most programs are single threaded, so the per-thread
scheme does improve ~50% hit rate by just adding a few more slots to
the cache.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 50.61% | 19.90 |
| patched | 73.45% | 13.58 |
+----------------+----------+------------------+
2) Kernel build: This one is already pretty good with the current
approach as we're dealing with good locality.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 75.28% | 11.03 |
| patched | 88.09% | 9.31 |
+----------------+----------+------------------+
3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 70.66% | 17.14 |
| patched | 91.15% | 12.57 |
+----------------+----------+------------------+
4) Ebizzy: There's a fair amount of variation from run to run, but this
approach always shows nearly perfect hit rates, while baseline is just
about non-existent. The amounts of cycles can fluctuate between
anywhere from ~60 to ~116 for the baseline scheme, but this approach
reduces it considerably. For instance, with 80 threads:
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 1.06% | 91.54 |
| patched | 99.97% | 14.18 |
+----------------+----------+------------------+
[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 02:37:25 +04:00
continue ;
2017-02-03 13:03:31 +03:00
flush_cache_range ( current - > vmacache . vmas [ i ] ,
mm: per-thread vma caching
This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed. There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma(). Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.
We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality. On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.
The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number. The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed. Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question. Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:
1) System bootup: Most programs are single threaded, so the per-thread
scheme does improve ~50% hit rate by just adding a few more slots to
the cache.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 50.61% | 19.90 |
| patched | 73.45% | 13.58 |
+----------------+----------+------------------+
2) Kernel build: This one is already pretty good with the current
approach as we're dealing with good locality.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 75.28% | 11.03 |
| patched | 88.09% | 9.31 |
+----------------+----------+------------------+
3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 70.66% | 17.14 |
| patched | 91.15% | 12.57 |
+----------------+----------+------------------+
4) Ebizzy: There's a fair amount of variation from run to run, but this
approach always shows nearly perfect hit rates, while baseline is just
about non-existent. The amounts of cycles can fluctuate between
anywhere from ~60 to ~116 for the baseline scheme, but this approach
reduces it considerably. For instance, with 80 threads:
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 1.06% | 91.54 |
| patched | 99.97% | 14.18 |
+----------------+----------+------------------+
[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 02:37:25 +04:00
addr , addr + BREAK_INSTR_SIZE ) ;
}
2008-04-17 22:05:37 +04:00
}
mm: per-thread vma caching
This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed. There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma(). Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.
We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality. On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.
The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number. The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed. Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question. Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:
1) System bootup: Most programs are single threaded, so the per-thread
scheme does improve ~50% hit rate by just adding a few more slots to
the cache.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 50.61% | 19.90 |
| patched | 73.45% | 13.58 |
+----------------+----------+------------------+
2) Kernel build: This one is already pretty good with the current
approach as we're dealing with good locality.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 75.28% | 11.03 |
| patched | 88.09% | 9.31 |
+----------------+----------+------------------+
3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 70.66% | 17.14 |
| patched | 91.15% | 12.57 |
+----------------+----------+------------------+
4) Ebizzy: There's a fair amount of variation from run to run, but this
approach always shows nearly perfect hit rates, while baseline is just
about non-existent. The amounts of cycles can fluctuate between
anywhere from ~60 to ~116 for the baseline scheme, but this approach
reduces it considerably. For instance, with 80 threads:
+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline | 1.06% | 91.54 |
| patched | 99.97% | 14.18 |
+----------------+----------+------------------+
[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 02:37:25 +04:00
2008-04-02 01:55:28 +04:00
/* Force flush instruction cache if it was outside the mm */
flush_icache_range ( addr , addr + BREAK_INSTR_SIZE ) ;
2008-04-17 22:05:37 +04:00
}
/*
* SW breakpoint management :
*/
2010-04-02 20:48:03 +04:00
int dbg_activate_sw_breakpoints ( void )
2008-04-17 22:05:37 +04:00
{
2009-12-11 17:43:20 +03:00
int error ;
int ret = 0 ;
2008-04-17 22:05:37 +04:00
int i ;
for ( i = 0 ; i < KGDB_MAX_BREAKPOINTS ; i + + ) {
if ( kgdb_break [ i ] . state ! = BP_SET )
continue ;
2012-03-21 19:17:03 +04:00
error = kgdb_arch_set_breakpoint ( & kgdb_break [ i ] ) ;
2009-12-11 17:43:20 +03:00
if ( error ) {
ret = error ;
2014-06-12 23:30:11 +04:00
pr_info ( " BP install failed: %lx \n " ,
kgdb_break [ i ] . bpt_addr ) ;
2009-12-11 17:43:20 +03:00
continue ;
}
2008-04-17 22:05:37 +04:00
2012-03-21 19:17:03 +04:00
kgdb_flush_swbreak_addr ( kgdb_break [ i ] . bpt_addr ) ;
2008-04-17 22:05:37 +04:00
kgdb_break [ i ] . state = BP_ACTIVE ;
}
2009-12-11 17:43:20 +03:00
return ret ;
2008-04-17 22:05:37 +04:00
}
2010-04-02 20:48:03 +04:00
int dbg_set_sw_break ( unsigned long addr )
2008-04-17 22:05:37 +04:00
{
int err = kgdb_validate_break_address ( addr ) ;
int breakno = - 1 ;
int i ;
if ( err )
return err ;
for ( i = 0 ; i < KGDB_MAX_BREAKPOINTS ; i + + ) {
if ( ( kgdb_break [ i ] . state = = BP_SET ) & &
( kgdb_break [ i ] . bpt_addr = = addr ) )
return - EEXIST ;
}
for ( i = 0 ; i < KGDB_MAX_BREAKPOINTS ; i + + ) {
if ( kgdb_break [ i ] . state = = BP_REMOVED & &
kgdb_break [ i ] . bpt_addr = = addr ) {
breakno = i ;
break ;
}
}
if ( breakno = = - 1 ) {
for ( i = 0 ; i < KGDB_MAX_BREAKPOINTS ; i + + ) {
if ( kgdb_break [ i ] . state = = BP_UNDEFINED ) {
breakno = i ;
break ;
}
}
}
if ( breakno = = - 1 )
return - E2BIG ;
kgdb_break [ breakno ] . state = BP_SET ;
kgdb_break [ breakno ] . type = BP_BREAKPOINT ;
kgdb_break [ breakno ] . bpt_addr = addr ;
return 0 ;
}
2010-05-21 06:04:21 +04:00
int dbg_deactivate_sw_breakpoints ( void )
2008-04-17 22:05:37 +04:00
{
2009-12-11 17:43:20 +03:00
int error ;
int ret = 0 ;
2008-04-17 22:05:37 +04:00
int i ;
for ( i = 0 ; i < KGDB_MAX_BREAKPOINTS ; i + + ) {
if ( kgdb_break [ i ] . state ! = BP_ACTIVE )
continue ;
2012-03-21 19:17:03 +04:00
error = kgdb_arch_remove_breakpoint ( & kgdb_break [ i ] ) ;
2009-12-11 17:43:20 +03:00
if ( error ) {
2014-06-12 23:30:11 +04:00
pr_info ( " BP remove failed: %lx \n " ,
kgdb_break [ i ] . bpt_addr ) ;
2009-12-11 17:43:20 +03:00
ret = error ;
}
2008-04-17 22:05:37 +04:00
2012-03-21 19:17:03 +04:00
kgdb_flush_swbreak_addr ( kgdb_break [ i ] . bpt_addr ) ;
2008-04-17 22:05:37 +04:00
kgdb_break [ i ] . state = BP_SET ;
}
2009-12-11 17:43:20 +03:00
return ret ;
2008-04-17 22:05:37 +04:00
}
2010-04-02 20:48:03 +04:00
int dbg_remove_sw_break ( unsigned long addr )
2008-04-17 22:05:37 +04:00
{
int i ;
for ( i = 0 ; i < KGDB_MAX_BREAKPOINTS ; i + + ) {
if ( ( kgdb_break [ i ] . state = = BP_SET ) & &
( kgdb_break [ i ] . bpt_addr = = addr ) ) {
kgdb_break [ i ] . state = BP_REMOVED ;
return 0 ;
}
}
return - ENOENT ;
}
int kgdb_isremovedbreak ( unsigned long addr )
{
int i ;
for ( i = 0 ; i < KGDB_MAX_BREAKPOINTS ; i + + ) {
if ( ( kgdb_break [ i ] . state = = BP_REMOVED ) & &
( kgdb_break [ i ] . bpt_addr = = addr ) )
return 1 ;
}
return 0 ;
}
2010-04-02 20:48:03 +04:00
int dbg_remove_all_break ( void )
2008-04-17 22:05:37 +04:00
{
int error ;
int i ;
/* Clear memory breakpoints. */
for ( i = 0 ; i < KGDB_MAX_BREAKPOINTS ; i + + ) {
2008-03-08 01:34:16 +03:00
if ( kgdb_break [ i ] . state ! = BP_ACTIVE )
goto setundefined ;
2012-03-21 19:17:03 +04:00
error = kgdb_arch_remove_breakpoint ( & kgdb_break [ i ] ) ;
2008-04-17 22:05:37 +04:00
if ( error )
2014-06-12 23:30:11 +04:00
pr_err ( " breakpoint remove failed: %lx \n " ,
2012-03-21 19:17:03 +04:00
kgdb_break [ i ] . bpt_addr ) ;
2008-03-08 01:34:16 +03:00
setundefined :
kgdb_break [ i ] . state = BP_UNDEFINED ;
2008-04-17 22:05:37 +04:00
}
/* Clear hardware breakpoints. */
if ( arch_kgdb_ops . remove_all_hw_break )
arch_kgdb_ops . remove_all_hw_break ( ) ;
return 0 ;
}
/*
* Return true if there is a valid kgdb I / O module . Also if no
* debugger is attached a message can be printed to the console about
* waiting for the debugger to attach .
*
* The print_wait argument is only to be true when called from inside
* the core kgdb_handle_exception , because it will wait for the
* debugger to attach .
*/
static int kgdb_io_ready ( int print_wait )
{
2010-04-02 20:48:03 +04:00
if ( ! dbg_io_ops )
2008-04-17 22:05:37 +04:00
return 0 ;
if ( kgdb_connected )
return 1 ;
if ( atomic_read ( & kgdb_setting_breakpoint ) )
return 1 ;
2010-05-21 06:04:21 +04:00
if ( print_wait ) {
# ifdef CONFIG_KGDB_KDB
if ( ! dbg_kdb_mode )
2014-06-12 23:30:11 +04:00
pr_crit ( " waiting... or $3#33 for KDB \n " ) ;
2010-05-21 06:04:21 +04:00
# else
2014-06-12 23:30:11 +04:00
pr_crit ( " Waiting for remote debugger \n " ) ;
2010-05-21 06:04:21 +04:00
# endif
}
2008-04-17 22:05:37 +04:00
return 1 ;
}
static int kgdb_reenter_check ( struct kgdb_state * ks )
{
unsigned long addr ;
if ( atomic_read ( & kgdb_active ) ! = raw_smp_processor_id ( ) )
return 0 ;
/* Panic on recursive debugger calls: */
exception_level + + ;
addr = kgdb_arch_pc ( ks - > ex_vector , ks - > linux_regs ) ;
2010-05-21 06:04:21 +04:00
dbg_deactivate_sw_breakpoints ( ) ;
2008-04-17 22:05:37 +04:00
/*
* If the break point removed ok at the place exception
* occurred , try to recover and print a warning to the end
* user because the user planted a breakpoint in a place that
* KGDB needs in order to function .
*/
2010-04-02 20:48:03 +04:00
if ( dbg_remove_sw_break ( addr ) = = 0 ) {
2008-04-17 22:05:37 +04:00
exception_level = 0 ;
kgdb_skipexception ( ks - > ex_vector , ks - > linux_regs ) ;
2010-04-02 20:48:03 +04:00
dbg_activate_sw_breakpoints ( ) ;
2014-06-12 23:30:11 +04:00
pr_crit ( " re-enter error: breakpoint removed %lx \n " , addr ) ;
2008-04-17 22:05:37 +04:00
WARN_ON_ONCE ( 1 ) ;
return 1 ;
}
2010-04-02 20:48:03 +04:00
dbg_remove_all_break ( ) ;
2008-04-17 22:05:37 +04:00
kgdb_skipexception ( ks - > ex_vector , ks - > linux_regs ) ;
if ( exception_level > 1 ) {
dump_stack ( ) ;
panic ( " Recursive entry to debugger " ) ;
}
2014-06-12 23:30:11 +04:00
pr_crit ( " re-enter exception: ALL breakpoints killed \n " ) ;
2010-05-21 06:04:27 +04:00
# ifdef CONFIG_KGDB_KDB
/* Allow kdb to debug itself one level */
return 0 ;
# endif
2008-04-17 22:05:37 +04:00
dump_stack ( ) ;
panic ( " Recursive entry to debugger " ) ;
return 1 ;
}
2010-08-06 20:47:14 +04:00
static void dbg_touch_watchdogs ( void )
{
touch_softlockup_watchdog_sync ( ) ;
clocksource_touch_watchdog ( ) ;
2010-08-13 21:44:04 +04:00
rcu_cpu_stall_reset ( ) ;
2010-08-06 20:47:14 +04:00
}
2010-05-21 17:46:00 +04:00
static int kgdb_cpu_enter ( struct kgdb_state * ks , struct pt_regs * regs ,
int exception_state )
2008-04-17 22:05:37 +04:00
{
unsigned long flags ;
2009-12-11 17:43:17 +03:00
int sstep_tries = 100 ;
2010-05-21 06:04:21 +04:00
int error ;
2010-05-21 17:46:00 +04:00
int cpu ;
2010-04-02 20:57:18 +04:00
int trace_on = 0 ;
2010-05-21 17:46:00 +04:00
int online_cpus = num_online_cpus ( ) ;
2014-11-11 18:31:53 +03:00
u64 time_left ;
2010-09-13 15:58:00 +04:00
2010-05-21 17:46:00 +04:00
kgdb_info [ ks - > cpu ] . enter_kgdb + + ;
kgdb_info [ ks - > cpu ] . exception_state | = exception_state ;
if ( exception_state = = DCPU_WANT_MASTER )
atomic_inc ( & masters_in_kgdb ) ;
else
atomic_inc ( & slaves_in_kgdb ) ;
2010-08-18 15:02:00 +04:00
if ( arch_kgdb_ops . disable_hw_break )
arch_kgdb_ops . disable_hw_break ( regs ) ;
2010-09-13 15:58:00 +04:00
2008-04-17 22:05:37 +04:00
acquirelock :
/*
* Interrupts will be restored by the ' trap return ' code , except when
* single stepping .
*/
local_irq_save ( flags ) ;
2010-04-02 20:47:02 +04:00
cpu = ks - > cpu ;
kgdb_info [ cpu ] . debuggerinfo = regs ;
kgdb_info [ cpu ] . task = current ;
2010-05-21 06:04:21 +04:00
kgdb_info [ cpu ] . ret_state = 0 ;
kgdb_info [ cpu ] . irq_depth = hardirq_count ( ) > > HARDIRQ_SHIFT ;
2008-04-17 22:05:37 +04:00
2010-05-21 17:46:00 +04:00
/* Make sure the above info reaches the primary CPU */
smp_mb ( ) ;
if ( exception_level = = 1 ) {
if ( raw_spin_trylock ( & dbg_master_lock ) )
atomic_xchg ( & kgdb_active , cpu ) ;
2010-05-21 06:04:27 +04:00
goto cpu_master_loop ;
2010-05-21 17:46:00 +04:00
}
2010-05-21 06:04:27 +04:00
2008-04-17 22:05:37 +04:00
/*
2010-04-02 20:47:02 +04:00
* CPU will loop if it is a slave or request to become a kgdb
* master cpu and acquire the kgdb_active lock :
2008-04-17 22:05:37 +04:00
*/
2010-04-02 20:47:02 +04:00
while ( 1 ) {
2010-05-21 06:04:21 +04:00
cpu_loop :
if ( kgdb_info [ cpu ] . exception_state & DCPU_NEXT_MASTER ) {
kgdb_info [ cpu ] . exception_state & = ~ DCPU_NEXT_MASTER ;
goto cpu_master_loop ;
} else if ( kgdb_info [ cpu ] . exception_state & DCPU_WANT_MASTER ) {
2010-05-21 17:46:00 +04:00
if ( raw_spin_trylock ( & dbg_master_lock ) ) {
atomic_xchg ( & kgdb_active , cpu ) ;
2010-04-02 20:47:02 +04:00
break ;
2010-05-21 17:46:00 +04:00
}
2010-04-02 20:47:02 +04:00
} else if ( kgdb_info [ cpu ] . exception_state & DCPU_IS_SLAVE ) {
2010-05-21 17:46:00 +04:00
if ( ! raw_spin_is_locked ( & dbg_slave_lock ) )
2010-04-02 20:47:02 +04:00
goto return_normal ;
} else {
return_normal :
/* Return to normal operation by executing any
* hw breakpoint fixup .
*/
if ( arch_kgdb_ops . correct_hw_break )
arch_kgdb_ops . correct_hw_break ( ) ;
2010-04-02 20:57:18 +04:00
if ( trace_on )
tracing_on ( ) ;
2018-12-05 06:38:28 +03:00
kgdb_info [ cpu ] . debuggerinfo = NULL ;
kgdb_info [ cpu ] . task = NULL ;
2010-05-21 17:46:00 +04:00
kgdb_info [ cpu ] . exception_state & =
~ ( DCPU_WANT_MASTER | DCPU_IS_SLAVE ) ;
kgdb_info [ cpu ] . enter_kgdb - - ;
2014-03-17 21:06:10 +04:00
smp_mb__before_atomic ( ) ;
2010-05-21 17:46:00 +04:00
atomic_dec ( & slaves_in_kgdb ) ;
2010-08-06 20:47:14 +04:00
dbg_touch_watchdogs ( ) ;
2010-04-02 20:47:02 +04:00
local_irq_restore ( flags ) ;
return 0 ;
}
2008-04-17 22:05:37 +04:00
cpu_relax ( ) ;
2010-04-02 20:47:02 +04:00
}
2008-04-17 22:05:37 +04:00
/*
2009-12-11 17:43:17 +03:00
* For single stepping , try to only enter on the processor
2011-03-31 05:57:33 +04:00
* that was single stepping . To guard against a deadlock , the
2009-12-11 17:43:17 +03:00
* kernel will only try for the value of sstep_tries before
* giving up and continuing on .
2008-04-17 22:05:37 +04:00
*/
if ( atomic_read ( & kgdb_cpu_doing_single_step ) ! = - 1 & &
2009-12-11 17:43:17 +03:00
( kgdb_info [ cpu ] . task & &
kgdb_info [ cpu ] . task - > pid ! = kgdb_sstep_pid ) & & - - sstep_tries ) {
2008-04-17 22:05:37 +04:00
atomic_set ( & kgdb_active , - 1 ) ;
2010-05-21 17:46:00 +04:00
raw_spin_unlock ( & dbg_master_lock ) ;
2010-08-06 20:47:14 +04:00
dbg_touch_watchdogs ( ) ;
2008-04-17 22:05:37 +04:00
local_irq_restore ( flags ) ;
goto acquirelock ;
}
if ( ! kgdb_io_ready ( 1 ) ) {
2010-05-21 06:04:21 +04:00
kgdb_info [ cpu ] . ret_state = 1 ;
2010-04-02 20:48:03 +04:00
goto kgdb_restore ; /* No I/O connection, resume the system */
2008-04-17 22:05:37 +04:00
}
/*
* Don ' t enter if we have hit a removed breakpoint .
*/
if ( kgdb_skipexception ( ks - > ex_vector , ks - > linux_regs ) )
goto kgdb_restore ;
/* Call the I/O driver's pre_exception routine */
2010-04-02 20:48:03 +04:00
if ( dbg_io_ops - > pre_exception )
dbg_io_ops - > pre_exception ( ) ;
2008-04-17 22:05:37 +04:00
/*
* Get the passive CPU lock which will hold all the non - primary
* CPU in a spin state while the debugger is active
*/
2010-05-21 17:46:00 +04:00
if ( ! kgdb_single_step )
raw_spin_lock ( & dbg_slave_lock ) ;
2008-04-17 22:05:37 +04:00
2008-04-02 01:55:27 +04:00
# ifdef CONFIG_SMP
2013-10-02 19:14:18 +04:00
/* If send_ready set, slaves are already waiting */
if ( ks - > send_ready )
atomic_set ( ks - > send_ready , 1 ) ;
2008-04-02 01:55:27 +04:00
/* Signal the other CPUs to enter kgdb_wait() */
2013-10-02 19:14:18 +04:00
else if ( ( ! kgdb_single_step ) & & kgdb_do_roundup )
2018-12-05 06:38:25 +03:00
kgdb_roundup_cpus ( ) ;
2008-04-02 01:55:27 +04:00
# endif
2008-04-17 22:05:37 +04:00
/*
* Wait for the other CPUs to be notified and be waiting for us :
*/
2016-12-15 02:05:49 +03:00
time_left = MSEC_PER_SEC ;
2014-11-11 18:31:53 +03:00
while ( kgdb_do_roundup & & - - time_left & &
( atomic_read ( & masters_in_kgdb ) + atomic_read ( & slaves_in_kgdb ) ) ! =
online_cpus )
2016-12-15 02:05:49 +03:00
udelay ( 1000 ) ;
2014-11-11 18:31:53 +03:00
if ( ! time_left )
2015-01-09 00:46:55 +03:00
pr_crit ( " Timed out waiting for secondary CPUs. \n " ) ;
2008-04-17 22:05:37 +04:00
/*
* At this point the primary processor is completely
* in the debugger and all secondary CPUs are quiescent
*/
2010-05-21 06:04:21 +04:00
dbg_deactivate_sw_breakpoints ( ) ;
2008-04-17 22:05:37 +04:00
kgdb_single_step = 0 ;
kgdb, x86, arm, mips, powerpc: ignore user space single stepping
On the x86 arch, user space single step exceptions should be ignored
if they occur in the kernel space, such as ptrace stepping through a
system call.
First check if it is kgdb that is executing a single step, then ensure
it is not an accidental traversal into the user space, while in kgdb,
any other time the TIF_SINGLESTEP is set, kgdb should ignore the
exception.
On x86, arm, mips and powerpc, the kgdb_contthread usage was
inconsistent with the way single stepping is implemented in the kgdb
core. The arch specific stub should always set the
kgdb_cpu_doing_single_step correctly if it is single stepping. This
allows kgdb to correctly process an instruction steps if ptrace
happens to be requesting an instruction step over a system call.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2008-09-26 19:36:41 +04:00
kgdb_contthread = current ;
2008-04-17 22:05:37 +04:00
exception_level = 0 ;
2010-04-02 20:57:18 +04:00
trace_on = tracing_is_on ( ) ;
if ( trace_on )
tracing_off ( ) ;
2008-04-17 22:05:37 +04:00
2010-05-21 06:04:21 +04:00
while ( 1 ) {
cpu_master_loop :
if ( dbg_kdb_mode ) {
kgdb_connected = 1 ;
error = kdb_stub ( ks ) ;
2010-08-05 18:22:25 +04:00
if ( error = = - 1 )
continue ;
2010-07-22 04:27:07 +04:00
kgdb_connected = 0 ;
2010-05-21 06:04:21 +04:00
} else {
error = gdb_serial_stub ( ks ) ;
}
if ( error = = DBG_PASS_EVENT ) {
dbg_kdb_mode = ! dbg_kdb_mode ;
} else if ( error = = DBG_SWITCH_CPU_EVENT ) {
2010-05-21 17:46:00 +04:00
kgdb_info [ dbg_switch_cpu ] . exception_state | =
DCPU_NEXT_MASTER ;
2010-05-21 06:04:21 +04:00
goto cpu_loop ;
} else {
kgdb_info [ cpu ] . ret_state = error ;
break ;
}
}
2008-04-17 22:05:37 +04:00
/* Call the I/O driver's post_exception routine */
2010-04-02 20:48:03 +04:00
if ( dbg_io_ops - > post_exception )
dbg_io_ops - > post_exception ( ) ;
2008-04-17 22:05:37 +04:00
kgdb, x86, arm, mips, powerpc: ignore user space single stepping
On the x86 arch, user space single step exceptions should be ignored
if they occur in the kernel space, such as ptrace stepping through a
system call.
First check if it is kgdb that is executing a single step, then ensure
it is not an accidental traversal into the user space, while in kgdb,
any other time the TIF_SINGLESTEP is set, kgdb should ignore the
exception.
On x86, arm, mips and powerpc, the kgdb_contthread usage was
inconsistent with the way single stepping is implemented in the kgdb
core. The arch specific stub should always set the
kgdb_cpu_doing_single_step correctly if it is single stepping. This
allows kgdb to correctly process an instruction steps if ptrace
happens to be requesting an instruction step over a system call.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2008-09-26 19:36:41 +04:00
if ( ! kgdb_single_step ) {
2010-05-21 17:46:00 +04:00
raw_spin_unlock ( & dbg_slave_lock ) ;
/* Wait till all the CPUs have quit from the debugger. */
while ( kgdb_do_roundup & & atomic_read ( & slaves_in_kgdb ) )
cpu_relax ( ) ;
2008-04-17 22:05:37 +04:00
}
kgdb_restore :
2009-12-11 17:43:17 +03:00
if ( atomic_read ( & kgdb_cpu_doing_single_step ) ! = - 1 ) {
int sstep_cpu = atomic_read ( & kgdb_cpu_doing_single_step ) ;
if ( kgdb_info [ sstep_cpu ] . task )
kgdb_sstep_pid = kgdb_info [ sstep_cpu ] . task - > pid ;
else
kgdb_sstep_pid = 0 ;
}
2010-09-13 15:58:00 +04:00
if ( arch_kgdb_ops . correct_hw_break )
arch_kgdb_ops . correct_hw_break ( ) ;
2010-04-02 20:57:18 +04:00
if ( trace_on )
tracing_on ( ) ;
2010-05-21 17:46:00 +04:00
2018-12-05 06:38:28 +03:00
kgdb_info [ cpu ] . debuggerinfo = NULL ;
kgdb_info [ cpu ] . task = NULL ;
2010-05-21 17:46:00 +04:00
kgdb_info [ cpu ] . exception_state & =
~ ( DCPU_WANT_MASTER | DCPU_IS_SLAVE ) ;
kgdb_info [ cpu ] . enter_kgdb - - ;
2014-03-17 21:06:10 +04:00
smp_mb__before_atomic ( ) ;
2010-05-21 17:46:00 +04:00
atomic_dec ( & masters_in_kgdb ) ;
2008-04-17 22:05:37 +04:00
/* Free kgdb_active */
atomic_set ( & kgdb_active , - 1 ) ;
2010-05-21 17:46:00 +04:00
raw_spin_unlock ( & dbg_master_lock ) ;
2010-08-06 20:47:14 +04:00
dbg_touch_watchdogs ( ) ;
2008-04-17 22:05:37 +04:00
local_irq_restore ( flags ) ;
2010-05-21 06:04:21 +04:00
return kgdb_info [ cpu ] . ret_state ;
2008-04-17 22:05:37 +04:00
}
2010-04-02 20:47:02 +04:00
/*
* kgdb_handle_exception ( ) - main entry point from a kernel exception
*
* Locking hierarchy :
* interface locks , if any ( begin_session )
* kgdb lock ( kgdb_active )
*/
int
kgdb_handle_exception ( int evector , int signo , int ecode , struct pt_regs * regs )
{
struct kgdb_state kgdb_var ;
struct kgdb_state * ks = & kgdb_var ;
2012-09-25 01:27:50 +04:00
int ret = 0 ;
if ( arch_kgdb_ops . enable_nmi )
arch_kgdb_ops . enable_nmi ( 0 ) ;
2015-01-28 14:32:14 +03:00
/*
* Avoid entering the debugger if we were triggered due to an oops
* but panic_timeout indicates the system should automatically
* reboot on panic . We don ' t want to get stuck waiting for input
* on such systems , especially if its " just " an oops .
*/
if ( signo ! = SIGTRAP & & panic_timeout )
return 1 ;
2010-04-02 20:47:02 +04:00
2013-10-02 19:14:18 +04:00
memset ( ks , 0 , sizeof ( struct kgdb_state ) ) ;
2010-04-02 20:47:02 +04:00
ks - > cpu = raw_smp_processor_id ( ) ;
ks - > ex_vector = evector ;
ks - > signo = signo ;
ks - > err_code = ecode ;
ks - > linux_regs = regs ;
if ( kgdb_reenter_check ( ks ) )
2012-09-25 01:27:50 +04:00
goto out ; /* Ouch, double exception ! */
2010-05-21 17:46:00 +04:00
if ( kgdb_info [ ks - > cpu ] . enter_kgdb ! = 0 )
2012-09-25 01:27:50 +04:00
goto out ;
2010-05-21 17:46:00 +04:00
2012-09-25 01:27:50 +04:00
ret = kgdb_cpu_enter ( ks , regs , DCPU_WANT_MASTER ) ;
out :
if ( arch_kgdb_ops . enable_nmi )
arch_kgdb_ops . enable_nmi ( 1 ) ;
return ret ;
2010-04-02 20:47:02 +04:00
}
2012-10-12 15:37:33 +04:00
/*
2018-10-22 07:45:48 +03:00
* GDB places a breakpoint at this function to know dynamically loaded objects .
2012-10-12 15:37:33 +04:00
*/
static int module_event ( struct notifier_block * self , unsigned long val ,
void * data )
{
return 0 ;
}
static struct notifier_block dbg_module_load_nb = {
. notifier_call = module_event ,
} ;
2008-04-17 22:05:37 +04:00
int kgdb_nmicallback ( int cpu , void * regs )
{
# ifdef CONFIG_SMP
2010-04-02 20:47:02 +04:00
struct kgdb_state kgdb_var ;
struct kgdb_state * ks = & kgdb_var ;
2018-12-05 06:38:27 +03:00
kgdb_info [ cpu ] . rounding_up = false ;
2010-04-02 20:47:02 +04:00
memset ( ks , 0 , sizeof ( struct kgdb_state ) ) ;
ks - > cpu = cpu ;
ks - > linux_regs = regs ;
2010-05-21 17:46:00 +04:00
if ( kgdb_info [ ks - > cpu ] . enter_kgdb = = 0 & &
raw_spin_is_locked ( & dbg_master_lock ) ) {
kgdb_cpu_enter ( ks , regs , DCPU_IS_SLAVE ) ;
2008-04-17 22:05:37 +04:00
return 0 ;
}
# endif
return 1 ;
}
2014-01-14 20:25:52 +04:00
int kgdb_nmicallin ( int cpu , int trapnr , void * regs , int err_code ,
atomic_t * send_ready )
2013-10-02 19:14:18 +04:00
{
# ifdef CONFIG_SMP
if ( ! kgdb_io_ready ( 0 ) | | ! send_ready )
return 1 ;
if ( kgdb_info [ cpu ] . enter_kgdb = = 0 ) {
struct kgdb_state kgdb_var ;
struct kgdb_state * ks = & kgdb_var ;
memset ( ks , 0 , sizeof ( struct kgdb_state ) ) ;
ks - > cpu = cpu ;
ks - > ex_vector = trapnr ;
ks - > signo = SIGTRAP ;
2014-01-14 20:25:52 +04:00
ks - > err_code = err_code ;
2013-10-02 19:14:18 +04:00
ks - > linux_regs = regs ;
ks - > send_ready = send_ready ;
kgdb_cpu_enter ( ks , regs , DCPU_WANT_MASTER ) ;
return 0 ;
}
# endif
return 1 ;
}
2008-06-24 19:52:55 +04:00
static void kgdb_console_write ( struct console * co , const char * s ,
unsigned count )
2008-04-17 22:05:37 +04:00
{
unsigned long flags ;
/* If we're debugging, or KGDB has not connected, don't try
* and print . */
2010-05-21 06:04:21 +04:00
if ( ! kgdb_connected | | atomic_read ( & kgdb_active ) ! = - 1 | | dbg_kdb_mode )
2008-04-17 22:05:37 +04:00
return ;
local_irq_save ( flags ) ;
2010-04-02 20:48:03 +04:00
gdbstub_msg_write ( s , count ) ;
2008-04-17 22:05:37 +04:00
local_irq_restore ( flags ) ;
}
static struct console kgdbcons = {
. name = " kgdb " ,
. write = kgdb_console_write ,
. flags = CON_PRINTBUFFER | CON_ENABLED ,
. index = - 1 ,
} ;
# ifdef CONFIG_MAGIC_SYSRQ
2010-08-18 08:15:46 +04:00
static void sysrq_handle_dbg ( int key )
2008-04-17 22:05:37 +04:00
{
2010-04-02 20:48:03 +04:00
if ( ! dbg_io_ops ) {
2014-06-12 23:30:11 +04:00
pr_crit ( " ERROR: No KGDB I/O module available \n " ) ;
2008-04-17 22:05:37 +04:00
return ;
}
2010-05-21 06:04:21 +04:00
if ( ! kgdb_connected ) {
# ifdef CONFIG_KGDB_KDB
if ( ! dbg_kdb_mode )
2014-06-12 23:30:11 +04:00
pr_crit ( " KGDB or $3#33 for KDB \n " ) ;
2010-05-21 06:04:21 +04:00
# else
2014-06-12 23:30:11 +04:00
pr_crit ( " Entering KGDB \n " ) ;
2010-05-21 06:04:21 +04:00
# endif
}
2008-04-17 22:05:37 +04:00
kgdb_breakpoint ( ) ;
}
2010-04-02 20:48:03 +04:00
static struct sysrq_key_op sysrq_dbg_op = {
. handler = sysrq_handle_dbg ,
2013-05-01 02:28:51 +04:00
. help_msg = " debug(g) " ,
2009-05-14 06:56:59 +04:00
. action_msg = " DEBUG " ,
2008-04-17 22:05:37 +04:00
} ;
# endif
2010-05-21 06:04:28 +04:00
static int kgdb_panic_event ( struct notifier_block * self ,
unsigned long val ,
void * data )
{
2015-01-28 14:32:14 +03:00
/*
* Avoid entering the debugger if we were triggered due to a panic
* We don ' t want to get stuck waiting for input from user in such case .
* panic_timeout indicates the system should automatically
* reboot on panic .
*/
if ( panic_timeout )
return NOTIFY_DONE ;
2010-05-21 06:04:28 +04:00
if ( dbg_kdb_mode )
kdb_printf ( " PANIC: %s \n " , ( char * ) data ) ;
kgdb_breakpoint ( ) ;
return NOTIFY_DONE ;
}
static struct notifier_block kgdb_panic_event_nb = {
. notifier_call = kgdb_panic_event ,
. priority = INT_MAX ,
} ;
2010-05-21 06:04:29 +04:00
void __weak kgdb_arch_late ( void )
{
}
void __init dbg_late_init ( void )
{
dbg_is_early = false ;
if ( kgdb_io_module_registered )
kgdb_arch_late ( ) ;
kdb_init ( KDB_INIT_FULL ) ;
}
2012-03-16 23:20:41 +04:00
static int
dbg_notify_reboot ( struct notifier_block * this , unsigned long code , void * x )
{
2012-03-20 04:35:55 +04:00
/*
* Take the following action on reboot notify depending on value :
* 1 = = Enter debugger
* 0 = = [ the default ] detatch debug client
* - 1 = = Do nothing . . . and use this until the board resets
*/
switch ( kgdbreboot ) {
case 1 :
kgdb_breakpoint ( ) ;
case - 1 :
goto done ;
}
2012-03-16 23:20:41 +04:00
if ( ! dbg_kdb_mode )
gdbstub_exit ( code ) ;
2012-03-20 04:35:55 +04:00
done :
2012-03-16 23:20:41 +04:00
return NOTIFY_DONE ;
}
static struct notifier_block dbg_reboot_notifier = {
. notifier_call = dbg_notify_reboot ,
. next = NULL ,
. priority = INT_MAX ,
} ;
2008-04-17 22:05:37 +04:00
static void kgdb_register_callbacks ( void )
{
if ( ! kgdb_io_module_registered ) {
kgdb_io_module_registered = 1 ;
kgdb_arch_init ( ) ;
2010-05-21 06:04:29 +04:00
if ( ! dbg_is_early )
kgdb_arch_late ( ) ;
2012-10-12 15:37:33 +04:00
register_module_notifier ( & dbg_module_load_nb ) ;
2012-03-16 23:20:41 +04:00
register_reboot_notifier ( & dbg_reboot_notifier ) ;
2010-05-21 06:04:28 +04:00
atomic_notifier_chain_register ( & panic_notifier_list ,
& kgdb_panic_event_nb ) ;
2008-04-17 22:05:37 +04:00
# ifdef CONFIG_MAGIC_SYSRQ
2010-04-02 20:48:03 +04:00
register_sysrq_key ( ' g ' , & sysrq_dbg_op ) ;
2008-04-17 22:05:37 +04:00
# endif
if ( kgdb_use_con & & ! kgdb_con_registered ) {
register_console ( & kgdbcons ) ;
kgdb_con_registered = 1 ;
}
}
}
static void kgdb_unregister_callbacks ( void )
{
/*
* When this routine is called KGDB should unregister from the
* panic handler and clean up , making sure it is not handling any
* break exceptions at the time .
*/
if ( kgdb_io_module_registered ) {
kgdb_io_module_registered = 0 ;
2012-03-16 23:20:41 +04:00
unregister_reboot_notifier ( & dbg_reboot_notifier ) ;
2012-10-12 15:37:33 +04:00
unregister_module_notifier ( & dbg_module_load_nb ) ;
2010-05-21 06:04:28 +04:00
atomic_notifier_chain_unregister ( & panic_notifier_list ,
& kgdb_panic_event_nb ) ;
2008-04-17 22:05:37 +04:00
kgdb_arch_exit ( ) ;
# ifdef CONFIG_MAGIC_SYSRQ
2010-04-02 20:48:03 +04:00
unregister_sysrq_key ( ' g ' , & sysrq_dbg_op ) ;
2008-04-17 22:05:37 +04:00
# endif
if ( kgdb_con_registered ) {
unregister_console ( & kgdbcons ) ;
kgdb_con_registered = 0 ;
}
}
}
2009-06-03 23:06:57 +04:00
/*
* There are times a tasklet needs to be used vs a compiled in
* break point so as to cause an exception outside a kgdb I / O module ,
* such as is the case with kgdboe , where calling a breakpoint in the
* I / O driver itself would be fatal .
*/
static void kgdb_tasklet_bpt ( unsigned long ing )
{
kgdb_breakpoint ( ) ;
atomic_set ( & kgdb_break_tasklet_var , 0 ) ;
}
static DECLARE_TASKLET ( kgdb_tasklet_breakpoint , kgdb_tasklet_bpt , 0 ) ;
void kgdb_schedule_breakpoint ( void )
{
if ( atomic_read ( & kgdb_break_tasklet_var ) | |
atomic_read ( & kgdb_active ) ! = - 1 | |
atomic_read ( & kgdb_setting_breakpoint ) )
return ;
atomic_inc ( & kgdb_break_tasklet_var ) ;
tasklet_schedule ( & kgdb_tasklet_breakpoint ) ;
}
EXPORT_SYMBOL_GPL ( kgdb_schedule_breakpoint ) ;
2008-04-17 22:05:37 +04:00
static void kgdb_initial_breakpoint ( void )
{
kgdb_break_asap = 0 ;
2014-06-12 23:30:11 +04:00
pr_crit ( " Waiting for connection from remote gdb... \n " ) ;
2008-04-17 22:05:37 +04:00
kgdb_breakpoint ( ) ;
}
/**
2008-03-08 01:34:16 +03:00
* kgdb_register_io_module - register KGDB IO module
2010-04-02 20:48:03 +04:00
* @ new_dbg_io_ops : the io ops vector
2008-04-17 22:05:37 +04:00
*
* Register it with the KGDB core .
*/
2010-04-02 20:48:03 +04:00
int kgdb_register_io_module ( struct kgdb_io * new_dbg_io_ops )
2008-04-17 22:05:37 +04:00
{
int err ;
spin_lock ( & kgdb_registration_lock ) ;
2010-04-02 20:48:03 +04:00
if ( dbg_io_ops ) {
2008-04-17 22:05:37 +04:00
spin_unlock ( & kgdb_registration_lock ) ;
2014-06-12 23:30:11 +04:00
pr_err ( " Another I/O driver is already registered with KGDB \n " ) ;
2008-04-17 22:05:37 +04:00
return - EBUSY ;
}
2010-04-02 20:48:03 +04:00
if ( new_dbg_io_ops - > init ) {
err = new_dbg_io_ops - > init ( ) ;
2008-04-17 22:05:37 +04:00
if ( err ) {
spin_unlock ( & kgdb_registration_lock ) ;
return err ;
}
}
2010-04-02 20:48:03 +04:00
dbg_io_ops = new_dbg_io_ops ;
2008-04-17 22:05:37 +04:00
spin_unlock ( & kgdb_registration_lock ) ;
2014-06-12 23:30:11 +04:00
pr_info ( " Registered I/O driver %s \n " , new_dbg_io_ops - > name ) ;
2008-04-17 22:05:37 +04:00
/* Arm KGDB now. */
kgdb_register_callbacks ( ) ;
if ( kgdb_break_asap )
kgdb_initial_breakpoint ( ) ;
return 0 ;
}
EXPORT_SYMBOL_GPL ( kgdb_register_io_module ) ;
/**
* kkgdb_unregister_io_module - unregister KGDB IO module
2010-04-02 20:48:03 +04:00
* @ old_dbg_io_ops : the io ops vector
2008-04-17 22:05:37 +04:00
*
* Unregister it with the KGDB core .
*/
2010-04-02 20:48:03 +04:00
void kgdb_unregister_io_module ( struct kgdb_io * old_dbg_io_ops )
2008-04-17 22:05:37 +04:00
{
BUG_ON ( kgdb_connected ) ;
/*
* KGDB is no longer able to communicate out , so
* unregister our callbacks and reset state .
*/
kgdb_unregister_callbacks ( ) ;
spin_lock ( & kgdb_registration_lock ) ;
2010-04-02 20:48:03 +04:00
WARN_ON_ONCE ( dbg_io_ops ! = old_dbg_io_ops ) ;
dbg_io_ops = NULL ;
2008-04-17 22:05:37 +04:00
spin_unlock ( & kgdb_registration_lock ) ;
2014-06-12 23:30:11 +04:00
pr_info ( " Unregistered I/O driver %s, debugger disabled \n " ,
2010-04-02 20:48:03 +04:00
old_dbg_io_ops - > name ) ;
2008-04-17 22:05:37 +04:00
}
EXPORT_SYMBOL_GPL ( kgdb_unregister_io_module ) ;
2010-05-21 06:04:21 +04:00
int dbg_io_get_char ( void )
{
int ret = dbg_io_ops - > read_char ( ) ;
2010-05-21 06:04:22 +04:00
if ( ret = = NO_POLL_CHAR )
return - 1 ;
2010-05-21 06:04:21 +04:00
if ( ! dbg_kdb_mode )
return ret ;
if ( ret = = 127 )
return 8 ;
return ret ;
}
2008-04-17 22:05:37 +04:00
/**
* kgdb_breakpoint - generate breakpoint exception
*
* This function will generate a breakpoint exception . It is used at the
* beginning of a program to sync up with a debugger and can be used
* otherwise as a quick means to stop program execution and " break " into
* the debugger .
*/
2014-01-28 15:20:20 +04:00
noinline void kgdb_breakpoint ( void )
2008-04-17 22:05:37 +04:00
{
2010-04-02 23:58:18 +04:00
atomic_inc ( & kgdb_setting_breakpoint ) ;
2008-04-17 22:05:37 +04:00
wmb ( ) ; /* Sync point before breakpoint */
arch_kgdb_breakpoint ( ) ;
wmb ( ) ; /* Sync point after breakpoint */
2010-04-02 23:58:18 +04:00
atomic_dec ( & kgdb_setting_breakpoint ) ;
2008-04-17 22:05:37 +04:00
}
EXPORT_SYMBOL_GPL ( kgdb_breakpoint ) ;
static int __init opt_kgdb_wait ( char * str )
{
kgdb_break_asap = 1 ;
2010-05-21 06:04:21 +04:00
kdb_init ( KDB_INIT_EARLY ) ;
2008-04-17 22:05:37 +04:00
if ( kgdb_io_module_registered )
kgdb_initial_breakpoint ( ) ;
return 0 ;
}
early_param ( " kgdbwait " , opt_kgdb_wait ) ;