2005-04-16 15:20:36 -07:00
/*
* Copyright ( C ) 1991 , 1992 Linus Torvalds
2008-11-24 18:24:11 -08:00
* Copyright ( C ) 2000 , 2001 , 2002 Andi Kleen SuSE Labs
2005-04-16 15:20:36 -07:00
*
* 1997 - 11 - 28 Modified for POSIX .1 b signals by Richard Henderson
* 2000 - 06 - 20 Pentium III FXSR , SSE support by Gareth Hughes
2008-11-24 18:24:11 -08:00
* 2000 - 2002 x86 - 64 support by Andi Kleen
2005-04-16 15:20:36 -07:00
*/
2012-05-21 19:50:07 -07:00
# define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
2008-11-21 17:36:41 -08:00
# include <linux/sched.h>
# include <linux/mm.h>
# include <linux/smp.h>
2005-04-16 15:20:36 -07:00
# include <linux/kernel.h>
# include <linux/errno.h>
# include <linux/wait.h>
2008-03-14 17:46:38 -07:00
# include <linux/tracehook.h>
2008-11-21 17:36:41 -08:00
# include <linux/unistd.h>
# include <linux/stddef.h>
# include <linux/personality.h>
# include <linux/uaccess.h>
2009-09-19 09:40:22 +03:00
# include <linux/user-return-notifier.h>
uprobes/core: Handle breakpoint and singlestep exceptions
Uprobes uses exception notifiers to get to know if a thread hit
a breakpoint or a singlestep exception.
When a thread hits a uprobe or is singlestepping post a uprobe
hit, the uprobe exception notifier sets its TIF_UPROBE bit,
which will then be checked on its return to userspace path
(do_notify_resume() ->uprobe_notify_resume()), where the
consumers handlers are run (in task context) based on the
defined filters.
Uprobe hits are thread specific and hence we need to maintain
information about if a task hit a uprobe, what uprobe was hit,
the slot where the original instruction was copied for xol so
that it can be singlestepped with appropriate fixups.
In some cases, special care is needed for instructions that are
executed out of line (xol). These are architecture specific
artefacts, such as handling RIP relative instructions on x86_64.
Since the instruction at which the uprobe was inserted is
executed out of line, architecture specific fixups are added so
that the thread continues normal execution in the presence of a
uprobe.
Postpone the signals until we execute the probed insn.
post_xol() path does a recalc_sigpending() before return to
user-mode, this ensures the signal can't be lost.
Uprobes relies on DIE_DEBUG notification to notify if a
singlestep is complete.
Adds x86 specific uprobe exception notifiers and appropriate
hooks needed to determine a uprobe hit and subsequent post
processing.
Add requisite x86 fixups for xol for uprobes. Specific cases
needing fixups include relative jumps (x86_64), calls, etc.
Where possible, we check and skip singlestepping the
breakpointed instructions. For now we skip single byte as well
as few multibyte nop instructions. However this can be extended
to other instructions too.
Credits to Oleg Nesterov for suggestions/patches related to
signal, breakpoint, singlestep handling code.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120313180011.29771.89027.sendpatchset@srdronam.in.ibm.com
[ Performed various cleanliness edits ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-13 23:30:11 +05:30
# include <linux/uprobes.h>
2012-11-27 19:33:25 +01:00
# include <linux/context_tracking.h>
2008-03-06 10:33:08 +01:00
2005-04-16 15:20:36 -07:00
# include <asm/processor.h>
# include <asm/ucontext.h>
2015-04-24 02:54:44 +02:00
# include <asm/fpu/internal.h>
2015-04-30 08:45:02 +02:00
# include <asm/fpu/signal.h>
2008-01-30 13:30:42 +01:00
# include <asm/vdso.h>
x86, mce: use 64bit machine check code on 32bit
The 64bit machine check code is in many ways much better than
the 32bit machine check code: it is more specification compliant,
is cleaner, only has a single code base versus one per CPU,
has better infrastructure for recovery, has a cleaner way to communicate
with user space etc. etc.
Use the 64bit code for 32bit too.
This is the second attempt to do this. There was one a couple of years
ago to unify this code for 32bit and 64bit. Back then this ran into some
trouble with K7s and was reverted.
I believe this time the K7 problems (and some others) are addressed.
I went over the old handlers and was very careful to retain
all quirks.
But of course this needs a lot of testing on old systems. On newer
64bit capable systems I don't expect much problems because they have been
already tested with the 64bit kernel.
I made this a CONFIG for now that still allows to select the old
machine check code. This is mostly to make testing easier,
if someone runs into a problem we can ask them to try
with the CONFIG switched.
The new code is default y for more coverage.
Once there is confidence the 64bit code works well on older hardware
too the CONFIG_X86_OLD_MCE and the associated code can be easily
removed.
This causes a behaviour change for 32bit installations. They now
have to install the mcelog package to be able to log
corrected machine checks.
The 64bit machine check code only handles CPUs which support the
standard Intel machine check architecture described in the IA32 SDM.
The 32bit code has special support for some older CPUs which
have non standard machine check architectures, in particular
WinChip C3 and Intel P5. I made those a separate CONFIG option
and kept them for now. The WinChip variant could be probably
removed without too much pain, it doesn't really do anything
interesting. P5 is also disabled by default (like it
was before) because many motherboards have it miswired, but
according to Alan Cox a few embedded setups use that one.
Forward ported/heavily changed version of old patch, original patch
included review/fixes from Thomas Gleixner, Bert Wesarg.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-04-28 19:07:31 +02:00
# include <asm/mce.h>
2012-02-19 07:38:43 -08:00
# include <asm/sighandling.h>
2015-07-29 01:41:21 -04:00
# include <asm/vm86.h>
2008-11-21 17:36:41 -08:00
# ifdef CONFIG_X86_64
# include <asm/proto.h>
# include <asm/ia32_unistd.h>
# endif /* CONFIG_X86_64 */
2008-09-05 16:26:55 -07:00
# include <asm/syscall.h>
2008-07-21 21:34:13 +05:30
# include <asm/syscalls.h>
2008-03-06 10:33:08 +01:00
2008-12-17 18:50:32 -08:00
# include <asm/sigframe.h>
2005-04-16 15:20:36 -07:00
2009-02-09 22:17:40 +09:00
# define COPY(x) do { \
get_user_ex ( regs - > x , & sc - > x ) ; \
} while ( 0 )
2008-11-24 18:21:37 -08:00
2009-02-09 22:17:40 +09:00
# define GET_SEG(seg) ({ \
unsigned short tmp ; \
get_user_ex ( tmp , & sc - > seg ) ; \
tmp ; \
} )
2008-11-24 18:21:37 -08:00
2009-02-09 22:17:40 +09:00
# define COPY_SEG(seg) do { \
regs - > seg = GET_SEG ( seg ) ; \
} while ( 0 )
2008-11-24 18:21:37 -08:00
2009-02-09 22:17:40 +09:00
# define COPY_SEG_CPL3(seg) do { \
regs - > seg = GET_SEG ( seg ) | 3 ; \
} while ( 0 )
2008-11-24 18:21:37 -08:00
2015-04-04 08:58:23 -04:00
int restore_sigcontext ( struct pt_regs * regs , struct sigcontext __user * sc )
2008-11-24 18:21:37 -08:00
{
2015-09-05 09:32:39 +02:00
unsigned long buf_val ;
2008-11-24 18:21:37 -08:00
void __user * buf ;
unsigned int tmpflags ;
unsigned int err = 0 ;
/* Always make any pending restarted system calls return -EINTR */
2015-02-12 15:01:14 -08:00
current - > restart_block . fn = do_no_restart_syscall ;
2008-11-24 18:21:37 -08:00
2009-01-23 15:50:10 -08:00
get_user_try {
2008-11-24 18:21:37 -08:00
# ifdef CONFIG_X86_32
2009-02-09 22:17:40 +09:00
set_user_gs ( regs , GET_SEG ( gs ) ) ;
2009-01-23 15:50:10 -08:00
COPY_SEG ( fs ) ;
COPY_SEG ( es ) ;
COPY_SEG ( ds ) ;
2008-11-24 18:21:37 -08:00
# endif /* CONFIG_X86_32 */
2009-01-23 15:50:10 -08:00
COPY ( di ) ; COPY ( si ) ; COPY ( bp ) ; COPY ( sp ) ; COPY ( bx ) ;
2015-04-04 08:58:23 -04:00
COPY ( dx ) ; COPY ( cx ) ; COPY ( ip ) ; COPY ( ax ) ;
2008-11-24 18:21:37 -08:00
# ifdef CONFIG_X86_64
2009-01-23 15:50:10 -08:00
COPY ( r8 ) ;
COPY ( r9 ) ;
COPY ( r10 ) ;
COPY ( r11 ) ;
COPY ( r12 ) ;
COPY ( r13 ) ;
COPY ( r14 ) ;
COPY ( r15 ) ;
2008-11-24 18:21:37 -08:00
# endif /* CONFIG_X86_64 */
2015-08-13 08:25:20 -07:00
# ifdef CONFIG_X86_32
2009-01-23 15:50:10 -08:00
COPY_SEG_CPL3 ( cs ) ;
COPY_SEG_CPL3 ( ss ) ;
2015-08-13 08:25:20 -07:00
# else /* !CONFIG_X86_32 */
/* Kernel saves and restores only the CS segment register on signals,
* which is the bare minimum needed to allow mixed 32 / 64 - bit code .
* App ' s signal handler can save / restore other segments if needed . */
COPY_SEG_CPL3 ( cs ) ;
# endif /* CONFIG_X86_32 */
2008-11-24 18:21:37 -08:00
2009-01-23 15:50:10 -08:00
get_user_ex ( tmpflags , & sc - > flags ) ;
regs - > flags = ( regs - > flags & ~ FIX_EFLAGS ) | ( tmpflags & FIX_EFLAGS ) ;
regs - > orig_ax = - 1 ; /* disable syscall checks */
2015-09-05 09:32:39 +02:00
get_user_ex ( buf_val , & sc - > fpstate ) ;
buf = ( void __user * ) buf_val ;
2009-01-23 15:50:10 -08:00
} get_user_catch ( err ) ;
2008-11-24 18:21:37 -08:00
2015-04-29 20:55:19 +02:00
err | = fpu__restore_sig ( buf , config_enabled ( CONFIG_X86_32 ) ) ;
2012-09-21 12:43:15 -07:00
2015-03-21 18:54:21 -04:00
force_iret ( ) ;
2008-11-24 18:21:37 -08:00
return err ;
}
2012-02-19 07:43:09 -08:00
int setup_sigcontext ( struct sigcontext __user * sc , void __user * fpstate ,
struct pt_regs * regs , unsigned long mask )
2008-11-24 18:21:37 -08:00
{
int err = 0 ;
2009-01-23 15:50:10 -08:00
put_user_try {
2008-11-24 18:21:37 -08:00
2009-01-23 15:50:10 -08:00
# ifdef CONFIG_X86_32
2009-02-09 22:17:40 +09:00
put_user_ex ( get_user_gs ( regs ) , ( unsigned int __user * ) & sc - > gs ) ;
2009-01-23 15:50:10 -08:00
put_user_ex ( regs - > fs , ( unsigned int __user * ) & sc - > fs ) ;
put_user_ex ( regs - > es , ( unsigned int __user * ) & sc - > es ) ;
put_user_ex ( regs - > ds , ( unsigned int __user * ) & sc - > ds ) ;
2008-11-24 18:21:37 -08:00
# endif /* CONFIG_X86_32 */
2009-01-23 15:50:10 -08:00
put_user_ex ( regs - > di , & sc - > di ) ;
put_user_ex ( regs - > si , & sc - > si ) ;
put_user_ex ( regs - > bp , & sc - > bp ) ;
put_user_ex ( regs - > sp , & sc - > sp ) ;
put_user_ex ( regs - > bx , & sc - > bx ) ;
put_user_ex ( regs - > dx , & sc - > dx ) ;
put_user_ex ( regs - > cx , & sc - > cx ) ;
put_user_ex ( regs - > ax , & sc - > ax ) ;
2008-11-24 18:21:37 -08:00
# ifdef CONFIG_X86_64
2009-01-23 15:50:10 -08:00
put_user_ex ( regs - > r8 , & sc - > r8 ) ;
put_user_ex ( regs - > r9 , & sc - > r9 ) ;
put_user_ex ( regs - > r10 , & sc - > r10 ) ;
put_user_ex ( regs - > r11 , & sc - > r11 ) ;
put_user_ex ( regs - > r12 , & sc - > r12 ) ;
put_user_ex ( regs - > r13 , & sc - > r13 ) ;
put_user_ex ( regs - > r14 , & sc - > r14 ) ;
put_user_ex ( regs - > r15 , & sc - > r15 ) ;
2008-11-24 18:21:37 -08:00
# endif /* CONFIG_X86_64 */
2012-03-12 14:55:55 +05:30
put_user_ex ( current - > thread . trap_nr , & sc - > trapno ) ;
2009-01-23 15:50:10 -08:00
put_user_ex ( current - > thread . error_code , & sc - > err ) ;
put_user_ex ( regs - > ip , & sc - > ip ) ;
2008-11-24 18:21:37 -08:00
# ifdef CONFIG_X86_32
2009-01-23 15:50:10 -08:00
put_user_ex ( regs - > cs , ( unsigned int __user * ) & sc - > cs ) ;
put_user_ex ( regs - > flags , & sc - > flags ) ;
put_user_ex ( regs - > sp , & sc - > sp_at_signal ) ;
put_user_ex ( regs - > ss , ( unsigned int __user * ) & sc - > ss ) ;
2008-11-24 18:21:37 -08:00
# else /* !CONFIG_X86_32 */
2009-01-23 15:50:10 -08:00
put_user_ex ( regs - > flags , & sc - > flags ) ;
put_user_ex ( regs - > cs , & sc - > cs ) ;
2015-08-13 08:25:20 -07:00
put_user_ex ( 0 , & sc - > gs ) ;
put_user_ex ( 0 , & sc - > fs ) ;
2008-11-24 18:21:37 -08:00
# endif /* CONFIG_X86_32 */
2009-01-23 15:50:10 -08:00
put_user_ex ( fpstate , & sc - > fpstate ) ;
2008-11-24 18:21:37 -08:00
2009-01-23 15:50:10 -08:00
/* non-iBCS2 extensions.. */
put_user_ex ( mask , & sc - > oldmask ) ;
put_user_ex ( current - > thread . cr2 , & sc - > cr2 ) ;
} put_user_catch ( err ) ;
2008-11-24 18:21:37 -08:00
return err ;
}
2005-04-16 15:20:36 -07:00
/*
2008-11-24 18:23:12 -08:00
* Set up a signal frame .
2005-04-16 15:20:36 -07:00
*/
/*
* Determine which stack to use . .
*/
2009-02-27 10:30:32 -08:00
static unsigned long align_sigframe ( unsigned long sp )
{
# ifdef CONFIG_X86_32
/*
* Align the stack pointer according to the i386 ABI ,
* i . e . so that on function entry ( ( sp + 4 ) & 15 ) = = 0.
*/
sp = ( ( sp + 4 ) & - 16ul ) - 4 ;
# else /* !CONFIG_X86_32 */
sp = round_down ( sp , 16 ) - 8 ;
# endif
return sp ;
}
2015-09-28 14:23:57 +02:00
static void __user *
2008-07-29 10:29:21 -07:00
get_sigframe ( struct k_sigaction * ka , struct pt_regs * regs , size_t frame_size ,
2009-02-27 10:27:04 -08:00
void __user * * fpstate )
2005-04-16 15:20:36 -07:00
{
/* Default to using normal stack */
x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels
Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied
to/from the fpstate in the task struct.
And in the case of signal delivery for x86_64 binaries, if the fpstate is live
in the CPU registers, then the live state is copied directly to the user
sigframe. Otherwise fpstate in the task struct is copied to the user sigframe.
During restore, fpstate in the user sigframe is restored directly to the live
CPU registers.
Historically, different code paths led to different bugs. For example,
x86_64 code path was not preemption safe till recently. Also there is lot
of code duplication for support of new features like xsave etc.
Unify signal handling code paths for x86 and x86_64 kernels.
New strategy is as follows:
Signal delivery: Both for 32/64-bit frames, align the core math frame area to
64bytes as needed by xsave (this where the main fpu/extended state gets copied
to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave
frames). If the state is live, copy the register state directly to the user
frame. If not live, copy the state in the thread struct to the user frame. And
for 32-bit [f]xsave frames, construct the fsave header separately before
the actual [f]xsave area.
Signal return: As the 32-bit frames with [f]xstate has an additional
'fsave' header, copy everything back from the user sigframe to the
fpstate in the task structure and reconstruct the fxstate from the 'fsave'
header (Also user passed pointers may not be correctly aligned for
any attempt to directly restore any partial state). At the next fpstate usage,
everything will be restored to the live CPU registers.
For all the 64-bit frames and the 32-bit fsave frame, restore the state from
the user sigframe directly to the live CPU registers. 64-bit signals always
restored the math frame directly, so we can expect the math frame pointer
to be correctly aligned. For 32-bit fsave frames, there are no alignment
requirements, so we can restore the state directly.
"lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are
with in the noise range with this change.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com
[ Merged in compilation fix ]
Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-07-24 16:05:29 -07:00
unsigned long math_size = 0 ;
2009-02-27 10:29:57 -08:00
unsigned long sp = regs - > sp ;
x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels
Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied
to/from the fpstate in the task struct.
And in the case of signal delivery for x86_64 binaries, if the fpstate is live
in the CPU registers, then the live state is copied directly to the user
sigframe. Otherwise fpstate in the task struct is copied to the user sigframe.
During restore, fpstate in the user sigframe is restored directly to the live
CPU registers.
Historically, different code paths led to different bugs. For example,
x86_64 code path was not preemption safe till recently. Also there is lot
of code duplication for support of new features like xsave etc.
Unify signal handling code paths for x86 and x86_64 kernels.
New strategy is as follows:
Signal delivery: Both for 32/64-bit frames, align the core math frame area to
64bytes as needed by xsave (this where the main fpu/extended state gets copied
to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave
frames). If the state is live, copy the register state directly to the user
frame. If not live, copy the state in the thread struct to the user frame. And
for 32-bit [f]xsave frames, construct the fsave header separately before
the actual [f]xsave area.
Signal return: As the 32-bit frames with [f]xstate has an additional
'fsave' header, copy everything back from the user sigframe to the
fpstate in the task structure and reconstruct the fxstate from the 'fsave'
header (Also user passed pointers may not be correctly aligned for
any attempt to directly restore any partial state). At the next fpstate usage,
everything will be restored to the live CPU registers.
For all the 64-bit frames and the 32-bit fsave frame, restore the state from
the user sigframe directly to the live CPU registers. 64-bit signals always
restored the math frame directly, so we can expect the math frame pointer
to be correctly aligned. For 32-bit fsave frames, there are no alignment
requirements, so we can restore the state directly.
"lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are
with in the noise range with this change.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com
[ Merged in compilation fix ]
Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-07-24 16:05:29 -07:00
unsigned long buf_fx = 0 ;
2009-03-19 10:56:29 -07:00
int onsigstack = on_sig_stack ( sp ) ;
2015-04-23 12:49:20 +02:00
struct fpu * fpu = & current - > thread . fpu ;
2009-02-27 10:29:57 -08:00
/* redzone */
2012-07-24 16:05:27 -07:00
if ( config_enabled ( CONFIG_X86_64 ) )
sp - = 128 ;
2005-04-16 15:20:36 -07:00
2009-03-19 10:56:29 -07:00
if ( ! onsigstack ) {
/* This is the X/Open sanctioned signal stack switching. */
if ( ka - > sa . sa_flags & SA_ONSTACK ) {
2009-03-26 10:03:08 -07:00
if ( current - > sas_ss_size )
2009-03-19 10:56:29 -07:00
sp = current - > sas_ss_sp + current - > sas_ss_size ;
2012-07-24 16:05:27 -07:00
} else if ( config_enabled ( CONFIG_X86_32 ) & &
( regs - > ss & 0xffff ) ! = __USER_DS & &
! ( ka - > sa . sa_flags & SA_RESTORER ) & &
ka - > sa . sa_restorer ) {
/* This is the legacy signal stack switching. */
2009-03-19 10:56:29 -07:00
sp = ( unsigned long ) ka - > sa . sa_restorer ;
}
2005-04-16 15:20:36 -07:00
}
2015-04-23 12:49:20 +02:00
if ( fpu - > fpstate_active ) {
2015-04-29 21:09:18 +02:00
sp = fpu__alloc_mathframe ( sp , config_enabled ( CONFIG_X86_32 ) ,
& buf_fx , & math_size ) ;
2009-03-02 17:20:01 -08:00
* fpstate = ( void __user * ) sp ;
2008-07-29 10:29:21 -07:00
}
2009-03-19 10:56:29 -07:00
sp = align_sigframe ( sp - frame_size ) ;
/*
* If we are on the alternate signal stack and would overflow it , don ' t .
* Return an always - bogus address instead so we will die with SIGSEGV .
*/
if ( onsigstack & & ! likely ( on_sig_stack ( sp ) ) )
return ( void __user * ) - 1L ;
x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels
Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied
to/from the fpstate in the task struct.
And in the case of signal delivery for x86_64 binaries, if the fpstate is live
in the CPU registers, then the live state is copied directly to the user
sigframe. Otherwise fpstate in the task struct is copied to the user sigframe.
During restore, fpstate in the user sigframe is restored directly to the live
CPU registers.
Historically, different code paths led to different bugs. For example,
x86_64 code path was not preemption safe till recently. Also there is lot
of code duplication for support of new features like xsave etc.
Unify signal handling code paths for x86 and x86_64 kernels.
New strategy is as follows:
Signal delivery: Both for 32/64-bit frames, align the core math frame area to
64bytes as needed by xsave (this where the main fpu/extended state gets copied
to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave
frames). If the state is live, copy the register state directly to the user
frame. If not live, copy the state in the thread struct to the user frame. And
for 32-bit [f]xsave frames, construct the fsave header separately before
the actual [f]xsave area.
Signal return: As the 32-bit frames with [f]xstate has an additional
'fsave' header, copy everything back from the user sigframe to the
fpstate in the task structure and reconstruct the fxstate from the 'fsave'
header (Also user passed pointers may not be correctly aligned for
any attempt to directly restore any partial state). At the next fpstate usage,
everything will be restored to the live CPU registers.
For all the 64-bit frames and the 32-bit fsave frame, restore the state from
the user sigframe directly to the live CPU registers. 64-bit signals always
restored the math frame directly, so we can expect the math frame pointer
to be correctly aligned. For 32-bit fsave frames, there are no alignment
requirements, so we can restore the state directly.
"lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are
with in the noise range with this change.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com
[ Merged in compilation fix ]
Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-07-24 16:05:29 -07:00
/* save i387 and extended state */
2015-04-23 12:49:20 +02:00
if ( fpu - > fpstate_active & &
2015-04-28 11:35:20 +02:00
copy_fpstate_to_sigframe ( * fpstate , ( void __user * ) buf_fx , math_size ) < 0 )
2009-03-19 10:56:29 -07:00
return ( void __user * ) - 1L ;
return ( void __user * ) sp ;
2005-04-16 15:20:36 -07:00
}
2009-02-27 10:29:57 -08:00
# ifdef CONFIG_X86_32
static const struct {
u16 poplmovl ;
u32 val ;
u16 int80 ;
} __attribute__ ( ( packed ) ) retcode = {
0xb858 , /* popl %eax; movl $..., %eax */
__NR_sigreturn ,
0x80cd , /* int $0x80 */
} ;
static const struct {
u8 movl ;
u32 val ;
u16 int80 ;
u8 pad ;
} __attribute__ ( ( packed ) ) rt_retcode = {
0xb8 , /* movl $..., %eax */
__NR_rt_sigreturn ,
0x80cd , /* int $0x80 */
0
} ;
2008-03-06 10:33:08 +01:00
static int
2012-11-09 23:51:47 -05:00
__setup_frame ( int sig , struct ksignal * ksig , sigset_t * set ,
2008-09-05 16:28:06 -07:00
struct pt_regs * regs )
2005-04-16 15:20:36 -07:00
{
struct sigframe __user * frame ;
2008-03-06 10:33:08 +01:00
void __user * restorer ;
2005-04-16 15:20:36 -07:00
int err = 0 ;
2008-07-29 10:29:22 -07:00
void __user * fpstate = NULL ;
2005-04-16 15:20:36 -07:00
2012-11-09 23:51:47 -05:00
frame = get_sigframe ( & ksig - > ka , regs , sizeof ( * frame ) , & fpstate ) ;
2005-04-16 15:20:36 -07:00
if ( ! access_ok ( VERIFY_WRITE , frame , sizeof ( * frame ) ) )
2008-09-12 17:01:09 -07:00
return - EFAULT ;
2005-04-16 15:20:36 -07:00
2008-09-12 17:02:53 -07:00
if ( __put_user ( sig , & frame - > sig ) )
2008-09-12 17:01:09 -07:00
return - EFAULT ;
2005-04-16 15:20:36 -07:00
2008-09-12 17:02:53 -07:00
if ( setup_sigcontext ( & frame - > sc , fpstate , regs , set - > sig [ 0 ] ) )
2008-09-12 17:01:09 -07:00
return - EFAULT ;
2005-04-16 15:20:36 -07:00
if ( _NSIG_WORDS > 1 ) {
2008-09-12 17:02:53 -07:00
if ( __copy_to_user ( & frame - > extramask , & set - > sig [ 1 ] ,
sizeof ( frame - > extramask ) ) )
2008-09-12 17:01:09 -07:00
return - EFAULT ;
2005-04-16 15:20:36 -07:00
}
2008-04-09 01:29:27 -07:00
if ( current - > mm - > context . vdso )
x86, vdso: Reimplement vdso.so preparation in build-time C
Currently, vdso.so files are prepared and analyzed by a combination
of objcopy, nm, some linker script tricks, and some simple ELF
parsers in the kernel. Replace all of that with plain C code that
runs at build time.
All five vdso images now generate .c files that are compiled and
linked in to the kernel image.
This should cause only one userspace-visible change: the loaded vDSO
images are stripped more heavily than they used to be. Everything
outside the loadable segment is dropped. In particular, this causes
the section table and section name strings to be missing. This
should be fine: real dynamic loaders don't load or inspect these
tables anyway. The result is roughly equivalent to eu-strip's
--strip-sections option.
The purpose of this change is to enable the vvar and hpet mappings
to be moved to the page following the vDSO load segment. Currently,
it is possible for the section table to extend into the page after
the load segment, so, if we map it, it risks overlapping the vvar or
hpet page. This happens whenever the load segment is just under a
multiple of PAGE_SIZE.
The only real subtlety here is that the old code had a C file with
inline assembler that did 'call VDSO32_vsyscall' and a linker script
that defined 'VDSO32_vsyscall = __kernel_vsyscall'. This most
likely worked by accident: the linker script entry defines a symbol
associated with an address as opposed to an alias for the real
dynamic symbol __kernel_vsyscall. That caused ld to relocate the
reference at link time instead of leaving an interposable dynamic
relocation. Since the VDSO32_vsyscall hack is no longer needed, I
now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it
work. vdso2c will generate an error and abort the build if the
resulting image contains any dynamic relocations, so we won't
silently generate bad vdso images.
(Dynamic relocations are a problem because nothing will even attempt
to relocate the vdso.)
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 12:19:34 -07:00
restorer = current - > mm - > context . vdso +
2015-10-05 17:47:56 -07:00
vdso_image_32 . sym___kernel_sigreturn ;
2007-02-13 13:26:26 +01:00
else
2008-01-30 13:33:23 +01:00
restorer = & frame - > retcode ;
2012-11-09 23:51:47 -05:00
if ( ksig - > ka . sa . sa_flags & SA_RESTORER )
restorer = ksig - > ka . sa . sa_restorer ;
2005-04-16 15:20:36 -07:00
/* Set up to return from userspace. */
err | = __put_user ( restorer , & frame - > pretcode ) ;
2008-03-06 10:33:08 +01:00
2005-04-16 15:20:36 -07:00
/*
2008-03-06 10:33:08 +01:00
* This is popl % eax ; movl $ __NR_sigreturn , % eax ; int $ 0x80
2005-04-16 15:20:36 -07:00
*
* WE DO NOT USE IT ANY MORE ! It ' s only left here for historical
* reasons and because gdb uses it as a signature to notice
* signal handler stack frames .
*/
2008-11-11 19:09:29 -08:00
err | = __put_user ( * ( ( u64 * ) & retcode ) , ( u64 * ) frame - > retcode ) ;
2005-04-16 15:20:36 -07:00
if ( err )
2008-09-12 17:01:09 -07:00
return - EFAULT ;
2005-04-16 15:20:36 -07:00
/* Set up registers for signal handler */
2008-03-06 10:33:08 +01:00
regs - > sp = ( unsigned long ) frame ;
2012-11-09 23:51:47 -05:00
regs - > ip = ( unsigned long ) ksig - > ka . sa . sa_handler ;
2008-03-06 10:33:08 +01:00
regs - > ax = ( unsigned long ) sig ;
2008-02-08 12:09:56 -08:00
regs - > dx = 0 ;
regs - > cx = 0 ;
2005-04-16 15:20:36 -07:00
2008-01-30 13:30:56 +01:00
regs - > ds = __USER_DS ;
regs - > es = __USER_DS ;
regs - > ss = __USER_DS ;
regs - > cs = __USER_CS ;
2005-04-16 15:20:36 -07:00
2006-01-18 17:44:00 -08:00
return 0 ;
2005-04-16 15:20:36 -07:00
}
2012-11-09 23:51:47 -05:00
static int __setup_rt_frame ( int sig , struct ksignal * ksig ,
2008-09-05 16:28:06 -07:00
sigset_t * set , struct pt_regs * regs )
2005-04-16 15:20:36 -07:00
{
struct rt_sigframe __user * frame ;
2008-03-06 10:33:08 +01:00
void __user * restorer ;
2005-04-16 15:20:36 -07:00
int err = 0 ;
2008-07-29 10:29:22 -07:00
void __user * fpstate = NULL ;
2005-04-16 15:20:36 -07:00
2012-11-09 23:51:47 -05:00
frame = get_sigframe ( & ksig - > ka , regs , sizeof ( * frame ) , & fpstate ) ;
2005-04-16 15:20:36 -07:00
if ( ! access_ok ( VERIFY_WRITE , frame , sizeof ( * frame ) ) )
2008-09-12 17:01:09 -07:00
return - EFAULT ;
2005-04-16 15:20:36 -07:00
2009-01-23 15:50:10 -08:00
put_user_try {
put_user_ex ( sig , & frame - > sig ) ;
put_user_ex ( & frame - > info , & frame - > pinfo ) ;
put_user_ex ( & frame - > uc , & frame - > puc ) ;
/* Create the ucontext. */
if ( cpu_has_xsave )
put_user_ex ( UC_FP_XSTATE , & frame - > uc . uc_flags ) ;
else
put_user_ex ( 0 , & frame - > uc . uc_flags ) ;
put_user_ex ( 0 , & frame - > uc . uc_link ) ;
2013-09-01 20:35:01 +01:00
save_altstack_ex ( & frame - > uc . uc_stack , regs - > sp ) ;
2009-01-23 15:50:10 -08:00
/* Set up to return from userspace. */
x86, vdso: Reimplement vdso.so preparation in build-time C
Currently, vdso.so files are prepared and analyzed by a combination
of objcopy, nm, some linker script tricks, and some simple ELF
parsers in the kernel. Replace all of that with plain C code that
runs at build time.
All five vdso images now generate .c files that are compiled and
linked in to the kernel image.
This should cause only one userspace-visible change: the loaded vDSO
images are stripped more heavily than they used to be. Everything
outside the loadable segment is dropped. In particular, this causes
the section table and section name strings to be missing. This
should be fine: real dynamic loaders don't load or inspect these
tables anyway. The result is roughly equivalent to eu-strip's
--strip-sections option.
The purpose of this change is to enable the vvar and hpet mappings
to be moved to the page following the vDSO load segment. Currently,
it is possible for the section table to extend into the page after
the load segment, so, if we map it, it risks overlapping the vvar or
hpet page. This happens whenever the load segment is just under a
multiple of PAGE_SIZE.
The only real subtlety here is that the old code had a C file with
inline assembler that did 'call VDSO32_vsyscall' and a linker script
that defined 'VDSO32_vsyscall = __kernel_vsyscall'. This most
likely worked by accident: the linker script entry defines a symbol
associated with an address as opposed to an alias for the real
dynamic symbol __kernel_vsyscall. That caused ld to relocate the
reference at link time instead of leaving an interposable dynamic
relocation. Since the VDSO32_vsyscall hack is no longer needed, I
now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it
work. vdso2c will generate an error and abort the build if the
resulting image contains any dynamic relocations, so we won't
silently generate bad vdso images.
(Dynamic relocations are a problem because nothing will even attempt
to relocate the vdso.)
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 12:19:34 -07:00
restorer = current - > mm - > context . vdso +
2015-10-05 17:47:56 -07:00
vdso_image_32 . sym___kernel_rt_sigreturn ;
2012-11-09 23:51:47 -05:00
if ( ksig - > ka . sa . sa_flags & SA_RESTORER )
restorer = ksig - > ka . sa . sa_restorer ;
2009-01-23 15:50:10 -08:00
put_user_ex ( restorer , & frame - > pretcode ) ;
/*
* This is movl $ __NR_rt_sigreturn , % ax ; int $ 0x80
*
* WE DO NOT USE IT ANY MORE ! It ' s only left here for historical
* reasons and because gdb uses it as a signature to notice
* signal handler stack frames .
*/
put_user_ex ( * ( ( u64 * ) & rt_retcode ) , ( u64 * ) frame - > retcode ) ;
} put_user_catch ( err ) ;
2012-09-21 17:18:44 -07:00
2012-11-09 23:51:47 -05:00
err | = copy_siginfo_to_user ( & frame - > info , & ksig - > info ) ;
2012-09-21 12:43:15 -07:00
err | = setup_sigcontext ( & frame - > uc . uc_mcontext , fpstate ,
regs , set - > sig [ 0 ] ) ;
err | = __copy_to_user ( & frame - > uc . uc_sigmask , set , sizeof ( * set ) ) ;
2005-04-16 15:20:36 -07:00
if ( err )
2008-09-12 17:01:09 -07:00
return - EFAULT ;
2005-04-16 15:20:36 -07:00
/* Set up registers for signal handler */
2008-03-06 10:33:08 +01:00
regs - > sp = ( unsigned long ) frame ;
2012-11-09 23:51:47 -05:00
regs - > ip = ( unsigned long ) ksig - > ka . sa . sa_handler ;
2008-09-05 16:28:38 -07:00
regs - > ax = ( unsigned long ) sig ;
2008-03-06 10:33:08 +01:00
regs - > dx = ( unsigned long ) & frame - > info ;
regs - > cx = ( unsigned long ) & frame - > uc ;
2005-04-16 15:20:36 -07:00
2008-01-30 13:30:56 +01:00
regs - > ds = __USER_DS ;
regs - > es = __USER_DS ;
regs - > ss = __USER_DS ;
regs - > cs = __USER_CS ;
2005-04-16 15:20:36 -07:00
2006-01-18 17:44:00 -08:00
return 0 ;
2005-04-16 15:20:36 -07:00
}
2008-11-24 18:23:12 -08:00
# else /* !CONFIG_X86_32 */
2012-11-09 23:51:47 -05:00
static int __setup_rt_frame ( int sig , struct ksignal * ksig ,
2008-11-24 18:23:12 -08:00
sigset_t * set , struct pt_regs * regs )
{
struct rt_sigframe __user * frame ;
void __user * fp = NULL ;
int err = 0 ;
2012-11-09 23:51:47 -05:00
frame = get_sigframe ( & ksig - > ka , regs , sizeof ( struct rt_sigframe ) , & fp ) ;
2008-11-24 18:23:12 -08:00
if ( ! access_ok ( VERIFY_WRITE , frame , sizeof ( * frame ) ) )
return - EFAULT ;
2012-11-09 23:51:47 -05:00
if ( ksig - > ka . sa . sa_flags & SA_SIGINFO ) {
if ( copy_siginfo_to_user ( & frame - > info , & ksig - > info ) )
2008-11-24 18:23:12 -08:00
return - EFAULT ;
}
2009-01-23 15:50:10 -08:00
put_user_try {
/* Create the ucontext. */
if ( cpu_has_xsave )
put_user_ex ( UC_FP_XSTATE , & frame - > uc . uc_flags ) ;
else
put_user_ex ( 0 , & frame - > uc . uc_flags ) ;
put_user_ex ( 0 , & frame - > uc . uc_link ) ;
2013-09-01 20:35:01 +01:00
save_altstack_ex ( & frame - > uc . uc_stack , regs - > sp ) ;
2009-01-23 15:50:10 -08:00
/* Set up to return from userspace. If provided, use a stub
already in userspace . */
/* x86-64 should always use SA_RESTORER. */
2012-11-09 23:51:47 -05:00
if ( ksig - > ka . sa . sa_flags & SA_RESTORER ) {
put_user_ex ( ksig - > ka . sa . sa_restorer , & frame - > pretcode ) ;
2009-01-23 15:50:10 -08:00
} else {
/* could use a vstub here */
err | = - EFAULT ;
}
} put_user_catch ( err ) ;
2008-11-24 18:23:12 -08:00
2012-09-21 12:43:15 -07:00
err | = setup_sigcontext ( & frame - > uc . uc_mcontext , fp , regs , set - > sig [ 0 ] ) ;
err | = __copy_to_user ( & frame - > uc . uc_sigmask , set , sizeof ( * set ) ) ;
2008-11-24 18:23:12 -08:00
if ( err )
return - EFAULT ;
/* Set up registers for signal handler */
regs - > di = sig ;
/* In case the signal handler was declared without prototypes */
regs - > ax = 0 ;
/* This also works for non SA_SIGINFO handlers because they expect the
next argument after the signal number on the stack . */
regs - > si = ( unsigned long ) & frame - > info ;
regs - > dx = ( unsigned long ) & frame - > uc ;
2012-11-09 23:51:47 -05:00
regs - > ip = ( unsigned long ) ksig - > ka . sa . sa_handler ;
2008-11-24 18:23:12 -08:00
regs - > sp = ( unsigned long ) frame ;
2015-08-13 08:25:20 -07:00
/* Set up the CS register to run signal handlers in 64-bit mode,
even if the handler happens to be interrupting 32 - bit code . */
2008-11-24 18:23:12 -08:00
regs - > cs = __USER_CS ;
return 0 ;
}
# endif /* CONFIG_X86_32 */
2012-11-09 23:51:47 -05:00
static int x32_setup_rt_frame ( struct ksignal * ksig ,
compat_sigset_t * set ,
2012-07-24 16:05:27 -07:00
struct pt_regs * regs )
{
# ifdef CONFIG_X86_X32_ABI
struct rt_sigframe_x32 __user * frame ;
void __user * restorer ;
int err = 0 ;
void __user * fpstate = NULL ;
2012-11-09 23:51:47 -05:00
frame = get_sigframe ( & ksig - > ka , regs , sizeof ( * frame ) , & fpstate ) ;
2012-07-24 16:05:27 -07:00
if ( ! access_ok ( VERIFY_WRITE , frame , sizeof ( * frame ) ) )
return - EFAULT ;
2012-11-09 23:51:47 -05:00
if ( ksig - > ka . sa . sa_flags & SA_SIGINFO ) {
if ( copy_siginfo_to_user32 ( & frame - > info , & ksig - > info ) )
2012-07-24 16:05:27 -07:00
return - EFAULT ;
}
put_user_try {
/* Create the ucontext. */
if ( cpu_has_xsave )
put_user_ex ( UC_FP_XSTATE , & frame - > uc . uc_flags ) ;
else
put_user_ex ( 0 , & frame - > uc . uc_flags ) ;
put_user_ex ( 0 , & frame - > uc . uc_link ) ;
2013-09-01 20:35:01 +01:00
compat_save_altstack_ex ( & frame - > uc . uc_stack , regs - > sp ) ;
2012-07-24 16:05:27 -07:00
put_user_ex ( 0 , & frame - > uc . uc__pad0 ) ;
2012-11-09 23:51:47 -05:00
if ( ksig - > ka . sa . sa_flags & SA_RESTORER ) {
restorer = ksig - > ka . sa . sa_restorer ;
2012-07-24 16:05:27 -07:00
} else {
/* could use a vstub here */
restorer = NULL ;
err | = - EFAULT ;
}
put_user_ex ( restorer , & frame - > pretcode ) ;
} put_user_catch ( err ) ;
2012-09-21 17:18:44 -07:00
err | = setup_sigcontext ( & frame - > uc . uc_mcontext , fpstate ,
regs , set - > sig [ 0 ] ) ;
err | = __copy_to_user ( & frame - > uc . uc_sigmask , set , sizeof ( * set ) ) ;
2012-07-24 16:05:27 -07:00
if ( err )
return - EFAULT ;
/* Set up registers for signal handler */
regs - > sp = ( unsigned long ) frame ;
2012-11-09 23:51:47 -05:00
regs - > ip = ( unsigned long ) ksig - > ka . sa . sa_handler ;
2012-07-24 16:05:27 -07:00
/* We use the x32 calling convention here... */
2012-11-09 23:51:47 -05:00
regs - > di = ksig - > sig ;
2012-07-24 16:05:27 -07:00
regs - > si = ( unsigned long ) & frame - > info ;
regs - > dx = ( unsigned long ) & frame - > uc ;
loadsegment ( ds , __USER_DS ) ;
loadsegment ( es , __USER_DS ) ;
regs - > cs = __USER_CS ;
regs - > ss = __USER_DS ;
# endif /* CONFIG_X86_X32_ABI */
return 0 ;
}
2008-11-24 18:23:12 -08:00
/*
* Do a signal return ; undo the signal stack .
*/
2008-11-24 18:24:11 -08:00
# ifdef CONFIG_X86_32
2013-08-05 15:02:40 -07:00
asmlinkage unsigned long sys_sigreturn ( void )
2008-11-24 18:23:12 -08:00
{
2012-11-12 14:32:42 -05:00
struct pt_regs * regs = current_pt_regs ( ) ;
2008-11-24 18:23:12 -08:00
struct sigframe __user * frame ;
sigset_t set ;
frame = ( struct sigframe __user * ) ( regs - > sp - 8 ) ;
if ( ! access_ok ( VERIFY_READ , frame , sizeof ( * frame ) ) )
goto badframe ;
if ( __get_user ( set . sig [ 0 ] , & frame - > sc . oldmask ) | | ( _NSIG_WORDS > 1
& & __copy_from_user ( & set . sig [ 1 ] , & frame - > extramask ,
sizeof ( frame - > extramask ) ) ) )
goto badframe ;
2011-07-10 21:27:27 +02:00
set_current_blocked ( & set ) ;
2008-11-24 18:23:12 -08:00
2015-04-04 08:58:23 -04:00
if ( restore_sigcontext ( regs , & frame - > sc ) )
2008-11-24 18:23:12 -08:00
goto badframe ;
2015-04-04 08:58:23 -04:00
return regs - > ax ;
2008-11-24 18:23:12 -08:00
badframe :
2008-12-16 14:02:16 -08:00
signal_fault ( regs , frame , " sigreturn " ) ;
2008-11-24 18:23:12 -08:00
return 0 ;
}
2008-11-24 18:24:11 -08:00
# endif /* CONFIG_X86_32 */
2008-11-24 18:23:12 -08:00
2013-08-05 15:02:40 -07:00
asmlinkage long sys_rt_sigreturn ( void )
2008-11-24 18:23:12 -08:00
{
2012-11-12 14:32:42 -05:00
struct pt_regs * regs = current_pt_regs ( ) ;
2008-11-24 18:23:12 -08:00
struct rt_sigframe __user * frame ;
sigset_t set ;
frame = ( struct rt_sigframe __user * ) ( regs - > sp - sizeof ( long ) ) ;
if ( ! access_ok ( VERIFY_READ , frame , sizeof ( * frame ) ) )
goto badframe ;
if ( __copy_from_user ( & set , & frame - > uc . uc_sigmask , sizeof ( set ) ) )
goto badframe ;
2011-04-27 21:09:39 +02:00
set_current_blocked ( & set ) ;
2008-11-24 18:23:12 -08:00
2015-04-04 08:58:23 -04:00
if ( restore_sigcontext ( regs , & frame - > uc . uc_mcontext ) )
2008-11-24 18:23:12 -08:00
goto badframe ;
2012-11-20 14:24:26 -05:00
if ( restore_altstack ( & frame - > uc . uc_stack ) )
2008-11-24 18:23:12 -08:00
goto badframe ;
2015-04-04 08:58:23 -04:00
return regs - > ax ;
2008-11-24 18:23:12 -08:00
badframe :
signal_fault ( regs , frame , " rt_sigreturn " ) ;
return 0 ;
}
2015-04-30 07:26:04 +02:00
static inline int is_ia32_compat_frame ( void )
{
return config_enabled ( CONFIG_IA32_EMULATION ) & &
test_thread_flag ( TIF_IA32 ) ;
}
static inline int is_ia32_frame ( void )
{
return config_enabled ( CONFIG_X86_32 ) | | is_ia32_compat_frame ( ) ;
}
static inline int is_x32_frame ( void )
{
return config_enabled ( CONFIG_X86_X32_ABI ) & & test_thread_flag ( TIF_X32 ) ;
}
2008-09-05 16:28:06 -07:00
static int
2012-11-09 23:51:47 -05:00
setup_rt_frame ( struct ksignal * ksig , struct pt_regs * regs )
2008-09-05 16:28:06 -07:00
{
2014-07-13 17:43:51 +02:00
int usig = ksig - > sig ;
2012-05-02 09:59:21 -04:00
sigset_t * set = sigmask_to_save ( ) ;
2012-07-24 16:05:27 -07:00
compat_sigset_t * cset = ( compat_sigset_t * ) set ;
2008-09-05 16:28:06 -07:00
/* Set up the stack frame */
2012-07-24 16:05:27 -07:00
if ( is_ia32_frame ( ) ) {
2012-11-09 23:51:47 -05:00
if ( ksig - > ka . sa . sa_flags & SA_SIGINFO )
return ia32_setup_rt_frame ( usig , ksig , cset , regs ) ;
2008-09-24 19:13:11 -07:00
else
2012-11-09 23:51:47 -05:00
return ia32_setup_frame ( usig , ksig , cset , regs ) ;
2012-07-24 16:05:27 -07:00
} else if ( is_x32_frame ( ) ) {
2012-11-09 23:51:47 -05:00
return x32_setup_rt_frame ( ksig , cset , regs ) ;
2012-02-19 09:41:09 -08:00
} else {
2012-11-09 23:51:47 -05:00
return __setup_rt_frame ( ksig - > sig , ksig , set , regs ) ;
2012-02-19 09:41:09 -08:00
}
2008-09-05 16:28:06 -07:00
}
2012-05-21 23:42:15 -04:00
static void
2012-11-09 23:51:47 -05:00
handle_signal ( struct ksignal * ksig , struct pt_regs * regs )
2005-04-16 15:20:36 -07:00
{
x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
When the TIF_SINGLESTEP tracee dequeues a signal,
handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
leaves TIF_SINGLESTEP set.
If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
sets X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the
subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
tracee gets the wrong SIGTRAP.
Test-case (needs -O2 to avoid prologue insns in signal handler):
#include <unistd.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <assert.h>
#include <stddef.h>
void handler(int n)
{
asm("nop");
}
int child(void)
{
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
signal(SIGALRM, handler);
kill(getpid(), SIGALRM);
return 0x23;
}
void *getip(int pid)
{
return (void*)ptrace(PTRACE_PEEKUSER, pid,
offsetof(struct user, regs.rip), 0);
}
int main(void)
{
int pid, status;
pid = fork();
if (!pid)
return child();
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 0);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 1);
assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
assert(wait(&status) == pid);
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
return 0;
}
The last assert() fails because PTRACE_CONT wrongly triggers
another single-step and X86_EFLAGS_TF can't be cleared by
debugger until the tracee does sys_rt_sigreturn().
Change handle_signal() to do user_disable_single_step() if
stepping, we do not need to preserve TIF_SINGLESTEP because we
are going to do ptrace_notify(), and it is simply wrong to leak
this bit.
While at it, change the comment to explain why we also need to
clear TF unconditionally after setup_rt_frame().
Note: in the longer term we should probably change
setup_sigcontext() to use get_flags() and then just remove this
user_disable_single_step(). And, the state of TIF_FORCED_TF can
be wrong after restore_sigcontext() which can set/clear TF, this
needs another fix.
This fix fixes the 'single_step_syscall_32' testcase in
the x86 testsuite:
Before:
~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[WARN] Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
Trace/breakpoint trap (core dumped)
After:
~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[OK] Survived with TF set and 39 traps
[RUN] Fast syscall with TF cleared
[OK] Nothing unexpected happened
Reported-by: Evan Teran <eteran@alum.rit.edu>
Reported-by: Pedro Alves <palves@redhat.com>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Added x86 self-test info. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-16 00:40:25 -07:00
bool stepping , failed ;
2015-04-23 12:49:20 +02:00
struct fpu * fpu = & current - > thread . fpu ;
x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
When the TIF_SINGLESTEP tracee dequeues a signal,
handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
leaves TIF_SINGLESTEP set.
If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
sets X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the
subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
tracee gets the wrong SIGTRAP.
Test-case (needs -O2 to avoid prologue insns in signal handler):
#include <unistd.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <assert.h>
#include <stddef.h>
void handler(int n)
{
asm("nop");
}
int child(void)
{
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
signal(SIGALRM, handler);
kill(getpid(), SIGALRM);
return 0x23;
}
void *getip(int pid)
{
return (void*)ptrace(PTRACE_PEEKUSER, pid,
offsetof(struct user, regs.rip), 0);
}
int main(void)
{
int pid, status;
pid = fork();
if (!pid)
return child();
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 0);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 1);
assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
assert(wait(&status) == pid);
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
return 0;
}
The last assert() fails because PTRACE_CONT wrongly triggers
another single-step and X86_EFLAGS_TF can't be cleared by
debugger until the tracee does sys_rt_sigreturn().
Change handle_signal() to do user_disable_single_step() if
stepping, we do not need to preserve TIF_SINGLESTEP because we
are going to do ptrace_notify(), and it is simply wrong to leak
this bit.
While at it, change the comment to explain why we also need to
clear TF unconditionally after setup_rt_frame().
Note: in the longer term we should probably change
setup_sigcontext() to use get_flags() and then just remove this
user_disable_single_step(). And, the state of TIF_FORCED_TF can
be wrong after restore_sigcontext() which can set/clear TF, this
needs another fix.
This fix fixes the 'single_step_syscall_32' testcase in
the x86 testsuite:
Before:
~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[WARN] Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
Trace/breakpoint trap (core dumped)
After:
~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[OK] Survived with TF set and 39 traps
[RUN] Fast syscall with TF cleared
[OK] Nothing unexpected happened
Reported-by: Evan Teran <eteran@alum.rit.edu>
Reported-by: Pedro Alves <palves@redhat.com>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Added x86 self-test info. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-16 00:40:25 -07:00
2015-07-29 01:41:19 -04:00
if ( v8086_mode ( regs ) )
save_v86_state ( ( struct kernel_vm86_regs * ) regs , VM86_SIGNAL ) ;
2005-04-16 15:20:36 -07:00
/* Are we from a system call? */
2008-09-05 16:26:55 -07:00
if ( syscall_get_nr ( current , regs ) > = 0 ) {
2005-04-16 15:20:36 -07:00
/* If so, check system call restarting.. */
2008-09-05 16:26:55 -07:00
switch ( syscall_get_error ( current , regs ) ) {
2008-02-08 12:09:58 -08:00
case - ERESTART_RESTARTBLOCK :
case - ERESTARTNOHAND :
regs - > ax = - EINTR ;
break ;
case - ERESTARTSYS :
2012-11-09 23:51:47 -05:00
if ( ! ( ksig - > ka . sa . sa_flags & SA_RESTART ) ) {
2008-01-30 13:30:56 +01:00
regs - > ax = - EINTR ;
2005-04-16 15:20:36 -07:00
break ;
2008-02-08 12:09:58 -08:00
}
/* fallthrough */
case - ERESTARTNOINTR :
regs - > ax = regs - > orig_ax ;
regs - > ip - = 2 ;
break ;
2005-04-16 15:20:36 -07:00
}
}
/*
x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
When the TIF_SINGLESTEP tracee dequeues a signal,
handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
leaves TIF_SINGLESTEP set.
If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
sets X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the
subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
tracee gets the wrong SIGTRAP.
Test-case (needs -O2 to avoid prologue insns in signal handler):
#include <unistd.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <assert.h>
#include <stddef.h>
void handler(int n)
{
asm("nop");
}
int child(void)
{
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
signal(SIGALRM, handler);
kill(getpid(), SIGALRM);
return 0x23;
}
void *getip(int pid)
{
return (void*)ptrace(PTRACE_PEEKUSER, pid,
offsetof(struct user, regs.rip), 0);
}
int main(void)
{
int pid, status;
pid = fork();
if (!pid)
return child();
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 0);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 1);
assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
assert(wait(&status) == pid);
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
return 0;
}
The last assert() fails because PTRACE_CONT wrongly triggers
another single-step and X86_EFLAGS_TF can't be cleared by
debugger until the tracee does sys_rt_sigreturn().
Change handle_signal() to do user_disable_single_step() if
stepping, we do not need to preserve TIF_SINGLESTEP because we
are going to do ptrace_notify(), and it is simply wrong to leak
this bit.
While at it, change the comment to explain why we also need to
clear TF unconditionally after setup_rt_frame().
Note: in the longer term we should probably change
setup_sigcontext() to use get_flags() and then just remove this
user_disable_single_step(). And, the state of TIF_FORCED_TF can
be wrong after restore_sigcontext() which can set/clear TF, this
needs another fix.
This fix fixes the 'single_step_syscall_32' testcase in
the x86 testsuite:
Before:
~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[WARN] Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
Trace/breakpoint trap (core dumped)
After:
~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[OK] Survived with TF set and 39 traps
[RUN] Fast syscall with TF cleared
[OK] Nothing unexpected happened
Reported-by: Evan Teran <eteran@alum.rit.edu>
Reported-by: Pedro Alves <palves@redhat.com>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Added x86 self-test info. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-16 00:40:25 -07:00
* If TF is set due to a debugger ( TIF_FORCED_TF ) , clear TF now
* so that register information in the sigcontext is correct and
* then notify the tracer before entering the signal handler .
2005-04-16 15:20:36 -07:00
*/
x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
When the TIF_SINGLESTEP tracee dequeues a signal,
handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
leaves TIF_SINGLESTEP set.
If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
sets X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the
subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
tracee gets the wrong SIGTRAP.
Test-case (needs -O2 to avoid prologue insns in signal handler):
#include <unistd.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <assert.h>
#include <stddef.h>
void handler(int n)
{
asm("nop");
}
int child(void)
{
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
signal(SIGALRM, handler);
kill(getpid(), SIGALRM);
return 0x23;
}
void *getip(int pid)
{
return (void*)ptrace(PTRACE_PEEKUSER, pid,
offsetof(struct user, regs.rip), 0);
}
int main(void)
{
int pid, status;
pid = fork();
if (!pid)
return child();
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 0);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 1);
assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
assert(wait(&status) == pid);
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
return 0;
}
The last assert() fails because PTRACE_CONT wrongly triggers
another single-step and X86_EFLAGS_TF can't be cleared by
debugger until the tracee does sys_rt_sigreturn().
Change handle_signal() to do user_disable_single_step() if
stepping, we do not need to preserve TIF_SINGLESTEP because we
are going to do ptrace_notify(), and it is simply wrong to leak
this bit.
While at it, change the comment to explain why we also need to
clear TF unconditionally after setup_rt_frame().
Note: in the longer term we should probably change
setup_sigcontext() to use get_flags() and then just remove this
user_disable_single_step(). And, the state of TIF_FORCED_TF can
be wrong after restore_sigcontext() which can set/clear TF, this
needs another fix.
This fix fixes the 'single_step_syscall_32' testcase in
the x86 testsuite:
Before:
~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[WARN] Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
Trace/breakpoint trap (core dumped)
After:
~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[OK] Survived with TF set and 39 traps
[RUN] Fast syscall with TF cleared
[OK] Nothing unexpected happened
Reported-by: Evan Teran <eteran@alum.rit.edu>
Reported-by: Pedro Alves <palves@redhat.com>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Added x86 self-test info. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-16 00:40:25 -07:00
stepping = test_thread_flag ( TIF_SINGLESTEP ) ;
if ( stepping )
user_disable_single_step ( current ) ;
2005-04-16 15:20:36 -07:00
2012-11-09 23:51:47 -05:00
failed = ( setup_rt_frame ( ksig , regs ) < 0 ) ;
if ( ! failed ) {
/*
* Clear the direction flag as per the ABI for function entry .
2013-05-01 17:25:43 +02:00
*
2013-05-01 17:25:42 +02:00
* Clear RF when entering the signal handler , because
* it might disable possible debug exception from the
* signal handler .
2013-05-01 17:25:43 +02:00
*
x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
When the TIF_SINGLESTEP tracee dequeues a signal,
handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
leaves TIF_SINGLESTEP set.
If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
sets X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the
subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
tracee gets the wrong SIGTRAP.
Test-case (needs -O2 to avoid prologue insns in signal handler):
#include <unistd.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <assert.h>
#include <stddef.h>
void handler(int n)
{
asm("nop");
}
int child(void)
{
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
signal(SIGALRM, handler);
kill(getpid(), SIGALRM);
return 0x23;
}
void *getip(int pid)
{
return (void*)ptrace(PTRACE_PEEKUSER, pid,
offsetof(struct user, regs.rip), 0);
}
int main(void)
{
int pid, status;
pid = fork();
if (!pid)
return child();
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 0);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 1);
assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
assert(wait(&status) == pid);
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
return 0;
}
The last assert() fails because PTRACE_CONT wrongly triggers
another single-step and X86_EFLAGS_TF can't be cleared by
debugger until the tracee does sys_rt_sigreturn().
Change handle_signal() to do user_disable_single_step() if
stepping, we do not need to preserve TIF_SINGLESTEP because we
are going to do ptrace_notify(), and it is simply wrong to leak
this bit.
While at it, change the comment to explain why we also need to
clear TF unconditionally after setup_rt_frame().
Note: in the longer term we should probably change
setup_sigcontext() to use get_flags() and then just remove this
user_disable_single_step(). And, the state of TIF_FORCED_TF can
be wrong after restore_sigcontext() which can set/clear TF, this
needs another fix.
This fix fixes the 'single_step_syscall_32' testcase in
the x86 testsuite:
Before:
~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[WARN] Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
Trace/breakpoint trap (core dumped)
After:
~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[OK] Survived with TF set and 39 traps
[RUN] Fast syscall with TF cleared
[OK] Nothing unexpected happened
Reported-by: Evan Teran <eteran@alum.rit.edu>
Reported-by: Pedro Alves <palves@redhat.com>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Added x86 self-test info. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-16 00:40:25 -07:00
* Clear TF for the case when it wasn ' t set by debugger to
* avoid the recursive send_sigtrap ( ) in SIGTRAP handler .
2012-11-09 23:51:47 -05:00
*/
2013-05-01 17:25:43 +02:00
regs - > flags & = ~ ( X86_EFLAGS_DF | X86_EFLAGS_RF | X86_EFLAGS_TF ) ;
x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
save_xstate_sig()->drop_init_fpu() doesn't look right. setup_rt_frame()
can fail after that, in this case the next setup_rt_frame() triggered
by SIGSEGV won't save fpu simply because the old state was lost. This
obviously mean that fpu won't be restored after sys_rt_sigreturn() from
SIGSEGV handler.
Shift drop_init_fpu() into !failed branch in handle_signal().
Test-case (needs -O2):
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <pthread.h>
#include <assert.h>
volatile double D;
void test(double d)
{
int pid = getpid();
for (D = d; D == d; ) {
/* sys_tkill(pid, SIGHUP); asm to avoid save/reload
* fp regs around "C" call */
asm ("" : : "a"(200), "D"(pid), "S"(1));
asm ("syscall" : : : "ax");
}
printf("ERR!!\n");
}
void sigh(int sig)
{
}
char altstack[4096 * 10] __attribute__((aligned(4096)));
void *tfunc(void *arg)
{
for (;;) {
mprotect(altstack, sizeof(altstack), PROT_READ);
mprotect(altstack, sizeof(altstack), PROT_READ|PROT_WRITE);
}
}
int main(void)
{
stack_t st = {
.ss_sp = altstack,
.ss_size = sizeof(altstack),
.ss_flags = SS_ONSTACK,
};
struct sigaction sa = {
.sa_handler = sigh,
};
pthread_t pt;
sigaction(SIGSEGV, &sa, NULL);
sigaltstack(&st, NULL);
sa.sa_flags = SA_ONSTACK;
sigaction(SIGHUP, &sa, NULL);
pthread_create(&pt, NULL, tfunc, NULL);
test(123.456);
return 0;
}
Reported-by: Bean Anderson <bean@azulsystems.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20140902175713.GA21646@redhat.com
Cc: <stable@kernel.org> # v3.7+
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-02 19:57:13 +02:00
/*
* Ensure the signal handler starts with the new fpu state .
*/
2015-04-23 12:49:20 +02:00
if ( fpu - > fpstate_active )
2015-04-30 07:12:46 +02:00
fpu__clear ( fpu ) ;
2012-05-21 23:42:15 -04:00
}
x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
When the TIF_SINGLESTEP tracee dequeues a signal,
handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
leaves TIF_SINGLESTEP set.
If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
sets X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the
subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
tracee gets the wrong SIGTRAP.
Test-case (needs -O2 to avoid prologue insns in signal handler):
#include <unistd.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <assert.h>
#include <stddef.h>
void handler(int n)
{
asm("nop");
}
int child(void)
{
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
signal(SIGALRM, handler);
kill(getpid(), SIGALRM);
return 0x23;
}
void *getip(int pid)
{
return (void*)ptrace(PTRACE_PEEKUSER, pid,
offsetof(struct user, regs.rip), 0);
}
int main(void)
{
int pid, status;
pid = fork();
if (!pid)
return child();
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 0);
assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert((getip(pid) - (void*)handler) == 1);
assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
assert(wait(&status) == pid);
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
return 0;
}
The last assert() fails because PTRACE_CONT wrongly triggers
another single-step and X86_EFLAGS_TF can't be cleared by
debugger until the tracee does sys_rt_sigreturn().
Change handle_signal() to do user_disable_single_step() if
stepping, we do not need to preserve TIF_SINGLESTEP because we
are going to do ptrace_notify(), and it is simply wrong to leak
this bit.
While at it, change the comment to explain why we also need to
clear TF unconditionally after setup_rt_frame().
Note: in the longer term we should probably change
setup_sigcontext() to use get_flags() and then just remove this
user_disable_single_step(). And, the state of TIF_FORCED_TF can
be wrong after restore_sigcontext() which can set/clear TF, this
needs another fix.
This fix fixes the 'single_step_syscall_32' testcase in
the x86 testsuite:
Before:
~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[WARN] Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
Trace/breakpoint trap (core dumped)
After:
~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
[RUN] Set TF and check nop
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check int80
[OK] Survived with TF set and 9 traps
[RUN] Set TF and check a fast syscall
[OK] Survived with TF set and 39 traps
[RUN] Fast syscall with TF cleared
[OK] Nothing unexpected happened
Reported-by: Evan Teran <eteran@alum.rit.edu>
Reported-by: Pedro Alves <palves@redhat.com>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Added x86 self-test info. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-16 00:40:25 -07:00
signal_setup_done ( failed , ksig , stepping ) ;
2005-04-16 15:20:36 -07:00
}
2015-12-01 00:54:36 +03:00
static inline unsigned long get_nr_restart_syscall ( const struct pt_regs * regs )
{
# if defined(CONFIG_X86_32) || !defined(CONFIG_X86_64)
return __NR_restart_syscall ;
# else /* !CONFIG_X86_32 && CONFIG_X86_64 */
return test_thread_flag ( TIF_IA32 ) ? __NR_ia32_restart_syscall :
__NR_restart_syscall | ( regs - > orig_ax & __X32_SYSCALL_BIT ) ;
# endif /* CONFIG_X86_32 || !CONFIG_X86_64 */
}
2008-10-29 18:46:40 -07:00
2005-04-16 15:20:36 -07:00
/*
* Note that ' init ' is a special process : it doesn ' t get signals it doesn ' t
* want to handle . Thus you cannot kill init even with a SIGKILL even by
* mistake .
*/
2015-07-03 12:44:23 -07:00
void do_signal ( struct pt_regs * regs )
2005-04-16 15:20:36 -07:00
{
2012-11-09 23:51:47 -05:00
struct ksignal ksig ;
2005-04-16 15:20:36 -07:00
2012-11-09 23:51:47 -05:00
if ( get_signal ( & ksig ) ) {
2008-03-06 10:33:08 +01:00
/* Whee! Actually deliver the signal. */
2012-11-09 23:51:47 -05:00
handle_signal ( & ksig , regs ) ;
2006-01-18 17:44:00 -08:00
return ;
2005-04-16 15:20:36 -07:00
}
/* Did we come from a system call? */
2008-09-05 16:26:55 -07:00
if ( syscall_get_nr ( current , regs ) > = 0 ) {
2005-04-16 15:20:36 -07:00
/* Restart the system call - no handlers present */
2008-09-05 16:26:55 -07:00
switch ( syscall_get_error ( current , regs ) ) {
2006-01-18 17:44:00 -08:00
case - ERESTARTNOHAND :
case - ERESTARTSYS :
case - ERESTARTNOINTR :
2008-01-30 13:30:56 +01:00
regs - > ax = regs - > orig_ax ;
regs - > ip - = 2 ;
2006-01-18 17:44:00 -08:00
break ;
case - ERESTART_RESTARTBLOCK :
2015-12-01 00:54:36 +03:00
regs - > ax = get_nr_restart_syscall ( regs ) ;
2008-01-30 13:30:56 +01:00
regs - > ip - = 2 ;
2006-01-18 17:44:00 -08:00
break ;
2005-04-16 15:20:36 -07:00
}
}
2006-01-18 17:44:00 -08:00
2008-02-08 12:09:58 -08:00
/*
* If there ' s no signal to deliver , we just put the saved sigmask
* back .
*/
2012-05-21 23:33:55 -04:00
restore_saved_sigmask ( ) ;
2005-04-16 15:20:36 -07:00
}
2008-09-05 16:27:11 -07:00
void signal_fault ( struct pt_regs * regs , void __user * frame , char * where )
{
struct task_struct * me = current ;
if ( show_unhandled_signals & & printk_ratelimit ( ) ) {
2008-12-16 14:02:16 -08:00
printk ( " %s "
2008-09-05 16:27:11 -07:00
" %s[%d] bad frame in %s frame:%p ip:%lx sp:%lx orax:%lx " ,
2008-12-16 14:02:16 -08:00
task_pid_nr ( current ) > 1 ? KERN_INFO : KERN_EMERG ,
2008-09-05 16:27:11 -07:00
me - > comm , me - > pid , where , frame ,
regs - > ip , regs - > sp , regs - > orig_ax ) ;
print_vma_addr ( " in " , regs - > ip ) ;
2012-05-21 19:50:07 -07:00
pr_cont ( " \n " ) ;
2008-09-05 16:27:11 -07:00
}
force_sig ( SIGSEGV , me ) ;
}
2012-02-19 09:41:09 -08:00
# ifdef CONFIG_X86_X32_ABI
2012-11-12 14:32:42 -05:00
asmlinkage long sys32_x32_rt_sigreturn ( void )
2012-02-19 09:41:09 -08:00
{
2012-11-12 14:32:42 -05:00
struct pt_regs * regs = current_pt_regs ( ) ;
2012-02-19 09:41:09 -08:00
struct rt_sigframe_x32 __user * frame ;
sigset_t set ;
frame = ( struct rt_sigframe_x32 __user * ) ( regs - > sp - 8 ) ;
if ( ! access_ok ( VERIFY_READ , frame , sizeof ( * frame ) ) )
goto badframe ;
if ( __copy_from_user ( & set , & frame - > uc . uc_sigmask , sizeof ( set ) ) )
goto badframe ;
set_current_blocked ( & set ) ;
2015-04-04 08:58:23 -04:00
if ( restore_sigcontext ( regs , & frame - > uc . uc_mcontext ) )
2012-02-19 09:41:09 -08:00
goto badframe ;
2012-12-14 14:47:53 -05:00
if ( compat_restore_altstack ( & frame - > uc . uc_stack ) )
2012-02-19 09:41:09 -08:00
goto badframe ;
2015-04-04 08:58:23 -04:00
return regs - > ax ;
2012-02-19 09:41:09 -08:00
badframe :
signal_fault ( regs , frame , " x32 rt_sigreturn " ) ;
return 0 ;
}
# endif