2006-02-04 00:10:01 -08:00
/ * winfixup. S : H a n d l e c a s e s w h e r e u s e r s t a c k p o i n t e r i s f o u n d t o b e b o g u s .
2005-04-16 15:20:36 -07:00
*
2006-02-04 00:10:01 -08:00
* Copyright ( C ) 1 9 9 7 , 2 0 0 6 D a v i d S . M i l l e r ( d a v e m @davemloft.net)
2005-04-16 15:20:36 -07:00
* /
# include < a s m / a s i . h >
# include < a s m / h e a d . h >
# include < a s m / p a g e . h >
# include < a s m / p t r a c e . h >
# include < a s m / p r o c e s s o r . h >
# include < a s m / s p i t f i r e . h >
# include < a s m / t h r e a d _ i n f o . h >
.text
2006-02-04 00:10:01 -08:00
/ * It u s e d t o b e t h e c a s e t h a t t h e s e r e g i s t e r w i n d o w f a u l t
* handlers c o u l d r u n v i a t h e s a v e a n d r e s t o r e i n s t r u c t i o n s
* done b y t h e t r a p e n t r y a n d e x i t c o d e . T h e y n o w d o t h e
* window s p i l l / f i l l b y h a n d , s o t h a t c a s e n o l o n g e r c a n o c c u r .
* /
2005-04-16 15:20:36 -07:00
.align 32
fill_fixup :
2006-02-02 21:55:10 -08:00
TRAP_ L O A D _ T H R E A D _ R E G ( % g 6 , % g 1 )
2006-02-04 00:10:01 -08:00
rdpr % t s t a t e , % g 1
and % g 1 , T S T A T E _ C W P , % g 1
or % g 4 , F A U L T _ C O D E _ W I N F I X U P , % g 4
stb % g 4 , [ % g 6 + T I _ F A U L T _ C O D E ]
stx % g 5 , [ % g 6 + T I _ F A U L T _ A D D R ]
wrpr % g 1 , % c w p
ba,p t % x c c , e t r a p
rd % p c , % g 7
call d o _ s p a r c64 _ f a u l t
add % s p , P T R E G S _ O F F , % o 0
2016-04-27 17:27:37 -04:00
ba,a ,p t % x c c , r t r a p
2005-04-16 15:20:36 -07:00
2006-02-04 00:10:01 -08:00
/ * Be v e r y c a r e f u l a b o u t u s a g e o f t h e t r a p g l o b a l s h e r e .
* You c a n n o t t o u c h % g 5 a s t h a t h a s t h e f a u l t i n f o r m a t i o n .
2005-04-16 15:20:36 -07:00
* /
spill_fixup :
2006-02-04 00:10:01 -08:00
spill_fixup_mna :
spill_fixup_dax :
2006-02-02 21:55:10 -08:00
TRAP_ L O A D _ T H R E A D _ R E G ( % g 6 , % g 1 )
2006-02-04 00:10:01 -08:00
ldx [ % g 6 + T I _ F L A G S ] , % g 1
sparc64: Make montmul/montsqr/mpmul usable in 32-bit threads.
The Montgomery Multiply, Montgomery Square, and Multiple-Precision
Multiply instructions work by loading a combination of the floating
point and multiple register windows worth of integer registers
with the inputs.
These values are 64-bit. But for 32-bit userland processes we only
save the low 32-bits of each integer register during a register spill.
This is because the register window save area is in the user stack and
has a fixed layout.
Therefore, the only way to use these instruction in 32-bit mode is to
perform the following sequence:
1) Load the top-32bits of a choosen integer register with a sentinel,
say "-1". This will be in the outer-most register window.
The idea is that we're trying to see if the outer-most register
window gets spilled, and thus the 64-bit values were truncated.
2) Load all the inputs for the montmul/montsqr/mpmul instruction,
down to the inner-most register window.
3) Execute the opcode.
4) Traverse back up to the outer-most register window.
5) Check the sentinel, if it's still "-1" store the results.
Otherwise retry the entire sequence.
This retry is extremely troublesome. If you're just unlucky and an
interrupt or other trap happens, it'll push that outer-most window to
the stack and clear the sentinel when we restore it.
We could retry forever and never make forward progress if interrupts
arrive at a fast enough rate (consider perf events as one example).
So we have do limited retries and fallback to software which is
extremely non-deterministic.
Luckily it's very straightforward to provide a mechanism to let
32-bit applications use a 64-bit stack. Stacks in 64-bit mode are
biased by 2047 bytes, which means that the lowest bit is set in the
actual %sp register value.
So if we see bit zero set in a 32-bit application's stack we treat
it like a 64-bit stack.
Runtime detection of such a facility is tricky, and cumbersome at
best. For example, just trying to use a biased stack and seeing if it
works is hard to recover from (the signal handler will need to use an
alt stack, plus something along the lines of longjmp). Therefore, we
add a system call to report a bitmask of arch specific features like
this in a cheap and less hairy way.
With help from Andy Polyakov.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26 15:18:37 -07:00
andcc % s p , 0 x1 , % g 0
movne % i c c , 0 , % g 1
2006-02-04 00:10:01 -08:00
andcc % g 1 , _ T I F _ 3 2 B I T , % g 0
ldub [ % g 6 + T I _ W S A V E D ] , % g 1
sll % g 1 , 3 , % g 3
add % g 6 , % g 3 , % g 3
stx % s p , [ % g 3 + T I _ R W I N _ S P T R S ]
sll % g 1 , 7 , % g 3
bne,p t % x c c , 1 f
add % g 6 , % g 3 , % g 3
stx % l 0 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x00 ]
stx % l 1 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x08 ]
stx % l 2 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x10 ]
stx % l 3 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x18 ]
stx % l 4 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x20 ]
stx % l 5 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x28 ]
stx % l 6 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x30 ]
stx % l 7 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x38 ]
stx % i 0 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x40 ]
stx % i 1 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x48 ]
stx % i 2 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x50 ]
stx % i 3 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x58 ]
stx % i 4 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x60 ]
stx % i 5 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x68 ]
stx % i 6 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x70 ]
ba,p t % x c c , 2 f
stx % i 7 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x78 ]
1 : stw % l 0 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x00 ]
stw % l 1 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x04 ]
stw % l 2 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x08 ]
stw % l 3 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x0 c ]
stw % l 4 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x10 ]
stw % l 5 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x14 ]
stw % l 6 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x18 ]
stw % l 7 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x1 c ]
stw % i 0 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x20 ]
stw % i 1 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x24 ]
stw % i 2 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x28 ]
stw % i 3 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x2 c ]
stw % i 4 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x30 ]
stw % i 5 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x34 ]
stw % i 6 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x38 ]
stw % i 7 , [ % g 3 + T I _ R E G _ W I N D O W + 0 x3 c ]
2 : add % g 1 , 1 , % g 1
stb % g 1 , [ % g 6 + T I _ W S A V E D ]
rdpr % t s t a t e , % g 1
andcc % g 1 , T S T A T E _ P R I V , % g 0
2005-04-16 15:20:36 -07:00
saved
2006-02-04 00:10:01 -08:00
be,p n % x c c , 1 f
and % g 1 , T S T A T E _ C W P , % g 1
2005-04-16 15:20:36 -07:00
retry
2006-02-04 00:10:01 -08:00
1 : mov F A U L T _ C O D E _ W R I T E | F A U L T _ C O D E _ D T L B | F A U L T _ C O D E _ W I N F I X U P , % g 4
stb % g 4 , [ % g 6 + T I _ F A U L T _ C O D E ]
stx % g 5 , [ % g 6 + T I _ F A U L T _ A D D R ]
wrpr % g 1 , % c w p
ba,p t % x c c , e t r a p
rd % p c , % g 7
call d o _ s p a r c64 _ f a u l t
add % s p , P T R E G S _ O F F , % o 0
2008-04-24 03:15:22 -07:00
ba,a ,p t % x c c , r t r a p
2005-04-16 15:20:36 -07:00
winfix_mna :
2006-02-04 00:10:01 -08:00
andn % g 3 , 0 x7 f , % g 3
add % g 3 , 0 x78 , % g 3
wrpr % g 3 , % t n p c
2005-04-16 15:20:36 -07:00
done
2006-02-04 00:10:01 -08:00
fill_fixup_mna :
rdpr % t s t a t e , % g 1
and % g 1 , T S T A T E _ C W P , % g 1
wrpr % g 1 , % c w p
ba,p t % x c c , e t r a p
rd % p c , % g 7
2006-02-09 20:20:34 -08:00
sethi % h i ( t l b _ t y p e ) , % g 1
lduw [ % g 1 + % l o ( t l b _ t y p e ) ] , % g 1
cmp % g 1 , 3
bne,p t % i c c , 1 f
2006-02-04 00:10:01 -08:00
add % s p , P T R E G S _ O F F , % o 0
2006-02-18 16:39:39 -08:00
mov % l 4 , % o 2
2006-02-16 01:45:49 -08:00
call s u n 4 v _ d o _ m n a
2006-02-18 16:39:39 -08:00
mov % l 5 , % o 1
2008-04-24 03:15:22 -07:00
ba,a ,p t % x c c , r t r a p
2006-02-18 16:39:39 -08:00
1 : mov % l 4 , % o 1
mov % l 5 , % o 2
call m e m _ a d d r e s s _ u n a l i g n e d
2006-02-09 20:20:34 -08:00
nop
2008-04-24 03:15:22 -07:00
ba,a ,p t % x c c , r t r a p
2005-04-16 15:20:36 -07:00
winfix_dax :
2006-02-04 00:10:01 -08:00
andn % g 3 , 0 x7 f , % g 3
add % g 3 , 0 x74 , % g 3
wrpr % g 3 , % t n p c
2005-04-16 15:20:36 -07:00
done
2006-02-04 00:10:01 -08:00
fill_fixup_dax :
rdpr % t s t a t e , % g 1
and % g 1 , T S T A T E _ C W P , % g 1
wrpr % g 1 , % c w p
ba,p t % x c c , e t r a p
rd % p c , % g 7
2006-02-09 20:20:34 -08:00
sethi % h i ( t l b _ t y p e ) , % g 1
2006-02-04 00:10:01 -08:00
mov % l 4 , % o 1
2006-02-09 20:20:34 -08:00
lduw [ % g 1 + % l o ( t l b _ t y p e ) ] , % g 1
2006-02-04 00:10:01 -08:00
mov % l 5 , % o 2
2006-02-09 20:20:34 -08:00
cmp % g 1 , 3
bne,p t % i c c , 1 f
2006-02-04 00:10:01 -08:00
add % s p , P T R E G S _ O F F , % o 0
2006-02-09 20:20:34 -08:00
call s u n 4 v _ d a t a _ a c c e s s _ e x c e p t i o n
nop
2008-04-24 03:15:22 -07:00
ba,a ,p t % x c c , r t r a p
arch/sparc: Avoid DCTI Couples
Avoid un-intended DCTI Couples. Use of DCTI couples is deprecated.
Also address the "Programming Note" for optimal performance.
Here is the complete text from Oracle SPARC Architecture Specs.
6.3.4.7 DCTI Couples
"A delayed control transfer instruction (DCTI) in the delay slot of
another DCTI is referred to as a “DCTI couple”. The use of DCTI couples
is deprecated in the Oracle SPARC Architecture; no new software should
place a DCTI in the delay slot of another DCTI, because on future Oracle
SPARC Architecture implementations DCTI couples may execute either
slowly or differently than the programmer assumes it will.
SPARC V8 and SPARC V9 Compatibility Note
The SPARC V8 architecture left behavior undefined for a DCTI couple. The
SPARC V9 architecture defined behavior in that case, but as of
UltraSPARC Architecture 2005, use of DCTI couples was deprecated.
Software should not expect high performance from DCTI couples, and
performance of DCTI couples should be expected to decline further in
future processors.
Programming Note
As noted in TABLE 6-5 on page 115, an annulled branch-always
(branch-always with a = 1) instruction is not architecturally a DCTI.
However, since not all implementations make that distinction, for
optimal performance, a DCTI should not be placed in the instruction word
immediately following an annulled branch-always instruction (BA,A or
BPA,A)."
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-17 14:52:21 -06:00
nop
2006-02-09 20:20:34 -08:00
1 : call s p i t f i r e _ d a t a _ a c c e s s _ e x c e p t i o n
nop
2008-04-24 03:15:22 -07:00
ba,a ,p t % x c c , r t r a p
arch/sparc: Avoid DCTI Couples
Avoid un-intended DCTI Couples. Use of DCTI couples is deprecated.
Also address the "Programming Note" for optimal performance.
Here is the complete text from Oracle SPARC Architecture Specs.
6.3.4.7 DCTI Couples
"A delayed control transfer instruction (DCTI) in the delay slot of
another DCTI is referred to as a “DCTI couple”. The use of DCTI couples
is deprecated in the Oracle SPARC Architecture; no new software should
place a DCTI in the delay slot of another DCTI, because on future Oracle
SPARC Architecture implementations DCTI couples may execute either
slowly or differently than the programmer assumes it will.
SPARC V8 and SPARC V9 Compatibility Note
The SPARC V8 architecture left behavior undefined for a DCTI couple. The
SPARC V9 architecture defined behavior in that case, but as of
UltraSPARC Architecture 2005, use of DCTI couples was deprecated.
Software should not expect high performance from DCTI couples, and
performance of DCTI couples should be expected to decline further in
future processors.
Programming Note
As noted in TABLE 6-5 on page 115, an annulled branch-always
(branch-always with a = 1) instruction is not architecturally a DCTI.
However, since not all implementations make that distinction, for
optimal performance, a DCTI should not be placed in the instruction word
immediately following an annulled branch-always instruction (BA,A or
BPA,A)."
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-17 14:52:21 -06:00
nop