2019-05-27 08:55:01 +02:00
// SPDX-License-Identifier: GPL-2.0-or-later
2007-04-26 15:48:28 -07:00
/* Kerberos-based RxRPC security
*
* Copyright ( C ) 2007 Red Hat , Inc . All Rights Reserved .
* Written by David Howells ( dhowells @ redhat . com )
*/
2016-06-02 12:08:52 -07:00
# define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
2016-01-24 21:19:01 +08:00
# include <crypto/skcipher.h>
2007-04-26 15:48:28 -07:00
# include <linux/module.h>
# include <linux/net.h>
# include <linux/skbuff.h>
# include <linux/udp.h>
# include <linux/scatterlist.h>
# include <linux/ctype.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
# include <linux/slab.h>
2007-04-26 15:48:28 -07:00
# include <net/sock.h>
# include <net/af_rxrpc.h>
2009-09-14 01:17:35 +00:00
# include <keys/rxrpc-type.h>
2007-04-26 15:48:28 -07:00
# include "ar-internal.h"
# define RXKAD_VERSION 2
# define MAXKRB5TICKETLEN 1024
# define RXKAD_TKT_TYPE_KERBEROS_V5 256
# define ANAME_SZ 40 /* size of authentication name */
# define INST_SZ 40 /* size of principal's instance */
# define REALM_SZ 40 /* size of principal's auth domain */
# define SNAME_SZ 40 /* size of service name */
struct rxkad_level1_hdr {
__be32 data_size ; /* true data size (excluding padding) */
} ;
struct rxkad_level2_hdr {
__be32 data_size ; /* true data size (excluding padding) */
__be32 checksum ; /* decrypted data checksum */
} ;
/*
* this holds a pinned cipher so that keventd doesn ' t get called by the cipher
* alloc routine , but since we have it to hand , we use it to decrypt RESPONSE
* packets
*/
2018-09-18 19:10:47 -07:00
static struct crypto_sync_skcipher * rxkad_ci ;
2019-07-30 15:56:57 +01:00
static struct skcipher_request * rxkad_ci_req ;
2007-04-26 15:48:28 -07:00
static DEFINE_MUTEX ( rxkad_ci_mutex ) ;
/*
* initialise connection security
*/
static int rxkad_init_connection_security ( struct rxrpc_connection * conn )
{
2018-09-18 19:10:47 -07:00
struct crypto_sync_skcipher * ci ;
2009-09-14 01:17:35 +00:00
struct rxrpc_key_token * token ;
2007-04-26 15:48:28 -07:00
int ret ;
2016-04-04 14:00:36 +01:00
_enter ( " {%d},{%x} " , conn - > debug_id , key_serial ( conn - > params . key ) ) ;
2007-04-26 15:48:28 -07:00
2016-04-04 14:00:36 +01:00
token = conn - > params . key - > payload . data [ 0 ] ;
2009-09-14 01:17:35 +00:00
conn - > security_ix = token - > security_index ;
2007-04-26 15:48:28 -07:00
2018-09-18 19:10:47 -07:00
ci = crypto_alloc_sync_skcipher ( " pcbc(fcrypt) " , 0 , 0 ) ;
2007-04-26 15:48:28 -07:00
if ( IS_ERR ( ci ) ) {
_debug ( " no cipher " ) ;
ret = PTR_ERR ( ci ) ;
goto error ;
}
2018-09-18 19:10:47 -07:00
if ( crypto_sync_skcipher_setkey ( ci , token - > kad - > session_key ,
2016-01-24 21:19:01 +08:00
sizeof ( token - > kad - > session_key ) ) < 0 )
2007-04-26 15:48:28 -07:00
BUG ( ) ;
2016-04-04 14:00:36 +01:00
switch ( conn - > params . security_level ) {
2007-04-26 15:48:28 -07:00
case RXRPC_SECURITY_PLAIN :
break ;
case RXRPC_SECURITY_AUTH :
conn - > size_align = 8 ;
conn - > security_size = sizeof ( struct rxkad_level1_hdr ) ;
break ;
case RXRPC_SECURITY_ENCRYPT :
conn - > size_align = 8 ;
conn - > security_size = sizeof ( struct rxkad_level2_hdr ) ;
break ;
default :
ret = - EKEYREJECTED ;
goto error ;
}
conn - > cipher = ci ;
ret = 0 ;
error :
_leave ( " = %d " , ret ) ;
return ret ;
}
/*
* prime the encryption state with the invariant parts of a connection ' s
* description
*/
2016-06-26 14:55:24 -07:00
static int rxkad_prime_packet_security ( struct rxrpc_connection * conn )
2007-04-26 15:48:28 -07:00
{
2019-07-30 15:56:57 +01:00
struct skcipher_request * req ;
2009-09-14 01:17:35 +00:00
struct rxrpc_key_token * token ;
2016-06-26 14:55:24 -07:00
struct scatterlist sg ;
2007-04-26 15:48:28 -07:00
struct rxrpc_crypt iv ;
2016-06-26 14:55:24 -07:00
__be32 * tmpbuf ;
size_t tmpsize = 4 * sizeof ( __be32 ) ;
2007-04-26 15:48:28 -07:00
_enter ( " " ) ;
2016-04-04 14:00:36 +01:00
if ( ! conn - > params . key )
2016-06-26 14:55:24 -07:00
return 0 ;
tmpbuf = kmalloc ( tmpsize , GFP_KERNEL ) ;
if ( ! tmpbuf )
return - ENOMEM ;
2007-04-26 15:48:28 -07:00
2019-07-30 15:56:57 +01:00
req = skcipher_request_alloc ( & conn - > cipher - > base , GFP_NOFS ) ;
if ( ! req ) {
kfree ( tmpbuf ) ;
return - ENOMEM ;
}
2016-04-04 14:00:36 +01:00
token = conn - > params . key - > payload . data [ 0 ] ;
2009-09-14 01:17:35 +00:00
memcpy ( & iv , token - > kad - > session_key , sizeof ( iv ) ) ;
2007-04-26 15:48:28 -07:00
2016-06-26 14:55:24 -07:00
tmpbuf [ 0 ] = htonl ( conn - > proto . epoch ) ;
tmpbuf [ 1 ] = htonl ( conn - > proto . cid ) ;
tmpbuf [ 2 ] = 0 ;
tmpbuf [ 3 ] = htonl ( conn - > security_ix ) ;
2016-01-24 21:19:01 +08:00
2016-06-26 14:55:24 -07:00
sg_init_one ( & sg , tmpbuf , tmpsize ) ;
2018-09-18 19:10:47 -07:00
skcipher_request_set_sync_tfm ( req , conn - > cipher ) ;
2016-01-24 21:19:01 +08:00
skcipher_request_set_callback ( req , 0 , NULL , NULL ) ;
2016-06-26 14:55:24 -07:00
skcipher_request_set_crypt ( req , & sg , & sg , tmpsize , iv . x ) ;
2016-01-24 21:19:01 +08:00
crypto_skcipher_encrypt ( req ) ;
2019-07-30 15:56:57 +01:00
skcipher_request_free ( req ) ;
2007-04-26 15:48:28 -07:00
2016-06-26 14:55:24 -07:00
memcpy ( & conn - > csum_iv , tmpbuf + 2 , sizeof ( conn - > csum_iv ) ) ;
kfree ( tmpbuf ) ;
_leave ( " = 0 " ) ;
return 0 ;
2007-04-26 15:48:28 -07:00
}
2019-07-30 15:56:57 +01:00
/*
* Allocate and prepare the crypto request on a call . For any particular call ,
* this is called serially for the packets , so no lock should be necessary .
*/
static struct skcipher_request * rxkad_get_call_crypto ( struct rxrpc_call * call )
{
struct crypto_skcipher * tfm = & call - > conn - > cipher - > base ;
struct skcipher_request * cipher_req = call - > cipher_req ;
if ( ! cipher_req ) {
cipher_req = skcipher_request_alloc ( tfm , GFP_NOFS ) ;
if ( ! cipher_req )
return NULL ;
call - > cipher_req = cipher_req ;
}
return cipher_req ;
}
/*
* Clean up the crypto on a call .
*/
static void rxkad_free_call_crypto ( struct rxrpc_call * call )
{
if ( call - > cipher_req )
skcipher_request_free ( call - > cipher_req ) ;
call - > cipher_req = NULL ;
}
2007-04-26 15:48:28 -07:00
/*
* partially encrypt a packet ( level 1 security )
*/
static int rxkad_secure_packet_auth ( const struct rxrpc_call * call ,
struct sk_buff * skb ,
u32 data_size ,
2018-08-03 10:15:25 +01:00
void * sechdr ,
struct skcipher_request * req )
2007-04-26 15:48:28 -07:00
{
2017-04-06 10:12:00 +01:00
struct rxrpc_skb_priv * sp = rxrpc_skb ( skb ) ;
2016-06-26 14:55:24 -07:00
struct rxkad_level1_hdr hdr ;
2007-04-26 15:48:28 -07:00
struct rxrpc_crypt iv ;
2016-06-26 14:55:24 -07:00
struct scatterlist sg ;
2007-04-26 15:48:28 -07:00
u16 check ;
_enter ( " " ) ;
2016-09-22 00:29:31 +01:00
check = sp - > hdr . seq ^ call - > call_id ;
2016-03-04 15:53:46 +00:00
data_size | = ( u32 ) check < < 16 ;
2007-04-26 15:48:28 -07:00
2016-06-26 14:55:24 -07:00
hdr . data_size = htonl ( data_size ) ;
memcpy ( sechdr , & hdr , sizeof ( hdr ) ) ;
2007-04-26 15:48:28 -07:00
/* start the encryption afresh */
memset ( & iv , 0 , sizeof ( iv ) ) ;
2016-06-26 14:55:24 -07:00
sg_init_one ( & sg , sechdr , 8 ) ;
2018-09-18 19:10:47 -07:00
skcipher_request_set_sync_tfm ( req , call - > conn - > cipher ) ;
2016-01-24 21:19:01 +08:00
skcipher_request_set_callback ( req , 0 , NULL , NULL ) ;
2016-06-26 14:55:24 -07:00
skcipher_request_set_crypt ( req , & sg , & sg , 8 , iv . x ) ;
2016-01-24 21:19:01 +08:00
crypto_skcipher_encrypt ( req ) ;
skcipher_request_zero ( req ) ;
2007-04-26 15:48:28 -07:00
_leave ( " = 0 " ) ;
return 0 ;
}
/*
* wholly encrypt a packet ( level 2 security )
*/
static int rxkad_secure_packet_encrypt ( const struct rxrpc_call * call ,
2016-03-04 15:56:19 +00:00
struct sk_buff * skb ,
u32 data_size ,
2018-08-03 10:15:25 +01:00
void * sechdr ,
struct skcipher_request * req )
2007-04-26 15:48:28 -07:00
{
2009-09-14 01:17:35 +00:00
const struct rxrpc_key_token * token ;
2016-06-26 14:55:24 -07:00
struct rxkad_level2_hdr rxkhdr ;
2007-04-26 15:48:28 -07:00
struct rxrpc_skb_priv * sp ;
struct rxrpc_crypt iv ;
struct scatterlist sg [ 16 ] ;
2012-04-15 05:58:06 +00:00
unsigned int len ;
2007-04-26 15:48:28 -07:00
u16 check ;
2016-01-24 21:19:01 +08:00
int err ;
2007-04-26 15:48:28 -07:00
sp = rxrpc_skb ( skb ) ;
_enter ( " " ) ;
2016-09-22 00:29:31 +01:00
check = sp - > hdr . seq ^ call - > call_id ;
2007-04-26 15:48:28 -07:00
2016-03-04 15:53:46 +00:00
rxkhdr . data_size = htonl ( data_size | ( u32 ) check < < 16 ) ;
2007-04-26 15:48:28 -07:00
rxkhdr . checksum = 0 ;
2016-06-26 14:55:24 -07:00
memcpy ( sechdr , & rxkhdr , sizeof ( rxkhdr ) ) ;
2007-04-26 15:48:28 -07:00
/* encrypt from the session key */
2016-04-04 14:00:36 +01:00
token = call - > conn - > params . key - > payload . data [ 0 ] ;
2009-09-14 01:17:35 +00:00
memcpy ( & iv , token - > kad - > session_key , sizeof ( iv ) ) ;
2007-04-26 15:48:28 -07:00
2007-10-27 00:52:07 -07:00
sg_init_one ( & sg [ 0 ] , sechdr , sizeof ( rxkhdr ) ) ;
2018-09-18 19:10:47 -07:00
skcipher_request_set_sync_tfm ( req , call - > conn - > cipher ) ;
2016-01-24 21:19:01 +08:00
skcipher_request_set_callback ( req , 0 , NULL , NULL ) ;
2016-06-26 14:55:24 -07:00
skcipher_request_set_crypt ( req , & sg [ 0 ] , & sg [ 0 ] , sizeof ( rxkhdr ) , iv . x ) ;
2016-01-24 21:19:01 +08:00
crypto_skcipher_encrypt ( req ) ;
2007-04-26 15:48:28 -07:00
/* we want to encrypt the skbuff in-place */
2019-08-27 10:13:46 +01:00
err = - EMSGSIZE ;
if ( skb_shinfo ( skb ) - > nr_frags > 16 )
2016-01-24 21:19:01 +08:00
goto out ;
2007-04-26 15:48:28 -07:00
len = data_size + call - > conn - > size_align - 1 ;
len & = ~ ( call - > conn - > size_align - 1 ) ;
2019-08-27 10:13:46 +01:00
sg_init_table ( sg , ARRAY_SIZE ( sg ) ) ;
2017-06-04 04:16:24 +02:00
err = skb_to_sgvec ( skb , sg , 0 , len ) ;
if ( unlikely ( err < 0 ) )
goto out ;
2016-01-24 21:19:01 +08:00
skcipher_request_set_crypt ( req , sg , sg , len , iv . x ) ;
crypto_skcipher_encrypt ( req ) ;
2007-04-26 15:48:28 -07:00
_leave ( " = 0 " ) ;
2016-01-24 21:19:01 +08:00
err = 0 ;
out :
skcipher_request_zero ( req ) ;
return err ;
2007-04-26 15:48:28 -07:00
}
/*
* checksum an RxRPC packet header
*/
2016-06-26 14:55:24 -07:00
static int rxkad_secure_packet ( struct rxrpc_call * call ,
2016-03-04 15:56:19 +00:00
struct sk_buff * skb ,
size_t data_size ,
void * sechdr )
2007-04-26 15:48:28 -07:00
{
struct rxrpc_skb_priv * sp ;
2019-07-30 15:56:57 +01:00
struct skcipher_request * req ;
2007-04-26 15:48:28 -07:00
struct rxrpc_crypt iv ;
2016-06-26 14:55:24 -07:00
struct scatterlist sg ;
2016-03-04 15:53:46 +00:00
u32 x , y ;
2007-04-26 15:48:28 -07:00
int ret ;
sp = rxrpc_skb ( skb ) ;
_enter ( " {%d{%x}},{#%u},%zu, " ,
2016-04-04 14:00:36 +01:00
call - > debug_id , key_serial ( call - > conn - > params . key ) ,
sp - > hdr . seq , data_size ) ;
2007-04-26 15:48:28 -07:00
if ( ! call - > conn - > cipher )
return 0 ;
2016-04-04 14:00:36 +01:00
ret = key_validate ( call - > conn - > params . key ) ;
2007-04-26 15:48:28 -07:00
if ( ret < 0 )
return ret ;
2019-07-30 15:56:57 +01:00
req = rxkad_get_call_crypto ( call ) ;
if ( ! req )
return - ENOMEM ;
2007-04-26 15:48:28 -07:00
/* continue encrypting from where we left off */
memcpy ( & iv , call - > conn - > csum_iv . x , sizeof ( iv ) ) ;
/* calculate the security checksum */
2016-08-23 15:27:24 +01:00
x = ( call - > cid & RXRPC_CHANNELMASK ) < < ( 32 - RXRPC_CIDSHIFT ) ;
2016-03-04 15:53:46 +00:00
x | = sp - > hdr . seq & 0x3fffffff ;
2016-09-22 00:29:31 +01:00
call - > crypto_buf [ 0 ] = htonl ( call - > call_id ) ;
2016-06-26 14:55:24 -07:00
call - > crypto_buf [ 1 ] = htonl ( x ) ;
2016-01-24 21:19:01 +08:00
2016-06-26 14:55:24 -07:00
sg_init_one ( & sg , call - > crypto_buf , 8 ) ;
2018-09-18 19:10:47 -07:00
skcipher_request_set_sync_tfm ( req , call - > conn - > cipher ) ;
2016-01-24 21:19:01 +08:00
skcipher_request_set_callback ( req , 0 , NULL , NULL ) ;
2016-06-26 14:55:24 -07:00
skcipher_request_set_crypt ( req , & sg , & sg , 8 , iv . x ) ;
2016-01-24 21:19:01 +08:00
crypto_skcipher_encrypt ( req ) ;
skcipher_request_zero ( req ) ;
2007-04-26 15:48:28 -07:00
2016-06-26 14:55:24 -07:00
y = ntohl ( call - > crypto_buf [ 1 ] ) ;
2008-03-29 03:08:38 +00:00
y = ( y > > 16 ) & 0xffff ;
if ( y = = 0 )
y = 1 ; /* zero checksums are not permitted */
2016-03-04 15:53:46 +00:00
sp - > hdr . cksum = y ;
2007-04-26 15:48:28 -07:00
2016-04-04 14:00:36 +01:00
switch ( call - > conn - > params . security_level ) {
2007-04-26 15:48:28 -07:00
case RXRPC_SECURITY_PLAIN :
ret = 0 ;
break ;
case RXRPC_SECURITY_AUTH :
2018-08-03 10:15:25 +01:00
ret = rxkad_secure_packet_auth ( call , skb , data_size , sechdr ,
req ) ;
2007-04-26 15:48:28 -07:00
break ;
case RXRPC_SECURITY_ENCRYPT :
ret = rxkad_secure_packet_encrypt ( call , skb , data_size ,
2018-08-03 10:15:25 +01:00
sechdr , req ) ;
2007-04-26 15:48:28 -07:00
break ;
default :
ret = - EPERM ;
break ;
}
2008-03-29 03:08:38 +00:00
_leave ( " = %d [set %hx] " , ret , y ) ;
2007-04-26 15:48:28 -07:00
return ret ;
}
/*
* decrypt partial encryption on a packet ( level 1 security )
*/
2016-09-06 22:19:51 +01:00
static int rxkad_verify_packet_1 ( struct rxrpc_call * call , struct sk_buff * skb ,
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
unsigned int offset , unsigned int len ,
2018-08-03 10:15:25 +01:00
rxrpc_seq_t seq ,
struct skcipher_request * req )
2007-04-26 15:48:28 -07:00
{
struct rxkad_level1_hdr sechdr ;
struct rxrpc_crypt iv ;
2007-10-27 00:52:07 -07:00
struct scatterlist sg [ 16 ] ;
2017-04-06 10:12:00 +01:00
bool aborted ;
2007-04-26 15:48:28 -07:00
u32 data_size , buf ;
u16 check ;
2019-08-27 10:13:46 +01:00
int ret ;
2007-04-26 15:48:28 -07:00
_enter ( " " ) ;
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
if ( len < 8 ) {
2017-04-06 10:12:00 +01:00
aborted = rxrpc_abort_eproto ( call , skb , " rxkad_1_hdr " , " V1H " ,
RXKADSEALEDINCON ) ;
2016-09-06 22:19:51 +01:00
goto protocol_error ;
}
2007-04-26 15:48:28 -07:00
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
/* Decrypt the skbuff in-place. TODO: We really want to decrypt
* directly into the target buffer .
*/
2019-08-27 10:13:46 +01:00
sg_init_table ( sg , ARRAY_SIZE ( sg ) ) ;
2017-06-04 04:16:24 +02:00
ret = skb_to_sgvec ( skb , sg , offset , 8 ) ;
if ( unlikely ( ret < 0 ) )
return ret ;
2007-04-26 15:48:28 -07:00
/* start the decryption afresh */
memset ( & iv , 0 , sizeof ( iv ) ) ;
2018-09-18 19:10:47 -07:00
skcipher_request_set_sync_tfm ( req , call - > conn - > cipher ) ;
2016-01-24 21:19:01 +08:00
skcipher_request_set_callback ( req , 0 , NULL , NULL ) ;
skcipher_request_set_crypt ( req , sg , sg , 8 , iv . x ) ;
crypto_skcipher_decrypt ( req ) ;
skcipher_request_zero ( req ) ;
2007-04-26 15:48:28 -07:00
2016-09-06 22:19:51 +01:00
/* Extract the decrypted packet length */
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
if ( skb_copy_bits ( skb , offset , & sechdr , sizeof ( sechdr ) ) < 0 ) {
2017-04-06 10:12:00 +01:00
aborted = rxrpc_abort_eproto ( call , skb , " rxkad_1_len " , " XV1 " ,
RXKADDATALEN ) ;
2016-09-06 22:19:51 +01:00
goto protocol_error ;
}
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
offset + = sizeof ( sechdr ) ;
len - = sizeof ( sechdr ) ;
2007-04-26 15:48:28 -07:00
buf = ntohl ( sechdr . data_size ) ;
data_size = buf & 0xffff ;
check = buf > > 16 ;
2016-09-06 22:19:51 +01:00
check ^ = seq ^ call - > call_id ;
2007-04-26 15:48:28 -07:00
check & = 0xffff ;
if ( check ! = 0 ) {
2017-04-06 10:12:00 +01:00
aborted = rxrpc_abort_eproto ( call , skb , " rxkad_1_check " , " V1C " ,
RXKADSEALEDINCON ) ;
2007-04-26 15:48:28 -07:00
goto protocol_error ;
}
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
if ( data_size > len ) {
2017-04-06 10:12:00 +01:00
aborted = rxrpc_abort_eproto ( call , skb , " rxkad_1_datalen " , " V1L " ,
RXKADDATALEN ) ;
2016-09-06 22:19:51 +01:00
goto protocol_error ;
}
2007-04-26 15:48:28 -07:00
_leave ( " = 0 [dlen=%x] " , data_size ) ;
return 0 ;
protocol_error :
2017-04-06 10:12:00 +01:00
if ( aborted )
rxrpc_send_abort_packet ( call ) ;
2007-04-26 15:48:28 -07:00
return - EPROTO ;
}
/*
* wholly decrypt a packet ( level 2 security )
*/
2016-09-06 22:19:51 +01:00
static int rxkad_verify_packet_2 ( struct rxrpc_call * call , struct sk_buff * skb ,
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
unsigned int offset , unsigned int len ,
2018-08-03 10:15:25 +01:00
rxrpc_seq_t seq ,
struct skcipher_request * req )
2007-04-26 15:48:28 -07:00
{
2009-09-14 01:17:35 +00:00
const struct rxrpc_key_token * token ;
2007-04-26 15:48:28 -07:00
struct rxkad_level2_hdr sechdr ;
struct rxrpc_crypt iv ;
struct scatterlist _sg [ 4 ] , * sg ;
2017-04-06 10:12:00 +01:00
bool aborted ;
2007-04-26 15:48:28 -07:00
u32 data_size , buf ;
u16 check ;
2017-06-04 04:16:24 +02:00
int nsg , ret ;
2007-04-26 15:48:28 -07:00
_enter ( " ,{%d} " , skb - > len ) ;
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
if ( len < 8 ) {
2017-04-06 10:12:00 +01:00
aborted = rxrpc_abort_eproto ( call , skb , " rxkad_2_hdr " , " V2H " ,
RXKADSEALEDINCON ) ;
2016-09-06 22:19:51 +01:00
goto protocol_error ;
}
2007-04-26 15:48:28 -07:00
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
/* Decrypt the skbuff in-place. TODO: We really want to decrypt
* directly into the target buffer .
*/
2007-04-26 15:48:28 -07:00
sg = _sg ;
2019-08-27 10:13:46 +01:00
nsg = skb_shinfo ( skb ) - > nr_frags ;
if ( nsg < = 4 ) {
nsg = 4 ;
} else {
treewide: kmalloc() -> kmalloc_array()
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:
kmalloc(a * b, gfp)
with:
kmalloc_array(a * b, gfp)
as well as handling cases of:
kmalloc(a * b * c, gfp)
with:
kmalloc(array3_size(a, b, c), gfp)
as it's slightly less ugly than:
kmalloc_array(array_size(a, b), c, gfp)
This does, however, attempt to ignore constant size factors like:
kmalloc(4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
kmalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kmalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
kmalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
- kmalloc
+ kmalloc_array
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
kmalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
kmalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
kmalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@
(
kmalloc(sizeof(THING) * C2, ...)
|
kmalloc(sizeof(TYPE) * C2, ...)
|
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * E2
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- E1 * E2
+ E1, E2
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 13:55:00 -07:00
sg = kmalloc_array ( nsg , sizeof ( * sg ) , GFP_NOIO ) ;
2007-04-26 15:48:28 -07:00
if ( ! sg )
goto nomem ;
}
2007-10-27 00:52:07 -07:00
sg_init_table ( sg , nsg ) ;
2017-06-04 04:16:24 +02:00
ret = skb_to_sgvec ( skb , sg , offset , len ) ;
if ( unlikely ( ret < 0 ) ) {
if ( sg ! = _sg )
kfree ( sg ) ;
return ret ;
}
2007-04-26 15:48:28 -07:00
/* decrypt from the session key */
2016-04-04 14:00:36 +01:00
token = call - > conn - > params . key - > payload . data [ 0 ] ;
2009-09-14 01:17:35 +00:00
memcpy ( & iv , token - > kad - > session_key , sizeof ( iv ) ) ;
2007-04-26 15:48:28 -07:00
2018-09-18 19:10:47 -07:00
skcipher_request_set_sync_tfm ( req , call - > conn - > cipher ) ;
2016-01-24 21:19:01 +08:00
skcipher_request_set_callback ( req , 0 , NULL , NULL ) ;
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
skcipher_request_set_crypt ( req , sg , sg , len , iv . x ) ;
2016-01-24 21:19:01 +08:00
crypto_skcipher_decrypt ( req ) ;
skcipher_request_zero ( req ) ;
2007-04-26 15:48:28 -07:00
if ( sg ! = _sg )
kfree ( sg ) ;
2016-09-06 22:19:51 +01:00
/* Extract the decrypted packet length */
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
if ( skb_copy_bits ( skb , offset , & sechdr , sizeof ( sechdr ) ) < 0 ) {
2017-04-06 10:12:00 +01:00
aborted = rxrpc_abort_eproto ( call , skb , " rxkad_2_len " , " XV2 " ,
RXKADDATALEN ) ;
2016-09-06 22:19:51 +01:00
goto protocol_error ;
}
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
offset + = sizeof ( sechdr ) ;
len - = sizeof ( sechdr ) ;
2007-04-26 15:48:28 -07:00
buf = ntohl ( sechdr . data_size ) ;
data_size = buf & 0xffff ;
check = buf > > 16 ;
2016-09-06 22:19:51 +01:00
check ^ = seq ^ call - > call_id ;
2007-04-26 15:48:28 -07:00
check & = 0xffff ;
if ( check ! = 0 ) {
2017-04-06 10:12:00 +01:00
aborted = rxrpc_abort_eproto ( call , skb , " rxkad_2_check " , " V2C " ,
RXKADSEALEDINCON ) ;
2007-04-26 15:48:28 -07:00
goto protocol_error ;
}
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
if ( data_size > len ) {
2017-04-06 10:12:00 +01:00
aborted = rxrpc_abort_eproto ( call , skb , " rxkad_2_datalen " , " V2L " ,
RXKADDATALEN ) ;
2016-09-06 22:19:51 +01:00
goto protocol_error ;
}
2007-04-26 15:48:28 -07:00
_leave ( " = 0 [dlen=%x] " , data_size ) ;
return 0 ;
protocol_error :
2017-04-06 10:12:00 +01:00
if ( aborted )
rxrpc_send_abort_packet ( call ) ;
2007-04-26 15:48:28 -07:00
return - EPROTO ;
nomem :
_leave ( " = -ENOMEM " ) ;
return - ENOMEM ;
}
/*
2016-09-06 22:19:51 +01:00
* Verify the security on a received packet or subpacket ( if part of a
* jumbo packet ) .
2007-04-26 15:48:28 -07:00
*/
2016-09-06 22:19:51 +01:00
static int rxkad_verify_packet ( struct rxrpc_call * call , struct sk_buff * skb ,
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
unsigned int offset , unsigned int len ,
2016-09-06 22:19:51 +01:00
rxrpc_seq_t seq , u16 expected_cksum )
2007-04-26 15:48:28 -07:00
{
2019-07-30 15:56:57 +01:00
struct skcipher_request * req ;
2007-04-26 15:48:28 -07:00
struct rxrpc_crypt iv ;
2016-06-26 14:55:24 -07:00
struct scatterlist sg ;
2017-04-06 10:12:00 +01:00
bool aborted ;
2016-03-04 15:53:46 +00:00
u16 cksum ;
u32 x , y ;
2007-04-26 15:48:28 -07:00
_enter ( " {%d{%x}},{#%u} " ,
2016-09-06 22:19:51 +01:00
call - > debug_id , key_serial ( call - > conn - > params . key ) , seq ) ;
2007-04-26 15:48:28 -07:00
if ( ! call - > conn - > cipher )
return 0 ;
2019-07-30 15:56:57 +01:00
req = rxkad_get_call_crypto ( call ) ;
if ( ! req )
return - ENOMEM ;
2007-04-26 15:48:28 -07:00
/* continue encrypting from where we left off */
memcpy ( & iv , call - > conn - > csum_iv . x , sizeof ( iv ) ) ;
/* validate the security checksum */
2016-08-23 15:27:24 +01:00
x = ( call - > cid & RXRPC_CHANNELMASK ) < < ( 32 - RXRPC_CIDSHIFT ) ;
2016-09-06 22:19:51 +01:00
x | = seq & 0x3fffffff ;
2016-06-26 14:55:24 -07:00
call - > crypto_buf [ 0 ] = htonl ( call - > call_id ) ;
call - > crypto_buf [ 1 ] = htonl ( x ) ;
2016-01-24 21:19:01 +08:00
2016-06-26 14:55:24 -07:00
sg_init_one ( & sg , call - > crypto_buf , 8 ) ;
2018-09-18 19:10:47 -07:00
skcipher_request_set_sync_tfm ( req , call - > conn - > cipher ) ;
2016-01-24 21:19:01 +08:00
skcipher_request_set_callback ( req , 0 , NULL , NULL ) ;
2016-06-26 14:55:24 -07:00
skcipher_request_set_crypt ( req , & sg , & sg , 8 , iv . x ) ;
2016-01-24 21:19:01 +08:00
crypto_skcipher_encrypt ( req ) ;
skcipher_request_zero ( req ) ;
2007-04-26 15:48:28 -07:00
2016-06-26 14:55:24 -07:00
y = ntohl ( call - > crypto_buf [ 1 ] ) ;
2016-03-04 15:53:46 +00:00
cksum = ( y > > 16 ) & 0xffff ;
if ( cksum = = 0 )
cksum = 1 ; /* zero checksums are not permitted */
2007-04-26 15:48:28 -07:00
2016-09-06 22:19:51 +01:00
if ( cksum ! = expected_cksum ) {
2017-04-06 10:12:00 +01:00
aborted = rxrpc_abort_eproto ( call , skb , " rxkad_csum " , " VCK " ,
RXKADSEALEDINCON ) ;
goto protocol_error ;
2007-04-26 15:48:28 -07:00
}
2016-04-04 14:00:36 +01:00
switch ( call - > conn - > params . security_level ) {
2007-04-26 15:48:28 -07:00
case RXRPC_SECURITY_PLAIN :
2016-09-06 22:19:51 +01:00
return 0 ;
2007-04-26 15:48:28 -07:00
case RXRPC_SECURITY_AUTH :
2018-08-03 10:15:25 +01:00
return rxkad_verify_packet_1 ( call , skb , offset , len , seq , req ) ;
2007-04-26 15:48:28 -07:00
case RXRPC_SECURITY_ENCRYPT :
2018-08-03 10:15:25 +01:00
return rxkad_verify_packet_2 ( call , skb , offset , len , seq , req ) ;
2007-04-26 15:48:28 -07:00
default :
2016-09-06 22:19:51 +01:00
return - ENOANO ;
2007-04-26 15:48:28 -07:00
}
2017-04-06 10:12:00 +01:00
protocol_error :
if ( aborted )
rxrpc_send_abort_packet ( call ) ;
return - EPROTO ;
2007-04-26 15:48:28 -07:00
}
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
/*
* Locate the data contained in a packet that was partially encrypted .
*/
static void rxkad_locate_data_1 ( struct rxrpc_call * call , struct sk_buff * skb ,
unsigned int * _offset , unsigned int * _len )
{
struct rxkad_level1_hdr sechdr ;
if ( skb_copy_bits ( skb , * _offset , & sechdr , sizeof ( sechdr ) ) < 0 )
BUG ( ) ;
* _offset + = sizeof ( sechdr ) ;
* _len = ntohl ( sechdr . data_size ) & 0xffff ;
}
/*
* Locate the data contained in a packet that was completely encrypted .
*/
static void rxkad_locate_data_2 ( struct rxrpc_call * call , struct sk_buff * skb ,
unsigned int * _offset , unsigned int * _len )
{
struct rxkad_level2_hdr sechdr ;
if ( skb_copy_bits ( skb , * _offset , & sechdr , sizeof ( sechdr ) ) < 0 )
BUG ( ) ;
* _offset + = sizeof ( sechdr ) ;
* _len = ntohl ( sechdr . data_size ) & 0xffff ;
}
/*
* Locate the data contained in an already decrypted packet .
*/
static void rxkad_locate_data ( struct rxrpc_call * call , struct sk_buff * skb ,
unsigned int * _offset , unsigned int * _len )
{
switch ( call - > conn - > params . security_level ) {
case RXRPC_SECURITY_AUTH :
rxkad_locate_data_1 ( call , skb , _offset , _len ) ;
return ;
case RXRPC_SECURITY_ENCRYPT :
rxkad_locate_data_2 ( call , skb , _offset , _len ) ;
return ;
default :
return ;
}
}
2007-04-26 15:48:28 -07:00
/*
* issue a challenge
*/
static int rxkad_issue_challenge ( struct rxrpc_connection * conn )
{
struct rxkad_challenge challenge ;
2016-03-04 15:53:46 +00:00
struct rxrpc_wire_header whdr ;
2007-04-26 15:48:28 -07:00
struct msghdr msg ;
struct kvec iov [ 2 ] ;
size_t len ;
2016-03-04 15:53:46 +00:00
u32 serial ;
2007-04-26 15:48:28 -07:00
int ret ;
2019-12-20 16:17:16 +00:00
_enter ( " {%d,%x} " , conn - > debug_id , key_serial ( conn - > server_key ) ) ;
2007-04-26 15:48:28 -07:00
2019-12-20 16:17:16 +00:00
ret = key_validate ( conn - > server_key ) ;
2007-04-26 15:48:28 -07:00
if ( ret < 0 )
return ret ;
get_random_bytes ( & conn - > security_nonce , sizeof ( conn - > security_nonce ) ) ;
challenge . version = htonl ( 2 ) ;
challenge . nonce = htonl ( conn - > security_nonce ) ;
challenge . min_level = htonl ( 0 ) ;
challenge . __padding = 0 ;
2017-08-29 10:18:37 +01:00
msg . msg_name = & conn - > params . peer - > srx . transport ;
msg . msg_namelen = conn - > params . peer - > srx . transport_len ;
2007-04-26 15:48:28 -07:00
msg . msg_control = NULL ;
msg . msg_controllen = 0 ;
msg . msg_flags = 0 ;
2016-04-04 14:00:36 +01:00
whdr . epoch = htonl ( conn - > proto . epoch ) ;
whdr . cid = htonl ( conn - > proto . cid ) ;
2016-03-04 15:53:46 +00:00
whdr . callNumber = 0 ;
whdr . seq = 0 ;
whdr . type = RXRPC_PACKET_TYPE_CHALLENGE ;
whdr . flags = conn - > out_clientflag ;
whdr . userStatus = 0 ;
whdr . securityIndex = conn - > security_ix ;
whdr . _rsvd = 0 ;
2017-06-05 14:30:49 +01:00
whdr . serviceId = htons ( conn - > service_id ) ;
2016-03-04 15:53:46 +00:00
iov [ 0 ] . iov_base = & whdr ;
iov [ 0 ] . iov_len = sizeof ( whdr ) ;
2007-04-26 15:48:28 -07:00
iov [ 1 ] . iov_base = & challenge ;
iov [ 1 ] . iov_len = sizeof ( challenge ) ;
len = iov [ 0 ] . iov_len + iov [ 1 ] . iov_len ;
2016-03-04 15:53:46 +00:00
serial = atomic_inc_return ( & conn - > serial ) ;
whdr . serial = htonl ( serial ) ;
_proto ( " Tx CHALLENGE %%%u " , serial ) ;
2007-04-26 15:48:28 -07:00
2016-04-04 14:00:36 +01:00
ret = kernel_sendmsg ( conn - > params . local - > socket , & msg , iov , 2 , len ) ;
2007-04-26 15:48:28 -07:00
if ( ret < 0 ) {
2018-05-10 23:26:01 +01:00
trace_rxrpc_tx_fail ( conn - > debug_id , serial , ret ,
2018-07-23 17:18:37 +01:00
rxrpc_tx_point_rxkad_challenge ) ;
2007-04-26 15:48:28 -07:00
return - EAGAIN ;
}
2018-08-08 11:30:02 +01:00
conn - > params . peer - > last_tx_at = ktime_get_seconds ( ) ;
2018-07-23 17:18:37 +01:00
trace_rxrpc_tx_packet ( conn - > debug_id , & whdr ,
rxrpc_tx_point_rxkad_challenge ) ;
2007-04-26 15:48:28 -07:00
_leave ( " = 0 " ) ;
return 0 ;
}
/*
* send a Kerberos security response
*/
static int rxkad_send_response ( struct rxrpc_connection * conn ,
2016-03-04 15:53:46 +00:00
struct rxrpc_host_header * hdr ,
2007-04-26 15:48:28 -07:00
struct rxkad_response * resp ,
const struct rxkad_key * s2 )
{
2016-03-04 15:53:46 +00:00
struct rxrpc_wire_header whdr ;
2007-04-26 15:48:28 -07:00
struct msghdr msg ;
struct kvec iov [ 3 ] ;
size_t len ;
2016-03-04 15:53:46 +00:00
u32 serial ;
2007-04-26 15:48:28 -07:00
int ret ;
_enter ( " " ) ;
2017-08-29 10:18:37 +01:00
msg . msg_name = & conn - > params . peer - > srx . transport ;
msg . msg_namelen = conn - > params . peer - > srx . transport_len ;
2007-04-26 15:48:28 -07:00
msg . msg_control = NULL ;
msg . msg_controllen = 0 ;
msg . msg_flags = 0 ;
2016-03-04 15:53:46 +00:00
memset ( & whdr , 0 , sizeof ( whdr ) ) ;
whdr . epoch = htonl ( hdr - > epoch ) ;
whdr . cid = htonl ( hdr - > cid ) ;
whdr . type = RXRPC_PACKET_TYPE_RESPONSE ;
whdr . flags = conn - > out_clientflag ;
whdr . securityIndex = hdr - > securityIndex ;
whdr . serviceId = htons ( hdr - > serviceId ) ;
2007-04-26 15:48:28 -07:00
2016-03-04 15:53:46 +00:00
iov [ 0 ] . iov_base = & whdr ;
iov [ 0 ] . iov_len = sizeof ( whdr ) ;
2007-04-26 15:48:28 -07:00
iov [ 1 ] . iov_base = resp ;
iov [ 1 ] . iov_len = sizeof ( * resp ) ;
2016-03-04 15:53:46 +00:00
iov [ 2 ] . iov_base = ( void * ) s2 - > ticket ;
2007-04-26 15:48:28 -07:00
iov [ 2 ] . iov_len = s2 - > ticket_len ;
len = iov [ 0 ] . iov_len + iov [ 1 ] . iov_len + iov [ 2 ] . iov_len ;
2016-03-04 15:53:46 +00:00
serial = atomic_inc_return ( & conn - > serial ) ;
whdr . serial = htonl ( serial ) ;
_proto ( " Tx RESPONSE %%%u " , serial ) ;
2007-04-26 15:48:28 -07:00
2016-04-04 14:00:36 +01:00
ret = kernel_sendmsg ( conn - > params . local - > socket , & msg , iov , 3 , len ) ;
2007-04-26 15:48:28 -07:00
if ( ret < 0 ) {
2018-05-10 23:26:01 +01:00
trace_rxrpc_tx_fail ( conn - > debug_id , serial , ret ,
2018-07-23 17:18:37 +01:00
rxrpc_tx_point_rxkad_response ) ;
2007-04-26 15:48:28 -07:00
return - EAGAIN ;
}
2018-08-08 11:30:02 +01:00
conn - > params . peer - > last_tx_at = ktime_get_seconds ( ) ;
2007-04-26 15:48:28 -07:00
_leave ( " = 0 " ) ;
return 0 ;
}
/*
* calculate the response checksum
*/
static void rxkad_calc_response_checksum ( struct rxkad_response * response )
{
u32 csum = 1000003 ;
int loop ;
u8 * p = ( u8 * ) response ;
for ( loop = sizeof ( * response ) ; loop > 0 ; loop - - )
csum = csum * 0x10204081 + * p + + ;
response - > encrypted . checksum = htonl ( csum ) ;
}
/*
* encrypt the response packet
*/
2019-07-30 15:56:57 +01:00
static int rxkad_encrypt_response ( struct rxrpc_connection * conn ,
struct rxkad_response * resp ,
const struct rxkad_key * s2 )
2007-04-26 15:48:28 -07:00
{
2019-07-30 15:56:57 +01:00
struct skcipher_request * req ;
2007-04-26 15:48:28 -07:00
struct rxrpc_crypt iv ;
2016-06-26 14:55:24 -07:00
struct scatterlist sg [ 1 ] ;
2007-04-26 15:48:28 -07:00
2019-07-30 15:56:57 +01:00
req = skcipher_request_alloc ( & conn - > cipher - > base , GFP_NOFS ) ;
if ( ! req )
return - ENOMEM ;
2007-04-26 15:48:28 -07:00
/* continue encrypting from where we left off */
memcpy ( & iv , s2 - > session_key , sizeof ( iv ) ) ;
2016-06-26 14:55:24 -07:00
sg_init_table ( sg , 1 ) ;
sg_set_buf ( sg , & resp - > encrypted , sizeof ( resp - > encrypted ) ) ;
2018-09-18 19:10:47 -07:00
skcipher_request_set_sync_tfm ( req , conn - > cipher ) ;
2016-01-24 21:19:01 +08:00
skcipher_request_set_callback ( req , 0 , NULL , NULL ) ;
skcipher_request_set_crypt ( req , sg , sg , sizeof ( resp - > encrypted ) , iv . x ) ;
crypto_skcipher_encrypt ( req ) ;
2019-07-30 15:56:57 +01:00
skcipher_request_free ( req ) ;
return 0 ;
2007-04-26 15:48:28 -07:00
}
/*
* respond to a challenge packet
*/
static int rxkad_respond_to_challenge ( struct rxrpc_connection * conn ,
struct sk_buff * skb ,
u32 * _abort_code )
{
2009-09-14 01:17:35 +00:00
const struct rxrpc_key_token * token ;
2007-04-26 15:48:28 -07:00
struct rxkad_challenge challenge ;
2018-02-08 15:59:07 +00:00
struct rxkad_response * resp ;
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
struct rxrpc_skb_priv * sp = rxrpc_skb ( skb ) ;
2017-04-06 10:12:00 +01:00
const char * eproto ;
2007-04-26 15:48:28 -07:00
u32 version , nonce , min_level , abort_code ;
int ret ;
2016-04-04 14:00:36 +01:00
_enter ( " {%d,%x} " , conn - > debug_id , key_serial ( conn - > params . key ) ) ;
2007-04-26 15:48:28 -07:00
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " chall_no_key " ) ;
2017-04-06 10:11:59 +01:00
abort_code = RX_PROTOCOL_ERROR ;
if ( ! conn - > params . key )
goto protocol_error ;
2007-04-26 15:48:28 -07:00
2017-04-06 10:11:59 +01:00
abort_code = RXKADEXPIRED ;
2016-04-04 14:00:36 +01:00
ret = key_validate ( conn - > params . key ) ;
2017-04-06 10:11:59 +01:00
if ( ret < 0 )
goto other_error ;
2007-04-26 15:48:28 -07:00
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " chall_short " ) ;
2007-04-26 15:48:28 -07:00
abort_code = RXKADPACKETSHORT ;
2016-09-30 13:26:03 +01:00
if ( skb_copy_bits ( skb , sizeof ( struct rxrpc_wire_header ) ,
& challenge , sizeof ( challenge ) ) < 0 )
2007-04-26 15:48:28 -07:00
goto protocol_error ;
version = ntohl ( challenge . version ) ;
nonce = ntohl ( challenge . nonce ) ;
min_level = ntohl ( challenge . min_level ) ;
_proto ( " Rx CHALLENGE %%%u { v=%u n=%u ml=%u } " ,
2016-03-04 15:53:46 +00:00
sp - > hdr . serial , version , nonce , min_level ) ;
2007-04-26 15:48:28 -07:00
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " chall_ver " ) ;
2007-04-26 15:48:28 -07:00
abort_code = RXKADINCONSISTENCY ;
if ( version ! = RXKAD_VERSION )
goto protocol_error ;
abort_code = RXKADLEVELFAIL ;
2017-04-06 10:11:59 +01:00
ret = - EACCES ;
2016-04-04 14:00:36 +01:00
if ( conn - > params . security_level < min_level )
2017-04-06 10:11:59 +01:00
goto other_error ;
2007-04-26 15:48:28 -07:00
2016-04-04 14:00:36 +01:00
token = conn - > params . key - > payload . data [ 0 ] ;
2007-04-26 15:48:28 -07:00
/* build the response packet */
2018-02-08 15:59:07 +00:00
resp = kzalloc ( sizeof ( struct rxkad_response ) , GFP_NOFS ) ;
if ( ! resp )
return - ENOMEM ;
resp - > version = htonl ( RXKAD_VERSION ) ;
resp - > encrypted . epoch = htonl ( conn - > proto . epoch ) ;
resp - > encrypted . cid = htonl ( conn - > proto . cid ) ;
resp - > encrypted . securityIndex = htonl ( conn - > security_ix ) ;
resp - > encrypted . inc_nonce = htonl ( nonce + 1 ) ;
resp - > encrypted . level = htonl ( conn - > params . security_level ) ;
resp - > kvno = htonl ( token - > kad - > kvno ) ;
resp - > ticket_len = htonl ( token - > kad - > ticket_len ) ;
resp - > encrypted . call_id [ 0 ] = htonl ( conn - > channels [ 0 ] . call_counter ) ;
resp - > encrypted . call_id [ 1 ] = htonl ( conn - > channels [ 1 ] . call_counter ) ;
resp - > encrypted . call_id [ 2 ] = htonl ( conn - > channels [ 2 ] . call_counter ) ;
resp - > encrypted . call_id [ 3 ] = htonl ( conn - > channels [ 3 ] . call_counter ) ;
2007-04-26 15:48:28 -07:00
/* calculate the response checksum and then do the encryption */
2018-02-08 15:59:07 +00:00
rxkad_calc_response_checksum ( resp ) ;
2019-07-30 15:56:57 +01:00
ret = rxkad_encrypt_response ( conn , resp , token - > kad ) ;
if ( ret = = 0 )
ret = rxkad_send_response ( conn , & sp - > hdr , resp , token - > kad ) ;
2018-02-08 15:59:07 +00:00
kfree ( resp ) ;
return ret ;
2007-04-26 15:48:28 -07:00
protocol_error :
2017-04-06 10:12:00 +01:00
trace_rxrpc_rx_eproto ( NULL , sp - > hdr . serial , eproto ) ;
2017-04-06 10:11:59 +01:00
ret = - EPROTO ;
other_error :
2007-04-26 15:48:28 -07:00
* _abort_code = abort_code ;
2017-04-06 10:11:59 +01:00
return ret ;
2007-04-26 15:48:28 -07:00
}
/*
* decrypt the kerberos IV ticket in the response
*/
static int rxkad_decrypt_ticket ( struct rxrpc_connection * conn ,
2017-04-06 10:12:00 +01:00
struct sk_buff * skb ,
2007-04-26 15:48:28 -07:00
void * ticket , size_t ticket_len ,
struct rxrpc_crypt * _session_key ,
2017-08-29 10:15:40 +01:00
time64_t * _expiry ,
2007-04-26 15:48:28 -07:00
u32 * _abort_code )
{
2016-01-24 21:19:01 +08:00
struct skcipher_request * req ;
2017-04-06 10:12:00 +01:00
struct rxrpc_skb_priv * sp = rxrpc_skb ( skb ) ;
2007-04-26 15:48:28 -07:00
struct rxrpc_crypt iv , key ;
2007-10-27 00:52:07 -07:00
struct scatterlist sg [ 1 ] ;
2007-04-26 15:48:28 -07:00
struct in_addr addr ;
2012-04-15 05:58:06 +00:00
unsigned int life ;
2017-04-06 10:12:00 +01:00
const char * eproto ;
2017-08-29 10:15:40 +01:00
time64_t issue , now ;
2007-04-26 15:48:28 -07:00
bool little_endian ;
int ret ;
2017-04-06 10:12:00 +01:00
u32 abort_code ;
2007-04-26 15:48:28 -07:00
u8 * p , * q , * name , * end ;
_enter ( " {%d},{%x} " , conn - > debug_id , key_serial ( conn - > server_key ) ) ;
* _expiry = 0 ;
ret = key_validate ( conn - > server_key ) ;
if ( ret < 0 ) {
switch ( ret ) {
case - EKEYEXPIRED :
2017-04-06 10:12:00 +01:00
abort_code = RXKADEXPIRED ;
2017-04-06 10:11:59 +01:00
goto other_error ;
2007-04-26 15:48:28 -07:00
default :
2017-04-06 10:12:00 +01:00
abort_code = RXKADNOAUTH ;
2017-04-06 10:11:59 +01:00
goto other_error ;
2007-04-26 15:48:28 -07:00
}
}
2015-10-21 14:04:48 +01:00
ASSERT ( conn - > server_key - > payload . data [ 0 ] ! = NULL ) ;
2007-04-26 15:48:28 -07:00
ASSERTCMP ( ( unsigned long ) ticket & 7UL , = = , 0 ) ;
2015-10-21 14:04:48 +01:00
memcpy ( & iv , & conn - > server_key - > payload . data [ 2 ] , sizeof ( iv ) ) ;
2007-04-26 15:48:28 -07:00
2017-04-06 10:11:59 +01:00
ret = - ENOMEM ;
2016-01-24 21:19:01 +08:00
req = skcipher_request_alloc ( conn - > server_key - > payload . data [ 0 ] ,
GFP_NOFS ) ;
2017-04-06 10:11:59 +01:00
if ( ! req )
goto temporary_error ;
2007-04-26 15:48:28 -07:00
2007-10-27 00:52:07 -07:00
sg_init_one ( & sg [ 0 ] , ticket , ticket_len ) ;
2016-01-24 21:19:01 +08:00
skcipher_request_set_callback ( req , 0 , NULL , NULL ) ;
skcipher_request_set_crypt ( req , sg , sg , ticket_len , iv . x ) ;
crypto_skcipher_decrypt ( req ) ;
skcipher_request_free ( req ) ;
2007-04-26 15:48:28 -07:00
p = ticket ;
end = p + ticket_len ;
2017-04-06 10:12:00 +01:00
# define Z(field) \
2007-04-26 15:48:28 -07:00
( { \
u8 * __str = p ; \
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " rxkad_bad_ " # field ) ; \
2007-04-26 15:48:28 -07:00
q = memchr ( p , 0 , end - p ) ; \
2017-04-06 10:12:00 +01:00
if ( ! q | | q - p > ( field # # _SZ ) ) \
2007-04-26 15:48:28 -07:00
goto bad_ticket ; \
for ( ; p < q ; p + + ) \
if ( ! isprint ( * p ) ) \
goto bad_ticket ; \
p + + ; \
__str ; \
} )
/* extract the ticket flags */
_debug ( " KIV FLAGS: %x " , * p ) ;
little_endian = * p & 1 ;
p + + ;
/* extract the authentication name */
2017-04-06 10:12:00 +01:00
name = Z ( ANAME ) ;
2007-04-26 15:48:28 -07:00
_debug ( " KIV ANAME: %s " , name ) ;
/* extract the principal's instance */
2017-04-06 10:12:00 +01:00
name = Z ( INST ) ;
2007-04-26 15:48:28 -07:00
_debug ( " KIV INST : %s " , name ) ;
/* extract the principal's authentication domain */
2017-04-06 10:12:00 +01:00
name = Z ( REALM ) ;
2007-04-26 15:48:28 -07:00
_debug ( " KIV REALM: %s " , name ) ;
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " rxkad_bad_len " ) ;
2007-04-26 15:48:28 -07:00
if ( end - p < 4 + 8 + 4 + 2 )
goto bad_ticket ;
/* get the IPv4 address of the entity that requested the ticket */
memcpy ( & addr , p , sizeof ( addr ) ) ;
p + = 4 ;
2008-10-31 00:54:56 -07:00
_debug ( " KIV ADDR : %pI4 " , & addr ) ;
2007-04-26 15:48:28 -07:00
/* get the session key from the ticket */
memcpy ( & key , p , sizeof ( key ) ) ;
p + = 8 ;
_debug ( " KIV KEY : %08x %08x " , ntohl ( key . n [ 0 ] ) , ntohl ( key . n [ 1 ] ) ) ;
memcpy ( _session_key , & key , sizeof ( key ) ) ;
/* get the ticket's lifetime */
life = * p + + * 5 * 60 ;
_debug ( " KIV LIFE : %u " , life ) ;
/* get the issue time of the ticket */
if ( little_endian ) {
__le32 stamp ;
memcpy ( & stamp , p , 4 ) ;
2017-08-29 10:15:40 +01:00
issue = rxrpc_u32_to_time64 ( le32_to_cpu ( stamp ) ) ;
2007-04-26 15:48:28 -07:00
} else {
__be32 stamp ;
memcpy ( & stamp , p , 4 ) ;
2017-08-29 10:15:40 +01:00
issue = rxrpc_u32_to_time64 ( be32_to_cpu ( stamp ) ) ;
2007-04-26 15:48:28 -07:00
}
p + = 4 ;
2017-08-29 10:15:40 +01:00
now = ktime_get_real_seconds ( ) ;
_debug ( " KIV ISSUE: %llx [%llx] " , issue , now ) ;
2007-04-26 15:48:28 -07:00
/* check the ticket is in date */
if ( issue > now ) {
2017-04-06 10:12:00 +01:00
abort_code = RXKADNOAUTH ;
2007-04-26 15:48:28 -07:00
ret = - EKEYREJECTED ;
2017-04-06 10:11:59 +01:00
goto other_error ;
2007-04-26 15:48:28 -07:00
}
if ( issue < now - life ) {
2017-04-06 10:12:00 +01:00
abort_code = RXKADEXPIRED ;
2007-04-26 15:48:28 -07:00
ret = - EKEYEXPIRED ;
2017-04-06 10:11:59 +01:00
goto other_error ;
2007-04-26 15:48:28 -07:00
}
* _expiry = issue + life ;
/* get the service name */
2017-04-06 10:12:00 +01:00
name = Z ( SNAME ) ;
2007-04-26 15:48:28 -07:00
_debug ( " KIV SNAME: %s " , name ) ;
/* get the service instance name */
2017-04-06 10:12:00 +01:00
name = Z ( INST ) ;
2007-04-26 15:48:28 -07:00
_debug ( " KIV SINST: %s " , name ) ;
2017-04-06 10:11:59 +01:00
return 0 ;
2007-04-26 15:48:28 -07:00
bad_ticket :
2017-04-06 10:12:00 +01:00
trace_rxrpc_rx_eproto ( NULL , sp - > hdr . serial , eproto ) ;
abort_code = RXKADBADTICKET ;
2017-04-06 10:11:59 +01:00
ret = - EPROTO ;
other_error :
2017-04-06 10:12:00 +01:00
* _abort_code = abort_code ;
2017-04-06 10:11:59 +01:00
return ret ;
temporary_error :
return ret ;
2007-04-26 15:48:28 -07:00
}
/*
* decrypt the response packet
*/
static void rxkad_decrypt_response ( struct rxrpc_connection * conn ,
struct rxkad_response * resp ,
const struct rxrpc_crypt * session_key )
{
2019-07-30 15:56:57 +01:00
struct skcipher_request * req = rxkad_ci_req ;
2016-06-26 14:55:24 -07:00
struct scatterlist sg [ 1 ] ;
2007-04-26 15:48:28 -07:00
struct rxrpc_crypt iv ;
_enter ( " ,,%08x%08x " ,
ntohl ( session_key - > n [ 0 ] ) , ntohl ( session_key - > n [ 1 ] ) ) ;
mutex_lock ( & rxkad_ci_mutex ) ;
2018-09-18 19:10:47 -07:00
if ( crypto_sync_skcipher_setkey ( rxkad_ci , session_key - > x ,
2019-07-30 15:56:57 +01:00
sizeof ( * session_key ) ) < 0 )
2007-04-26 15:48:28 -07:00
BUG ( ) ;
memcpy ( & iv , session_key , sizeof ( iv ) ) ;
2016-06-26 14:55:24 -07:00
sg_init_table ( sg , 1 ) ;
sg_set_buf ( sg , & resp - > encrypted , sizeof ( resp - > encrypted ) ) ;
2018-09-18 19:10:47 -07:00
skcipher_request_set_sync_tfm ( req , rxkad_ci ) ;
2016-01-24 21:19:01 +08:00
skcipher_request_set_callback ( req , 0 , NULL , NULL ) ;
skcipher_request_set_crypt ( req , sg , sg , sizeof ( resp - > encrypted ) , iv . x ) ;
crypto_skcipher_decrypt ( req ) ;
skcipher_request_zero ( req ) ;
2007-04-26 15:48:28 -07:00
mutex_unlock ( & rxkad_ci_mutex ) ;
_leave ( " " ) ;
}
/*
* verify a response
*/
static int rxkad_verify_response ( struct rxrpc_connection * conn ,
struct sk_buff * skb ,
u32 * _abort_code )
{
2018-02-08 15:59:07 +00:00
struct rxkad_response * response ;
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
struct rxrpc_skb_priv * sp = rxrpc_skb ( skb ) ;
2007-04-26 15:48:28 -07:00
struct rxrpc_crypt session_key ;
2017-04-06 10:12:00 +01:00
const char * eproto ;
2017-08-29 10:15:40 +01:00
time64_t expiry ;
2007-04-26 15:48:28 -07:00
void * ticket ;
2008-03-29 03:08:38 +00:00
u32 abort_code , version , kvno , ticket_len , level ;
__be32 csum ;
rxrpc: Call channels should have separate call number spaces
Each channel on a connection has a separate, independent number space from
which to allocate callNumber values. It is entirely possible, for example,
to have a connection with four active calls, each with call number 1.
Note that the callNumber values for any particular channel don't have to
start at 1, but they are supposed to increment monotonically for that
channel from a client's perspective and may not be reused once the call
number is transmitted (until the epoch cycles all the way back round).
Currently, however, call numbers are allocated on a per-connection basis
and, further, are held in an rb-tree. The rb-tree is redundant as the four
channel pointers in the rxrpc_connection struct are entirely capable of
pointing to all the calls currently in progress on a connection.
To this end, make the following changes:
(1) Handle call number allocation independently per channel.
(2) Get rid of the conn->calls rb-tree. This is overkill as a connection
may have a maximum of four calls in progress at any one time. Use the
pointers in the channels[] array instead, indexed by the channel
number from the packet.
(3) For each channel, save the result of the last call that was in
progress on that channel in conn->channels[] so that the final ACK or
ABORT packet can be replayed if necessary. Any call earlier than that
is just ignored. If we've seen the next call number in a packet, the
last one is most definitely defunct.
(4) When generating a RESPONSE packet for a connection, the call number
counter for each channel must be included in it.
(5) When parsing a RESPONSE packet for a connection, the call number
counters contained therein should be used to set the minimum expected
call numbers on each channel.
To do in future commits:
(1) Replay terminal packets based on the last call stored in
conn->channels[].
(2) Connections should be retired before the callNumber space on any
channel runs out.
(3) A server is expected to disregard or reject any new incoming call that
has a call number less than the current call number counter. The call
number counter for that channel must be advanced to the new call
number.
Note that the server cannot just require that the next call that it
sees on a channel be exactly the call number counter + 1 because then
there's a scenario that could cause a problem: The client transmits a
packet to initiate a connection, the network goes out, the server
sends an ACK (which gets lost), the client sends an ABORT (which also
gets lost); the network then reconnects, the client then reuses the
call number for the next call (it doesn't know the server already saw
the call number), but the server thinks it already has the first
packet of this call (it doesn't know that the client doesn't know that
it saw the call number the first time).
Signed-off-by: David Howells <dhowells@redhat.com>
2016-06-27 14:39:44 +01:00
int ret , i ;
2007-04-26 15:48:28 -07:00
_enter ( " {%d,%x} " , conn - > debug_id , key_serial ( conn - > server_key ) ) ;
2018-02-08 15:59:07 +00:00
ret = - ENOMEM ;
response = kzalloc ( sizeof ( struct rxkad_response ) , GFP_NOFS ) ;
if ( ! response )
goto temporary_error ;
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " rxkad_rsp_short " ) ;
2007-04-26 15:48:28 -07:00
abort_code = RXKADPACKETSHORT ;
2016-09-30 13:26:03 +01:00
if ( skb_copy_bits ( skb , sizeof ( struct rxrpc_wire_header ) ,
2018-02-08 15:59:07 +00:00
response , sizeof ( * response ) ) < 0 )
2007-04-26 15:48:28 -07:00
goto protocol_error ;
2018-02-08 15:59:07 +00:00
if ( ! pskb_pull ( skb , sizeof ( * response ) ) )
2007-04-26 15:48:28 -07:00
BUG ( ) ;
2018-02-08 15:59:07 +00:00
version = ntohl ( response - > version ) ;
ticket_len = ntohl ( response - > ticket_len ) ;
kvno = ntohl ( response - > kvno ) ;
2007-04-26 15:48:28 -07:00
_proto ( " Rx RESPONSE %%%u { v=%u kv=%u tl=%u } " ,
2016-03-04 15:53:46 +00:00
sp - > hdr . serial , version , kvno , ticket_len ) ;
2007-04-26 15:48:28 -07:00
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " rxkad_rsp_ver " ) ;
2007-04-26 15:48:28 -07:00
abort_code = RXKADINCONSISTENCY ;
if ( version ! = RXKAD_VERSION )
2007-12-07 04:31:47 -08:00
goto protocol_error ;
2007-04-26 15:48:28 -07:00
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " rxkad_rsp_tktlen " ) ;
2007-04-26 15:48:28 -07:00
abort_code = RXKADTICKETLEN ;
if ( ticket_len < 4 | | ticket_len > MAXKRB5TICKETLEN )
goto protocol_error ;
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " rxkad_rsp_unkkey " ) ;
2007-04-26 15:48:28 -07:00
abort_code = RXKADUNKNOWNKEY ;
if ( kvno > = RXKAD_TKT_TYPE_KERBEROS_V5 )
goto protocol_error ;
/* extract the kerberos ticket and decrypt and decode it */
2017-04-06 10:11:59 +01:00
ret = - ENOMEM ;
2007-04-26 15:48:28 -07:00
ticket = kmalloc ( ticket_len , GFP_NOFS ) ;
if ( ! ticket )
2017-04-06 10:11:59 +01:00
goto temporary_error ;
2007-04-26 15:48:28 -07:00
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " rxkad_tkt_short " ) ;
2007-04-26 15:48:28 -07:00
abort_code = RXKADPACKETSHORT ;
2016-09-30 13:26:03 +01:00
if ( skb_copy_bits ( skb , sizeof ( struct rxrpc_wire_header ) ,
ticket , ticket_len ) < 0 )
2007-04-26 15:48:28 -07:00
goto protocol_error_free ;
2017-04-06 10:12:00 +01:00
ret = rxkad_decrypt_ticket ( conn , skb , ticket , ticket_len , & session_key ,
2017-04-06 10:11:59 +01:00
& expiry , _abort_code ) ;
if ( ret < 0 )
2020-05-22 13:45:18 -05:00
goto temporary_error_free_ticket ;
2007-04-26 15:48:28 -07:00
/* use the session key from inside the ticket to decrypt the
* response */
2018-02-08 15:59:07 +00:00
rxkad_decrypt_response ( conn , response , & session_key ) ;
2007-04-26 15:48:28 -07:00
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " rxkad_rsp_param " ) ;
2007-04-26 15:48:28 -07:00
abort_code = RXKADSEALEDINCON ;
2018-02-08 15:59:07 +00:00
if ( ntohl ( response - > encrypted . epoch ) ! = conn - > proto . epoch )
2007-04-26 15:48:28 -07:00
goto protocol_error_free ;
2018-02-08 15:59:07 +00:00
if ( ntohl ( response - > encrypted . cid ) ! = conn - > proto . cid )
2007-04-26 15:48:28 -07:00
goto protocol_error_free ;
2018-02-08 15:59:07 +00:00
if ( ntohl ( response - > encrypted . securityIndex ) ! = conn - > security_ix )
2007-04-26 15:48:28 -07:00
goto protocol_error_free ;
2018-02-08 15:59:07 +00:00
csum = response - > encrypted . checksum ;
response - > encrypted . checksum = 0 ;
rxkad_calc_response_checksum ( response ) ;
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " rxkad_rsp_csum " ) ;
2018-02-08 15:59:07 +00:00
if ( response - > encrypted . checksum ! = csum )
2007-04-26 15:48:28 -07:00
goto protocol_error_free ;
rxrpc: Call channels should have separate call number spaces
Each channel on a connection has a separate, independent number space from
which to allocate callNumber values. It is entirely possible, for example,
to have a connection with four active calls, each with call number 1.
Note that the callNumber values for any particular channel don't have to
start at 1, but they are supposed to increment monotonically for that
channel from a client's perspective and may not be reused once the call
number is transmitted (until the epoch cycles all the way back round).
Currently, however, call numbers are allocated on a per-connection basis
and, further, are held in an rb-tree. The rb-tree is redundant as the four
channel pointers in the rxrpc_connection struct are entirely capable of
pointing to all the calls currently in progress on a connection.
To this end, make the following changes:
(1) Handle call number allocation independently per channel.
(2) Get rid of the conn->calls rb-tree. This is overkill as a connection
may have a maximum of four calls in progress at any one time. Use the
pointers in the channels[] array instead, indexed by the channel
number from the packet.
(3) For each channel, save the result of the last call that was in
progress on that channel in conn->channels[] so that the final ACK or
ABORT packet can be replayed if necessary. Any call earlier than that
is just ignored. If we've seen the next call number in a packet, the
last one is most definitely defunct.
(4) When generating a RESPONSE packet for a connection, the call number
counter for each channel must be included in it.
(5) When parsing a RESPONSE packet for a connection, the call number
counters contained therein should be used to set the minimum expected
call numbers on each channel.
To do in future commits:
(1) Replay terminal packets based on the last call stored in
conn->channels[].
(2) Connections should be retired before the callNumber space on any
channel runs out.
(3) A server is expected to disregard or reject any new incoming call that
has a call number less than the current call number counter. The call
number counter for that channel must be advanced to the new call
number.
Note that the server cannot just require that the next call that it
sees on a channel be exactly the call number counter + 1 because then
there's a scenario that could cause a problem: The client transmits a
packet to initiate a connection, the network goes out, the server
sends an ACK (which gets lost), the client sends an ABORT (which also
gets lost); the network then reconnects, the client then reuses the
call number for the next call (it doesn't know the server already saw
the call number), but the server thinks it already has the first
packet of this call (it doesn't know that the client doesn't know that
it saw the call number the first time).
Signed-off-by: David Howells <dhowells@redhat.com>
2016-06-27 14:39:44 +01:00
spin_lock ( & conn - > channel_lock ) ;
for ( i = 0 ; i < RXRPC_MAXCALLS ; i + + ) {
struct rxrpc_call * call ;
2018-02-08 15:59:07 +00:00
u32 call_id = ntohl ( response - > encrypted . call_id [ i ] ) ;
rxrpc: Call channels should have separate call number spaces
Each channel on a connection has a separate, independent number space from
which to allocate callNumber values. It is entirely possible, for example,
to have a connection with four active calls, each with call number 1.
Note that the callNumber values for any particular channel don't have to
start at 1, but they are supposed to increment monotonically for that
channel from a client's perspective and may not be reused once the call
number is transmitted (until the epoch cycles all the way back round).
Currently, however, call numbers are allocated on a per-connection basis
and, further, are held in an rb-tree. The rb-tree is redundant as the four
channel pointers in the rxrpc_connection struct are entirely capable of
pointing to all the calls currently in progress on a connection.
To this end, make the following changes:
(1) Handle call number allocation independently per channel.
(2) Get rid of the conn->calls rb-tree. This is overkill as a connection
may have a maximum of four calls in progress at any one time. Use the
pointers in the channels[] array instead, indexed by the channel
number from the packet.
(3) For each channel, save the result of the last call that was in
progress on that channel in conn->channels[] so that the final ACK or
ABORT packet can be replayed if necessary. Any call earlier than that
is just ignored. If we've seen the next call number in a packet, the
last one is most definitely defunct.
(4) When generating a RESPONSE packet for a connection, the call number
counter for each channel must be included in it.
(5) When parsing a RESPONSE packet for a connection, the call number
counters contained therein should be used to set the minimum expected
call numbers on each channel.
To do in future commits:
(1) Replay terminal packets based on the last call stored in
conn->channels[].
(2) Connections should be retired before the callNumber space on any
channel runs out.
(3) A server is expected to disregard or reject any new incoming call that
has a call number less than the current call number counter. The call
number counter for that channel must be advanced to the new call
number.
Note that the server cannot just require that the next call that it
sees on a channel be exactly the call number counter + 1 because then
there's a scenario that could cause a problem: The client transmits a
packet to initiate a connection, the network goes out, the server
sends an ACK (which gets lost), the client sends an ABORT (which also
gets lost); the network then reconnects, the client then reuses the
call number for the next call (it doesn't know the server already saw
the call number), but the server thinks it already has the first
packet of this call (it doesn't know that the client doesn't know that
it saw the call number the first time).
Signed-off-by: David Howells <dhowells@redhat.com>
2016-06-27 14:39:44 +01:00
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " rxkad_rsp_callid " ) ;
rxrpc: Call channels should have separate call number spaces
Each channel on a connection has a separate, independent number space from
which to allocate callNumber values. It is entirely possible, for example,
to have a connection with four active calls, each with call number 1.
Note that the callNumber values for any particular channel don't have to
start at 1, but they are supposed to increment monotonically for that
channel from a client's perspective and may not be reused once the call
number is transmitted (until the epoch cycles all the way back round).
Currently, however, call numbers are allocated on a per-connection basis
and, further, are held in an rb-tree. The rb-tree is redundant as the four
channel pointers in the rxrpc_connection struct are entirely capable of
pointing to all the calls currently in progress on a connection.
To this end, make the following changes:
(1) Handle call number allocation independently per channel.
(2) Get rid of the conn->calls rb-tree. This is overkill as a connection
may have a maximum of four calls in progress at any one time. Use the
pointers in the channels[] array instead, indexed by the channel
number from the packet.
(3) For each channel, save the result of the last call that was in
progress on that channel in conn->channels[] so that the final ACK or
ABORT packet can be replayed if necessary. Any call earlier than that
is just ignored. If we've seen the next call number in a packet, the
last one is most definitely defunct.
(4) When generating a RESPONSE packet for a connection, the call number
counter for each channel must be included in it.
(5) When parsing a RESPONSE packet for a connection, the call number
counters contained therein should be used to set the minimum expected
call numbers on each channel.
To do in future commits:
(1) Replay terminal packets based on the last call stored in
conn->channels[].
(2) Connections should be retired before the callNumber space on any
channel runs out.
(3) A server is expected to disregard or reject any new incoming call that
has a call number less than the current call number counter. The call
number counter for that channel must be advanced to the new call
number.
Note that the server cannot just require that the next call that it
sees on a channel be exactly the call number counter + 1 because then
there's a scenario that could cause a problem: The client transmits a
packet to initiate a connection, the network goes out, the server
sends an ACK (which gets lost), the client sends an ABORT (which also
gets lost); the network then reconnects, the client then reuses the
call number for the next call (it doesn't know the server already saw
the call number), but the server thinks it already has the first
packet of this call (it doesn't know that the client doesn't know that
it saw the call number the first time).
Signed-off-by: David Howells <dhowells@redhat.com>
2016-06-27 14:39:44 +01:00
if ( call_id > INT_MAX )
goto protocol_error_unlock ;
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " rxkad_rsp_callctr " ) ;
rxrpc: Call channels should have separate call number spaces
Each channel on a connection has a separate, independent number space from
which to allocate callNumber values. It is entirely possible, for example,
to have a connection with four active calls, each with call number 1.
Note that the callNumber values for any particular channel don't have to
start at 1, but they are supposed to increment monotonically for that
channel from a client's perspective and may not be reused once the call
number is transmitted (until the epoch cycles all the way back round).
Currently, however, call numbers are allocated on a per-connection basis
and, further, are held in an rb-tree. The rb-tree is redundant as the four
channel pointers in the rxrpc_connection struct are entirely capable of
pointing to all the calls currently in progress on a connection.
To this end, make the following changes:
(1) Handle call number allocation independently per channel.
(2) Get rid of the conn->calls rb-tree. This is overkill as a connection
may have a maximum of four calls in progress at any one time. Use the
pointers in the channels[] array instead, indexed by the channel
number from the packet.
(3) For each channel, save the result of the last call that was in
progress on that channel in conn->channels[] so that the final ACK or
ABORT packet can be replayed if necessary. Any call earlier than that
is just ignored. If we've seen the next call number in a packet, the
last one is most definitely defunct.
(4) When generating a RESPONSE packet for a connection, the call number
counter for each channel must be included in it.
(5) When parsing a RESPONSE packet for a connection, the call number
counters contained therein should be used to set the minimum expected
call numbers on each channel.
To do in future commits:
(1) Replay terminal packets based on the last call stored in
conn->channels[].
(2) Connections should be retired before the callNumber space on any
channel runs out.
(3) A server is expected to disregard or reject any new incoming call that
has a call number less than the current call number counter. The call
number counter for that channel must be advanced to the new call
number.
Note that the server cannot just require that the next call that it
sees on a channel be exactly the call number counter + 1 because then
there's a scenario that could cause a problem: The client transmits a
packet to initiate a connection, the network goes out, the server
sends an ACK (which gets lost), the client sends an ABORT (which also
gets lost); the network then reconnects, the client then reuses the
call number for the next call (it doesn't know the server already saw
the call number), but the server thinks it already has the first
packet of this call (it doesn't know that the client doesn't know that
it saw the call number the first time).
Signed-off-by: David Howells <dhowells@redhat.com>
2016-06-27 14:39:44 +01:00
if ( call_id < conn - > channels [ i ] . call_counter )
goto protocol_error_unlock ;
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " rxkad_rsp_callst " ) ;
rxrpc: Call channels should have separate call number spaces
Each channel on a connection has a separate, independent number space from
which to allocate callNumber values. It is entirely possible, for example,
to have a connection with four active calls, each with call number 1.
Note that the callNumber values for any particular channel don't have to
start at 1, but they are supposed to increment monotonically for that
channel from a client's perspective and may not be reused once the call
number is transmitted (until the epoch cycles all the way back round).
Currently, however, call numbers are allocated on a per-connection basis
and, further, are held in an rb-tree. The rb-tree is redundant as the four
channel pointers in the rxrpc_connection struct are entirely capable of
pointing to all the calls currently in progress on a connection.
To this end, make the following changes:
(1) Handle call number allocation independently per channel.
(2) Get rid of the conn->calls rb-tree. This is overkill as a connection
may have a maximum of four calls in progress at any one time. Use the
pointers in the channels[] array instead, indexed by the channel
number from the packet.
(3) For each channel, save the result of the last call that was in
progress on that channel in conn->channels[] so that the final ACK or
ABORT packet can be replayed if necessary. Any call earlier than that
is just ignored. If we've seen the next call number in a packet, the
last one is most definitely defunct.
(4) When generating a RESPONSE packet for a connection, the call number
counter for each channel must be included in it.
(5) When parsing a RESPONSE packet for a connection, the call number
counters contained therein should be used to set the minimum expected
call numbers on each channel.
To do in future commits:
(1) Replay terminal packets based on the last call stored in
conn->channels[].
(2) Connections should be retired before the callNumber space on any
channel runs out.
(3) A server is expected to disregard or reject any new incoming call that
has a call number less than the current call number counter. The call
number counter for that channel must be advanced to the new call
number.
Note that the server cannot just require that the next call that it
sees on a channel be exactly the call number counter + 1 because then
there's a scenario that could cause a problem: The client transmits a
packet to initiate a connection, the network goes out, the server
sends an ACK (which gets lost), the client sends an ABORT (which also
gets lost); the network then reconnects, the client then reuses the
call number for the next call (it doesn't know the server already saw
the call number), but the server thinks it already has the first
packet of this call (it doesn't know that the client doesn't know that
it saw the call number the first time).
Signed-off-by: David Howells <dhowells@redhat.com>
2016-06-27 14:39:44 +01:00
if ( call_id > conn - > channels [ i ] . call_counter ) {
call = rcu_dereference_protected (
conn - > channels [ i ] . call ,
lockdep_is_held ( & conn - > channel_lock ) ) ;
if ( call & & call - > state < RXRPC_CALL_COMPLETE )
goto protocol_error_unlock ;
conn - > channels [ i ] . call_counter = call_id ;
}
}
spin_unlock ( & conn - > channel_lock ) ;
2007-04-26 15:48:28 -07:00
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " rxkad_rsp_seq " ) ;
2007-04-26 15:48:28 -07:00
abort_code = RXKADOUTOFSEQUENCE ;
2018-02-08 15:59:07 +00:00
if ( ntohl ( response - > encrypted . inc_nonce ) ! = conn - > security_nonce + 1 )
2007-04-26 15:48:28 -07:00
goto protocol_error_free ;
2017-04-06 10:12:00 +01:00
eproto = tracepoint_string ( " rxkad_rsp_level " ) ;
2007-04-26 15:48:28 -07:00
abort_code = RXKADLEVELFAIL ;
2018-02-08 15:59:07 +00:00
level = ntohl ( response - > encrypted . level ) ;
2007-04-26 15:48:28 -07:00
if ( level > RXRPC_SECURITY_ENCRYPT )
goto protocol_error_free ;
2016-04-04 14:00:36 +01:00
conn - > params . security_level = level ;
2007-04-26 15:48:28 -07:00
/* create a key to hold the security data and expiration time - after
* this the connection security can be handled in exactly the same way
* as for a client connection */
ret = rxrpc_get_server_data_key ( conn , & session_key , expiry , kvno ) ;
2017-04-06 10:11:59 +01:00
if ( ret < 0 )
2018-02-08 15:59:07 +00:00
goto temporary_error_free_ticket ;
2007-04-26 15:48:28 -07:00
kfree ( ticket ) ;
2018-02-08 15:59:07 +00:00
kfree ( response ) ;
2007-04-26 15:48:28 -07:00
_leave ( " = 0 " ) ;
return 0 ;
rxrpc: Call channels should have separate call number spaces
Each channel on a connection has a separate, independent number space from
which to allocate callNumber values. It is entirely possible, for example,
to have a connection with four active calls, each with call number 1.
Note that the callNumber values for any particular channel don't have to
start at 1, but they are supposed to increment monotonically for that
channel from a client's perspective and may not be reused once the call
number is transmitted (until the epoch cycles all the way back round).
Currently, however, call numbers are allocated on a per-connection basis
and, further, are held in an rb-tree. The rb-tree is redundant as the four
channel pointers in the rxrpc_connection struct are entirely capable of
pointing to all the calls currently in progress on a connection.
To this end, make the following changes:
(1) Handle call number allocation independently per channel.
(2) Get rid of the conn->calls rb-tree. This is overkill as a connection
may have a maximum of four calls in progress at any one time. Use the
pointers in the channels[] array instead, indexed by the channel
number from the packet.
(3) For each channel, save the result of the last call that was in
progress on that channel in conn->channels[] so that the final ACK or
ABORT packet can be replayed if necessary. Any call earlier than that
is just ignored. If we've seen the next call number in a packet, the
last one is most definitely defunct.
(4) When generating a RESPONSE packet for a connection, the call number
counter for each channel must be included in it.
(5) When parsing a RESPONSE packet for a connection, the call number
counters contained therein should be used to set the minimum expected
call numbers on each channel.
To do in future commits:
(1) Replay terminal packets based on the last call stored in
conn->channels[].
(2) Connections should be retired before the callNumber space on any
channel runs out.
(3) A server is expected to disregard or reject any new incoming call that
has a call number less than the current call number counter. The call
number counter for that channel must be advanced to the new call
number.
Note that the server cannot just require that the next call that it
sees on a channel be exactly the call number counter + 1 because then
there's a scenario that could cause a problem: The client transmits a
packet to initiate a connection, the network goes out, the server
sends an ACK (which gets lost), the client sends an ABORT (which also
gets lost); the network then reconnects, the client then reuses the
call number for the next call (it doesn't know the server already saw
the call number), but the server thinks it already has the first
packet of this call (it doesn't know that the client doesn't know that
it saw the call number the first time).
Signed-off-by: David Howells <dhowells@redhat.com>
2016-06-27 14:39:44 +01:00
protocol_error_unlock :
spin_unlock ( & conn - > channel_lock ) ;
2007-04-26 15:48:28 -07:00
protocol_error_free :
kfree ( ticket ) ;
protocol_error :
2018-02-08 15:59:07 +00:00
kfree ( response ) ;
2017-04-06 10:12:00 +01:00
trace_rxrpc_rx_eproto ( NULL , sp - > hdr . serial , eproto ) ;
2007-04-26 15:48:28 -07:00
* _abort_code = abort_code ;
return - EPROTO ;
2017-04-06 10:11:59 +01:00
2018-02-08 15:59:07 +00:00
temporary_error_free_ticket :
2017-04-06 10:11:59 +01:00
kfree ( ticket ) ;
2018-02-08 15:59:07 +00:00
kfree ( response ) ;
2017-04-06 10:11:59 +01:00
temporary_error :
/* Ignore the response packet if we got a temporary error such as
* ENOMEM . We just want to send the challenge again . Note that we
* also come out this way if the ticket decryption fails .
*/
return ret ;
2007-04-26 15:48:28 -07:00
}
/*
* clear the connection security
*/
static void rxkad_clear ( struct rxrpc_connection * conn )
{
_enter ( " " ) ;
if ( conn - > cipher )
2018-09-18 19:10:47 -07:00
crypto_free_sync_skcipher ( conn - > cipher ) ;
2007-04-26 15:48:28 -07:00
}
2016-04-07 17:23:51 +01:00
/*
* Initialise the rxkad security service .
*/
static int rxkad_init ( void )
{
2019-07-30 15:56:57 +01:00
struct crypto_sync_skcipher * tfm ;
struct skcipher_request * req ;
2016-04-07 17:23:51 +01:00
/* pin the cipher we need so that the crypto layer doesn't invoke
* keventd to go get it */
2019-07-30 15:56:57 +01:00
tfm = crypto_alloc_sync_skcipher ( " pcbc(fcrypt) " , 0 , 0 ) ;
if ( IS_ERR ( tfm ) )
return PTR_ERR ( tfm ) ;
req = skcipher_request_alloc ( & tfm - > base , GFP_KERNEL ) ;
if ( ! req )
goto nomem_tfm ;
rxkad_ci_req = req ;
rxkad_ci = tfm ;
return 0 ;
nomem_tfm :
crypto_free_sync_skcipher ( tfm ) ;
return - ENOMEM ;
2016-04-07 17:23:51 +01:00
}
/*
* Clean up the rxkad security service .
*/
static void rxkad_exit ( void )
{
2019-07-30 15:56:57 +01:00
crypto_free_sync_skcipher ( rxkad_ci ) ;
skcipher_request_free ( rxkad_ci_req ) ;
2016-04-07 17:23:51 +01:00
}
2007-04-26 15:48:28 -07:00
/*
* RxRPC Kerberos - based security
*/
2016-04-07 17:23:51 +01:00
const struct rxrpc_security rxkad = {
2007-04-26 15:48:28 -07:00
. name = " rxkad " ,
2009-09-14 01:17:30 +00:00
. security_index = RXRPC_SECURITY_RXKAD ,
2019-12-20 16:17:16 +00:00
. no_key_abort = RXKADUNKNOWNKEY ,
2016-04-07 17:23:51 +01:00
. init = rxkad_init ,
. exit = rxkad_exit ,
2007-04-26 15:48:28 -07:00
. init_connection_security = rxkad_init_connection_security ,
. prime_packet_security = rxkad_prime_packet_security ,
. secure_packet = rxkad_secure_packet ,
. verify_packet = rxkad_verify_packet ,
2019-07-30 15:56:57 +01:00
. free_call_crypto = rxkad_free_call_crypto ,
rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:
(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).
(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.
(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.
(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).
(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).
(6) sk_buffs no longer take a ref on their parent call.
To make this work, the following changes are made:
(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.
(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.
Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.
(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).
The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.
The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.
Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.
Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.
(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around
(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.
However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.
(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.
Additional changes:
(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).
(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.
(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.
Future additional changes that will need to be considered:
(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.
(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.
Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
. locate_data = rxkad_locate_data ,
2007-04-26 15:48:28 -07:00
. issue_challenge = rxkad_issue_challenge ,
. respond_to_challenge = rxkad_respond_to_challenge ,
. verify_response = rxkad_verify_response ,
. clear = rxkad_clear ,
} ;