2005-04-17 02:20:36 +04:00
/*
2005-08-12 19:51:49 +04:00
* inet_diag . c Module for monitoring INET transport protocols sockets .
2005-04-17 02:20:36 +04:00
*
* Authors : Alexey Kuznetsov , < kuznet @ ms2 . inr . ac . ru >
*
* This program is free software ; you can redistribute it and / or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation ; either version
* 2 of the License , or ( at your option ) any later version .
*/
2007-08-29 02:50:33 +04:00
# include <linux/kernel.h>
2005-04-17 02:20:36 +04:00
# include <linux/module.h>
# include <linux/types.h>
# include <linux/fcntl.h>
# include <linux/random.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 11:04:11 +03:00
# include <linux/slab.h>
2005-04-17 02:20:36 +04:00
# include <linux/cache.h>
# include <linux/init.h>
# include <linux/time.h>
# include <net/icmp.h>
# include <net/tcp.h>
# include <net/ipv6.h>
# include <net/inet_common.h>
2005-08-12 16:19:38 +04:00
# include <net/inet_connection_sock.h>
# include <net/inet_hashtables.h>
# include <net/inet_timewait_sock.h>
# include <net/inet6_hashtables.h>
2007-03-26 10:06:12 +04:00
# include <net/netlink.h>
2005-04-17 02:20:36 +04:00
# include <linux/inet.h>
# include <linux/stddef.h>
2005-08-12 19:56:38 +04:00
# include <linux/inet_diag.h>
2005-04-17 02:20:36 +04:00
2005-08-12 16:27:49 +04:00
static const struct inet_diag_handler * * inet_diag_table ;
2005-08-12 19:51:49 +04:00
struct inet_diag_entry {
2006-09-28 05:44:30 +04:00
__be32 * saddr ;
__be32 * daddr ;
2005-04-17 02:20:36 +04:00
u16 sport ;
u16 dport ;
u16 family ;
u16 userlocks ;
} ;
2005-08-12 19:51:49 +04:00
static struct sock * idiagnl ;
2005-04-17 02:20:36 +04:00
2005-08-12 19:51:49 +04:00
# define INET_DIAG_PUT(skb, attrtype, attrlen) \
2005-06-23 23:20:36 +04:00
RTA_DATA ( __RTA_PUT ( skb , attrtype , attrlen ) )
2005-04-17 02:20:36 +04:00
2007-12-03 07:51:25 +03:00
static DEFINE_MUTEX ( inet_diag_table_mutex ) ;
static const struct inet_diag_handler * inet_diag_lock_handler ( int type )
{
if ( ! inet_diag_table [ type ] )
request_module ( " net-pf-%d-proto-%d-type-%d " , PF_NETLINK ,
NETLINK_INET_DIAG , type ) ;
mutex_lock ( & inet_diag_table_mutex ) ;
if ( ! inet_diag_table [ type ] )
return ERR_PTR ( - ENOENT ) ;
return inet_diag_table [ type ] ;
}
static inline void inet_diag_unlock_handler (
const struct inet_diag_handler * handler )
{
mutex_unlock ( & inet_diag_table_mutex ) ;
}
2006-01-10 01:56:56 +03:00
static int inet_csk_diag_fill ( struct sock * sk ,
struct sk_buff * skb ,
int ext , u32 pid , u32 seq , u16 nlmsg_flags ,
const struct nlmsghdr * unlh )
2005-04-17 02:20:36 +04:00
{
2005-08-10 07:10:42 +04:00
const struct inet_sock * inet = inet_sk ( sk ) ;
const struct inet_connection_sock * icsk = inet_csk ( sk ) ;
2005-08-12 19:51:49 +04:00
struct inet_diag_msg * r ;
2005-04-17 02:20:36 +04:00
struct nlmsghdr * nlh ;
2005-08-12 16:27:49 +04:00
void * info = NULL ;
2005-08-12 19:51:49 +04:00
struct inet_diag_meminfo * minfo = NULL ;
2007-04-20 07:29:13 +04:00
unsigned char * b = skb_tail_pointer ( skb ) ;
2005-08-12 16:27:49 +04:00
const struct inet_diag_handler * handler ;
handler = inet_diag_table [ unlh - > nlmsg_type ] ;
BUG_ON ( handler = = NULL ) ;
2005-04-17 02:20:36 +04:00
2005-08-10 12:54:28 +04:00
nlh = NLMSG_PUT ( skb , pid , seq , unlh - > nlmsg_type , sizeof ( * r ) ) ;
2005-04-17 02:20:36 +04:00
nlh - > nlmsg_flags = nlmsg_flags ;
2005-08-12 16:27:49 +04:00
2005-04-17 02:20:36 +04:00
r = NLMSG_DATA ( nlh ) ;
2006-01-10 01:56:38 +03:00
BUG_ON ( sk - > sk_state = = TCP_TIME_WAIT ) ;
if ( ext & ( 1 < < ( INET_DIAG_MEMINFO - 1 ) ) )
minfo = INET_DIAG_PUT ( skb , INET_DIAG_MEMINFO , sizeof ( * minfo ) ) ;
if ( ext & ( 1 < < ( INET_DIAG_INFO - 1 ) ) )
info = INET_DIAG_PUT ( skb , INET_DIAG_INFO ,
handler - > idiag_info_size ) ;
if ( ( ext & ( 1 < < ( INET_DIAG_CONG - 1 ) ) ) & & icsk - > icsk_ca_ops ) {
const size_t len = strlen ( icsk - > icsk_ca_ops - > name ) ;
strcpy ( INET_DIAG_PUT ( skb , INET_DIAG_CONG , len + 1 ) ,
icsk - > icsk_ca_ops - > name ) ;
2005-04-17 02:20:36 +04:00
}
2006-01-10 01:56:38 +03:00
2011-10-12 13:00:35 +04:00
if ( ( ext & ( 1 < < ( INET_DIAG_TOS - 1 ) ) ) & & ( sk - > sk_family ! = AF_INET6 ) )
RTA_PUT_U8 ( skb , INET_DIAG_TOS , inet - > tos ) ;
2005-08-12 19:51:49 +04:00
r - > idiag_family = sk - > sk_family ;
r - > idiag_state = sk - > sk_state ;
r - > idiag_timer = 0 ;
r - > idiag_retrans = 0 ;
2005-04-17 02:20:36 +04:00
2005-08-12 19:51:49 +04:00
r - > id . idiag_if = sk - > sk_bound_dev_if ;
r - > id . idiag_cookie [ 0 ] = ( u32 ) ( unsigned long ) sk ;
r - > id . idiag_cookie [ 1 ] = ( u32 ) ( ( ( unsigned long ) sk > > 31 ) > > 1 ) ;
2005-04-17 02:20:36 +04:00
2009-10-15 10:30:45 +04:00
r - > id . idiag_sport = inet - > inet_sport ;
r - > id . idiag_dport = inet - > inet_dport ;
r - > id . idiag_src [ 0 ] = inet - > inet_rcv_saddr ;
r - > id . idiag_dst [ 0 ] = inet - > inet_daddr ;
2005-04-17 02:20:36 +04:00
2005-08-12 16:19:38 +04:00
# if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
2005-08-12 19:51:49 +04:00
if ( r - > idiag_family = = AF_INET6 ) {
2011-04-22 08:53:02 +04:00
const struct ipv6_pinfo * np = inet6_sk ( sk ) ;
2005-04-17 02:20:36 +04:00
2005-08-12 19:51:49 +04:00
ipv6_addr_copy ( ( struct in6_addr * ) r - > id . idiag_src ,
2005-04-17 02:20:36 +04:00
& np - > rcv_saddr ) ;
2005-08-12 19:51:49 +04:00
ipv6_addr_copy ( ( struct in6_addr * ) r - > id . idiag_dst ,
2005-04-17 02:20:36 +04:00
& np - > daddr ) ;
2011-10-12 13:00:35 +04:00
if ( ext & ( 1 < < ( INET_DIAG_TOS - 1 ) ) )
RTA_PUT_U8 ( skb , INET_DIAG_TOS , np - > tclass ) ;
2005-04-17 02:20:36 +04:00
}
# endif
2007-08-29 02:50:33 +04:00
# define EXPIRES_IN_MS(tmo) DIV_ROUND_UP((tmo - jiffies) * 1000, HZ)
2005-04-17 02:20:36 +04:00
2005-08-10 07:10:42 +04:00
if ( icsk - > icsk_pending = = ICSK_TIME_RETRANS ) {
2005-08-12 19:51:49 +04:00
r - > idiag_timer = 1 ;
r - > idiag_retrans = icsk - > icsk_retransmits ;
r - > idiag_expires = EXPIRES_IN_MS ( icsk - > icsk_timeout ) ;
2005-08-10 07:10:42 +04:00
} else if ( icsk - > icsk_pending = = ICSK_TIME_PROBE0 ) {
2005-08-12 19:51:49 +04:00
r - > idiag_timer = 4 ;
r - > idiag_retrans = icsk - > icsk_probes_out ;
r - > idiag_expires = EXPIRES_IN_MS ( icsk - > icsk_timeout ) ;
2005-04-17 02:20:36 +04:00
} else if ( timer_pending ( & sk - > sk_timer ) ) {
2005-08-12 19:51:49 +04:00
r - > idiag_timer = 2 ;
r - > idiag_retrans = icsk - > icsk_probes_out ;
r - > idiag_expires = EXPIRES_IN_MS ( sk - > sk_timer . expires ) ;
2005-04-17 02:20:36 +04:00
} else {
2005-08-12 19:51:49 +04:00
r - > idiag_timer = 0 ;
r - > idiag_expires = 0 ;
2005-04-17 02:20:36 +04:00
}
# undef EXPIRES_IN_MS
2005-08-10 12:54:28 +04:00
2005-08-12 19:51:49 +04:00
r - > idiag_uid = sock_i_uid ( sk ) ;
r - > idiag_inode = sock_i_ino ( sk ) ;
2005-04-17 02:20:36 +04:00
if ( minfo ) {
2009-06-18 06:05:41 +04:00
minfo - > idiag_rmem = sk_rmem_alloc_get ( sk ) ;
2005-08-12 19:51:49 +04:00
minfo - > idiag_wmem = sk - > sk_wmem_queued ;
minfo - > idiag_fmem = sk - > sk_forward_alloc ;
2009-06-18 06:05:41 +04:00
minfo - > idiag_tmem = sk_wmem_alloc_get ( sk ) ;
2005-04-17 02:20:36 +04:00
}
2005-08-12 16:27:49 +04:00
handler - > idiag_get_info ( sk , r , info ) ;
2005-04-17 02:20:36 +04:00
2005-08-10 11:03:31 +04:00
if ( sk - > sk_state < TCP_TIME_WAIT & &
icsk - > icsk_ca_ops & & icsk - > icsk_ca_ops - > get_info )
icsk - > icsk_ca_ops - > get_info ( sk , ext , skb ) ;
2005-04-17 02:20:36 +04:00
2007-04-20 07:29:13 +04:00
nlh - > nlmsg_len = skb_tail_pointer ( skb ) - b ;
2005-04-17 02:20:36 +04:00
return skb - > len ;
2005-06-23 23:20:36 +04:00
rtattr_failure :
2005-04-17 02:20:36 +04:00
nlmsg_failure :
2007-03-26 10:06:12 +04:00
nlmsg_trim ( skb , b ) ;
2007-02-01 10:16:40 +03:00
return - EMSGSIZE ;
2005-04-17 02:20:36 +04:00
}
2006-01-10 01:56:38 +03:00
static int inet_twsk_diag_fill ( struct inet_timewait_sock * tw ,
struct sk_buff * skb , int ext , u32 pid ,
u32 seq , u16 nlmsg_flags ,
const struct nlmsghdr * unlh )
{
long tmo ;
struct inet_diag_msg * r ;
2007-04-20 07:29:13 +04:00
const unsigned char * previous_tail = skb_tail_pointer ( skb ) ;
2006-01-10 01:56:38 +03:00
struct nlmsghdr * nlh = NLMSG_PUT ( skb , pid , seq ,
unlh - > nlmsg_type , sizeof ( * r ) ) ;
r = NLMSG_DATA ( nlh ) ;
BUG_ON ( tw - > tw_state ! = TCP_TIME_WAIT ) ;
nlh - > nlmsg_flags = nlmsg_flags ;
tmo = tw - > tw_ttd - jiffies ;
if ( tmo < 0 )
tmo = 0 ;
r - > idiag_family = tw - > tw_family ;
r - > idiag_retrans = 0 ;
r - > id . idiag_if = tw - > tw_bound_dev_if ;
r - > id . idiag_cookie [ 0 ] = ( u32 ) ( unsigned long ) tw ;
r - > id . idiag_cookie [ 1 ] = ( u32 ) ( ( ( unsigned long ) tw > > 31 ) > > 1 ) ;
r - > id . idiag_sport = tw - > tw_sport ;
r - > id . idiag_dport = tw - > tw_dport ;
r - > id . idiag_src [ 0 ] = tw - > tw_rcv_saddr ;
r - > id . idiag_dst [ 0 ] = tw - > tw_daddr ;
r - > idiag_state = tw - > tw_substate ;
r - > idiag_timer = 3 ;
2007-08-29 02:50:33 +04:00
r - > idiag_expires = DIV_ROUND_UP ( tmo * 1000 , HZ ) ;
2006-01-10 01:56:38 +03:00
r - > idiag_rqueue = 0 ;
r - > idiag_wqueue = 0 ;
r - > idiag_uid = 0 ;
r - > idiag_inode = 0 ;
# if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
if ( tw - > tw_family = = AF_INET6 ) {
const struct inet6_timewait_sock * tw6 =
inet6_twsk ( ( struct sock * ) tw ) ;
ipv6_addr_copy ( ( struct in6_addr * ) r - > id . idiag_src ,
& tw6 - > tw_v6_rcv_saddr ) ;
ipv6_addr_copy ( ( struct in6_addr * ) r - > id . idiag_dst ,
& tw6 - > tw_v6_daddr ) ;
}
# endif
2007-04-20 07:29:13 +04:00
nlh - > nlmsg_len = skb_tail_pointer ( skb ) - previous_tail ;
2006-01-10 01:56:38 +03:00
return skb - > len ;
nlmsg_failure :
2007-03-26 10:06:12 +04:00
nlmsg_trim ( skb , previous_tail ) ;
2007-02-01 10:16:40 +03:00
return - EMSGSIZE ;
2006-01-10 01:56:38 +03:00
}
2006-01-10 01:56:56 +03:00
static int sk_diag_fill ( struct sock * sk , struct sk_buff * skb ,
int ext , u32 pid , u32 seq , u16 nlmsg_flags ,
const struct nlmsghdr * unlh )
{
if ( sk - > sk_state = = TCP_TIME_WAIT )
return inet_twsk_diag_fill ( ( struct inet_timewait_sock * ) sk ,
skb , ext , pid , seq , nlmsg_flags ,
unlh ) ;
return inet_csk_diag_fill ( sk , skb , ext , pid , seq , nlmsg_flags , unlh ) ;
}
2006-01-10 01:56:19 +03:00
static int inet_diag_get_exact ( struct sk_buff * in_skb ,
const struct nlmsghdr * nlh )
2005-04-17 02:20:36 +04:00
{
int err ;
struct sock * sk ;
2005-08-12 19:51:49 +04:00
struct inet_diag_req * req = NLMSG_DATA ( nlh ) ;
2005-04-17 02:20:36 +04:00
struct sk_buff * rep ;
2005-08-12 16:27:49 +04:00
struct inet_hashinfo * hashinfo ;
const struct inet_diag_handler * handler ;
2007-12-03 07:51:25 +03:00
handler = inet_diag_lock_handler ( nlh - > nlmsg_type ) ;
2008-01-29 07:52:12 +03:00
if ( IS_ERR ( handler ) ) {
err = PTR_ERR ( handler ) ;
goto unlock ;
}
2007-12-03 07:51:25 +03:00
2005-08-12 16:27:49 +04:00
hashinfo = handler - > idiag_hashinfo ;
2007-12-03 07:51:25 +03:00
err = - EINVAL ;
2005-08-12 16:27:49 +04:00
2005-08-12 19:51:49 +04:00
if ( req - > idiag_family = = AF_INET ) {
2008-01-31 16:06:40 +03:00
sk = inet_lookup ( & init_net , hashinfo , req - > id . idiag_dst [ 0 ] ,
2005-08-12 19:51:49 +04:00
req - > id . idiag_dport , req - > id . idiag_src [ 0 ] ,
req - > id . idiag_sport , req - > id . idiag_if ) ;
2005-04-17 02:20:36 +04:00
}
2005-08-12 16:19:38 +04:00
# if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
2005-08-12 19:51:49 +04:00
else if ( req - > idiag_family = = AF_INET6 ) {
2008-01-31 16:07:21 +03:00
sk = inet6_lookup ( & init_net , hashinfo ,
2005-08-12 19:51:49 +04:00
( struct in6_addr * ) req - > id . idiag_dst ,
req - > id . idiag_dport ,
( struct in6_addr * ) req - > id . idiag_src ,
req - > id . idiag_sport ,
req - > id . idiag_if ) ;
2005-04-17 02:20:36 +04:00
}
# endif
else {
2007-12-03 07:51:25 +03:00
goto unlock ;
2005-04-17 02:20:36 +04:00
}
2007-12-03 07:51:25 +03:00
err = - ENOENT ;
2005-04-17 02:20:36 +04:00
if ( sk = = NULL )
2007-12-03 07:51:25 +03:00
goto unlock ;
2005-04-17 02:20:36 +04:00
err = - ESTALE ;
2005-08-12 19:51:49 +04:00
if ( ( req - > id . idiag_cookie [ 0 ] ! = INET_DIAG_NOCOOKIE | |
req - > id . idiag_cookie [ 1 ] ! = INET_DIAG_NOCOOKIE ) & &
( ( u32 ) ( unsigned long ) sk ! = req - > id . idiag_cookie [ 0 ] | |
( u32 ) ( ( ( ( unsigned long ) sk ) > > 31 ) > > 1 ) ! = req - > id . idiag_cookie [ 1 ] ) )
2005-04-17 02:20:36 +04:00
goto out ;
err = - ENOMEM ;
2005-08-12 19:51:49 +04:00
rep = alloc_skb ( NLMSG_SPACE ( ( sizeof ( struct inet_diag_msg ) +
sizeof ( struct inet_diag_meminfo ) +
2005-08-12 16:27:49 +04:00
handler - > idiag_info_size + 64 ) ) ,
GFP_KERNEL ) ;
2005-04-17 02:20:36 +04:00
if ( ! rep )
goto out ;
2007-02-01 10:16:40 +03:00
err = sk_diag_fill ( sk , rep , req - > idiag_ext ,
NETLINK_CB ( in_skb ) . pid ,
nlh - > nlmsg_seq , 0 , nlh ) ;
if ( err < 0 ) {
WARN_ON ( err = = - EMSGSIZE ) ;
kfree_skb ( rep ) ;
goto out ;
}
2005-08-12 19:51:49 +04:00
err = netlink_unicast ( idiagnl , rep , NETLINK_CB ( in_skb ) . pid ,
MSG_DONTWAIT ) ;
2005-04-17 02:20:36 +04:00
if ( err > 0 )
err = 0 ;
out :
if ( sk ) {
if ( sk - > sk_state = = TCP_TIME_WAIT )
2005-08-10 07:09:30 +04:00
inet_twsk_put ( ( struct inet_timewait_sock * ) sk ) ;
2005-04-17 02:20:36 +04:00
else
sock_put ( sk ) ;
}
2007-12-03 07:51:25 +03:00
unlock :
inet_diag_unlock_handler ( handler ) ;
2005-04-17 02:20:36 +04:00
return err ;
}
2006-09-28 05:44:30 +04:00
static int bitstring_match ( const __be32 * a1 , const __be32 * a2 , int bits )
2005-04-17 02:20:36 +04:00
{
int words = bits > > 5 ;
bits & = 0x1f ;
if ( words ) {
if ( memcmp ( a1 , a2 , words < < 2 ) )
return 0 ;
}
if ( bits ) {
2006-09-28 05:44:30 +04:00
__be32 w1 , w2 ;
__be32 mask ;
2005-04-17 02:20:36 +04:00
w1 = a1 [ words ] ;
w2 = a2 [ words ] ;
mask = htonl ( ( 0xffffffff ) < < ( 32 - bits ) ) ;
if ( ( w1 ^ w2 ) & mask )
return 0 ;
}
return 1 ;
}
2005-08-12 19:51:49 +04:00
static int inet_diag_bc_run ( const void * bc , int len ,
2006-01-10 01:56:19 +03:00
const struct inet_diag_entry * entry )
2005-04-17 02:20:36 +04:00
{
while ( len > 0 ) {
int yes = 1 ;
2005-08-12 19:51:49 +04:00
const struct inet_diag_bc_op * op = bc ;
2005-04-17 02:20:36 +04:00
switch ( op - > code ) {
2005-08-12 19:51:49 +04:00
case INET_DIAG_BC_NOP :
2005-04-17 02:20:36 +04:00
break ;
2005-08-12 19:51:49 +04:00
case INET_DIAG_BC_JMP :
2005-04-17 02:20:36 +04:00
yes = 0 ;
break ;
2005-08-12 19:51:49 +04:00
case INET_DIAG_BC_S_GE :
2005-04-17 02:20:36 +04:00
yes = entry - > sport > = op [ 1 ] . no ;
break ;
2005-08-12 19:51:49 +04:00
case INET_DIAG_BC_S_LE :
2010-01-20 01:12:20 +03:00
yes = entry - > sport < = op [ 1 ] . no ;
2005-04-17 02:20:36 +04:00
break ;
2005-08-12 19:51:49 +04:00
case INET_DIAG_BC_D_GE :
2005-04-17 02:20:36 +04:00
yes = entry - > dport > = op [ 1 ] . no ;
break ;
2005-08-12 19:51:49 +04:00
case INET_DIAG_BC_D_LE :
2005-04-17 02:20:36 +04:00
yes = entry - > dport < = op [ 1 ] . no ;
break ;
2005-08-12 19:51:49 +04:00
case INET_DIAG_BC_AUTO :
2005-04-17 02:20:36 +04:00
yes = ! ( entry - > userlocks & SOCK_BINDPORT_LOCK ) ;
break ;
2005-08-12 19:51:49 +04:00
case INET_DIAG_BC_S_COND :
2005-08-12 19:56:38 +04:00
case INET_DIAG_BC_D_COND : {
struct inet_diag_hostcond * cond ;
2006-09-28 05:44:30 +04:00
__be32 * addr ;
2005-04-17 02:20:36 +04:00
2005-08-12 19:56:38 +04:00
cond = ( struct inet_diag_hostcond * ) ( op + 1 ) ;
2005-04-17 02:20:36 +04:00
if ( cond - > port ! = - 1 & &
2005-08-12 19:51:49 +04:00
cond - > port ! = ( op - > code = = INET_DIAG_BC_S_COND ?
2005-04-17 02:20:36 +04:00
entry - > sport : entry - > dport ) ) {
yes = 0 ;
break ;
}
2006-01-10 01:56:19 +03:00
2005-04-17 02:20:36 +04:00
if ( cond - > prefix_len = = 0 )
break ;
2005-08-12 19:51:49 +04:00
if ( op - > code = = INET_DIAG_BC_S_COND )
2005-04-17 02:20:36 +04:00
addr = entry - > saddr ;
else
addr = entry - > daddr ;
2006-01-10 01:56:19 +03:00
if ( bitstring_match ( addr , cond - > addr ,
cond - > prefix_len ) )
2005-04-17 02:20:36 +04:00
break ;
if ( entry - > family = = AF_INET6 & &
cond - > family = = AF_INET ) {
if ( addr [ 0 ] = = 0 & & addr [ 1 ] = = 0 & &
addr [ 2 ] = = htonl ( 0xffff ) & &
2005-08-12 19:56:38 +04:00
bitstring_match ( addr + 3 , cond - > addr ,
2007-02-09 17:24:47 +03:00
cond - > prefix_len ) )
2005-04-17 02:20:36 +04:00
break ;
}
yes = 0 ;
break ;
}
}
2006-01-10 01:56:19 +03:00
if ( yes ) {
2005-04-17 02:20:36 +04:00
len - = op - > yes ;
bc + = op - > yes ;
} else {
len - = op - > no ;
bc + = op - > no ;
}
}
2010-09-23 00:43:57 +04:00
return len = = 0 ;
2005-04-17 02:20:36 +04:00
}
static int valid_cc ( const void * bc , int len , int cc )
{
while ( len > = 0 ) {
2005-08-12 19:51:49 +04:00
const struct inet_diag_bc_op * op = bc ;
2005-04-17 02:20:36 +04:00
if ( cc > len )
return 0 ;
if ( cc = = len )
return 1 ;
2011-06-18 00:25:39 +04:00
if ( op - > yes < 4 | | op - > yes & 3 )
2005-04-17 02:20:36 +04:00
return 0 ;
len - = op - > yes ;
bc + = op - > yes ;
}
return 0 ;
}
2005-08-12 19:51:49 +04:00
static int inet_diag_bc_audit ( const void * bytecode , int bytecode_len )
2005-04-17 02:20:36 +04:00
{
2011-06-18 00:25:39 +04:00
const void * bc = bytecode ;
2005-04-17 02:20:36 +04:00
int len = bytecode_len ;
while ( len > 0 ) {
2011-06-18 00:25:39 +04:00
const struct inet_diag_bc_op * op = bc ;
2005-04-17 02:20:36 +04:00
//printk("BC: %d %d %d {%d} / %d\n", op->code, op->yes, op->no, op[1].no, len);
switch ( op - > code ) {
2005-08-12 19:51:49 +04:00
case INET_DIAG_BC_AUTO :
case INET_DIAG_BC_S_COND :
case INET_DIAG_BC_D_COND :
case INET_DIAG_BC_S_GE :
case INET_DIAG_BC_S_LE :
case INET_DIAG_BC_D_GE :
case INET_DIAG_BC_D_LE :
case INET_DIAG_BC_JMP :
2011-06-18 00:25:39 +04:00
if ( op - > no < 4 | | op - > no > len + 4 | | op - > no & 3 )
2005-04-17 02:20:36 +04:00
return - EINVAL ;
if ( op - > no < len & &
2005-08-12 19:56:38 +04:00
! valid_cc ( bytecode , bytecode_len , len - op - > no ) )
2005-04-17 02:20:36 +04:00
return - EINVAL ;
break ;
2005-08-12 19:51:49 +04:00
case INET_DIAG_BC_NOP :
2005-04-17 02:20:36 +04:00
break ;
default :
return - EINVAL ;
}
2011-06-18 00:25:39 +04:00
if ( op - > yes < 4 | | op - > yes > len + 4 | | op - > yes & 3 )
return - EINVAL ;
2006-01-10 01:56:19 +03:00
bc + = op - > yes ;
2005-04-17 02:20:36 +04:00
len - = op - > yes ;
}
return len = = 0 ? 0 : - EINVAL ;
}
2006-01-10 01:56:56 +03:00
static int inet_csk_diag_dump ( struct sock * sk ,
struct sk_buff * skb ,
struct netlink_callback * cb )
2005-04-17 02:20:36 +04:00
{
2005-08-12 19:51:49 +04:00
struct inet_diag_req * r = NLMSG_DATA ( cb - > nlh ) ;
2005-04-17 02:20:36 +04:00
2010-11-03 19:35:41 +03:00
if ( nlmsg_attrlen ( cb - > nlh , sizeof ( * r ) ) ) {
2005-08-12 19:51:49 +04:00
struct inet_diag_entry entry ;
2010-11-03 19:35:41 +03:00
const struct nlattr * bc = nlmsg_find_attr ( cb - > nlh ,
sizeof ( * r ) ,
INET_DIAG_REQ_BYTECODE ) ;
2005-04-17 02:20:36 +04:00
struct inet_sock * inet = inet_sk ( sk ) ;
entry . family = sk - > sk_family ;
2005-08-12 16:19:38 +04:00
# if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
2005-04-17 02:20:36 +04:00
if ( entry . family = = AF_INET6 ) {
struct ipv6_pinfo * np = inet6_sk ( sk ) ;
entry . saddr = np - > rcv_saddr . s6_addr32 ;
entry . daddr = np - > daddr . s6_addr32 ;
} else
# endif
{
2009-10-15 10:30:45 +04:00
entry . saddr = & inet - > inet_rcv_saddr ;
entry . daddr = & inet - > inet_daddr ;
2005-04-17 02:20:36 +04:00
}
2009-10-15 10:30:45 +04:00
entry . sport = inet - > inet_num ;
entry . dport = ntohs ( inet - > inet_dport ) ;
2005-04-17 02:20:36 +04:00
entry . userlocks = sk - > sk_userlocks ;
2010-11-03 19:35:41 +03:00
if ( ! inet_diag_bc_run ( nla_data ( bc ) , nla_len ( bc ) , & entry ) )
2005-04-17 02:20:36 +04:00
return 0 ;
}
2006-01-10 01:56:56 +03:00
return inet_csk_diag_fill ( sk , skb , r - > idiag_ext ,
NETLINK_CB ( cb - > skb ) . pid ,
cb - > nlh - > nlmsg_seq , NLM_F_MULTI , cb - > nlh ) ;
2005-04-17 02:20:36 +04:00
}
2006-01-10 01:56:38 +03:00
static int inet_twsk_diag_dump ( struct inet_timewait_sock * tw ,
struct sk_buff * skb ,
struct netlink_callback * cb )
{
struct inet_diag_req * r = NLMSG_DATA ( cb - > nlh ) ;
2010-11-03 19:35:41 +03:00
if ( nlmsg_attrlen ( cb - > nlh , sizeof ( * r ) ) ) {
2006-01-10 01:56:38 +03:00
struct inet_diag_entry entry ;
2010-11-03 19:35:41 +03:00
const struct nlattr * bc = nlmsg_find_attr ( cb - > nlh ,
sizeof ( * r ) ,
INET_DIAG_REQ_BYTECODE ) ;
2006-01-10 01:56:38 +03:00
entry . family = tw - > tw_family ;
# if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
if ( tw - > tw_family = = AF_INET6 ) {
struct inet6_timewait_sock * tw6 =
inet6_twsk ( ( struct sock * ) tw ) ;
entry . saddr = tw6 - > tw_v6_rcv_saddr . s6_addr32 ;
entry . daddr = tw6 - > tw_v6_daddr . s6_addr32 ;
} else
# endif
{
entry . saddr = & tw - > tw_rcv_saddr ;
entry . daddr = & tw - > tw_daddr ;
}
entry . sport = tw - > tw_num ;
entry . dport = ntohs ( tw - > tw_dport ) ;
2007-02-09 17:24:47 +03:00
entry . userlocks = 0 ;
2006-01-10 01:56:38 +03:00
2010-11-03 19:35:41 +03:00
if ( ! inet_diag_bc_run ( nla_data ( bc ) , nla_len ( bc ) , & entry ) )
2006-01-10 01:56:38 +03:00
return 0 ;
}
return inet_twsk_diag_fill ( tw , skb , r - > idiag_ext ,
NETLINK_CB ( cb - > skb ) . pid ,
cb - > nlh - > nlmsg_seq , NLM_F_MULTI , cb - > nlh ) ;
}
2005-08-12 19:51:49 +04:00
static int inet_diag_fill_req ( struct sk_buff * skb , struct sock * sk ,
2006-01-10 01:56:19 +03:00
struct request_sock * req , u32 pid , u32 seq ,
const struct nlmsghdr * unlh )
2005-04-17 02:20:36 +04:00
{
[NET] Generalise TCP's struct open_request minisock infrastructure
Kept this first changeset minimal, without changing existing names to
ease peer review.
Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
has two new members:
->slab, that replaces tcp_openreq_cachep
->obj_size, to inform the size of the openreq descendant for
a specific protocol
The protocol specific fields in struct open_request were moved to a
class hierarchy, with the things that are common to all connection
oriented PF_INET protocols in struct inet_request_sock, the TCP ones
in tcp_request_sock, that is an inet_request_sock, that is an
open_request.
I.e. this uses the same approach used for the struct sock class
hierarchy, with sk_prot indicating if the protocol wants to use the
open_request infrastructure by filling in sk_prot->rsk_prot with an
or_calltable.
Results? Performance is improved and TCP v4 now uses only 64 bytes per
open request minisock, down from 96 without this patch :-)
Next changeset will rename some of the structs, fields and functions
mentioned above, struct or_calltable is way unclear, better name it
struct request_sock_ops, s/struct open_request/struct request_sock/g,
etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-19 09:46:52 +04:00
const struct inet_request_sock * ireq = inet_rsk ( req ) ;
2005-04-17 02:20:36 +04:00
struct inet_sock * inet = inet_sk ( sk ) ;
2007-04-20 07:29:13 +04:00
unsigned char * b = skb_tail_pointer ( skb ) ;
2005-08-12 19:51:49 +04:00
struct inet_diag_msg * r ;
2005-04-17 02:20:36 +04:00
struct nlmsghdr * nlh ;
long tmo ;
2005-08-10 12:54:28 +04:00
nlh = NLMSG_PUT ( skb , pid , seq , unlh - > nlmsg_type , sizeof ( * r ) ) ;
2005-04-17 02:20:36 +04:00
nlh - > nlmsg_flags = NLM_F_MULTI ;
r = NLMSG_DATA ( nlh ) ;
2005-08-12 19:51:49 +04:00
r - > idiag_family = sk - > sk_family ;
r - > idiag_state = TCP_SYN_RECV ;
r - > idiag_timer = 1 ;
r - > idiag_retrans = req - > retrans ;
2005-04-17 02:20:36 +04:00
2005-08-12 19:51:49 +04:00
r - > id . idiag_if = sk - > sk_bound_dev_if ;
r - > id . idiag_cookie [ 0 ] = ( u32 ) ( unsigned long ) req ;
r - > id . idiag_cookie [ 1 ] = ( u32 ) ( ( ( unsigned long ) req > > 31 ) > > 1 ) ;
2005-04-17 02:20:36 +04:00
tmo = req - > expires - jiffies ;
if ( tmo < 0 )
tmo = 0 ;
2009-10-15 10:30:45 +04:00
r - > id . idiag_sport = inet - > inet_sport ;
2005-08-12 19:51:49 +04:00
r - > id . idiag_dport = ireq - > rmt_port ;
r - > id . idiag_src [ 0 ] = ireq - > loc_addr ;
r - > id . idiag_dst [ 0 ] = ireq - > rmt_addr ;
r - > idiag_expires = jiffies_to_msecs ( tmo ) ;
r - > idiag_rqueue = 0 ;
r - > idiag_wqueue = 0 ;
r - > idiag_uid = sock_i_uid ( sk ) ;
r - > idiag_inode = 0 ;
2005-08-12 16:19:38 +04:00
# if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
2005-08-12 19:51:49 +04:00
if ( r - > idiag_family = = AF_INET6 ) {
ipv6_addr_copy ( ( struct in6_addr * ) r - > id . idiag_src ,
2005-12-14 10:15:40 +03:00
& inet6_rsk ( req ) - > loc_addr ) ;
2005-08-12 19:51:49 +04:00
ipv6_addr_copy ( ( struct in6_addr * ) r - > id . idiag_dst ,
2005-12-14 10:15:40 +03:00
& inet6_rsk ( req ) - > rmt_addr ) ;
2005-04-17 02:20:36 +04:00
}
# endif
2007-04-20 07:29:13 +04:00
nlh - > nlmsg_len = skb_tail_pointer ( skb ) - b ;
2005-04-17 02:20:36 +04:00
return skb - > len ;
nlmsg_failure :
2007-03-26 10:06:12 +04:00
nlmsg_trim ( skb , b ) ;
2005-04-17 02:20:36 +04:00
return - 1 ;
}
2005-08-12 19:51:49 +04:00
static int inet_diag_dump_reqs ( struct sk_buff * skb , struct sock * sk ,
2006-01-10 01:56:19 +03:00
struct netlink_callback * cb )
2005-04-17 02:20:36 +04:00
{
2005-08-12 19:51:49 +04:00
struct inet_diag_entry entry ;
struct inet_diag_req * r = NLMSG_DATA ( cb - > nlh ) ;
2005-08-10 07:10:42 +04:00
struct inet_connection_sock * icsk = inet_csk ( sk ) ;
2005-06-19 09:48:55 +04:00
struct listen_sock * lopt ;
2010-11-03 19:35:41 +03:00
const struct nlattr * bc = NULL ;
2005-04-17 02:20:36 +04:00
struct inet_sock * inet = inet_sk ( sk ) ;
int j , s_j ;
int reqnum , s_reqnum ;
int err = 0 ;
s_j = cb - > args [ 3 ] ;
s_reqnum = cb - > args [ 4 ] ;
if ( s_j > 0 )
s_j - - ;
entry . family = sk - > sk_family ;
2005-08-10 07:10:42 +04:00
read_lock_bh ( & icsk - > icsk_accept_queue . syn_wait_lock ) ;
2005-04-17 02:20:36 +04:00
2005-08-10 07:10:42 +04:00
lopt = icsk - > icsk_accept_queue . listen_opt ;
2005-04-17 02:20:36 +04:00
if ( ! lopt | | ! lopt - > qlen )
goto out ;
2010-11-03 19:35:41 +03:00
if ( nlmsg_attrlen ( cb - > nlh , sizeof ( * r ) ) ) {
bc = nlmsg_find_attr ( cb - > nlh , sizeof ( * r ) ,
INET_DIAG_REQ_BYTECODE ) ;
2009-10-15 10:30:45 +04:00
entry . sport = inet - > inet_num ;
2005-04-17 02:20:36 +04:00
entry . userlocks = sk - > sk_userlocks ;
}
2005-08-10 12:54:28 +04:00
for ( j = s_j ; j < lopt - > nr_table_entries ; j + + ) {
2005-06-19 09:47:21 +04:00
struct request_sock * req , * head = lopt - > syn_table [ j ] ;
2005-04-17 02:20:36 +04:00
reqnum = 0 ;
for ( req = head ; req ; reqnum + + , req = req - > dl_next ) {
[NET] Generalise TCP's struct open_request minisock infrastructure
Kept this first changeset minimal, without changing existing names to
ease peer review.
Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
has two new members:
->slab, that replaces tcp_openreq_cachep
->obj_size, to inform the size of the openreq descendant for
a specific protocol
The protocol specific fields in struct open_request were moved to a
class hierarchy, with the things that are common to all connection
oriented PF_INET protocols in struct inet_request_sock, the TCP ones
in tcp_request_sock, that is an inet_request_sock, that is an
open_request.
I.e. this uses the same approach used for the struct sock class
hierarchy, with sk_prot indicating if the protocol wants to use the
open_request infrastructure by filling in sk_prot->rsk_prot with an
or_calltable.
Results? Performance is improved and TCP v4 now uses only 64 bytes per
open request minisock, down from 96 without this patch :-)
Next changeset will rename some of the structs, fields and functions
mentioned above, struct or_calltable is way unclear, better name it
struct request_sock_ops, s/struct open_request/struct request_sock/g,
etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-19 09:46:52 +04:00
struct inet_request_sock * ireq = inet_rsk ( req ) ;
2005-04-17 02:20:36 +04:00
if ( reqnum < s_reqnum )
continue ;
2005-08-12 19:51:49 +04:00
if ( r - > id . idiag_dport ! = ireq - > rmt_port & &
r - > id . idiag_dport )
2005-04-17 02:20:36 +04:00
continue ;
if ( bc ) {
entry . saddr =
2005-08-12 16:19:38 +04:00
# if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
2005-04-17 02:20:36 +04:00
( entry . family = = AF_INET6 ) ?
2005-12-14 10:15:40 +03:00
inet6_rsk ( req ) - > loc_addr . s6_addr32 :
2005-04-17 02:20:36 +04:00
# endif
[NET] Generalise TCP's struct open_request minisock infrastructure
Kept this first changeset minimal, without changing existing names to
ease peer review.
Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
has two new members:
->slab, that replaces tcp_openreq_cachep
->obj_size, to inform the size of the openreq descendant for
a specific protocol
The protocol specific fields in struct open_request were moved to a
class hierarchy, with the things that are common to all connection
oriented PF_INET protocols in struct inet_request_sock, the TCP ones
in tcp_request_sock, that is an inet_request_sock, that is an
open_request.
I.e. this uses the same approach used for the struct sock class
hierarchy, with sk_prot indicating if the protocol wants to use the
open_request infrastructure by filling in sk_prot->rsk_prot with an
or_calltable.
Results? Performance is improved and TCP v4 now uses only 64 bytes per
open request minisock, down from 96 without this patch :-)
Next changeset will rename some of the structs, fields and functions
mentioned above, struct or_calltable is way unclear, better name it
struct request_sock_ops, s/struct open_request/struct request_sock/g,
etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-19 09:46:52 +04:00
& ireq - > loc_addr ;
2006-01-10 01:56:19 +03:00
entry . daddr =
2005-08-12 16:19:38 +04:00
# if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
2005-04-17 02:20:36 +04:00
( entry . family = = AF_INET6 ) ?
2005-12-14 10:15:40 +03:00
inet6_rsk ( req ) - > rmt_addr . s6_addr32 :
2005-04-17 02:20:36 +04:00
# endif
[NET] Generalise TCP's struct open_request minisock infrastructure
Kept this first changeset minimal, without changing existing names to
ease peer review.
Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
has two new members:
->slab, that replaces tcp_openreq_cachep
->obj_size, to inform the size of the openreq descendant for
a specific protocol
The protocol specific fields in struct open_request were moved to a
class hierarchy, with the things that are common to all connection
oriented PF_INET protocols in struct inet_request_sock, the TCP ones
in tcp_request_sock, that is an inet_request_sock, that is an
open_request.
I.e. this uses the same approach used for the struct sock class
hierarchy, with sk_prot indicating if the protocol wants to use the
open_request infrastructure by filling in sk_prot->rsk_prot with an
or_calltable.
Results? Performance is improved and TCP v4 now uses only 64 bytes per
open request minisock, down from 96 without this patch :-)
Next changeset will rename some of the structs, fields and functions
mentioned above, struct or_calltable is way unclear, better name it
struct request_sock_ops, s/struct open_request/struct request_sock/g,
etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-19 09:46:52 +04:00
& ireq - > rmt_addr ;
entry . dport = ntohs ( ireq - > rmt_port ) ;
2005-04-17 02:20:36 +04:00
2010-11-03 19:35:41 +03:00
if ( ! inet_diag_bc_run ( nla_data ( bc ) ,
nla_len ( bc ) , & entry ) )
2005-04-17 02:20:36 +04:00
continue ;
}
2005-08-12 19:51:49 +04:00
err = inet_diag_fill_req ( skb , sk , req ,
2005-04-17 02:20:36 +04:00
NETLINK_CB ( cb - > skb ) . pid ,
2005-08-10 12:54:28 +04:00
cb - > nlh - > nlmsg_seq , cb - > nlh ) ;
2005-04-17 02:20:36 +04:00
if ( err < 0 ) {
cb - > args [ 3 ] = j + 1 ;
cb - > args [ 4 ] = reqnum ;
goto out ;
}
}
s_reqnum = 0 ;
}
out :
2005-08-10 07:10:42 +04:00
read_unlock_bh ( & icsk - > icsk_accept_queue . syn_wait_lock ) ;
2005-04-17 02:20:36 +04:00
return err ;
}
2005-08-12 19:51:49 +04:00
static int inet_diag_dump ( struct sk_buff * skb , struct netlink_callback * cb )
2005-04-17 02:20:36 +04:00
{
int i , num ;
int s_i , s_num ;
2005-08-12 19:51:49 +04:00
struct inet_diag_req * r = NLMSG_DATA ( cb - > nlh ) ;
2005-08-12 16:27:49 +04:00
const struct inet_diag_handler * handler ;
2005-08-10 12:54:28 +04:00
struct inet_hashinfo * hashinfo ;
2005-04-17 02:20:36 +04:00
2007-12-03 07:51:25 +03:00
handler = inet_diag_lock_handler ( cb - > nlh - > nlmsg_type ) ;
2008-01-29 07:52:12 +03:00
if ( IS_ERR ( handler ) )
goto unlock ;
2007-12-03 07:51:25 +03:00
2005-08-12 16:27:49 +04:00
hashinfo = handler - > idiag_hashinfo ;
2006-01-10 01:56:19 +03:00
2005-04-17 02:20:36 +04:00
s_i = cb - > args [ 1 ] ;
s_num = num = cb - > args [ 2 ] ;
2005-08-12 16:27:49 +04:00
2005-04-17 02:20:36 +04:00
if ( cb - > args [ 0 ] = = 0 ) {
2005-08-12 19:51:49 +04:00
if ( ! ( r - > idiag_states & ( TCPF_LISTEN | TCPF_SYN_RECV ) ) )
2005-04-17 02:20:36 +04:00
goto skip_listen_ht ;
2005-08-10 12:54:28 +04:00
2005-08-10 06:59:44 +04:00
for ( i = s_i ; i < INET_LHTABLE_SIZE ; i + + ) {
2005-04-17 02:20:36 +04:00
struct sock * sk ;
2008-11-24 04:22:55 +03:00
struct hlist_nulls_node * node ;
2008-11-20 11:40:07 +03:00
struct inet_listen_hashbucket * ilb ;
2005-04-17 02:20:36 +04:00
num = 0 ;
2008-11-20 11:40:07 +03:00
ilb = & hashinfo - > listening_hash [ i ] ;
spin_lock_bh ( & ilb - > lock ) ;
2008-11-24 04:22:55 +03:00
sk_nulls_for_each ( sk , node , & ilb - > head ) {
2005-04-17 02:20:36 +04:00
struct inet_sock * inet = inet_sk ( sk ) ;
if ( num < s_num ) {
num + + ;
continue ;
}
2009-10-15 10:30:45 +04:00
if ( r - > id . idiag_sport ! = inet - > inet_sport & &
2005-08-12 19:51:49 +04:00
r - > id . idiag_sport )
2005-04-17 02:20:36 +04:00
goto next_listen ;
2005-08-12 19:51:49 +04:00
if ( ! ( r - > idiag_states & TCPF_LISTEN ) | |
r - > id . idiag_dport | |
2005-04-17 02:20:36 +04:00
cb - > args [ 3 ] > 0 )
goto syn_recv ;
2006-01-10 01:56:56 +03:00
if ( inet_csk_diag_dump ( sk , skb , cb ) < 0 ) {
2008-11-20 11:40:07 +03:00
spin_unlock_bh ( & ilb - > lock ) ;
2005-04-17 02:20:36 +04:00
goto done ;
}
syn_recv :
2005-08-12 19:51:49 +04:00
if ( ! ( r - > idiag_states & TCPF_SYN_RECV ) )
2005-04-17 02:20:36 +04:00
goto next_listen ;
2005-08-12 19:51:49 +04:00
if ( inet_diag_dump_reqs ( skb , sk , cb ) < 0 ) {
2008-11-20 11:40:07 +03:00
spin_unlock_bh ( & ilb - > lock ) ;
2005-04-17 02:20:36 +04:00
goto done ;
}
next_listen :
cb - > args [ 3 ] = 0 ;
cb - > args [ 4 ] = 0 ;
+ + num ;
}
2008-11-20 11:40:07 +03:00
spin_unlock_bh ( & ilb - > lock ) ;
2005-04-17 02:20:36 +04:00
s_num = 0 ;
cb - > args [ 3 ] = 0 ;
cb - > args [ 4 ] = 0 ;
}
skip_listen_ht :
cb - > args [ 0 ] = 1 ;
s_i = num = s_num = 0 ;
}
2005-08-12 19:51:49 +04:00
if ( ! ( r - > idiag_states & ~ ( TCPF_LISTEN | TCPF_SYN_RECV ) ) )
2007-12-03 07:51:25 +03:00
goto unlock ;
2005-04-17 02:20:36 +04:00
2009-10-09 04:16:19 +04:00
for ( i = s_i ; i < = hashinfo - > ehash_mask ; i + + ) {
2005-08-10 12:54:28 +04:00
struct inet_ehash_bucket * head = & hashinfo - > ehash [ i ] ;
2008-11-22 03:39:19 +03:00
spinlock_t * lock = inet_ehash_lockp ( hashinfo , i ) ;
2005-04-17 02:20:36 +04:00
struct sock * sk ;
2008-11-17 06:40:17 +03:00
struct hlist_nulls_node * node ;
2005-04-17 02:20:36 +04:00
2008-08-28 12:09:54 +04:00
num = 0 ;
2008-11-17 06:40:17 +03:00
if ( hlist_nulls_empty ( & head - > chain ) & &
hlist_nulls_empty ( & head - > twchain ) )
2008-08-28 12:09:54 +04:00
continue ;
2005-04-17 02:20:36 +04:00
if ( i > s_i )
s_num = 0 ;
2008-11-22 03:39:19 +03:00
spin_lock_bh ( lock ) ;
2008-11-17 06:40:17 +03:00
sk_nulls_for_each ( sk , node , & head - > chain ) {
2005-04-17 02:20:36 +04:00
struct inet_sock * inet = inet_sk ( sk ) ;
if ( num < s_num )
goto next_normal ;
2005-08-12 19:51:49 +04:00
if ( ! ( r - > idiag_states & ( 1 < < sk - > sk_state ) ) )
2005-04-17 02:20:36 +04:00
goto next_normal ;
2009-10-15 10:30:45 +04:00
if ( r - > id . idiag_sport ! = inet - > inet_sport & &
2005-08-12 19:51:49 +04:00
r - > id . idiag_sport )
2005-04-17 02:20:36 +04:00
goto next_normal ;
2009-10-15 10:30:45 +04:00
if ( r - > id . idiag_dport ! = inet - > inet_dport & &
2006-01-10 01:56:19 +03:00
r - > id . idiag_dport )
2005-04-17 02:20:36 +04:00
goto next_normal ;
2006-01-10 01:56:56 +03:00
if ( inet_csk_diag_dump ( sk , skb , cb ) < 0 ) {
2008-11-22 03:39:19 +03:00
spin_unlock_bh ( lock ) ;
2005-04-17 02:20:36 +04:00
goto done ;
}
next_normal :
+ + num ;
}
2005-08-12 19:51:49 +04:00
if ( r - > idiag_states & TCPF_TIME_WAIT ) {
2006-01-10 01:56:38 +03:00
struct inet_timewait_sock * tw ;
inet_twsk_for_each ( tw , node ,
[NET]: change layout of ehash table
ehash table layout is currently this one :
First half of this table is used by sockets not in TIME_WAIT state
Second half of it is used by sockets in TIME_WAIT state.
This is non optimal because of for a given hash or socket, the two chain heads
are located in separate cache lines.
Moreover the locks of the second half are never used.
If instead of this halving, we use two list heads in inet_ehash_bucket instead
of only one, we probably can avoid one cache miss, and reduce ram usage,
particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK,
CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep
together related chains to speedup lookups and socket state change.
In this patch I did not try to align struct inet_ehash_bucket, but a future
patch could try to make this structure have a convenient size (a power of two
or a multiple of L1_CACHE_SIZE).
I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so
maybe we dont need to scratch our heads to align the bucket...
Note : In case struct inet_ehash_bucket is not a power of two, we could
probably change alloc_large_system_hash() (in case it use __get_free_pages())
to free the unused space. It currently allocates a big zone, but the last
quarter of it could be freed. Again, this should be a temporary 'problem'.
Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-09 01:16:46 +03:00
& head - > twchain ) {
2005-04-17 02:20:36 +04:00
if ( num < s_num )
goto next_dying ;
2006-01-10 01:56:03 +03:00
if ( r - > id . idiag_sport ! = tw - > tw_sport & &
2005-08-12 19:51:49 +04:00
r - > id . idiag_sport )
2005-04-17 02:20:36 +04:00
goto next_dying ;
2006-01-10 01:56:03 +03:00
if ( r - > id . idiag_dport ! = tw - > tw_dport & &
2005-08-12 19:51:49 +04:00
r - > id . idiag_dport )
2005-04-17 02:20:36 +04:00
goto next_dying ;
2006-01-10 01:56:38 +03:00
if ( inet_twsk_diag_dump ( tw , skb , cb ) < 0 ) {
2008-11-22 03:39:19 +03:00
spin_unlock_bh ( lock ) ;
2005-04-17 02:20:36 +04:00
goto done ;
}
next_dying :
+ + num ;
}
}
2008-11-22 03:39:19 +03:00
spin_unlock_bh ( lock ) ;
2005-04-17 02:20:36 +04:00
}
done :
cb - > args [ 1 ] = i ;
cb - > args [ 2 ] = num ;
2007-12-03 07:51:25 +03:00
unlock :
inet_diag_unlock_handler ( handler ) ;
2005-04-17 02:20:36 +04:00
return skb - > len ;
}
2007-03-23 09:30:35 +03:00
static int inet_diag_rcv_msg ( struct sk_buff * skb , struct nlmsghdr * nlh )
2005-04-17 02:20:36 +04:00
{
2007-03-23 09:30:35 +03:00
int hdrlen = sizeof ( struct inet_diag_req ) ;
2005-04-17 02:20:36 +04:00
2007-03-23 09:30:35 +03:00
if ( nlh - > nlmsg_type > = INET_DIAG_GETSOCK_MAX | |
nlmsg_len ( nlh ) < hdrlen )
return - EINVAL ;
2005-04-17 02:20:36 +04:00
2011-01-18 23:40:38 +03:00
if ( nlh - > nlmsg_flags & NLM_F_DUMP ) {
2007-03-23 09:30:35 +03:00
if ( nlmsg_attrlen ( nlh , hdrlen ) ) {
struct nlattr * attr ;
2005-04-17 02:20:36 +04:00
2007-03-23 09:30:35 +03:00
attr = nlmsg_find_attr ( nlh , hdrlen ,
INET_DIAG_REQ_BYTECODE ) ;
if ( attr = = NULL | |
nla_len ( attr ) < sizeof ( struct inet_diag_bc_op ) | |
inet_diag_bc_audit ( nla_data ( attr ) , nla_len ( attr ) ) )
return - EINVAL ;
}
2005-04-17 02:20:36 +04:00
2007-03-23 09:30:55 +03:00
return netlink_dump_start ( idiagnl , skb , nlh ,
2011-06-10 05:27:09 +04:00
inet_diag_dump , NULL , 0 ) ;
2005-04-17 02:20:36 +04:00
}
2007-03-23 09:30:35 +03:00
return inet_diag_get_exact ( skb , nlh ) ;
2005-04-17 02:20:36 +04:00
}
2007-09-11 13:33:28 +04:00
static DEFINE_MUTEX ( inet_diag_mutex ) ;
2007-10-11 08:15:29 +04:00
static void inet_diag_rcv ( struct sk_buff * skb )
2005-04-17 02:20:36 +04:00
{
2007-10-11 08:15:29 +04:00
mutex_lock ( & inet_diag_mutex ) ;
netlink_rcv_skb ( skb , & inet_diag_rcv_msg ) ;
mutex_unlock ( & inet_diag_mutex ) ;
2005-04-17 02:20:36 +04:00
}
2005-08-12 16:27:49 +04:00
int inet_diag_register ( const struct inet_diag_handler * h )
{
const __u16 type = h - > idiag_type ;
int err = - EINVAL ;
if ( type > = INET_DIAG_GETSOCK_MAX )
goto out ;
2007-12-03 07:51:25 +03:00
mutex_lock ( & inet_diag_table_mutex ) ;
2005-08-12 16:27:49 +04:00
err = - EEXIST ;
if ( inet_diag_table [ type ] = = NULL ) {
inet_diag_table [ type ] = h ;
err = 0 ;
}
2007-12-03 07:51:25 +03:00
mutex_unlock ( & inet_diag_table_mutex ) ;
2005-08-12 16:27:49 +04:00
out :
return err ;
}
EXPORT_SYMBOL_GPL ( inet_diag_register ) ;
void inet_diag_unregister ( const struct inet_diag_handler * h )
{
const __u16 type = h - > idiag_type ;
if ( type > = INET_DIAG_GETSOCK_MAX )
return ;
2007-12-03 07:51:25 +03:00
mutex_lock ( & inet_diag_table_mutex ) ;
2005-08-12 16:27:49 +04:00
inet_diag_table [ type ] = NULL ;
2007-12-03 07:51:25 +03:00
mutex_unlock ( & inet_diag_table_mutex ) ;
2005-08-12 16:27:49 +04:00
}
EXPORT_SYMBOL_GPL ( inet_diag_unregister ) ;
2005-08-12 19:51:49 +04:00
static int __init inet_diag_init ( void )
2005-04-17 02:20:36 +04:00
{
2005-08-12 16:27:49 +04:00
const int inet_diag_table_size = ( INET_DIAG_GETSOCK_MAX *
sizeof ( struct inet_diag_handler * ) ) ;
int err = - ENOMEM ;
2006-07-22 01:51:30 +04:00
inet_diag_table = kzalloc ( inet_diag_table_size , GFP_KERNEL ) ;
2005-08-12 16:27:49 +04:00
if ( ! inet_diag_table )
goto out ;
2007-09-12 15:05:38 +04:00
idiagnl = netlink_kernel_create ( & init_net , NETLINK_INET_DIAG , 0 ,
2007-12-03 07:51:25 +03:00
inet_diag_rcv , NULL , THIS_MODULE ) ;
2005-08-12 19:51:49 +04:00
if ( idiagnl = = NULL )
2005-08-12 16:27:49 +04:00
goto out_free_table ;
[INET_DIAG]: Move the tcp_diag interface to the proper place
With this the previous setup is back, i.e. tcp_diag can be built as a module,
as dccp_diag and both share the infrastructure available in inet_diag.
If one selects CONFIG_INET_DIAG as module CONFIG_INET_TCP_DIAG will also be
built as a module, as will CONFIG_INET_DCCP_DIAG, if CONFIG_IP_DCCP was
selected static or as a module, if CONFIG_INET_DIAG is y, being statically
linked CONFIG_INET_TCP_DIAG will follow suit and CONFIG_INET_DCCP_DIAG will be
built in the same manner as CONFIG_IP_DCCP.
Now to aim at UDP, converting it to use inet_hashinfo, so that we can use
iproute2 for UDP sockets as well.
Ah, just to show an example of this new infrastructure working for DCCP :-)
[root@qemu ~]# ./ss -dane
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 0 *:5001 *:* ino:942 sk:cfd503a0
ESTAB 0 0 127.0.0.1:5001 127.0.0.1:32770 ino:943 sk:cfd50a60
ESTAB 0 0 127.0.0.1:32770 127.0.0.1:5001 ino:947 sk:cfd50700
TIME-WAIT 0 0 127.0.0.1:32769 127.0.0.1:5001 timer:(timewait,3.430ms,0) ino:0 sk:cf209620
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-12 19:59:17 +04:00
err = 0 ;
2005-08-12 16:27:49 +04:00
out :
return err ;
out_free_table :
kfree ( inet_diag_table ) ;
goto out ;
2005-04-17 02:20:36 +04:00
}
2005-08-12 19:51:49 +04:00
static void __exit inet_diag_exit ( void )
2005-04-17 02:20:36 +04:00
{
2008-01-29 01:41:19 +03:00
netlink_kernel_release ( idiagnl ) ;
2005-08-12 16:27:49 +04:00
kfree ( inet_diag_table ) ;
2005-04-17 02:20:36 +04:00
}
2005-08-12 19:51:49 +04:00
module_init ( inet_diag_init ) ;
module_exit ( inet_diag_exit ) ;
2005-04-17 02:20:36 +04:00
MODULE_LICENSE ( " GPL " ) ;
2007-10-22 03:44:04 +04:00
MODULE_ALIAS_NET_PF_PROTO ( PF_NETLINK , NETLINK_INET_DIAG ) ;