2005-04-16 15:20:36 -07:00
/*
* PF_INET6 socket protocol family
2007-02-09 23:24:49 +09:00
* Linux INET6 implementation
2005-04-16 15:20:36 -07:00
*
* Authors :
2007-02-09 23:24:49 +09:00
* Pedro Roque < roque @ di . fc . ul . pt >
2005-04-16 15:20:36 -07:00
*
* Adapted from linux / net / ipv4 / af_inet . c
*
2014-08-24 21:53:10 +01:00
* Fixes :
2005-04-16 15:20:36 -07:00
* piggy , Karl Knutson : Socket protocol table
2014-08-24 21:53:10 +01:00
* Hideaki YOSHIFUJI : sin6_scope_id support
* Arnaldo Melo : check proc_net_create return , cleanups
2005-04-16 15:20:36 -07:00
*
* This program is free software ; you can redistribute it and / or
2014-08-24 21:53:10 +01:00
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation ; either version
* 2 of the License , or ( at your option ) any later version .
2005-04-16 15:20:36 -07:00
*/
2012-05-15 14:11:53 +00:00
# define pr_fmt(fmt) "IPv6: " fmt
2005-04-16 15:20:36 -07:00
# include <linux/module.h>
2006-01-11 12:17:47 -08:00
# include <linux/capability.h>
2005-04-16 15:20:36 -07:00
# include <linux/errno.h>
# include <linux/types.h>
# include <linux/socket.h>
# include <linux/in.h>
# include <linux/kernel.h>
# include <linux/timer.h>
# include <linux/string.h>
# include <linux/sockios.h>
# include <linux/net.h>
# include <linux/fcntl.h>
# include <linux/mm.h>
# include <linux/interrupt.h>
# include <linux/proc_fs.h>
# include <linux/stat.h>
# include <linux/init.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
# include <linux/slab.h>
2005-04-16 15:20:36 -07:00
# include <linux/inet.h>
# include <linux/netdevice.h>
# include <linux/icmpv6.h>
2005-08-09 19:42:34 -07:00
# include <linux/netfilter_ipv6.h>
2005-04-16 15:20:36 -07:00
# include <net/ip.h>
# include <net/ipv6.h>
# include <net/udp.h>
2006-11-27 11:10:57 -08:00
# include <net/udplite.h>
2005-04-16 15:20:36 -07:00
# include <net/tcp.h>
2013-05-22 20:17:31 +00:00
# include <net/ping.h>
2005-04-16 15:20:36 -07:00
# include <net/protocol.h>
# include <net/inet_common.h>
2008-10-01 07:33:10 -07:00
# include <net/route.h>
2005-04-16 15:20:36 -07:00
# include <net/transp_v6.h>
# include <net/ip6_route.h>
# include <net/addrconf.h>
2013-08-31 13:44:36 +08:00
# include <net/ndisc.h>
2005-04-16 15:20:36 -07:00
# ifdef CONFIG_IPV6_TUNNEL
# include <net/ip6_tunnel.h>
# endif
# include <asm/uaccess.h>
2008-04-03 09:22:53 +09:00
# include <linux/mroute6.h>
2005-04-16 15:20:36 -07:00
MODULE_AUTHOR ( " Cast of dozens " ) ;
MODULE_DESCRIPTION ( " IPv6 protocol stack for Linux " ) ;
MODULE_LICENSE ( " GPL " ) ;
2007-11-23 21:28:44 +08:00
/* The inetsw6 table contains everything that inet6_create needs to
2005-04-16 15:20:36 -07:00
* build a new socket .
*/
static struct list_head inetsw6 [ SOCK_MAX ] ;
static DEFINE_SPINLOCK ( inetsw6_lock ) ;
2009-06-01 03:07:33 -07:00
struct ipv6_params ipv6_defaults = {
. disable_ipv6 = 0 ,
. autoconf = 1 ,
} ;
2012-05-05 10:13:53 +00:00
static int disable_ipv6_mod ;
2009-06-01 03:07:33 -07:00
module_param_named ( disable , disable_ipv6_mod , int , 0444 ) ;
MODULE_PARM_DESC ( disable , " Disable IPv6 module such that it is non-functional " ) ;
module_param_named ( disable_ipv6 , ipv6_defaults . disable_ipv6 , int , 0444 ) ;
MODULE_PARM_DESC ( disable_ipv6 , " Disable IPv6 on all interfaces " ) ;
module_param_named ( autoconf , ipv6_defaults . autoconf , int , 0444 ) ;
MODULE_PARM_DESC ( autoconf , " Enable IPv6 address autoconfiguration on all interfaces " ) ;
2009-03-04 03:18:11 -08:00
2005-04-16 15:20:36 -07:00
static __inline__ struct ipv6_pinfo * inet6_sk_generic ( struct sock * sk )
{
const int offset = sk - > sk_prot - > obj_size - sizeof ( struct ipv6_pinfo ) ;
return ( struct ipv6_pinfo * ) ( ( ( u8 * ) sk ) + offset ) ;
}
2009-11-05 22:18:14 -08:00
static int inet6_create ( struct net * net , struct socket * sock , int protocol ,
int kern )
2005-04-16 15:20:36 -07:00
{
struct inet_sock * inet ;
struct ipv6_pinfo * np ;
struct sock * sk ;
struct inet_protosw * answer ;
struct proto * answer_prot ;
unsigned char answer_flags ;
2005-12-02 20:56:57 -08:00
int try_loading_module = 0 ;
int err ;
2005-04-16 15:20:36 -07:00
/* Look for the requested type/protocol pair. */
2005-12-02 20:56:57 -08:00
lookup_protocol :
err = - ESOCKTNOSUPPORT ;
2005-04-16 15:20:36 -07:00
rcu_read_lock ( ) ;
2008-07-25 01:45:34 -07:00
list_for_each_entry_rcu ( answer , & inetsw6 [ sock - > type ] , list ) {
2005-04-16 15:20:36 -07:00
2008-07-25 01:45:34 -07:00
err = 0 ;
2005-04-16 15:20:36 -07:00
/* Check the non-wild match. */
if ( protocol = = answer - > protocol ) {
if ( protocol ! = IPPROTO_IP )
break ;
} else {
/* Check for the two wild cases. */
if ( IPPROTO_IP = = protocol ) {
protocol = answer - > protocol ;
break ;
}
if ( IPPROTO_IP = = answer - > protocol )
break ;
}
2005-12-02 20:56:57 -08:00
err = - EPROTONOSUPPORT ;
2005-04-16 15:20:36 -07:00
}
2008-07-25 01:45:34 -07:00
if ( err ) {
2005-12-02 20:56:57 -08:00
if ( try_loading_module < 2 ) {
rcu_read_unlock ( ) ;
/*
* Be more specific , e . g . net - pf - 10 - proto - 132 - type - 1
* ( net - pf - PF_INET6 - proto - IPPROTO_SCTP - type - SOCK_STREAM )
*/
if ( + + try_loading_module = = 1 )
request_module ( " net-pf-%d-proto-%d-type-%d " ,
PF_INET6 , protocol , sock - > type ) ;
/*
* Fall back to generic , e . g . net - pf - 10 - proto - 132
* ( net - pf - PF_INET6 - proto - IPPROTO_SCTP )
*/
else
request_module ( " net-pf-%d-proto-%d " ,
PF_INET6 , protocol ) ;
goto lookup_protocol ;
} else
goto out_rcu_unlock ;
}
err = - EPERM ;
net: Allow userns root to control ipv6
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.
In general policy and network stack state changes are allowed while
resource control is left unchanged.
Allow the SIOCSIFADDR ioctl to add ipv6 addresses.
Allow the SIOCDIFADDR ioctl to delete ipv6 addresses.
Allow the SIOCADDRT ioctl to add ipv6 routes.
Allow the SIOCDELRT ioctl to delete ipv6 routes.
Allow creation of ipv6 raw sockets.
Allow setting the IPV6_JOIN_ANYCAST socket option.
Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR
socket option.
Allow setting the IPV6_TRANSPARENT socket option.
Allow setting the IPV6_HOPOPTS socket option.
Allow setting the IPV6_RTHDRDSTOPTS socket option.
Allow setting the IPV6_DSTOPTS socket option.
Allow setting the IPV6_IPSEC_POLICY socket option.
Allow setting the IPV6_XFRM_POLICY socket option.
Allow sending packets with the IPV6_2292HOPOPTS control message.
Allow sending packets with the IPV6_2292DSTOPTS control message.
Allow sending packets with the IPV6_RTHDRDSTOPTS control message.
Allow setting the multicast routing socket options on non multicast
routing sockets.
Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for
setting up, changing and deleting tunnels over ipv6.
Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for
setting up, changing and deleting ipv6 over ipv4 tunnels.
Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding,
deleting, and changing the potential router list for ISATAP tunnels.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-16 03:03:06 +00:00
if ( sock - > type = = SOCK_RAW & & ! kern & &
! ns_capable ( net - > user_ns , CAP_NET_RAW ) )
2005-04-16 15:20:36 -07:00
goto out_rcu_unlock ;
sock - > ops = answer - > ops ;
answer_prot = answer - > prot ;
answer_flags = answer - > flags ;
rcu_read_unlock ( ) ;
2015-03-29 14:00:04 +01:00
WARN_ON ( ! answer_prot - > slab ) ;
2005-04-16 15:20:36 -07:00
2005-12-02 20:56:57 -08:00
err = - ENOBUFS ;
2007-11-01 00:39:31 -07:00
sk = sk_alloc ( net , PF_INET6 , GFP_KERNEL , answer_prot ) ;
2015-03-29 14:00:04 +01:00
if ( ! sk )
2005-04-16 15:20:36 -07:00
goto out ;
sock_init_data ( sock , sk ) ;
2005-12-02 20:56:57 -08:00
err = 0 ;
2005-04-16 15:20:36 -07:00
if ( INET_PROTOSW_REUSE & answer_flags )
2012-04-19 03:39:36 +00:00
sk - > sk_reuse = SK_CAN_REUSE ;
2005-04-16 15:20:36 -07:00
inet = inet_sk ( sk ) ;
2007-01-09 14:37:06 -08:00
inet - > is_icsk = ( INET_PROTOSW_ICSK & answer_flags ) ! = 0 ;
2005-04-16 15:20:36 -07:00
if ( SOCK_RAW = = sock - > type ) {
2009-10-15 06:30:45 +00:00
inet - > inet_num = protocol ;
2005-04-16 15:20:36 -07:00
if ( IPPROTO_RAW = = protocol )
inet - > hdrincl = 1 ;
}
2005-08-09 19:45:38 -07:00
sk - > sk_destruct = inet_sock_destruct ;
2005-04-16 15:20:36 -07:00
sk - > sk_family = PF_INET6 ;
sk - > sk_protocol = protocol ;
sk - > sk_backlog_rcv = answer - > prot - > backlog_rcv ;
inet_sk ( sk ) - > pinet6 = np = inet6_sk_generic ( sk ) ;
np - > hop_limit = - 1 ;
2010-05-03 23:42:27 -07:00
np - > mcast_hops = IPV6_DEFAULT_MCASTHOPS ;
2005-04-16 15:20:36 -07:00
np - > mc_loop = 1 ;
np - > pmtudisc = IPV6_PMTUDISC_WANT ;
2014-06-27 08:36:16 -07:00
sk - > sk_ipv6only = net - > ipv6 . sysctl . bindv6only ;
2007-02-09 23:24:49 +09:00
2005-04-16 15:20:36 -07:00
/* Init the ipv4 part of the socket since we can have sockets
* using v6 API for ipv4 .
*/
inet - > uc_ttl = - 1 ;
inet - > mc_loop = 1 ;
inet - > mc_ttl = 1 ;
inet - > mc_index = 0 ;
inet - > mc_list = NULL ;
2012-02-09 09:35:49 +00:00
inet - > rcv_tos = 0 ;
2005-04-16 15:20:36 -07:00
2013-12-14 05:13:38 +01:00
if ( net - > ipv4 . sysctl_ip_no_pmtu_disc )
2005-04-16 15:20:36 -07:00
inet - > pmtudisc = IP_PMTUDISC_DONT ;
else
inet - > pmtudisc = IP_PMTUDISC_WANT ;
2007-02-09 23:24:49 +09:00
/*
2005-08-09 19:45:38 -07:00
* Increment only the relevant sk_prot - > socks debug field , this changes
* the previous behaviour of incrementing both the equivalent to
* answer - > prot - > socks ( inet6_sock_nr ) and inet_sock_nr .
*
* This allows better debug granularity as we ' ll know exactly how many
* UDPv6 , TCPv6 , etc socks were allocated , not the sum of all IPv6
* transport protocol socks . - acme
*/
sk_refcnt_debug_inc ( sk ) ;
2005-04-16 15:20:36 -07:00
2009-10-15 06:30:45 +00:00
if ( inet - > inet_num ) {
2005-04-16 15:20:36 -07:00
/* It assumes that any protocol which allows
* the user to assign a number at socket
* creation time automatically shares .
*/
2009-10-15 06:30:45 +00:00
inet - > inet_sport = htons ( inet - > inet_num ) ;
2005-04-16 15:20:36 -07:00
sk - > sk_prot - > hash ( sk ) ;
}
if ( sk - > sk_prot - > init ) {
2005-12-02 20:56:57 -08:00
err = sk - > sk_prot - > init ( sk ) ;
if ( err ) {
2005-04-16 15:20:36 -07:00
sk_common_release ( sk ) ;
goto out ;
}
}
out :
2005-12-02 20:56:57 -08:00
return err ;
2005-04-16 15:20:36 -07:00
out_rcu_unlock :
rcu_read_unlock ( ) ;
goto out ;
}
/* bind for INET6 API */
int inet6_bind ( struct socket * sock , struct sockaddr * uaddr , int addr_len )
{
2012-05-05 10:13:53 +00:00
struct sockaddr_in6 * addr = ( struct sockaddr_in6 * ) uaddr ;
2005-04-16 15:20:36 -07:00
struct sock * sk = sock - > sk ;
struct inet_sock * inet = inet_sk ( sk ) ;
struct ipv6_pinfo * np = inet6_sk ( sk ) ;
2008-03-26 02:26:21 +09:00
struct net * net = sock_net ( sk ) ;
2006-09-26 22:17:51 -07:00
__be32 v4addr = 0 ;
2005-04-16 15:20:36 -07:00
unsigned short snum ;
int addr_type = 0 ;
int err = 0 ;
/* If the socket has its own bind function then use it. */
if ( sk - > sk_prot - > bind )
return sk - > sk_prot - > bind ( sk , uaddr , addr_len ) ;
if ( addr_len < SIN6_LEN_RFC2133 )
return - EINVAL ;
2011-06-06 06:00:07 +00:00
if ( addr - > sin6_family ! = AF_INET6 )
net: bind() fix error return on wrong address family
Hi,
Reinhard Max also pointed out that the error should EAFNOSUPPORT according
to POSIX.
The Linux manpages have it as EINVAL, some other OSes (Minix, HPUX, perhaps BSD) use
EAFNOSUPPORT. Windows uses WSAEFAULT according to MSDN.
Other protocols error values in their af bind() methods in current mainline git as far
as a brief look shows:
EAFNOSUPPORT: atm, appletalk, l2tp, llc, phonet, rxrpc
EINVAL: ax25, bluetooth, decnet, econet, ieee802154, iucv, netlink, netrom, packet, rds, rose, unix, x25,
No check?: can/raw, ipv6/raw, irda, l2tp/l2tp_ip
Ciao, Marcus
Signed-off-by: Marcus Meissner <meissner@suse.de>
Cc: Reinhard Max <max@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-04 01:30:29 +00:00
return - EAFNOSUPPORT ;
2011-06-06 06:00:07 +00:00
2005-04-16 15:20:36 -07:00
addr_type = ipv6_addr_type ( & addr - > sin6_addr ) ;
if ( ( addr_type & IPV6_ADDR_MULTICAST ) & & sock - > type = = SOCK_STREAM )
return - EINVAL ;
snum = ntohs ( addr - > sin6_port ) ;
2012-11-16 03:03:12 +00:00
if ( snum & & snum < PROT_SOCK & & ! ns_capable ( net - > user_ns , CAP_NET_BIND_SERVICE ) )
2005-04-16 15:20:36 -07:00
return - EACCES ;
lock_sock ( sk ) ;
/* Check these errors (active socket, double bind). */
2009-10-15 06:30:45 +00:00
if ( sk - > sk_state ! = TCP_CLOSE | | inet - > inet_num ) {
2005-04-16 15:20:36 -07:00
err = - EINVAL ;
goto out ;
}
/* Check if the address belongs to the host. */
if ( addr_type = = IPV6_ADDR_MAPPED ) {
2009-03-24 16:24:50 +00:00
int chk_addr_ret ;
2009-03-24 16:24:48 +00:00
/* Binding to v4-mapped address on a v6-only socket
* makes no sense
*/
2014-06-27 08:36:16 -07:00
if ( sk - > sk_ipv6only ) {
2009-03-24 16:24:48 +00:00
err = - EINVAL ;
goto out ;
}
2009-03-24 16:24:50 +00:00
2010-12-10 14:55:42 +01:00
/* Reproduce AF_INET checks to make the bindings consistent */
2005-04-16 15:20:36 -07:00
v4addr = addr - > sin6_addr . s6_addr32 [ 3 ] ;
2009-03-24 16:24:50 +00:00
chk_addr_ret = inet_addr_type ( net , v4addr ) ;
2014-09-05 15:09:03 +02:00
if ( ! net - > ipv4 . sysctl_ip_nonlocal_bind & &
2009-03-24 16:24:50 +00:00
! ( inet - > freebind | | inet - > transparent ) & &
v4addr ! = htonl ( INADDR_ANY ) & &
chk_addr_ret ! = RTN_LOCAL & &
chk_addr_ret ! = RTN_MULTICAST & &
2009-08-23 19:06:28 -07:00
chk_addr_ret ! = RTN_BROADCAST ) {
err = - EADDRNOTAVAIL ;
2005-04-16 15:20:36 -07:00
goto out ;
2009-08-23 19:06:28 -07:00
}
2005-04-16 15:20:36 -07:00
} else {
if ( addr_type ! = IPV6_ADDR_ANY ) {
struct net_device * dev = NULL ;
2009-11-02 12:10:39 +01:00
rcu_read_lock ( ) ;
2013-03-08 02:07:19 +00:00
if ( __ipv6_addr_needs_scope_id ( addr_type ) ) {
2005-04-16 15:20:36 -07:00
if ( addr_len > = sizeof ( struct sockaddr_in6 ) & &
addr - > sin6_scope_id ) {
/* Override any existing binding, if another one
* is supplied by user .
*/
sk - > sk_bound_dev_if = addr - > sin6_scope_id ;
}
2007-02-09 23:24:49 +09:00
2005-04-16 15:20:36 -07:00
/* Binding to link-local address requires an interface */
if ( ! sk - > sk_bound_dev_if ) {
err = - EINVAL ;
2009-11-02 12:10:39 +01:00
goto out_unlock ;
2005-04-16 15:20:36 -07:00
}
2009-11-02 12:10:39 +01:00
dev = dev_get_by_index_rcu ( net , sk - > sk_bound_dev_if ) ;
2005-04-16 15:20:36 -07:00
if ( ! dev ) {
err = - ENODEV ;
2009-11-02 12:10:39 +01:00
goto out_unlock ;
2005-04-16 15:20:36 -07:00
}
}
/* ipv4 addr of the socket is invalid. Only the
* unspecified and mapped address have a v4 equivalent .
*/
v4addr = LOOPBACK4_IPV6 ;
if ( ! ( addr_type & IPV6_ADDR_MULTICAST ) ) {
2011-11-07 14:57:21 +00:00
if ( ! ( inet - > freebind | | inet - > transparent ) & &
2010-10-21 16:10:03 +02:00
! ipv6_chk_addr ( net , & addr - > sin6_addr ,
2008-01-10 22:43:18 -08:00
dev , 0 ) ) {
2005-04-16 15:20:36 -07:00
err = - EADDRNOTAVAIL ;
2009-11-02 12:10:39 +01:00
goto out_unlock ;
2005-04-16 15:20:36 -07:00
}
}
2009-11-02 12:10:39 +01:00
rcu_read_unlock ( ) ;
2005-04-16 15:20:36 -07:00
}
}
2009-10-15 06:30:45 +00:00
inet - > inet_rcv_saddr = v4addr ;
inet - > inet_saddr = v4addr ;
2005-04-16 15:20:36 -07:00
ipv6: make lookups simpler and faster
TCP listener refactoring, part 4 :
To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common
Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.
Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).
inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.
inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.
We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 15:42:29 -07:00
sk - > sk_v6_rcv_saddr = addr - > sin6_addr ;
2007-02-09 23:24:49 +09:00
2005-04-16 15:20:36 -07:00
if ( ! ( addr_type & IPV6_ADDR_MULTICAST ) )
2011-11-21 03:39:03 +00:00
np - > saddr = addr - > sin6_addr ;
2005-04-16 15:20:36 -07:00
/* Make sure we are allowed to bind here. */
if ( sk - > sk_prot - > get_port ( sk , snum ) ) {
inet_reset_saddr ( sk ) ;
err = - EADDRINUSE ;
goto out ;
}
2009-03-24 16:24:49 +00:00
if ( addr_type ! = IPV6_ADDR_ANY ) {
2005-04-16 15:20:36 -07:00
sk - > sk_userlocks | = SOCK_BINDADDR_LOCK ;
2009-03-24 16:24:49 +00:00
if ( addr_type ! = IPV6_ADDR_MAPPED )
2014-06-27 08:36:16 -07:00
sk - > sk_ipv6only = 1 ;
2009-03-24 16:24:49 +00:00
}
2005-04-16 15:20:36 -07:00
if ( snum )
sk - > sk_userlocks | = SOCK_BINDPORT_LOCK ;
2009-10-15 06:30:45 +00:00
inet - > inet_sport = htons ( inet - > inet_num ) ;
inet - > inet_dport = 0 ;
inet - > inet_daddr = 0 ;
2005-04-16 15:20:36 -07:00
out :
release_sock ( sk ) ;
return err ;
2009-11-02 12:10:39 +01:00
out_unlock :
rcu_read_unlock ( ) ;
goto out ;
2005-04-16 15:20:36 -07:00
}
2007-02-22 22:05:40 +09:00
EXPORT_SYMBOL ( inet6_bind ) ;
2005-04-16 15:20:36 -07:00
int inet6_release ( struct socket * sock )
{
struct sock * sk = sock - > sk ;
2015-03-29 14:00:04 +01:00
if ( ! sk )
2005-04-16 15:20:36 -07:00
return - EINVAL ;
/* Free mc lists */
ipv6_sock_mc_close ( sk ) ;
/* Free ac lists */
ipv6_sock_ac_close ( sk ) ;
return inet_release ( sock ) ;
}
2007-02-22 22:05:40 +09:00
EXPORT_SYMBOL ( inet6_release ) ;
2008-06-14 17:04:49 -07:00
void inet6_destroy_sock ( struct sock * sk )
2005-04-16 15:20:36 -07:00
{
struct ipv6_pinfo * np = inet6_sk ( sk ) ;
struct sk_buff * skb ;
struct ipv6_txoptions * opt ;
/* Release rx options */
2012-05-05 10:13:53 +00:00
skb = xchg ( & np - > pktoptions , NULL ) ;
2015-03-29 14:00:05 +01:00
if ( skb )
2005-04-16 15:20:36 -07:00
kfree_skb ( skb ) ;
2010-04-23 11:26:09 +00:00
2012-05-05 10:13:53 +00:00
skb = xchg ( & np - > rxpmtu , NULL ) ;
2015-03-29 14:00:05 +01:00
if ( skb )
2010-04-23 11:26:09 +00:00
kfree_skb ( skb ) ;
2005-04-16 15:20:36 -07:00
/* Free flowlabels */
fl6_free_socklist ( sk ) ;
/* Free tx options */
2012-05-05 10:13:53 +00:00
opt = xchg ( & np - > opt , NULL ) ;
2015-03-29 14:00:05 +01:00
if ( opt )
2005-04-16 15:20:36 -07:00
sock_kfree_s ( sk , opt , opt - > tot_len ) ;
}
2005-12-13 23:23:20 -08:00
EXPORT_SYMBOL_GPL ( inet6_destroy_sock ) ;
2005-04-16 15:20:36 -07:00
/*
* This does both peername and sockname .
*/
2007-02-09 23:24:49 +09:00
2005-04-16 15:20:36 -07:00
int inet6_getname ( struct socket * sock , struct sockaddr * uaddr ,
int * uaddr_len , int peer )
{
2012-05-05 10:13:53 +00:00
struct sockaddr_in6 * sin = ( struct sockaddr_in6 * ) uaddr ;
2005-04-16 15:20:36 -07:00
struct sock * sk = sock - > sk ;
struct inet_sock * inet = inet_sk ( sk ) ;
struct ipv6_pinfo * np = inet6_sk ( sk ) ;
2007-02-09 23:24:49 +09:00
2005-04-16 15:20:36 -07:00
sin - > sin6_family = AF_INET6 ;
sin - > sin6_flowinfo = 0 ;
sin - > sin6_scope_id = 0 ;
if ( peer ) {
2009-10-15 06:30:45 +00:00
if ( ! inet - > inet_dport )
2005-04-16 15:20:36 -07:00
return - ENOTCONN ;
if ( ( ( 1 < < sk - > sk_state ) & ( TCPF_CLOSE | TCPF_SYN_SENT ) ) & &
peer = = 1 )
return - ENOTCONN ;
2009-10-15 06:30:45 +00:00
sin - > sin6_port = inet - > inet_dport ;
ipv6: make lookups simpler and faster
TCP listener refactoring, part 4 :
To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common
Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.
Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).
inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.
inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.
We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 15:42:29 -07:00
sin - > sin6_addr = sk - > sk_v6_daddr ;
2005-04-16 15:20:36 -07:00
if ( np - > sndflow )
sin - > sin6_flowinfo = np - > flow_label ;
} else {
ipv6: make lookups simpler and faster
TCP listener refactoring, part 4 :
To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common
Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.
Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).
inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.
inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.
We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 15:42:29 -07:00
if ( ipv6_addr_any ( & sk - > sk_v6_rcv_saddr ) )
2011-11-21 03:39:03 +00:00
sin - > sin6_addr = np - > saddr ;
2005-04-16 15:20:36 -07:00
else
ipv6: make lookups simpler and faster
TCP listener refactoring, part 4 :
To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common
Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.
Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).
inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.
inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.
We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 15:42:29 -07:00
sin - > sin6_addr = sk - > sk_v6_rcv_saddr ;
2005-04-16 15:20:36 -07:00
2009-10-15 06:30:45 +00:00
sin - > sin6_port = inet - > inet_sport ;
2005-04-16 15:20:36 -07:00
}
2013-03-08 02:07:19 +00:00
sin - > sin6_scope_id = ipv6_iface_scope_id ( & sin - > sin6_addr ,
sk - > sk_bound_dev_if ) ;
2005-04-16 15:20:36 -07:00
* uaddr_len = sizeof ( * sin ) ;
2010-09-22 20:43:57 +00:00
return 0 ;
2005-04-16 15:20:36 -07:00
}
2007-02-22 22:05:40 +09:00
EXPORT_SYMBOL ( inet6_getname ) ;
2005-04-16 15:20:36 -07:00
int inet6_ioctl ( struct socket * sock , unsigned int cmd , unsigned long arg )
{
struct sock * sk = sock - > sk ;
2008-03-26 02:26:21 +09:00
struct net * net = sock_net ( sk ) ;
2005-04-16 15:20:36 -07:00
2012-05-05 10:13:53 +00:00
switch ( cmd ) {
2005-04-16 15:20:36 -07:00
case SIOCGSTAMP :
return sock_get_timestamp ( sk , ( struct timeval __user * ) arg ) ;
2007-03-18 17:33:16 -07:00
case SIOCGSTAMPNS :
return sock_get_timestampns ( sk , ( struct timespec __user * ) arg ) ;
2005-04-16 15:20:36 -07:00
case SIOCADDRT :
case SIOCDELRT :
2007-02-09 23:24:49 +09:00
2010-09-22 20:43:57 +00:00
return ipv6_route_ioctl ( net , cmd , ( void __user * ) arg ) ;
2005-04-16 15:20:36 -07:00
case SIOCSIFADDR :
2008-03-05 10:46:57 -08:00
return addrconf_add_ifaddr ( net , ( void __user * ) arg ) ;
2005-04-16 15:20:36 -07:00
case SIOCDIFADDR :
2008-03-05 10:46:57 -08:00
return addrconf_del_ifaddr ( net , ( void __user * ) arg ) ;
2005-04-16 15:20:36 -07:00
case SIOCSIFDSTADDR :
2008-03-05 10:46:57 -08:00
return addrconf_set_dstaddr ( net , ( void __user * ) arg ) ;
2005-04-16 15:20:36 -07:00
default :
2006-01-03 14:18:33 -08:00
if ( ! sk - > sk_prot - > ioctl )
return - ENOIOCTLCMD ;
return sk - > sk_prot - > ioctl ( sk , cmd , arg ) ;
2005-04-16 15:20:36 -07:00
}
/*NOTREACHED*/
2010-09-22 20:43:57 +00:00
return 0 ;
2005-04-16 15:20:36 -07:00
}
2007-02-22 22:05:40 +09:00
EXPORT_SYMBOL ( inet6_ioctl ) ;
2005-12-22 12:49:22 -08:00
const struct proto_ops inet6_stream_ops = {
2006-03-20 22:48:35 -08:00
. family = PF_INET6 ,
. owner = THIS_MODULE ,
. release = inet6_release ,
. bind = inet6_bind ,
. connect = inet_stream_connect , /* ok */
. socketpair = sock_no_socketpair , /* a do nothing */
. accept = inet_accept , /* ok */
. getname = inet6_getname ,
. poll = tcp_poll , /* ok */
. ioctl = inet6_ioctl , /* must change */
. listen = inet_listen , /* ok */
. shutdown = inet_shutdown , /* ok */
. setsockopt = sock_common_setsockopt , /* ok */
. getsockopt = sock_common_getsockopt , /* ok */
2010-07-10 20:41:55 +00:00
. sendmsg = inet_sendmsg , /* ok */
. recvmsg = inet_recvmsg , /* ok */
2006-03-20 22:48:35 -08:00
. mmap = sock_no_mmap ,
2010-07-10 20:41:55 +00:00
. sendpage = inet_sendpage ,
2007-11-06 23:31:58 -08:00
. splice_read = tcp_splice_read ,
2006-03-20 22:45:21 -08:00
# ifdef CONFIG_COMPAT
2006-03-20 22:48:35 -08:00
. compat_setsockopt = compat_sock_common_setsockopt ,
. compat_getsockopt = compat_sock_common_getsockopt ,
2006-03-20 22:45:21 -08:00
# endif
2005-04-16 15:20:36 -07:00
} ;
2005-12-22 12:49:22 -08:00
const struct proto_ops inet6_dgram_ops = {
2006-03-20 22:48:35 -08:00
. family = PF_INET6 ,
. owner = THIS_MODULE ,
. release = inet6_release ,
. bind = inet6_bind ,
. connect = inet_dgram_connect , /* ok */
. socketpair = sock_no_socketpair , /* a do nothing */
. accept = sock_no_accept , /* a do nothing */
. getname = inet6_getname ,
. poll = udp_poll , /* ok */
. ioctl = inet6_ioctl , /* must change */
. listen = sock_no_listen , /* ok */
. shutdown = inet_shutdown , /* ok */
. setsockopt = sock_common_setsockopt , /* ok */
. getsockopt = sock_common_getsockopt , /* ok */
. sendmsg = inet_sendmsg , /* ok */
2010-07-10 20:41:55 +00:00
. recvmsg = inet_recvmsg , /* ok */
2006-03-20 22:48:35 -08:00
. mmap = sock_no_mmap ,
. sendpage = sock_no_sendpage ,
2006-03-20 22:45:21 -08:00
# ifdef CONFIG_COMPAT
2006-03-20 22:48:35 -08:00
. compat_setsockopt = compat_sock_common_setsockopt ,
. compat_getsockopt = compat_sock_common_getsockopt ,
2006-03-20 22:45:21 -08:00
# endif
2005-04-16 15:20:36 -07:00
} ;
2009-10-05 05:58:39 +00:00
static const struct net_proto_family inet6_family_ops = {
2005-04-16 15:20:36 -07:00
. family = PF_INET6 ,
. create = inet6_create ,
. owner = THIS_MODULE ,
} ;
2007-12-11 02:25:01 -08:00
int inet6_register_protosw ( struct inet_protosw * p )
2005-04-16 15:20:36 -07:00
{
struct list_head * lh ;
struct inet_protosw * answer ;
struct list_head * last_perm ;
2007-12-11 02:25:01 -08:00
int protocol = p - > protocol ;
int ret ;
2005-04-16 15:20:36 -07:00
spin_lock_bh ( & inetsw6_lock ) ;
2007-12-11 02:25:01 -08:00
ret = - EINVAL ;
2005-04-16 15:20:36 -07:00
if ( p - > type > = SOCK_MAX )
goto out_illegal ;
/* If we are trying to override a permanent protocol, bail. */
answer = NULL ;
2007-12-11 02:25:01 -08:00
ret = - EPERM ;
2005-04-16 15:20:36 -07:00
last_perm = & inetsw6 [ p - > type ] ;
list_for_each ( lh , & inetsw6 [ p - > type ] ) {
answer = list_entry ( lh , struct inet_protosw , list ) ;
/* Check only the non-wild match. */
if ( INET_PROTOSW_PERMANENT & answer - > flags ) {
if ( protocol = = answer - > protocol )
break ;
last_perm = lh ;
}
answer = NULL ;
}
if ( answer )
goto out_permanent ;
/* Add the new entry after the last permanent entry if any, so that
* the new entry does not override a permanent entry when matched with
* a wild - card protocol . But it is allowed to override any existing
2007-02-09 23:24:49 +09:00
* non - permanent entry . This means that when we remove this entry , the
2005-04-16 15:20:36 -07:00
* system automatically returns to the old behavior .
*/
list_add_rcu ( & p - > list , last_perm ) ;
2007-12-11 02:25:01 -08:00
ret = 0 ;
2005-04-16 15:20:36 -07:00
out :
spin_unlock_bh ( & inetsw6_lock ) ;
2007-12-11 02:25:01 -08:00
return ret ;
2005-04-16 15:20:36 -07:00
out_permanent :
2012-05-15 14:11:53 +00:00
pr_err ( " Attempt to override permanent protocol %d \n " , protocol ) ;
2005-04-16 15:20:36 -07:00
goto out ;
out_illegal :
2012-05-15 14:11:53 +00:00
pr_err ( " Ignoring attempt to register invalid socket type %d \n " ,
2005-04-16 15:20:36 -07:00
p - > type ) ;
goto out ;
}
2007-02-22 22:05:40 +09:00
EXPORT_SYMBOL ( inet6_register_protosw ) ;
2005-04-16 15:20:36 -07:00
void
inet6_unregister_protosw ( struct inet_protosw * p )
{
if ( INET_PROTOSW_PERMANENT & p - > flags ) {
2012-05-15 14:11:53 +00:00
pr_err ( " Attempt to unregister permanent protocol %d \n " ,
2005-04-16 15:20:36 -07:00
p - > protocol ) ;
} else {
spin_lock_bh ( & inetsw6_lock ) ;
list_del_rcu ( & p - > list ) ;
spin_unlock_bh ( & inetsw6_lock ) ;
synchronize_net ( ) ;
}
}
2007-02-22 22:05:40 +09:00
EXPORT_SYMBOL ( inet6_unregister_protosw ) ;
2005-12-13 23:22:54 -08:00
int inet6_sk_rebuild_header ( struct sock * sk )
{
struct ipv6_pinfo * np = inet6_sk ( sk ) ;
2011-03-01 13:19:07 -08:00
struct dst_entry * dst ;
2005-12-13 23:22:54 -08:00
dst = __sk_dst_check ( sk , np - > dst_cookie ) ;
2015-03-29 14:00:04 +01:00
if ( ! dst ) {
2005-12-13 23:22:54 -08:00
struct inet_sock * inet = inet_sk ( sk ) ;
2010-06-01 21:35:01 +00:00
struct in6_addr * final_p , final ;
2011-03-12 16:22:43 -05:00
struct flowi6 fl6 ;
memset ( & fl6 , 0 , sizeof ( fl6 ) ) ;
fl6 . flowi6_proto = sk - > sk_protocol ;
ipv6: make lookups simpler and faster
TCP listener refactoring, part 4 :
To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common
Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.
Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).
inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.
inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.
We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 15:42:29 -07:00
fl6 . daddr = sk - > sk_v6_daddr ;
2011-11-21 03:39:03 +00:00
fl6 . saddr = np - > saddr ;
2011-03-12 16:22:43 -05:00
fl6 . flowlabel = np - > flow_label ;
fl6 . flowi6_oif = sk - > sk_bound_dev_if ;
fl6 . flowi6_mark = sk - > sk_mark ;
2011-03-12 16:36:19 -05:00
fl6 . fl6_dport = inet - > inet_dport ;
fl6 . fl6_sport = inet - > inet_sport ;
2011-03-12 16:22:43 -05:00
security_sk_classify_flow ( sk , flowi6_to_flowi ( & fl6 ) ) ;
final_p = fl6_update_dst ( & fl6 , np - > opt , & final ) ;
2013-08-28 08:04:14 +02:00
dst = ip6_dst_lookup_flow ( sk , & fl6 , final_p ) ;
2011-03-01 13:19:07 -08:00
if ( IS_ERR ( dst ) ) {
2005-12-13 23:22:54 -08:00
sk - > sk_route_caps = 0 ;
2011-03-01 13:19:07 -08:00
sk - > sk_err_soft = - PTR_ERR ( dst ) ;
return PTR_ERR ( dst ) ;
2005-12-13 23:22:54 -08:00
}
2006-08-29 17:15:09 -07:00
__ip6_dst_store ( sk , dst , NULL , NULL ) ;
2005-12-13 23:22:54 -08:00
}
return 0 ;
}
EXPORT_SYMBOL_GPL ( inet6_sk_rebuild_header ) ;
2014-09-27 09:50:56 -07:00
bool ipv6_opt_accepted ( const struct sock * sk , const struct sk_buff * skb ,
const struct inet6_skb_parm * opt )
2005-12-13 23:24:28 -08:00
{
2012-05-18 08:14:11 +02:00
const struct ipv6_pinfo * np = inet6_sk ( sk ) ;
2005-12-13 23:24:28 -08:00
if ( np - > rxopt . all ) {
if ( ( opt - > hop & & ( np - > rxopt . bits . hopopts | |
np - > rxopt . bits . ohopopts ) ) | |
2013-12-08 15:47:01 +01:00
( ip6_flowinfo ( ( struct ipv6hdr * ) skb_network_header ( skb ) ) & &
2005-12-13 23:24:28 -08:00
np - > rxopt . bits . rxflow ) | |
( opt - > srcrt & & ( np - > rxopt . bits . srcrt | |
np - > rxopt . bits . osrcrt ) ) | |
( ( opt - > dst1 | | opt - > dst0 ) & &
( np - > rxopt . bits . dstopts | | np - > rxopt . bits . odstopts ) ) )
2012-05-18 08:14:11 +02:00
return true ;
2005-12-13 23:24:28 -08:00
}
2012-05-18 08:14:11 +02:00
return false ;
2005-12-13 23:24:28 -08:00
}
EXPORT_SYMBOL_GPL ( ipv6_opt_accepted ) ;
2009-03-09 08:18:29 +00:00
static struct packet_type ipv6_packet_type __read_mostly = {
2009-02-01 00:45:17 -08:00
. type = cpu_to_be16 ( ETH_P_IPV6 ) ,
2008-02-27 23:14:03 +09:00
. func = ipv6_rcv ,
2012-11-15 08:49:11 +00:00
} ;
2008-02-27 23:14:03 +09:00
static int __init ipv6_packet_init ( void )
{
dev_add_pack ( & ipv6_packet_type ) ;
return 0 ;
}
static void ipv6_packet_cleanup ( void )
{
dev_remove_pack ( & ipv6_packet_type ) ;
}
2008-10-07 14:48:53 -07:00
static int __net_init ipv6_init_mibs ( struct net * net )
{
2013-10-07 15:51:58 -07:00
int i ;
2014-05-05 15:55:55 -07:00
net - > mib . udp_stats_in6 = alloc_percpu ( struct udp_mib ) ;
if ( ! net - > mib . udp_stats_in6 )
2008-10-07 14:49:36 -07:00
return - ENOMEM ;
2014-05-05 15:55:55 -07:00
net - > mib . udplite_stats_in6 = alloc_percpu ( struct udp_mib ) ;
if ( ! net - > mib . udplite_stats_in6 )
2008-10-07 14:50:06 -07:00
goto err_udplite_mib ;
2014-05-05 15:55:55 -07:00
net - > mib . ipv6_statistics = alloc_percpu ( struct ipstats_mib ) ;
if ( ! net - > mib . ipv6_statistics )
2008-10-08 10:36:03 -07:00
goto err_ip_mib ;
2013-10-07 15:51:58 -07:00
for_each_possible_cpu ( i ) {
struct ipstats_mib * af_inet6_stats ;
2014-05-05 15:55:55 -07:00
af_inet6_stats = per_cpu_ptr ( net - > mib . ipv6_statistics , i ) ;
2013-10-07 15:51:58 -07:00
u64_stats_init ( & af_inet6_stats - > syncp ) ;
}
2014-05-05 15:55:55 -07:00
net - > mib . icmpv6_statistics = alloc_percpu ( struct icmpv6_mib ) ;
if ( ! net - > mib . icmpv6_statistics )
2008-10-08 10:36:03 -07:00
goto err_icmp_mib ;
2011-11-13 01:24:04 +00:00
net - > mib . icmpv6msg_statistics = kzalloc ( sizeof ( struct icmpv6msg_mib ) ,
GFP_KERNEL ) ;
if ( ! net - > mib . icmpv6msg_statistics )
2008-10-08 10:36:03 -07:00
goto err_icmpmsg_mib ;
2008-10-07 14:48:53 -07:00
return 0 ;
2008-10-07 14:50:06 -07:00
2008-10-08 10:36:03 -07:00
err_icmpmsg_mib :
2014-05-05 15:55:55 -07:00
free_percpu ( net - > mib . icmpv6_statistics ) ;
2008-10-08 10:36:03 -07:00
err_icmp_mib :
2014-05-05 15:55:55 -07:00
free_percpu ( net - > mib . ipv6_statistics ) ;
2008-10-08 10:36:03 -07:00
err_ip_mib :
2014-05-05 15:55:55 -07:00
free_percpu ( net - > mib . udplite_stats_in6 ) ;
2008-10-07 14:50:06 -07:00
err_udplite_mib :
2014-05-05 15:55:55 -07:00
free_percpu ( net - > mib . udp_stats_in6 ) ;
2008-10-07 14:50:06 -07:00
return - ENOMEM ;
2008-10-07 14:48:53 -07:00
}
2010-01-17 03:35:32 +00:00
static void ipv6_cleanup_mibs ( struct net * net )
2008-10-07 14:48:53 -07:00
{
2014-05-05 15:55:55 -07:00
free_percpu ( net - > mib . udp_stats_in6 ) ;
free_percpu ( net - > mib . udplite_stats_in6 ) ;
free_percpu ( net - > mib . ipv6_statistics ) ;
free_percpu ( net - > mib . icmpv6_statistics ) ;
2011-11-13 01:24:04 +00:00
kfree ( net - > mib . icmpv6msg_statistics ) ;
2008-10-07 14:48:53 -07:00
}
2008-10-13 18:54:07 -07:00
static int __net_init inet6_net_init ( struct net * net )
2008-01-10 02:48:33 -08:00
{
2008-03-21 04:14:17 -07:00
int err = 0 ;
2008-01-10 02:54:53 -08:00
net - > ipv6 . sysctl . bindv6only = 0 ;
2008-01-10 03:02:40 -08:00
net - > ipv6 . sysctl . icmpv6_time = 1 * HZ ;
2014-01-17 17:15:05 +01:00
net - > ipv6 . sysctl . flowlabel_consistency = 1 ;
2014-07-01 21:33:10 -07:00
net - > ipv6 . sysctl . auto_flowlabels = 0 ;
2015-03-23 23:36:05 +01:00
net - > ipv6 . sysctl . idgen_retries = 3 ;
net - > ipv6 . sysctl . idgen_delay = 1 * HZ ;
ipv6: Flow label state ranges
This patch divides the IPv6 flow label space into two ranges:
0-7ffff is reserved for flow label manager, 80000-fffff will be
used for creating auto flow labels (per RFC6438). This only affects how
labels are set on transmit, it does not affect receive. This range split
can be disbaled by systcl.
Background:
IPv6 flow labels have been an unmitigated disappointment thus far
in the lifetime of IPv6. Support in HW devices to use them for ECMP
is lacking, and OSes don't turn them on by default. If we had these
we could get much better hashing in IPv6 networks without resorting
to DPI, possibly eliminating some of the motivations to to define new
encaps in UDP just for getting ECMP.
Unfortunately, the initial specfications of IPv6 did not clarify
how they are to be used. There has always been a vague concept that
these can be used for ECMP, flow hashing, etc. and we do now have a
good standard how to this in RFC6438. The problem is that flow labels
can be either stateful or stateless (as in RFC6438), and we are
presented with the possibility that a stateless label may collide
with a stateful one. Attempts to split the flow label space were
rejected in IETF. When we added support in Linux for RFC6438, we
could not turn on flow labels by default due to this conflict.
This patch splits the flow label space and should give us
a path to enabling auto flow labels by default for all IPv6 packets.
This is an API change so we need to consider compatibility with
existing deployment. The stateful range is chosen to be the lower
values in hopes that most uses would have chosen small numbers.
Once we resolve the stateless/stateful issue, we can proceed to
look at enabling RFC6438 flow labels by default (starting with
scaled testing).
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29 15:33:21 -07:00
net - > ipv6 . sysctl . flowlabel_state_ranges = 1 ;
2014-10-06 19:58:37 +02:00
atomic_set ( & net - > ipv6 . fib6_sernum , 1 ) ;
2008-01-10 02:56:03 -08:00
2008-10-07 14:48:53 -07:00
err = ipv6_init_mibs ( net ) ;
if ( err )
return err ;
2008-03-21 04:14:17 -07:00
# ifdef CONFIG_PROC_FS
err = udp6_proc_init ( net ) ;
if ( err )
goto out ;
2008-03-21 04:14:45 -07:00
err = tcp6_proc_init ( net ) ;
if ( err )
goto proc_tcp6_fail ;
2008-03-26 16:52:32 -07:00
err = ac6_proc_init ( net ) ;
if ( err )
goto proc_ac6_fail ;
2008-03-21 04:14:17 -07:00
# endif
return err ;
2008-03-21 04:14:45 -07:00
# ifdef CONFIG_PROC_FS
2008-03-26 16:52:32 -07:00
proc_ac6_fail :
tcp6_proc_exit ( net ) ;
2008-03-21 04:14:45 -07:00
proc_tcp6_fail :
udp6_proc_exit ( net ) ;
2008-10-07 14:48:53 -07:00
out :
ipv6_cleanup_mibs ( net ) ;
return err ;
2008-03-21 04:14:45 -07:00
# endif
2008-01-10 02:48:33 -08:00
}
2010-01-17 03:35:32 +00:00
static void __net_exit inet6_net_exit ( struct net * net )
2008-01-10 02:48:33 -08:00
{
2008-03-21 04:14:17 -07:00
# ifdef CONFIG_PROC_FS
udp6_proc_exit ( net ) ;
2008-03-21 04:14:45 -07:00
tcp6_proc_exit ( net ) ;
2008-03-26 16:52:32 -07:00
ac6_proc_exit ( net ) ;
2008-03-21 04:14:17 -07:00
# endif
2008-10-07 14:48:53 -07:00
ipv6_cleanup_mibs ( net ) ;
2008-01-10 02:48:33 -08:00
}
static struct pernet_operations inet6_net_ops = {
. init = inet6_net_init ,
. exit = inet6_net_exit ,
} ;
2013-08-31 13:44:30 +08:00
static const struct ipv6_stub ipv6_stub_impl = {
. ipv6_sock_mc_join = ipv6_sock_mc_join ,
. ipv6_sock_mc_drop = ipv6_sock_mc_drop ,
. ipv6_dst_lookup = ip6_dst_lookup ,
. udpv6_encap_enable = udpv6_encap_enable ,
2013-08-31 13:44:36 +08:00
. ndisc_send_na = ndisc_send_na ,
2013-08-31 13:44:34 +08:00
. nd_tbl = & nd_tbl ,
2013-08-31 13:44:30 +08:00
} ;
2005-04-16 15:20:36 -07:00
static int __init inet6_init ( void )
{
2007-02-09 23:24:49 +09:00
struct list_head * r ;
2009-03-04 03:18:11 -08:00
int err = 0 ;
2005-04-16 15:20:36 -07:00
2015-03-01 14:58:29 +02:00
sock_skb_cb_check_size ( sizeof ( struct inet6_skb_parm ) ) ;
2006-09-01 00:29:06 -07:00
2009-03-04 03:18:11 -08:00
/* Register the socket-side information for inet6_create. */
2012-05-05 10:13:53 +00:00
for ( r = & inetsw6 [ 0 ] ; r < & inetsw6 [ SOCK_MAX ] ; + + r )
2009-03-04 03:18:11 -08:00
INIT_LIST_HEAD ( r ) ;
2009-06-01 03:07:33 -07:00
if ( disable_ipv6_mod ) {
2012-05-15 14:11:53 +00:00
pr_info ( " Loaded, but administratively disabled, reboot required to enable \n " ) ;
2009-03-04 03:18:11 -08:00
goto out ;
}
2005-04-16 15:20:36 -07:00
err = proto_register ( & tcpv6_prot , 1 ) ;
if ( err )
goto out ;
err = proto_register ( & udpv6_prot , 1 ) ;
if ( err )
goto out_unregister_tcp_proto ;
2006-11-27 11:10:57 -08:00
err = proto_register ( & udplitev6_prot , 1 ) ;
2005-04-16 15:20:36 -07:00
if ( err )
goto out_unregister_udp_proto ;
2006-11-27 11:10:57 -08:00
err = proto_register ( & rawv6_prot , 1 ) ;
if ( err )
goto out_unregister_udplite_proto ;
2013-05-22 20:17:31 +00:00
err = proto_register ( & pingv6_prot , 1 ) ;
if ( err )
goto out_unregister_ping_proto ;
2005-04-16 15:20:36 -07:00
/* We MUST register RAW sockets before we create the ICMP6,
* IGMP6 , or NDISC control sockets .
*/
2007-12-11 02:25:35 -08:00
err = rawv6_init ( ) ;
if ( err )
goto out_unregister_raw_proto ;
2005-04-16 15:20:36 -07:00
/* Register the family here so that the init calls below will
* be able to create sockets . ( ? ? is this dangerous ? ? )
*/
2005-11-11 15:05:47 -08:00
err = sock_register ( & inet6_family_ops ) ;
if ( err )
2007-12-11 02:25:35 -08:00
goto out_sock_register_fail ;
2005-04-16 15:20:36 -07:00
/*
* ipngwg API draft makes clear that the correct semantics
* for TCP and UDP is to consider one TCP and UDP instance
2011-03-30 22:57:33 -03:00
* in a host available by both INET and INET6 APIs and
2005-04-16 15:20:36 -07:00
* able to communicate via both network protocols .
*/
2008-01-10 02:48:33 -08:00
err = register_pernet_subsys ( & inet6_net_ops ) ;
if ( err )
goto register_pernet_fail ;
2008-02-29 11:13:15 -08:00
err = icmpv6_init ( ) ;
2005-04-16 15:20:36 -07:00
if ( err )
goto icmp_fail ;
2008-07-03 12:13:30 +08:00
err = ip6_mr_init ( ) ;
if ( err )
goto ipmr_fail ;
2008-02-29 11:13:15 -08:00
err = ndisc_init ( ) ;
2005-04-16 15:20:36 -07:00
if ( err )
goto ndisc_fail ;
2008-02-29 11:13:15 -08:00
err = igmp6_init ( ) ;
2005-04-16 15:20:36 -07:00
if ( err )
goto igmp_fail ;
2013-08-31 13:44:30 +08:00
ipv6_stub = & ipv6_stub_impl ;
2005-08-09 19:42:34 -07:00
err = ipv6_netfilter_init ( ) ;
if ( err )
goto netfilter_fail ;
2005-04-16 15:20:36 -07:00
/* Create /proc/foo6 entries. */
# ifdef CONFIG_PROC_FS
err = - ENOMEM ;
if ( raw6_proc_init ( ) )
goto proc_raw6_fail ;
2006-11-27 11:10:57 -08:00
if ( udplite6_proc_init ( ) )
goto proc_udplite6_fail ;
2005-04-16 15:20:36 -07:00
if ( ipv6_misc_proc_init ( ) )
goto proc_misc6_fail ;
if ( if6_proc_init ( ) )
goto proc_if6_fail ;
# endif
2007-12-07 00:44:29 -08:00
err = ip6_route_init ( ) ;
if ( err )
goto ip6_route_fail ;
2013-09-09 21:45:04 +02:00
err = ndisc_late_init ( ) ;
if ( err )
goto ndisc_late_fail ;
2007-12-11 02:23:18 -08:00
err = ip6_flowlabel_init ( ) ;
if ( err )
goto ip6_flowlabel_fail ;
2005-04-16 15:20:36 -07:00
err = addrconf_init ( ) ;
if ( err )
goto addrconf_fail ;
/* Init v6 extension headers. */
2007-12-11 02:23:54 -08:00
err = ipv6_exthdrs_init ( ) ;
if ( err )
goto ipv6_exthdrs_fail ;
2007-12-11 02:24:29 -08:00
err = ipv6_frag_init ( ) ;
if ( err )
goto ipv6_frag_fail ;
2005-04-16 15:20:36 -07:00
/* Init v6 transport protocols. */
2007-12-11 02:25:35 -08:00
err = udpv6_init ( ) ;
if ( err )
goto udpv6_fail ;
2005-07-05 14:41:20 -07:00
2007-12-11 02:25:35 -08:00
err = udplitev6_init ( ) ;
if ( err )
goto udplitev6_fail ;
err = tcpv6_init ( ) ;
if ( err )
goto tcpv6_fail ;
err = ipv6_packet_init ( ) ;
if ( err )
goto ipv6_packet_fail ;
2008-03-05 10:45:36 -08:00
2013-05-22 20:17:31 +00:00
err = pingv6_init ( ) ;
if ( err )
goto pingv6_fail ;
2008-03-05 10:45:36 -08:00
# ifdef CONFIG_SYSCTL
err = ipv6_sysctl_register ( ) ;
if ( err )
goto sysctl_fail ;
# endif
2005-04-16 15:20:36 -07:00
out :
return err ;
2008-03-05 10:45:36 -08:00
# ifdef CONFIG_SYSCTL
sysctl_fail :
2013-11-16 15:17:24 -05:00
pingv6_exit ( ) ;
2008-03-05 10:45:36 -08:00
# endif
2013-05-22 20:17:31 +00:00
pingv6_fail :
2013-11-16 15:17:24 -05:00
ipv6_packet_cleanup ( ) ;
2007-12-11 02:25:35 -08:00
ipv6_packet_fail :
tcpv6_exit ( ) ;
tcpv6_fail :
udplitev6_exit ( ) ;
udplitev6_fail :
udpv6_exit ( ) ;
udpv6_fail :
ipv6_frag_exit ( ) ;
2007-12-11 02:24:29 -08:00
ipv6_frag_fail :
ipv6_exthdrs_exit ( ) ;
2007-12-11 02:23:54 -08:00
ipv6_exthdrs_fail :
addrconf_cleanup ( ) ;
2005-04-16 15:20:36 -07:00
addrconf_fail :
ip6_flowlabel_cleanup ( ) ;
2007-12-11 02:23:18 -08:00
ip6_flowlabel_fail :
2013-09-09 21:45:04 +02:00
ndisc_late_cleanup ( ) ;
ndisc_late_fail :
2005-04-16 15:20:36 -07:00
ip6_route_cleanup ( ) ;
2007-12-07 00:44:29 -08:00
ip6_route_fail :
2005-04-16 15:20:36 -07:00
# ifdef CONFIG_PROC_FS
if6_proc_exit ( ) ;
proc_if6_fail :
ipv6_misc_proc_exit ( ) ;
proc_misc6_fail :
2006-11-27 11:10:57 -08:00
udplite6_proc_exit ( ) ;
proc_udplite6_fail :
2005-04-16 15:20:36 -07:00
raw6_proc_exit ( ) ;
proc_raw6_fail :
# endif
2005-08-09 19:42:34 -07:00
ipv6_netfilter_fini ( ) ;
netfilter_fail :
2005-04-16 15:20:36 -07:00
igmp6_cleanup ( ) ;
igmp_fail :
ndisc_cleanup ( ) ;
ndisc_fail :
2008-07-03 12:13:30 +08:00
ip6_mr_cleanup ( ) ;
ipmr_fail :
2005-04-16 15:20:36 -07:00
icmpv6_cleanup ( ) ;
icmp_fail :
2008-01-10 02:48:33 -08:00
unregister_pernet_subsys ( & inet6_net_ops ) ;
register_pernet_fail :
2005-11-11 15:05:47 -08:00
sock_unregister ( PF_INET6 ) ;
2007-12-07 00:44:29 -08:00
rtnl_unregister_all ( PF_INET6 ) ;
2007-12-11 02:25:35 -08:00
out_sock_register_fail :
rawv6_exit ( ) ;
2013-05-22 20:17:31 +00:00
out_unregister_ping_proto :
proto_unregister ( & pingv6_prot ) ;
2005-04-16 15:20:36 -07:00
out_unregister_raw_proto :
proto_unregister ( & rawv6_prot ) ;
2006-11-27 11:10:57 -08:00
out_unregister_udplite_proto :
proto_unregister ( & udplitev6_prot ) ;
2005-04-16 15:20:36 -07:00
out_unregister_udp_proto :
proto_unregister ( & udpv6_prot ) ;
out_unregister_tcp_proto :
proto_unregister ( & tcpv6_prot ) ;
goto out ;
}
module_init ( inet6_init ) ;
MODULE_ALIAS_NETPROTO ( PF_INET6 ) ;