2019-05-27 08:55:01 +02:00
// SPDX-License-Identifier: GPL-2.0-or-later
2005-04-16 15:20:36 -07:00
/*
* common UDP / RAW code
2007-02-09 23:24:49 +09:00
* Linux INET6 implementation
2005-04-16 15:20:36 -07:00
*
* Authors :
2007-02-09 23:24:49 +09:00
* Pedro Roque < roque @ di . fc . ul . pt >
2005-04-16 15:20:36 -07:00
*/
2006-01-11 12:17:47 -08:00
# include <linux/capability.h>
2005-04-16 15:20:36 -07:00
# include <linux/errno.h>
# include <linux/types.h>
# include <linux/kernel.h>
# include <linux/interrupt.h>
# include <linux/socket.h>
# include <linux/sockios.h>
# include <linux/in6.h>
# include <linux/ipv6.h>
# include <linux/route.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
# include <linux/slab.h>
2012-04-29 21:48:53 +00:00
# include <linux/export.h>
2020-07-24 09:03:10 -04:00
# include <linux/icmp.h>
2005-04-16 15:20:36 -07:00
# include <net/ipv6.h>
# include <net/ndisc.h>
# include <net/addrconf.h>
# include <net/transp_v6.h>
# include <net/ip6_route.h>
2005-08-09 20:08:28 -07:00
# include <net/tcp_states.h>
2013-01-13 05:02:01 +00:00
# include <net/dsfield.h>
2019-09-12 21:16:39 -04:00
# include <net/sock_reuseport.h>
2005-04-16 15:20:36 -07:00
# include <linux/errqueue.h>
2016-12-24 11:46:01 -08:00
# include <linux/uaccess.h>
2005-04-16 15:20:36 -07:00
2012-05-18 18:57:34 +00:00
static bool ipv6_mapped_addr_any ( const struct in6_addr * a )
2011-08-05 03:56:30 -07:00
{
2012-05-18 18:57:34 +00:00
return ipv6_addr_v4mapped ( a ) & & ( a - > s6_addr32 [ 3 ] = = 0 ) ;
2011-08-05 03:56:30 -07:00
}
2023-07-11 15:06:21 +02:00
static void ip6_datagram_flow_key_init ( struct flowi6 * fl6 ,
const struct sock * sk )
2016-04-11 15:29:34 -07:00
{
2023-07-11 15:06:21 +02:00
const struct inet_sock * inet = inet_sk ( sk ) ;
const struct ipv6_pinfo * np = inet6_sk ( sk ) ;
2022-12-08 15:54:46 +01:00
int oif = sk - > sk_bound_dev_if ;
2016-04-11 15:29:34 -07:00
memset ( fl6 , 0 , sizeof ( * fl6 ) ) ;
fl6 - > flowi6_proto = sk - > sk_protocol ;
fl6 - > daddr = sk - > sk_v6_daddr ;
fl6 - > saddr = np - > saddr ;
fl6 - > flowi6_mark = sk - > sk_mark ;
fl6 - > fl6_dport = inet - > inet_dport ;
fl6 - > fl6_sport = inet - > inet_sport ;
2023-02-08 18:13:59 +01:00
fl6 - > flowlabel = ip6_make_flowinfo ( np - > tclass , np - > flow_label ) ;
2016-11-04 02:23:43 +09:00
fl6 - > flowi6_uid = sk - > sk_uid ;
2016-04-11 15:29:34 -07:00
2022-12-08 15:54:46 +01:00
if ( ! oif )
oif = np - > sticky_pktinfo . ipi6_ifindex ;
2016-04-11 15:29:34 -07:00
2022-12-08 15:54:46 +01:00
if ( ! oif ) {
if ( ipv6_addr_is_multicast ( & fl6 - > daddr ) )
2023-12-08 10:12:43 +00:00
oif = READ_ONCE ( np - > mcast_oif ) ;
2022-12-08 15:54:46 +01:00
else
2023-12-08 10:12:44 +00:00
oif = READ_ONCE ( np - > ucast_oif ) ;
2022-12-08 15:54:46 +01:00
}
2016-04-11 15:29:34 -07:00
2022-12-08 15:54:46 +01:00
fl6 - > flowi6_oif = oif ;
2020-09-27 22:38:26 -04:00
security_sk_classify_flow ( sk , flowi6_to_flowi_common ( fl6 ) ) ;
2016-04-11 15:29:34 -07:00
}
ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update
There is a case in connected UDP socket such that
getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
sequence could be the following:
1. Create a connected UDP socket
2. Send some datagrams out
3. Receive a ICMPV6_PKT_TOOBIG
4. No new outgoing datagrams to trigger the sk_dst_check()
logic to update the sk->sk_dst_cache.
5. getsockopt(IPV6_MTU) returns the mtu from the invalid
sk->sk_dst_cache instead of the newly created RTF_CACHE clone.
This patch updates the sk->sk_dst_cache for a connected datagram sk
during pmtu-update code path.
Note that the sk->sk_v6_daddr is used to do the route lookup
instead of skb->data (i.e. iph). It is because a UDP socket can become
connected after sending out some datagrams in un-connected state. or
It can be connected multiple times to different destinations. Hence,
iph may not be related to where sk is currently connected to.
It is done under '!sock_owned_by_user(sk)' condition because
the user may make another ip6_datagram_connect() (i.e changing
the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update
code path.
For the sock_owned_by_user(sk) == true case, the next patch will
introduce a release_cb() which will update the sk->sk_dst_cache.
Test:
Server (Connected UDP Socket):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Route Details:
[root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
2fac::/64 dev eth0 proto kernel metric 256 pref medium
2fac:face::/64 via 2fac::face dev eth0 metric 1024 pref medium
A simple python code to create a connected UDP socket:
import socket
import errno
HOST = '2fac::1'
PORT = 8080
s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
s.bind((HOST, PORT))
s.connect(('2fac:face::face', 53))
print("connected")
while True:
try:
data = s.recv(1024)
except socket.error as se:
if se.errno == errno.EMSGSIZE:
pmtu = s.getsockopt(41, 24)
print("PMTU:%d" % pmtu)
break
s.close()
Python program output after getting a ICMPV6_PKT_TOOBIG:
[root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
connected
PMTU:1300
Cache routes after recieving TOOBIG:
[root@arch-fb-vm1 ~]# ip -6 r show table cache
2fac:face::face via 2fac::face dev eth0 metric 0
cache expires 463sec mtu 1300 pref medium
Client (Send the ICMPV6_PKT_TOOBIG):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scapy is used to generate the TOOBIG message. Here is the scapy script I have
used:
>>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
>>> sendp(p, iface='qemubr0')
Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Wei Wang <weiwan@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11 15:29:36 -07:00
int ip6_datagram_dst_update ( struct sock * sk , bool fix_sk_saddr )
2016-04-11 15:29:35 -07:00
{
struct ip6_flowlabel * flowlabel = NULL ;
struct in6_addr * final_p , final ;
struct ipv6_txoptions * opt ;
struct dst_entry * dst ;
struct inet_sock * inet = inet_sk ( sk ) ;
struct ipv6_pinfo * np = inet6_sk ( sk ) ;
struct flowi6 fl6 ;
int err = 0 ;
2023-09-12 16:02:12 +00:00
if ( inet6_test_bit ( SNDFLOW , sk ) & &
( np - > flow_label & IPV6_FLOWLABEL_MASK ) ) {
2016-04-11 15:29:35 -07:00
flowlabel = fl6_sock_lookup ( sk , np - > flow_label ) ;
2019-07-10 06:40:10 -07:00
if ( IS_ERR ( flowlabel ) )
2016-04-11 15:29:35 -07:00
return - EINVAL ;
}
ip6_datagram_flow_key_init ( & fl6 , sk ) ;
rcu_read_lock ( ) ;
opt = flowlabel ? flowlabel - > opt : rcu_dereference ( np - > opt ) ;
final_p = fl6_update_dst ( & fl6 , opt , & final ) ;
rcu_read_unlock ( ) ;
2019-12-04 15:35:52 +01:00
dst = ip6_dst_lookup_flow ( sock_net ( sk ) , sk , & fl6 , final_p ) ;
2016-04-11 15:29:35 -07:00
if ( IS_ERR ( dst ) ) {
err = PTR_ERR ( dst ) ;
goto out ;
}
ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update
There is a case in connected UDP socket such that
getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
sequence could be the following:
1. Create a connected UDP socket
2. Send some datagrams out
3. Receive a ICMPV6_PKT_TOOBIG
4. No new outgoing datagrams to trigger the sk_dst_check()
logic to update the sk->sk_dst_cache.
5. getsockopt(IPV6_MTU) returns the mtu from the invalid
sk->sk_dst_cache instead of the newly created RTF_CACHE clone.
This patch updates the sk->sk_dst_cache for a connected datagram sk
during pmtu-update code path.
Note that the sk->sk_v6_daddr is used to do the route lookup
instead of skb->data (i.e. iph). It is because a UDP socket can become
connected after sending out some datagrams in un-connected state. or
It can be connected multiple times to different destinations. Hence,
iph may not be related to where sk is currently connected to.
It is done under '!sock_owned_by_user(sk)' condition because
the user may make another ip6_datagram_connect() (i.e changing
the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update
code path.
For the sock_owned_by_user(sk) == true case, the next patch will
introduce a release_cb() which will update the sk->sk_dst_cache.
Test:
Server (Connected UDP Socket):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Route Details:
[root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
2fac::/64 dev eth0 proto kernel metric 256 pref medium
2fac:face::/64 via 2fac::face dev eth0 metric 1024 pref medium
A simple python code to create a connected UDP socket:
import socket
import errno
HOST = '2fac::1'
PORT = 8080
s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
s.bind((HOST, PORT))
s.connect(('2fac:face::face', 53))
print("connected")
while True:
try:
data = s.recv(1024)
except socket.error as se:
if se.errno == errno.EMSGSIZE:
pmtu = s.getsockopt(41, 24)
print("PMTU:%d" % pmtu)
break
s.close()
Python program output after getting a ICMPV6_PKT_TOOBIG:
[root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
connected
PMTU:1300
Cache routes after recieving TOOBIG:
[root@arch-fb-vm1 ~]# ip -6 r show table cache
2fac:face::face via 2fac::face dev eth0 metric 0
cache expires 463sec mtu 1300 pref medium
Client (Send the ICMPV6_PKT_TOOBIG):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scapy is used to generate the TOOBIG message. Here is the scapy script I have
used:
>>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
>>> sendp(p, iface='qemubr0')
Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Wei Wang <weiwan@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11 15:29:36 -07:00
if ( fix_sk_saddr ) {
if ( ipv6_addr_any ( & np - > saddr ) )
np - > saddr = fl6 . saddr ;
2016-04-11 15:29:35 -07:00
ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update
There is a case in connected UDP socket such that
getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
sequence could be the following:
1. Create a connected UDP socket
2. Send some datagrams out
3. Receive a ICMPV6_PKT_TOOBIG
4. No new outgoing datagrams to trigger the sk_dst_check()
logic to update the sk->sk_dst_cache.
5. getsockopt(IPV6_MTU) returns the mtu from the invalid
sk->sk_dst_cache instead of the newly created RTF_CACHE clone.
This patch updates the sk->sk_dst_cache for a connected datagram sk
during pmtu-update code path.
Note that the sk->sk_v6_daddr is used to do the route lookup
instead of skb->data (i.e. iph). It is because a UDP socket can become
connected after sending out some datagrams in un-connected state. or
It can be connected multiple times to different destinations. Hence,
iph may not be related to where sk is currently connected to.
It is done under '!sock_owned_by_user(sk)' condition because
the user may make another ip6_datagram_connect() (i.e changing
the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update
code path.
For the sock_owned_by_user(sk) == true case, the next patch will
introduce a release_cb() which will update the sk->sk_dst_cache.
Test:
Server (Connected UDP Socket):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Route Details:
[root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
2fac::/64 dev eth0 proto kernel metric 256 pref medium
2fac:face::/64 via 2fac::face dev eth0 metric 1024 pref medium
A simple python code to create a connected UDP socket:
import socket
import errno
HOST = '2fac::1'
PORT = 8080
s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
s.bind((HOST, PORT))
s.connect(('2fac:face::face', 53))
print("connected")
while True:
try:
data = s.recv(1024)
except socket.error as se:
if se.errno == errno.EMSGSIZE:
pmtu = s.getsockopt(41, 24)
print("PMTU:%d" % pmtu)
break
s.close()
Python program output after getting a ICMPV6_PKT_TOOBIG:
[root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
connected
PMTU:1300
Cache routes after recieving TOOBIG:
[root@arch-fb-vm1 ~]# ip -6 r show table cache
2fac:face::face via 2fac::face dev eth0 metric 0
cache expires 463sec mtu 1300 pref medium
Client (Send the ICMPV6_PKT_TOOBIG):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scapy is used to generate the TOOBIG message. Here is the scapy script I have
used:
>>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
>>> sendp(p, iface='qemubr0')
Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Wei Wang <weiwan@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11 15:29:36 -07:00
if ( ipv6_addr_any ( & sk - > sk_v6_rcv_saddr ) ) {
sk - > sk_v6_rcv_saddr = fl6 . saddr ;
inet - > inet_rcv_saddr = LOOPBACK4_IPV6 ;
if ( sk - > sk_prot - > rehash )
sk - > sk_prot - > rehash ( sk ) ;
}
2016-04-11 15:29:35 -07:00
}
2018-04-03 15:00:07 +03:00
ip6_sk_dst_store_flow ( sk , dst , & fl6 ) ;
2016-04-11 15:29:35 -07:00
out :
fl6_sock_release ( flowlabel ) ;
return err ;
}
2016-04-11 15:29:37 -07:00
void ip6_datagram_release_cb ( struct sock * sk )
{
struct dst_entry * dst ;
if ( ipv6_addr_v4mapped ( & sk - > sk_v6_daddr ) )
return ;
rcu_read_lock ( ) ;
dst = __sk_dst_get ( sk ) ;
if ( ! dst | | ! dst - > obsolete | |
dst - > ops - > check ( dst , inet6_sk ( sk ) - > dst_cookie ) ) {
rcu_read_unlock ( ) ;
return ;
}
rcu_read_unlock ( ) ;
ip6_datagram_dst_update ( sk , false ) ;
}
EXPORT_SYMBOL_GPL ( ip6_datagram_release_cb ) ;
2016-11-29 13:09:44 +01:00
int __ip6_datagram_connect ( struct sock * sk , struct sockaddr * uaddr ,
int addr_len )
2005-04-16 15:20:36 -07:00
{
struct sockaddr_in6 * usin = ( struct sockaddr_in6 * ) uaddr ;
2014-08-24 21:53:10 +01:00
struct inet_sock * inet = inet_sk ( sk ) ;
struct ipv6_pinfo * np = inet6_sk ( sk ) ;
2018-03-12 14:54:23 +01:00
struct in6_addr * daddr , old_daddr ;
__be32 fl6_flowlabel = 0 ;
__be32 old_fl6_flowlabel ;
2018-03-19 11:24:58 +01:00
__be16 old_dport ;
2005-04-16 15:20:36 -07:00
int addr_type ;
int err ;
if ( usin - > sin6_family = = AF_INET ) {
2022-04-20 10:58:50 +09:00
if ( ipv6_only_sock ( sk ) )
2005-04-16 15:20:36 -07:00
return - EAFNOSUPPORT ;
2015-07-14 08:10:22 +02:00
err = __ip4_datagram_connect ( sk , uaddr , addr_len ) ;
2005-04-16 15:20:36 -07:00
goto ipv4_connected ;
}
if ( addr_len < SIN6_LEN_RFC2133 )
2007-02-09 23:24:49 +09:00
return - EINVAL ;
2005-04-16 15:20:36 -07:00
2007-02-09 23:24:49 +09:00
if ( usin - > sin6_family ! = AF_INET6 )
return - EAFNOSUPPORT ;
2005-04-16 15:20:36 -07:00
2023-09-12 16:02:12 +00:00
if ( inet6_test_bit ( SNDFLOW , sk ) )
2016-04-11 15:29:34 -07:00
fl6_flowlabel = usin - > sin6_flowinfo & IPV6_FLOWINFO_MASK ;
2005-04-16 15:20:36 -07:00
2017-02-12 17:26:07 -05:00
if ( ipv6_addr_any ( & usin - > sin6_addr ) ) {
2005-04-16 15:20:36 -07:00
/*
* connect to self
*/
2017-02-12 17:26:07 -05:00
if ( ipv6_addr_v4mapped ( & sk - > sk_v6_rcv_saddr ) )
ipv6_addr_set_v4mapped ( htonl ( INADDR_LOOPBACK ) ,
& usin - > sin6_addr ) ;
else
usin - > sin6_addr = in6addr_loopback ;
2005-04-16 15:20:36 -07:00
}
2017-02-12 17:26:07 -05:00
addr_type = ipv6_addr_type ( & usin - > sin6_addr ) ;
2005-04-16 15:20:36 -07:00
daddr = & usin - > sin6_addr ;
2017-02-12 17:26:07 -05:00
if ( addr_type & IPV6_ADDR_MAPPED ) {
2005-04-16 15:20:36 -07:00
struct sockaddr_in sin ;
2022-04-20 10:58:50 +09:00
if ( ipv6_only_sock ( sk ) ) {
2005-04-16 15:20:36 -07:00
err = - ENETUNREACH ;
goto out ;
}
sin . sin_family = AF_INET ;
sin . sin_addr . s_addr = daddr - > s6_addr32 [ 3 ] ;
sin . sin_port = usin - > sin6_port ;
2015-07-14 08:10:22 +02:00
err = __ip4_datagram_connect ( sk ,
( struct sockaddr * ) & sin ,
sizeof ( sin ) ) ;
2005-04-16 15:20:36 -07:00
ipv4_connected :
if ( err )
goto out ;
2007-02-09 23:24:49 +09:00
ipv6: make lookups simpler and faster
TCP listener refactoring, part 4 :
To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common
Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.
Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).
inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.
inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.
We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 15:42:29 -07:00
ipv6_addr_set_v4mapped ( inet - > inet_daddr , & sk - > sk_v6_daddr ) ;
2005-04-16 15:20:36 -07:00
2011-08-05 03:56:30 -07:00
if ( ipv6_addr_any ( & np - > saddr ) | |
ipv6_mapped_addr_any ( & np - > saddr ) )
2009-10-15 06:30:45 +00:00
ipv6_addr_set_v4mapped ( inet - > inet_saddr , & np - > saddr ) ;
2009-10-07 13:58:25 -07:00
ipv6: make lookups simpler and faster
TCP listener refactoring, part 4 :
To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common
Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.
Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).
inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.
inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.
We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 15:42:29 -07:00
if ( ipv6_addr_any ( & sk - > sk_v6_rcv_saddr ) | |
ipv6_mapped_addr_any ( & sk - > sk_v6_rcv_saddr ) ) {
2009-10-15 06:30:45 +00:00
ipv6_addr_set_v4mapped ( inet - > inet_rcv_saddr ,
ipv6: make lookups simpler and faster
TCP listener refactoring, part 4 :
To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common
Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.
Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).
inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.
inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.
We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 15:42:29 -07:00
& sk - > sk_v6_rcv_saddr ) ;
udp: add rehash on connect()
commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
added a secondary hash on UDP, hashed on (local addr, local port).
Problem is that following sequence :
fd = socket(...)
connect(fd, &remote, ...)
not only selects remote end point (address and port), but also sets
local address, while UDP stack stored in secondary hash table the socket
while its local address was INADDR_ANY (or ipv6 equivalent)
Sequence is :
- autobind() : choose a random local port, insert socket in hash tables
[while local address is INADDR_ANY]
- connect() : set remote address and port, change local address to IP
given by a route lookup.
When an incoming UDP frame comes, if more than 10 sockets are found in
primary hash table, we switch to secondary table, and fail to find
socket because its local address changed.
One solution to this problem is to rehash datagram socket if needed.
We add a new rehash(struct socket *) method in "struct proto", and
implement this method for UDP v4 & v6, using a common helper.
This rehashing only takes care of secondary hash table, since primary
hash (based on local port only) is not changed.
Reported-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 05:08:44 +00:00
if ( sk - > sk_prot - > rehash )
sk - > sk_prot - > rehash ( sk ) ;
}
2005-04-16 15:20:36 -07:00
goto out ;
}
2013-03-08 02:07:19 +00:00
if ( __ipv6_addr_needs_scope_id ( addr_type ) ) {
2005-04-16 15:20:36 -07:00
if ( addr_len > = sizeof ( struct sockaddr_in6 ) & &
usin - > sin6_scope_id ) {
2018-01-04 14:03:54 -08:00
if ( ! sk_dev_equal_l3scope ( sk , usin - > sin6_scope_id ) ) {
2005-04-16 15:20:36 -07:00
err = - EINVAL ;
goto out ;
}
2022-05-13 11:55:41 -07:00
WRITE_ONCE ( sk - > sk_bound_dev_if , usin - > sin6_scope_id ) ;
2005-04-16 15:20:36 -07:00
}
2008-01-08 23:52:21 -08:00
if ( ! sk - > sk_bound_dev_if & & ( addr_type & IPV6_ADDR_MULTICAST ) )
2023-12-08 10:12:43 +00:00
WRITE_ONCE ( sk - > sk_bound_dev_if , READ_ONCE ( np - > mcast_oif ) ) ;
2008-01-08 23:52:21 -08:00
2005-04-16 15:20:36 -07:00
/* Connect to link-local address requires an interface */
if ( ! sk - > sk_bound_dev_if ) {
err = - EINVAL ;
goto out ;
}
}
2018-03-12 14:54:23 +01:00
/* save the current peer information before updating it */
old_daddr = sk - > sk_v6_daddr ;
old_fl6_flowlabel = np - > flow_label ;
old_dport = inet - > inet_dport ;
ipv6: make lookups simpler and faster
TCP listener refactoring, part 4 :
To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common
Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.
Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).
inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.
inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.
We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 15:42:29 -07:00
sk - > sk_v6_daddr = * daddr ;
2016-04-11 15:29:34 -07:00
np - > flow_label = fl6_flowlabel ;
2009-10-15 06:30:45 +00:00
inet - > inet_dport = usin - > sin6_port ;
2005-04-16 15:20:36 -07:00
/*
* Check for a route to destination an obtain the
* destination cache for it .
*/
ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update
There is a case in connected UDP socket such that
getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
sequence could be the following:
1. Create a connected UDP socket
2. Send some datagrams out
3. Receive a ICMPV6_PKT_TOOBIG
4. No new outgoing datagrams to trigger the sk_dst_check()
logic to update the sk->sk_dst_cache.
5. getsockopt(IPV6_MTU) returns the mtu from the invalid
sk->sk_dst_cache instead of the newly created RTF_CACHE clone.
This patch updates the sk->sk_dst_cache for a connected datagram sk
during pmtu-update code path.
Note that the sk->sk_v6_daddr is used to do the route lookup
instead of skb->data (i.e. iph). It is because a UDP socket can become
connected after sending out some datagrams in un-connected state. or
It can be connected multiple times to different destinations. Hence,
iph may not be related to where sk is currently connected to.
It is done under '!sock_owned_by_user(sk)' condition because
the user may make another ip6_datagram_connect() (i.e changing
the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update
code path.
For the sock_owned_by_user(sk) == true case, the next patch will
introduce a release_cb() which will update the sk->sk_dst_cache.
Test:
Server (Connected UDP Socket):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Route Details:
[root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
2fac::/64 dev eth0 proto kernel metric 256 pref medium
2fac:face::/64 via 2fac::face dev eth0 metric 1024 pref medium
A simple python code to create a connected UDP socket:
import socket
import errno
HOST = '2fac::1'
PORT = 8080
s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
s.bind((HOST, PORT))
s.connect(('2fac:face::face', 53))
print("connected")
while True:
try:
data = s.recv(1024)
except socket.error as se:
if se.errno == errno.EMSGSIZE:
pmtu = s.getsockopt(41, 24)
print("PMTU:%d" % pmtu)
break
s.close()
Python program output after getting a ICMPV6_PKT_TOOBIG:
[root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
connected
PMTU:1300
Cache routes after recieving TOOBIG:
[root@arch-fb-vm1 ~]# ip -6 r show table cache
2fac:face::face via 2fac::face dev eth0 metric 0
cache expires 463sec mtu 1300 pref medium
Client (Send the ICMPV6_PKT_TOOBIG):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scapy is used to generate the TOOBIG message. Here is the scapy script I have
used:
>>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
>>> sendp(p, iface='qemubr0')
Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Wei Wang <weiwan@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11 15:29:36 -07:00
err = ip6_datagram_dst_update ( sk , true ) ;
2017-06-23 15:25:37 -07:00
if ( err ) {
2018-03-12 14:54:23 +01:00
/* Restore the socket peer info, to keep it consistent with
* the old socket state
2017-06-23 15:25:37 -07:00
*/
2018-03-12 14:54:23 +01:00
sk - > sk_v6_daddr = old_daddr ;
np - > flow_label = old_fl6_flowlabel ;
inet - > inet_dport = old_dport ;
2005-04-16 15:20:36 -07:00
goto out ;
2017-06-23 15:25:37 -07:00
}
2005-04-16 15:20:36 -07:00
udp: Update reuse->has_conns under reuseport_lock.
When we call connect() for a UDP socket in a reuseport group, we have
to update sk->sk_reuseport_cb->has_conns to 1. Otherwise, the kernel
could select a unconnected socket wrongly for packets sent to the
connected socket.
However, the current way to set has_conns is illegal and possible to
trigger that problem. reuseport_has_conns() changes has_conns under
rcu_read_lock(), which upgrades the RCU reader to the updater. Then,
it must do the update under the updater's lock, reuseport_lock, but
it doesn't for now.
For this reason, there is a race below where we fail to set has_conns
resulting in the wrong socket selection. To avoid the race, let's split
the reader and updater with proper locking.
cpu1 cpu2
+----+ +----+
__ip[46]_datagram_connect() reuseport_grow()
. .
|- reuseport_has_conns(sk, true) |- more_reuse = __reuseport_alloc(more_socks_size)
| . |
| |- rcu_read_lock()
| |- reuse = rcu_dereference(sk->sk_reuseport_cb)
| |
| | | /* reuse->has_conns == 0 here */
| | |- more_reuse->has_conns = reuse->has_conns
| |- reuse->has_conns = 1 | /* more_reuse->has_conns SHOULD BE 1 HERE */
| | |
| | |- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb,
| | | more_reuse)
| `- rcu_read_unlock() `- kfree_rcu(reuse, rcu)
|
|- sk->sk_state = TCP_ESTABLISHED
Note the likely(reuse) in reuseport_has_conns_set() is always true,
but we put the test there for ease of review. [0]
For the record, usually, sk_reuseport_cb is changed under lock_sock().
The only exception is reuseport_grow() & TCP reqsk migration case.
1) shutdown() TCP listener, which is moved into the latter part of
reuse->socks[] to migrate reqsk.
2) New listen() overflows reuse->socks[] and call reuseport_grow().
3) reuse->max_socks overflows u16 with the new listener.
4) reuseport_grow() pops the old shutdown()ed listener from the array
and update its sk->sk_reuseport_cb as NULL without lock_sock().
shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(),
but, reuseport_has_conns_set() is called only for UDP under lock_sock(),
so likely(reuse) never be false in reuseport_has_conns_set().
[0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/
Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-14 11:26:25 -07:00
reuseport_has_conns_set ( sk ) ;
2005-04-16 15:20:36 -07:00
sk - > sk_state = TCP_ESTABLISHED ;
2015-07-28 16:02:05 -07:00
sk_set_txhash ( sk ) ;
2005-04-16 15:20:36 -07:00
out :
return err ;
}
2016-11-29 13:09:44 +01:00
EXPORT_SYMBOL_GPL ( __ip6_datagram_connect ) ;
2015-07-14 08:10:22 +02:00
int ip6_datagram_connect ( struct sock * sk , struct sockaddr * uaddr , int addr_len )
{
int res ;
lock_sock ( sk ) ;
res = __ip6_datagram_connect ( sk , uaddr , addr_len ) ;
release_sock ( sk ) ;
return res ;
}
2012-04-29 21:48:53 +00:00
EXPORT_SYMBOL_GPL ( ip6_datagram_connect ) ;
2005-04-16 15:20:36 -07:00
2014-01-20 05:16:39 +01:00
int ip6_datagram_connect_v6_only ( struct sock * sk , struct sockaddr * uaddr ,
int addr_len )
{
DECLARE_SOCKADDR ( struct sockaddr_in6 * , sin6 , uaddr ) ;
if ( sin6 - > sin6_family ! = AF_INET6 )
return - EAFNOSUPPORT ;
return ip6_datagram_connect ( sk , uaddr , addr_len ) ;
}
EXPORT_SYMBOL_GPL ( ip6_datagram_connect_v6_only ) ;
2020-07-24 09:03:10 -04:00
static void ipv6_icmp_error_rfc4884 ( const struct sk_buff * skb ,
struct sock_ee_data_rfc4884 * out )
{
switch ( icmp6_hdr ( skb ) - > icmp6_type ) {
case ICMPV6_TIME_EXCEED :
case ICMPV6_DEST_UNREACH :
ip_icmp_error_rfc4884 ( skb , out , sizeof ( struct icmp6hdr ) ,
icmp6_hdr ( skb ) - > icmp6_datagram_len * 8 ) ;
}
}
2007-02-09 23:24:49 +09:00
void ipv6_icmp_error ( struct sock * sk , struct sk_buff * skb , int err ,
2006-11-14 20:56:00 -08:00
__be16 port , u32 info , u8 * payload )
2005-04-16 15:20:36 -07:00
{
2007-03-13 14:03:22 -03:00
struct icmp6hdr * icmph = icmp6_hdr ( skb ) ;
2005-04-16 15:20:36 -07:00
struct sock_exterr_skb * serr ;
2023-09-12 16:02:08 +00:00
if ( ! inet6_test_bit ( RECVERR6 , sk ) )
2005-04-16 15:20:36 -07:00
return ;
skb = skb_clone ( skb , GFP_ATOMIC ) ;
if ( ! skb )
return ;
2010-05-03 15:44:27 +00:00
skb - > protocol = htons ( ETH_P_IPV6 ) ;
2005-04-16 15:20:36 -07:00
serr = SKB_EXT_ERR ( skb ) ;
serr - > ee . ee_errno = err ;
serr - > ee . ee_origin = SO_EE_ORIGIN_ICMP6 ;
2007-02-09 23:24:49 +09:00
serr - > ee . ee_type = icmph - > icmp6_type ;
2005-04-16 15:20:36 -07:00
serr - > ee . ee_code = icmph - > icmp6_code ;
serr - > ee . ee_pad = 0 ;
serr - > ee . ee_info = info ;
serr - > ee . ee_data = 0 ;
2007-04-10 20:50:43 -07:00
serr - > addr_offset = ( u8 * ) & ( ( ( struct ipv6hdr * ) ( icmph + 1 ) ) - > daddr ) -
skb_network_header ( skb ) ;
2005-04-16 15:20:36 -07:00
serr - > port = port ;
__skb_pull ( skb , payload - skb - > data ) ;
2020-07-24 09:03:10 -04:00
2023-09-12 16:02:04 +00:00
if ( inet6_test_bit ( RECVERR6_RFC4884 , sk ) )
2020-07-24 09:03:10 -04:00
ipv6_icmp_error_rfc4884 ( skb , & serr - > ee . ee_rfc4884 ) ;
2007-03-13 17:10:43 -03:00
skb_reset_transport_header ( skb ) ;
2005-04-16 15:20:36 -07:00
if ( sock_queue_err_skb ( sk , skb ) )
kfree_skb ( skb ) ;
}
2022-10-12 08:49:29 +01:00
EXPORT_SYMBOL_GPL ( ipv6_icmp_error ) ;
2005-04-16 15:20:36 -07:00
2011-03-12 16:22:43 -05:00
void ipv6_local_error ( struct sock * sk , int err , struct flowi6 * fl6 , u32 info )
2005-04-16 15:20:36 -07:00
{
struct sock_exterr_skb * serr ;
struct ipv6hdr * iph ;
struct sk_buff * skb ;
2023-09-12 16:02:08 +00:00
if ( ! inet6_test_bit ( RECVERR6 , sk ) )
2005-04-16 15:20:36 -07:00
return ;
skb = alloc_skb ( sizeof ( struct ipv6hdr ) , GFP_ATOMIC ) ;
if ( ! skb )
return ;
2010-05-03 15:44:27 +00:00
skb - > protocol = htons ( ETH_P_IPV6 ) ;
2007-03-10 19:57:15 -03:00
skb_put ( skb , sizeof ( struct ipv6hdr ) ) ;
skb_reset_network_header ( skb ) ;
2007-04-25 17:54:47 -07:00
iph = ipv6_hdr ( skb ) ;
2011-11-21 03:39:03 +00:00
iph - > daddr = fl6 - > daddr ;
2019-01-08 04:06:14 -08:00
ip6_flow_hdr ( iph , 0 , 0 ) ;
2005-04-16 15:20:36 -07:00
serr = SKB_EXT_ERR ( skb ) ;
serr - > ee . ee_errno = err ;
serr - > ee . ee_origin = SO_EE_ORIGIN_LOCAL ;
2007-02-09 23:24:49 +09:00
serr - > ee . ee_type = 0 ;
2005-04-16 15:20:36 -07:00
serr - > ee . ee_code = 0 ;
serr - > ee . ee_pad = 0 ;
serr - > ee . ee_info = info ;
serr - > ee . ee_data = 0 ;
2007-04-10 20:50:43 -07:00
serr - > addr_offset = ( u8 * ) & iph - > daddr - skb_network_header ( skb ) ;
2011-03-12 16:36:19 -05:00
serr - > port = fl6 - > fl6_dport ;
2005-04-16 15:20:36 -07:00
2007-04-19 20:29:13 -07:00
__skb_pull ( skb , skb_tail_pointer ( skb ) - skb - > data ) ;
2007-03-13 17:10:43 -03:00
skb_reset_transport_header ( skb ) ;
2005-04-16 15:20:36 -07:00
if ( sock_queue_err_skb ( sk , skb ) )
kfree_skb ( skb ) ;
}
2011-03-12 16:22:43 -05:00
void ipv6_local_rxpmtu ( struct sock * sk , struct flowi6 * fl6 , u32 mtu )
2010-04-23 11:26:09 +00:00
{
struct ipv6_pinfo * np = inet6_sk ( sk ) ;
struct ipv6hdr * iph ;
struct sk_buff * skb ;
struct ip6_mtuinfo * mtu_info ;
if ( ! np - > rxopt . bits . rxpmtu )
return ;
skb = alloc_skb ( sizeof ( struct ipv6hdr ) , GFP_ATOMIC ) ;
if ( ! skb )
return ;
skb_put ( skb , sizeof ( struct ipv6hdr ) ) ;
skb_reset_network_header ( skb ) ;
iph = ipv6_hdr ( skb ) ;
2011-11-21 03:39:03 +00:00
iph - > daddr = fl6 - > daddr ;
2010-04-23 11:26:09 +00:00
mtu_info = IP6CBMTU ( skb ) ;
mtu_info - > ip6m_mtu = mtu ;
mtu_info - > ip6m_addr . sin6_family = AF_INET6 ;
mtu_info - > ip6m_addr . sin6_port = 0 ;
mtu_info - > ip6m_addr . sin6_flowinfo = 0 ;
2011-03-12 16:22:43 -05:00
mtu_info - > ip6m_addr . sin6_scope_id = fl6 - > flowi6_oif ;
2011-11-21 03:39:03 +00:00
mtu_info - > ip6m_addr . sin6_addr = ipv6_hdr ( skb ) - > daddr ;
2010-04-23 11:26:09 +00:00
__skb_pull ( skb , skb_tail_pointer ( skb ) - skb - > data ) ;
skb_reset_transport_header ( skb ) ;
skb = xchg ( & np - > rxpmtu , skb ) ;
kfree_skb ( skb ) ;
}
2015-06-23 08:34:39 +03:00
/* For some errors we have valid addr_offset even with zero payload and
* zero port . Also , addr_offset should be supported if port is set .
*/
static inline bool ipv6_datagram_support_addr ( struct sock_exterr_skb * serr )
{
return serr - > ee . ee_origin = = SO_EE_ORIGIN_ICMP6 | |
serr - > ee . ee_origin = = SO_EE_ORIGIN_ICMP | |
serr - > ee . ee_origin = = SO_EE_ORIGIN_LOCAL | | serr - > port ;
}
2015-03-07 20:33:22 -05:00
/* IPv6 supports cmsg on all origins aside from SO_EE_ORIGIN_LOCAL.
*
* At one point , excluding local errors was a quick test to identify icmp / icmp6
* errors . This is no longer true , but the test remained , so the v6 stack ,
* unlike v4 , also honors cmsg requests on all wifi and timestamp errors .
*/
static bool ip6_datagram_support_cmsg ( struct sk_buff * skb ,
struct sock_exterr_skb * serr )
net-timestamp: allow reading recv cmsg on errqueue with origin tstamp
Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.
on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.
on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.
In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.
The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:
ipv4: if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6: if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {
At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Changes
v1 -> v2
large rewrite
- integrate with existing pktinfo cmsg generation code
- on ipv4: only send with new flag, to maintain legacy behavior
- on ipv6: send at most a single pktinfo cmsg
- on ipv6: initialize fields if not yet initialized
The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches
1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
(http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
cmsg from users without CAP_NET_RAW.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-30 22:22:34 -05:00
{
2015-03-07 20:33:22 -05:00
if ( serr - > ee . ee_origin = = SO_EE_ORIGIN_ICMP | |
serr - > ee . ee_origin = = SO_EE_ORIGIN_ICMP6 )
return true ;
if ( serr - > ee . ee_origin = = SO_EE_ORIGIN_LOCAL )
return false ;
2017-04-12 19:24:35 -04:00
if ( ! IP6CB ( skb ) - > iif )
2015-03-07 20:33:22 -05:00
return false ;
net-timestamp: allow reading recv cmsg on errqueue with origin tstamp
Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.
on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.
on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.
In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.
The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:
ipv4: if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6: if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {
At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Changes
v1 -> v2
large rewrite
- integrate with existing pktinfo cmsg generation code
- on ipv4: only send with new flag, to maintain legacy behavior
- on ipv6: send at most a single pktinfo cmsg
- on ipv6: initialize fields if not yet initialized
The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches
1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
(http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
cmsg from users without CAP_NET_RAW.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-30 22:22:34 -05:00
2015-03-07 20:33:22 -05:00
return true ;
net-timestamp: allow reading recv cmsg on errqueue with origin tstamp
Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.
on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.
on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.
In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.
The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:
ipv4: if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6: if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {
At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Changes
v1 -> v2
large rewrite
- integrate with existing pktinfo cmsg generation code
- on ipv4: only send with new flag, to maintain legacy behavior
- on ipv6: send at most a single pktinfo cmsg
- on ipv6: initialize fields if not yet initialized
The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches
1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
(http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
cmsg from users without CAP_NET_RAW.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-30 22:22:34 -05:00
}
2007-02-09 23:24:49 +09:00
/*
2005-04-16 15:20:36 -07:00
* Handle MSG_ERRQUEUE
*/
2013-11-23 00:46:12 +01:00
int ipv6_recv_error ( struct sock * sk , struct msghdr * msg , int len , int * addr_len )
2005-04-16 15:20:36 -07:00
{
struct ipv6_pinfo * np = inet6_sk ( sk ) ;
struct sock_exterr_skb * serr ;
2014-08-31 21:30:27 -04:00
struct sk_buff * skb ;
2014-01-17 22:53:15 +01:00
DECLARE_SOCKADDR ( struct sockaddr_in6 * , sin , msg - > msg_name ) ;
2005-04-16 15:20:36 -07:00
struct {
struct sock_extended_err ee ;
struct sockaddr_in6 offender ;
} errhdr ;
int err ;
int copied ;
err = - EAGAIN ;
2014-08-31 21:30:27 -04:00
skb = sock_dequeue_err_skb ( sk ) ;
2015-03-29 14:00:04 +01:00
if ( ! skb )
2005-04-16 15:20:36 -07:00
goto out ;
copied = skb - > len ;
if ( copied > len ) {
msg - > msg_flags | = MSG_TRUNC ;
copied = len ;
}
2014-11-05 16:46:40 -05:00
err = skb_copy_datagram_msg ( skb , 0 , msg , copied ) ;
2016-04-21 22:27:32 -07:00
if ( unlikely ( err ) ) {
kfree_skb ( skb ) ;
return err ;
}
2005-04-16 15:20:36 -07:00
sock_recv_timestamp ( msg , sk , skb ) ;
serr = SKB_EXT_ERR ( skb ) ;
2015-06-23 08:34:39 +03:00
if ( sin & & ipv6_datagram_support_addr ( serr ) ) {
2007-04-10 20:50:43 -07:00
const unsigned char * nh = skb_network_header ( skb ) ;
2005-04-16 15:20:36 -07:00
sin - > sin6_family = AF_INET6 ;
sin - > sin6_flowinfo = 0 ;
2007-02-09 23:24:49 +09:00
sin - > sin6_port = serr - > port ;
2010-05-03 15:44:27 +00:00
if ( skb - > protocol = = htons ( ETH_P_IPV6 ) ) {
2013-01-08 06:44:23 +00:00
const struct ipv6hdr * ip6h = container_of ( ( struct in6_addr * ) ( nh + serr - > addr_offset ) ,
struct ipv6hdr , daddr ) ;
sin - > sin6_addr = ip6h - > daddr ;
2023-09-12 16:02:12 +00:00
if ( inet6_test_bit ( SNDFLOW , sk ) )
2013-01-13 05:01:51 +00:00
sin - > sin6_flowinfo = ip6_flowinfo ( ip6h ) ;
2013-03-08 02:07:19 +00:00
sin - > sin6_scope_id =
ipv6_iface_scope_id ( & sin - > sin6_addr ,
IP6CB ( skb ) - > iif ) ;
2005-04-16 15:20:36 -07:00
} else {
2009-10-07 13:58:25 -07:00
ipv6_addr_set_v4mapped ( * ( __be32 * ) ( nh + serr - > addr_offset ) ,
& sin - > sin6_addr ) ;
2013-03-08 02:07:19 +00:00
sin - > sin6_scope_id = 0 ;
2005-04-16 15:20:36 -07:00
}
2013-11-23 00:46:12 +01:00
* addr_len = sizeof ( * sin ) ;
2005-04-16 15:20:36 -07:00
}
memcpy ( & errhdr . ee , & serr - > ee , sizeof ( struct sock_extended_err ) ) ;
sin = & errhdr . offender ;
2015-01-15 13:18:40 -05:00
memset ( sin , 0 , sizeof ( * sin ) ) ;
2015-03-07 20:33:22 -05:00
if ( ip6_datagram_support_cmsg ( skb , serr ) ) {
2005-04-16 15:20:36 -07:00
sin - > sin6_family = AF_INET6 ;
2015-03-07 20:33:22 -05:00
if ( np - > rxopt . all )
2014-01-20 03:43:08 +01:00
ip6_datagram_recv_common_ctl ( sk , msg , skb ) ;
2010-05-03 15:44:27 +00:00
if ( skb - > protocol = = htons ( ETH_P_IPV6 ) ) {
2011-11-21 03:39:03 +00:00
sin - > sin6_addr = ipv6_hdr ( skb ) - > saddr ;
2005-04-16 15:20:36 -07:00
if ( np - > rxopt . all )
2014-01-20 03:43:08 +01:00
ip6_datagram_recv_specific_ctl ( sk , msg , skb ) ;
2013-03-08 02:07:19 +00:00
sin - > sin6_scope_id =
ipv6_iface_scope_id ( & sin - > sin6_addr ,
IP6CB ( skb ) - > iif ) ;
2005-04-16 15:20:36 -07:00
} else {
2009-10-07 13:58:25 -07:00
ipv6_addr_set_v4mapped ( ip_hdr ( skb ) - > saddr ,
& sin - > sin6_addr ) ;
2023-08-16 08:15:33 +00:00
if ( inet_cmsg_flags ( inet_sk ( sk ) ) )
2005-04-16 15:20:36 -07:00
ip_cmsg_recv ( msg , skb ) ;
}
}
put_cmsg ( msg , SOL_IPV6 , IPV6_RECVERR , sizeof ( errhdr ) , & errhdr ) ;
/* Now we could try to dump offended packet options */
msg - > msg_flags | = MSG_ERRQUEUE ;
err = copied ;
2016-04-21 22:27:32 -07:00
consume_skb ( skb ) ;
2005-04-16 15:20:36 -07:00
out :
return err ;
}
2012-04-29 21:48:53 +00:00
EXPORT_SYMBOL_GPL ( ipv6_recv_error ) ;
2005-04-16 15:20:36 -07:00
2010-04-23 11:26:09 +00:00
/*
* Handle IPV6_RECVPATHMTU
*/
2013-11-23 00:46:12 +01:00
int ipv6_recv_rxpmtu ( struct sock * sk , struct msghdr * msg , int len ,
int * addr_len )
2010-04-23 11:26:09 +00:00
{
struct ipv6_pinfo * np = inet6_sk ( sk ) ;
struct sk_buff * skb ;
struct ip6_mtuinfo mtu_info ;
2014-01-17 22:53:15 +01:00
DECLARE_SOCKADDR ( struct sockaddr_in6 * , sin , msg - > msg_name ) ;
2010-04-23 11:26:09 +00:00
int err ;
int copied ;
err = - EAGAIN ;
skb = xchg ( & np - > rxpmtu , NULL ) ;
2015-03-29 14:00:04 +01:00
if ( ! skb )
2010-04-23 11:26:09 +00:00
goto out ;
copied = skb - > len ;
if ( copied > len ) {
msg - > msg_flags | = MSG_TRUNC ;
copied = len ;
}
2014-11-05 16:46:40 -05:00
err = skb_copy_datagram_msg ( skb , 0 , msg , copied ) ;
2010-04-23 11:26:09 +00:00
if ( err )
goto out_free_skb ;
sock_recv_timestamp ( msg , sk , skb ) ;
memcpy ( & mtu_info , IP6CBMTU ( skb ) , sizeof ( mtu_info ) ) ;
if ( sin ) {
sin - > sin6_family = AF_INET6 ;
sin - > sin6_flowinfo = 0 ;
sin - > sin6_port = 0 ;
sin - > sin6_scope_id = mtu_info . ip6m_addr . sin6_scope_id ;
2011-11-21 03:39:03 +00:00
sin - > sin6_addr = mtu_info . ip6m_addr . sin6_addr ;
2013-11-23 00:46:12 +01:00
* addr_len = sizeof ( * sin ) ;
2010-04-23 11:26:09 +00:00
}
put_cmsg ( msg , SOL_IPV6 , IPV6_PATHMTU , sizeof ( mtu_info ) , & mtu_info ) ;
err = copied ;
out_free_skb :
kfree_skb ( skb ) ;
out :
return err ;
}
2005-04-16 15:20:36 -07:00
2014-01-20 03:43:08 +01:00
void ip6_datagram_recv_common_ctl ( struct sock * sk , struct msghdr * msg ,
struct sk_buff * skb )
2005-04-16 15:20:36 -07:00
{
struct ipv6_pinfo * np = inet6_sk ( sk ) ;
2014-01-20 03:43:08 +01:00
bool is_ipv6 = skb - > protocol = = htons ( ETH_P_IPV6 ) ;
2005-04-16 15:20:36 -07:00
if ( np - > rxopt . bits . rxinfo ) {
struct in6_pktinfo src_info ;
2014-01-20 03:43:08 +01:00
if ( is_ipv6 ) {
src_info . ipi6_ifindex = IP6CB ( skb ) - > iif ;
src_info . ipi6_addr = ipv6_hdr ( skb ) - > daddr ;
} else {
src_info . ipi6_ifindex =
PKTINFO_SKB_CB ( skb ) - > ipi_ifindex ;
ipv6_addr_set_v4mapped ( ip_hdr ( skb ) - > daddr ,
& src_info . ipi6_addr ) ;
}
net-timestamp: allow reading recv cmsg on errqueue with origin tstamp
Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.
on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.
on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.
In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.
The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:
ipv4: if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6: if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {
At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Changes
v1 -> v2
large rewrite
- integrate with existing pktinfo cmsg generation code
- on ipv4: only send with new flag, to maintain legacy behavior
- on ipv6: send at most a single pktinfo cmsg
- on ipv6: initialize fields if not yet initialized
The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches
1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
(http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
cmsg from users without CAP_NET_RAW.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-30 22:22:34 -05:00
if ( src_info . ipi6_ifindex > = 0 )
put_cmsg ( msg , SOL_IPV6 , IPV6_PKTINFO ,
sizeof ( src_info ) , & src_info ) ;
2005-04-16 15:20:36 -07:00
}
2014-01-20 03:43:08 +01:00
}
void ip6_datagram_recv_specific_ctl ( struct sock * sk , struct msghdr * msg ,
struct sk_buff * skb )
{
struct ipv6_pinfo * np = inet6_sk ( sk ) ;
struct inet6_skb_parm * opt = IP6CB ( skb ) ;
unsigned char * nh = skb_network_header ( skb ) ;
2005-04-16 15:20:36 -07:00
if ( np - > rxopt . bits . rxhlim ) {
2007-04-25 17:54:47 -07:00
int hlim = ipv6_hdr ( skb ) - > hop_limit ;
2005-04-16 15:20:36 -07:00
put_cmsg ( msg , SOL_IPV6 , IPV6_HOPLIMIT , sizeof ( hlim ) , & hlim ) ;
}
2005-09-08 10:19:03 +09:00
if ( np - > rxopt . bits . rxtclass ) {
2013-01-13 05:02:01 +00:00
int tclass = ipv6_get_dsfield ( ipv6_hdr ( skb ) ) ;
2005-09-08 10:19:03 +09:00
put_cmsg ( msg , SOL_IPV6 , IPV6_TCLASS , sizeof ( tclass ) , & tclass ) ;
}
2013-01-13 05:01:51 +00:00
if ( np - > rxopt . bits . rxflow ) {
__be32 flowinfo = ip6_flowinfo ( ( struct ipv6hdr * ) nh ) ;
if ( flowinfo )
put_cmsg ( msg , SOL_IPV6 , IPV6_FLOWINFO , sizeof ( flowinfo ) , & flowinfo ) ;
2005-04-16 15:20:36 -07:00
}
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
/* HbH is allowed only once */
2015-07-08 23:32:12 +02:00
if ( np - > rxopt . bits . hopopts & & ( opt - > flags & IP6SKB_HOPBYHOP ) ) {
u8 * ptr = nh + sizeof ( struct ipv6hdr ) ;
2005-04-16 15:20:36 -07:00
put_cmsg ( msg , SOL_IPV6 , IPV6_HOPOPTS , ( ptr [ 1 ] + 1 ) < < 3 , ptr ) ;
}
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
if ( opt - > lastopt & &
( np - > rxopt . bits . dstopts | | np - > rxopt . bits . srcrt ) ) {
/*
* Silly enough , but we need to reparse in order to
* report extension headers ( except for HbH )
* in order .
*
2007-02-09 23:24:49 +09:00
* Also note that IPV6_RECVRTHDRDSTOPTS is NOT
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
* ( and WILL NOT be ) defined because
* IPV6_RECVDSTOPTS is more generic . - - yoshfuji
*/
unsigned int off = sizeof ( struct ipv6hdr ) ;
2007-04-25 17:54:47 -07:00
u8 nexthdr = ipv6_hdr ( skb ) - > nexthdr ;
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
while ( off < = opt - > lastopt ) {
2012-04-15 05:58:06 +00:00
unsigned int len ;
2007-04-10 20:50:43 -07:00
u8 * ptr = nh + off ;
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
2012-04-01 07:49:03 +00:00
switch ( nexthdr ) {
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
case IPPROTO_DSTOPTS :
nexthdr = ptr [ 0 ] ;
len = ( ptr [ 1 ] + 1 ) < < 3 ;
if ( np - > rxopt . bits . dstopts )
put_cmsg ( msg , SOL_IPV6 , IPV6_DSTOPTS , len , ptr ) ;
break ;
case IPPROTO_ROUTING :
nexthdr = ptr [ 0 ] ;
len = ( ptr [ 1 ] + 1 ) < < 3 ;
if ( np - > rxopt . bits . srcrt )
put_cmsg ( msg , SOL_IPV6 , IPV6_RTHDR , len , ptr ) ;
break ;
case IPPROTO_AH :
nexthdr = ptr [ 0 ] ;
2005-11-20 12:21:59 +09:00
len = ( ptr [ 1 ] + 2 ) < < 2 ;
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
break ;
default :
nexthdr = ptr [ 0 ] ;
len = ( ptr [ 1 ] + 1 ) < < 3 ;
break ;
}
off + = len ;
}
}
/* socket options in old style */
if ( np - > rxopt . bits . rxoinfo ) {
struct in6_pktinfo src_info ;
src_info . ipi6_ifindex = opt - > iif ;
2011-11-21 03:39:03 +00:00
src_info . ipi6_addr = ipv6_hdr ( skb ) - > daddr ;
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
put_cmsg ( msg , SOL_IPV6 , IPV6_2292PKTINFO , sizeof ( src_info ) , & src_info ) ;
}
if ( np - > rxopt . bits . rxohlim ) {
2007-04-25 17:54:47 -07:00
int hlim = ipv6_hdr ( skb ) - > hop_limit ;
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
put_cmsg ( msg , SOL_IPV6 , IPV6_2292HOPLIMIT , sizeof ( hlim ) , & hlim ) ;
}
2015-07-08 23:32:12 +02:00
if ( np - > rxopt . bits . ohopopts & & ( opt - > flags & IP6SKB_HOPBYHOP ) ) {
u8 * ptr = nh + sizeof ( struct ipv6hdr ) ;
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
put_cmsg ( msg , SOL_IPV6 , IPV6_2292HOPOPTS , ( ptr [ 1 ] + 1 ) < < 3 , ptr ) ;
}
if ( np - > rxopt . bits . odstopts & & opt - > dst0 ) {
2007-04-10 20:50:43 -07:00
u8 * ptr = nh + opt - > dst0 ;
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
put_cmsg ( msg , SOL_IPV6 , IPV6_2292DSTOPTS , ( ptr [ 1 ] + 1 ) < < 3 , ptr ) ;
2005-04-16 15:20:36 -07:00
}
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
if ( np - > rxopt . bits . osrcrt & & opt - > srcrt ) {
2007-04-10 20:50:43 -07:00
struct ipv6_rt_hdr * rthdr = ( struct ipv6_rt_hdr * ) ( nh + opt - > srcrt ) ;
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
put_cmsg ( msg , SOL_IPV6 , IPV6_2292RTHDR , ( rthdr - > hdrlen + 1 ) < < 3 , rthdr ) ;
2005-04-16 15:20:36 -07:00
}
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
if ( np - > rxopt . bits . odstopts & & opt - > dst1 ) {
2007-04-10 20:50:43 -07:00
u8 * ptr = nh + opt - > dst1 ;
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
put_cmsg ( msg , SOL_IPV6 , IPV6_2292DSTOPTS , ( ptr [ 1 ] + 1 ) < < 3 , ptr ) ;
2005-04-16 15:20:36 -07:00
}
2010-10-21 16:08:28 +02:00
if ( np - > rxopt . bits . rxorigdstaddr ) {
struct sockaddr_in6 sin6 ;
2019-01-07 16:47:33 -05:00
__be16 _ports [ 2 ] , * ports ;
2010-10-21 16:08:28 +02:00
2019-01-07 16:47:33 -05:00
ports = skb_header_pointer ( skb , skb_transport_offset ( skb ) ,
sizeof ( _ports ) , & _ports ) ;
if ( ports ) {
2010-10-21 16:08:28 +02:00
/* All current transport protocols have the port numbers in the
* first four bytes of the transport header and this function is
* written with this assumption in mind .
*/
sin6 . sin6_family = AF_INET6 ;
2011-11-21 03:39:03 +00:00
sin6 . sin6_addr = ipv6_hdr ( skb ) - > daddr ;
2010-10-21 16:08:28 +02:00
sin6 . sin6_port = ports [ 1 ] ;
sin6 . sin6_flowinfo = 0 ;
2013-03-08 02:07:26 +00:00
sin6 . sin6_scope_id =
ipv6_iface_scope_id ( & ipv6_hdr ( skb ) - > daddr ,
opt - > iif ) ;
2010-10-21 16:08:28 +02:00
put_cmsg ( msg , SOL_IPV6 , IPV6_ORIGDSTADDR , sizeof ( sin6 ) , & sin6 ) ;
}
}
2016-11-02 11:02:17 -04:00
if ( np - > rxopt . bits . recvfragsize & & opt - > frag_max_size ) {
int val = opt - > frag_max_size ;
put_cmsg ( msg , SOL_IPV6 , IPV6_RECVFRAGSIZE , sizeof ( val ) , & val ) ;
}
2014-01-20 03:43:08 +01:00
}
void ip6_datagram_recv_ctl ( struct sock * sk , struct msghdr * msg ,
struct sk_buff * skb )
{
ip6_datagram_recv_common_ctl ( sk , msg , skb ) ;
ip6_datagram_recv_specific_ctl ( sk , msg , skb ) ;
2005-04-16 15:20:36 -07:00
}
2013-01-31 01:02:25 +00:00
EXPORT_SYMBOL_GPL ( ip6_datagram_recv_ctl ) ;
2005-04-16 15:20:36 -07:00
2013-01-31 01:02:24 +00:00
int ip6_datagram_send_ctl ( struct net * net , struct sock * sk ,
struct msghdr * msg , struct flowi6 * fl6 ,
2018-07-06 10:12:57 -04:00
struct ipcm6_cookie * ipc6 )
2005-04-16 15:20:36 -07:00
{
struct in6_pktinfo * src_info ;
struct cmsghdr * cmsg ;
struct ipv6_rt_hdr * rthdr ;
struct ipv6_opt_hdr * hdr ;
2016-05-02 21:40:07 -07:00
struct ipv6_txoptions * opt = ipc6 - > opt ;
2005-04-16 15:20:36 -07:00
int len ;
int err = 0 ;
2014-12-11 11:22:04 +08:00
for_each_cmsghdr ( cmsg , msg ) {
2005-04-16 15:20:36 -07:00
int addr_type ;
if ( ! CMSG_OK ( msg , cmsg ) ) {
err = - EINVAL ;
goto exit_f ;
}
2016-04-02 23:08:11 -04:00
if ( cmsg - > cmsg_level = = SOL_SOCKET ) {
2022-10-20 06:54:41 +00:00
err = __sock_cmsg_send ( sk , cmsg , & ipc6 - > sockc ) ;
2016-05-13 06:14:37 -07:00
if ( err )
return err ;
2016-04-02 23:08:11 -04:00
continue ;
}
2005-04-16 15:20:36 -07:00
if ( cmsg - > cmsg_level ! = SOL_IPV6 )
continue ;
switch ( cmsg - > cmsg_type ) {
2007-02-09 23:24:49 +09:00
case IPV6_PKTINFO :
case IPV6_2292PKTINFO :
2008-06-04 13:01:37 +09:00
{
struct net_device * dev = NULL ;
net: ensure unbound datagram socket to be chosen when not in a VRF
Ensure an unbound datagram skt is chosen when not in a VRF. The check
for a device match in compute_score() for UDP must be performed when
there is no device match. For this, a failure is returned when there is
no device match. This ensures that bound sockets are never selected,
even if there is no unbound socket.
Allow IPv6 packets to be sent over a datagram skt bound to a VRF. These
packets are currently blocked, as flowi6_oif was set to that of the
master vrf device, and the ipi6_ifindex is that of the slave device.
Allow these packets to be sent by checking the device with ipi6_ifindex
has the same L3 scope as that of the bound device of the skt, which is
the master vrf device. Note that this check always succeeds if the skt
is unbound.
Even though the right datagram skt is now selected by compute_score(),
a different skt is being returned that is bound to the wrong vrf. The
difference between these and stream sockets is the handling of the skt
option for SO_REUSEPORT. While the handling when adding a skt for reuse
correctly checks that the bound device of the skt is a match, the skts
in the hashslot are already incorrect. So for the same hash, a skt for
the wrong vrf may be selected for the required port. The root cause is
that the skt is immediately placed into a slot when it is created,
but when the skt is then bound using SO_BINDTODEVICE, it remains in the
same slot. The solution is to move the skt to the correct slot by
forcing a rehash.
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 15:36:04 +00:00
int src_idx ;
2008-06-04 13:01:37 +09:00
2007-02-09 23:24:49 +09:00
if ( cmsg - > cmsg_len < CMSG_LEN ( sizeof ( struct in6_pktinfo ) ) ) {
2005-04-16 15:20:36 -07:00
err = - EINVAL ;
goto exit_f ;
}
src_info = ( struct in6_pktinfo * ) CMSG_DATA ( cmsg ) ;
net: ensure unbound datagram socket to be chosen when not in a VRF
Ensure an unbound datagram skt is chosen when not in a VRF. The check
for a device match in compute_score() for UDP must be performed when
there is no device match. For this, a failure is returned when there is
no device match. This ensures that bound sockets are never selected,
even if there is no unbound socket.
Allow IPv6 packets to be sent over a datagram skt bound to a VRF. These
packets are currently blocked, as flowi6_oif was set to that of the
master vrf device, and the ipi6_ifindex is that of the slave device.
Allow these packets to be sent by checking the device with ipi6_ifindex
has the same L3 scope as that of the bound device of the skt, which is
the master vrf device. Note that this check always succeeds if the skt
is unbound.
Even though the right datagram skt is now selected by compute_score(),
a different skt is being returned that is bound to the wrong vrf. The
difference between these and stream sockets is the handling of the skt
option for SO_REUSEPORT. While the handling when adding a skt for reuse
correctly checks that the bound device of the skt is a match, the skts
in the hashslot are already incorrect. So for the same hash, a skt for
the wrong vrf may be selected for the required port. The root cause is
that the skt is immediately placed into a slot when it is created,
but when the skt is then bound using SO_BINDTODEVICE, it remains in the
same slot. The solution is to move the skt to the correct slot by
forcing a rehash.
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 15:36:04 +00:00
src_idx = src_info - > ipi6_ifindex ;
2007-02-09 23:24:49 +09:00
net: ensure unbound datagram socket to be chosen when not in a VRF
Ensure an unbound datagram skt is chosen when not in a VRF. The check
for a device match in compute_score() for UDP must be performed when
there is no device match. For this, a failure is returned when there is
no device match. This ensures that bound sockets are never selected,
even if there is no unbound socket.
Allow IPv6 packets to be sent over a datagram skt bound to a VRF. These
packets are currently blocked, as flowi6_oif was set to that of the
master vrf device, and the ipi6_ifindex is that of the slave device.
Allow these packets to be sent by checking the device with ipi6_ifindex
has the same L3 scope as that of the bound device of the skt, which is
the master vrf device. Note that this check always succeeds if the skt
is unbound.
Even though the right datagram skt is now selected by compute_score(),
a different skt is being returned that is bound to the wrong vrf. The
difference between these and stream sockets is the handling of the skt
option for SO_REUSEPORT. While the handling when adding a skt for reuse
correctly checks that the bound device of the skt is a match, the skts
in the hashslot are already incorrect. So for the same hash, a skt for
the wrong vrf may be selected for the required port. The root cause is
that the skt is immediately placed into a slot when it is created,
but when the skt is then bound using SO_BINDTODEVICE, it remains in the
same slot. The solution is to move the skt to the correct slot by
forcing a rehash.
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 15:36:04 +00:00
if ( src_idx ) {
2011-03-12 16:22:43 -05:00
if ( fl6 - > flowi6_oif & &
net: ensure unbound datagram socket to be chosen when not in a VRF
Ensure an unbound datagram skt is chosen when not in a VRF. The check
for a device match in compute_score() for UDP must be performed when
there is no device match. For this, a failure is returned when there is
no device match. This ensures that bound sockets are never selected,
even if there is no unbound socket.
Allow IPv6 packets to be sent over a datagram skt bound to a VRF. These
packets are currently blocked, as flowi6_oif was set to that of the
master vrf device, and the ipi6_ifindex is that of the slave device.
Allow these packets to be sent by checking the device with ipi6_ifindex
has the same L3 scope as that of the bound device of the skt, which is
the master vrf device. Note that this check always succeeds if the skt
is unbound.
Even though the right datagram skt is now selected by compute_score(),
a different skt is being returned that is bound to the wrong vrf. The
difference between these and stream sockets is the handling of the skt
option for SO_REUSEPORT. While the handling when adding a skt for reuse
correctly checks that the bound device of the skt is a match, the skts
in the hashslot are already incorrect. So for the same hash, a skt for
the wrong vrf may be selected for the required port. The root cause is
that the skt is immediately placed into a slot when it is created,
but when the skt is then bound using SO_BINDTODEVICE, it remains in the
same slot. The solution is to move the skt to the correct slot by
forcing a rehash.
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 15:36:04 +00:00
src_idx ! = fl6 - > flowi6_oif & &
2022-05-13 11:55:41 -07:00
( READ_ONCE ( sk - > sk_bound_dev_if ) ! = fl6 - > flowi6_oif | |
net: ensure unbound datagram socket to be chosen when not in a VRF
Ensure an unbound datagram skt is chosen when not in a VRF. The check
for a device match in compute_score() for UDP must be performed when
there is no device match. For this, a failure is returned when there is
no device match. This ensures that bound sockets are never selected,
even if there is no unbound socket.
Allow IPv6 packets to be sent over a datagram skt bound to a VRF. These
packets are currently blocked, as flowi6_oif was set to that of the
master vrf device, and the ipi6_ifindex is that of the slave device.
Allow these packets to be sent by checking the device with ipi6_ifindex
has the same L3 scope as that of the bound device of the skt, which is
the master vrf device. Note that this check always succeeds if the skt
is unbound.
Even though the right datagram skt is now selected by compute_score(),
a different skt is being returned that is bound to the wrong vrf. The
difference between these and stream sockets is the handling of the skt
option for SO_REUSEPORT. While the handling when adding a skt for reuse
correctly checks that the bound device of the skt is a match, the skts
in the hashslot are already incorrect. So for the same hash, a skt for
the wrong vrf may be selected for the required port. The root cause is
that the skt is immediately placed into a slot when it is created,
but when the skt is then bound using SO_BINDTODEVICE, it remains in the
same slot. The solution is to move the skt to the correct slot by
forcing a rehash.
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 15:36:04 +00:00
! sk_dev_equal_l3scope ( sk , src_idx ) ) )
2005-04-16 15:20:36 -07:00
return - EINVAL ;
net: ensure unbound datagram socket to be chosen when not in a VRF
Ensure an unbound datagram skt is chosen when not in a VRF. The check
for a device match in compute_score() for UDP must be performed when
there is no device match. For this, a failure is returned when there is
no device match. This ensures that bound sockets are never selected,
even if there is no unbound socket.
Allow IPv6 packets to be sent over a datagram skt bound to a VRF. These
packets are currently blocked, as flowi6_oif was set to that of the
master vrf device, and the ipi6_ifindex is that of the slave device.
Allow these packets to be sent by checking the device with ipi6_ifindex
has the same L3 scope as that of the bound device of the skt, which is
the master vrf device. Note that this check always succeeds if the skt
is unbound.
Even though the right datagram skt is now selected by compute_score(),
a different skt is being returned that is bound to the wrong vrf. The
difference between these and stream sockets is the handling of the skt
option for SO_REUSEPORT. While the handling when adding a skt for reuse
correctly checks that the bound device of the skt is a match, the skts
in the hashslot are already incorrect. So for the same hash, a skt for
the wrong vrf may be selected for the required port. The root cause is
that the skt is immediately placed into a slot when it is created,
but when the skt is then bound using SO_BINDTODEVICE, it remains in the
same slot. The solution is to move the skt to the correct slot by
forcing a rehash.
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 15:36:04 +00:00
fl6 - > flowi6_oif = src_idx ;
2005-04-16 15:20:36 -07:00
}
2008-06-04 13:01:37 +09:00
addr_type = __ipv6_addr_type ( & src_info - > ipi6_addr ) ;
2005-04-16 15:20:36 -07:00
2009-11-02 12:21:06 +01:00
rcu_read_lock ( ) ;
2011-03-12 16:22:43 -05:00
if ( fl6 - > flowi6_oif ) {
dev = dev_get_by_index_rcu ( net , fl6 - > flowi6_oif ) ;
2009-11-02 12:21:06 +01:00
if ( ! dev ) {
rcu_read_unlock ( ) ;
2008-06-04 13:01:37 +09:00
return - ENODEV ;
2009-11-02 12:21:06 +01:00
}
} else if ( addr_type & IPV6_ADDR_LINKLOCAL ) {
rcu_read_unlock ( ) ;
2008-06-04 13:01:37 +09:00
return - EINVAL ;
2009-11-02 12:21:06 +01:00
}
2007-02-09 23:24:49 +09:00
2008-06-04 13:01:37 +09:00
if ( addr_type ! = IPV6_ADDR_ANY ) {
int strict = __ipv6_addr_src_scope ( addr_type ) < = IPV6_ADDR_SCOPE_LINKLOCAL ;
2018-07-31 21:18:11 +02:00
if ( ! ipv6_can_nonlocal_bind ( net , inet_sk ( sk ) ) & &
2018-03-13 08:29:37 -07:00
! ipv6_chk_addr_and_flags ( net , & src_info - > ipi6_addr ,
dev , ! strict , 0 ,
IFA_F_TENTATIVE ) & &
2014-01-22 07:42:37 +01:00
! ipv6_chk_acast_addr_src ( net , dev ,
& src_info - > ipi6_addr ) )
2008-06-04 13:01:37 +09:00
err = - EINVAL ;
else
2011-11-21 03:39:03 +00:00
fl6 - > saddr = src_info - > ipi6_addr ;
2005-04-16 15:20:36 -07:00
}
2008-06-04 13:01:37 +09:00
2009-11-02 12:21:06 +01:00
rcu_read_unlock ( ) ;
2005-04-16 15:20:36 -07:00
2008-06-04 13:01:37 +09:00
if ( err )
goto exit_f ;
2005-04-16 15:20:36 -07:00
break ;
2008-06-04 13:01:37 +09:00
}
2005-04-16 15:20:36 -07:00
case IPV6_FLOWINFO :
2007-02-09 23:24:49 +09:00
if ( cmsg - > cmsg_len < CMSG_LEN ( 4 ) ) {
2005-04-16 15:20:36 -07:00
err = - EINVAL ;
goto exit_f ;
}
2011-03-12 16:22:43 -05:00
if ( fl6 - > flowlabel & IPV6_FLOWINFO_MASK ) {
if ( ( fl6 - > flowlabel ^ * ( __be32 * ) CMSG_DATA ( cmsg ) ) & ~ IPV6_FLOWINFO_MASK ) {
2005-04-16 15:20:36 -07:00
err = - EINVAL ;
goto exit_f ;
}
}
2011-03-12 16:22:43 -05:00
fl6 - > flowlabel = IPV6_FLOWINFO_MASK & * ( __be32 * ) CMSG_DATA ( cmsg ) ;
2005-04-16 15:20:36 -07:00
break ;
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
case IPV6_2292HOPOPTS :
2005-04-16 15:20:36 -07:00
case IPV6_HOPOPTS :
2007-02-09 23:24:49 +09:00
if ( opt - > hopopt | | cmsg - > cmsg_len < CMSG_LEN ( sizeof ( struct ipv6_opt_hdr ) ) ) {
2005-04-16 15:20:36 -07:00
err = - EINVAL ;
goto exit_f ;
}
hdr = ( struct ipv6_opt_hdr * ) CMSG_DATA ( cmsg ) ;
len = ( ( hdr - > hdrlen + 1 ) < < 3 ) ;
if ( cmsg - > cmsg_len < CMSG_LEN ( len ) ) {
err = - EINVAL ;
goto exit_f ;
}
net: Allow userns root to control ipv6
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.
In general policy and network stack state changes are allowed while
resource control is left unchanged.
Allow the SIOCSIFADDR ioctl to add ipv6 addresses.
Allow the SIOCDIFADDR ioctl to delete ipv6 addresses.
Allow the SIOCADDRT ioctl to add ipv6 routes.
Allow the SIOCDELRT ioctl to delete ipv6 routes.
Allow creation of ipv6 raw sockets.
Allow setting the IPV6_JOIN_ANYCAST socket option.
Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR
socket option.
Allow setting the IPV6_TRANSPARENT socket option.
Allow setting the IPV6_HOPOPTS socket option.
Allow setting the IPV6_RTHDRDSTOPTS socket option.
Allow setting the IPV6_DSTOPTS socket option.
Allow setting the IPV6_IPSEC_POLICY socket option.
Allow setting the IPV6_XFRM_POLICY socket option.
Allow sending packets with the IPV6_2292HOPOPTS control message.
Allow sending packets with the IPV6_2292DSTOPTS control message.
Allow sending packets with the IPV6_RTHDRDSTOPTS control message.
Allow setting the multicast routing socket options on non multicast
routing sockets.
Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for
setting up, changing and deleting tunnels over ipv6.
Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for
setting up, changing and deleting ipv6 over ipv4 tunnels.
Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding,
deleting, and changing the potential router list for ISATAP tunnels.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-16 03:03:06 +00:00
if ( ! ns_capable ( net - > user_ns , CAP_NET_RAW ) ) {
2005-04-16 15:20:36 -07:00
err = - EPERM ;
goto exit_f ;
}
opt - > opt_nflen + = len ;
opt - > hopopt = hdr ;
break ;
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
case IPV6_2292DSTOPTS :
2007-02-09 23:24:49 +09:00
if ( cmsg - > cmsg_len < CMSG_LEN ( sizeof ( struct ipv6_opt_hdr ) ) ) {
2005-04-16 15:20:36 -07:00
err = - EINVAL ;
goto exit_f ;
}
hdr = ( struct ipv6_opt_hdr * ) CMSG_DATA ( cmsg ) ;
len = ( ( hdr - > hdrlen + 1 ) < < 3 ) ;
if ( cmsg - > cmsg_len < CMSG_LEN ( len ) ) {
err = - EINVAL ;
goto exit_f ;
}
net: Allow userns root to control ipv6
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.
In general policy and network stack state changes are allowed while
resource control is left unchanged.
Allow the SIOCSIFADDR ioctl to add ipv6 addresses.
Allow the SIOCDIFADDR ioctl to delete ipv6 addresses.
Allow the SIOCADDRT ioctl to add ipv6 routes.
Allow the SIOCDELRT ioctl to delete ipv6 routes.
Allow creation of ipv6 raw sockets.
Allow setting the IPV6_JOIN_ANYCAST socket option.
Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR
socket option.
Allow setting the IPV6_TRANSPARENT socket option.
Allow setting the IPV6_HOPOPTS socket option.
Allow setting the IPV6_RTHDRDSTOPTS socket option.
Allow setting the IPV6_DSTOPTS socket option.
Allow setting the IPV6_IPSEC_POLICY socket option.
Allow setting the IPV6_XFRM_POLICY socket option.
Allow sending packets with the IPV6_2292HOPOPTS control message.
Allow sending packets with the IPV6_2292DSTOPTS control message.
Allow sending packets with the IPV6_RTHDRDSTOPTS control message.
Allow setting the multicast routing socket options on non multicast
routing sockets.
Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for
setting up, changing and deleting tunnels over ipv6.
Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for
setting up, changing and deleting ipv6 over ipv4 tunnels.
Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding,
deleting, and changing the potential router list for ISATAP tunnels.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-16 03:03:06 +00:00
if ( ! ns_capable ( net - > user_ns , CAP_NET_RAW ) ) {
2005-04-16 15:20:36 -07:00
err = - EPERM ;
goto exit_f ;
}
if ( opt - > dst1opt ) {
err = - EINVAL ;
goto exit_f ;
}
opt - > opt_flen + = len ;
opt - > dst1opt = hdr ;
break ;
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
case IPV6_DSTOPTS :
case IPV6_RTHDRDSTOPTS :
if ( cmsg - > cmsg_len < CMSG_LEN ( sizeof ( struct ipv6_opt_hdr ) ) ) {
err = - EINVAL ;
goto exit_f ;
}
hdr = ( struct ipv6_opt_hdr * ) CMSG_DATA ( cmsg ) ;
len = ( ( hdr - > hdrlen + 1 ) < < 3 ) ;
if ( cmsg - > cmsg_len < CMSG_LEN ( len ) ) {
err = - EINVAL ;
goto exit_f ;
}
net: Allow userns root to control ipv6
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.
In general policy and network stack state changes are allowed while
resource control is left unchanged.
Allow the SIOCSIFADDR ioctl to add ipv6 addresses.
Allow the SIOCDIFADDR ioctl to delete ipv6 addresses.
Allow the SIOCADDRT ioctl to add ipv6 routes.
Allow the SIOCDELRT ioctl to delete ipv6 routes.
Allow creation of ipv6 raw sockets.
Allow setting the IPV6_JOIN_ANYCAST socket option.
Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR
socket option.
Allow setting the IPV6_TRANSPARENT socket option.
Allow setting the IPV6_HOPOPTS socket option.
Allow setting the IPV6_RTHDRDSTOPTS socket option.
Allow setting the IPV6_DSTOPTS socket option.
Allow setting the IPV6_IPSEC_POLICY socket option.
Allow setting the IPV6_XFRM_POLICY socket option.
Allow sending packets with the IPV6_2292HOPOPTS control message.
Allow sending packets with the IPV6_2292DSTOPTS control message.
Allow sending packets with the IPV6_RTHDRDSTOPTS control message.
Allow setting the multicast routing socket options on non multicast
routing sockets.
Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for
setting up, changing and deleting tunnels over ipv6.
Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for
setting up, changing and deleting ipv6 over ipv4 tunnels.
Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding,
deleting, and changing the potential router list for ISATAP tunnels.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-16 03:03:06 +00:00
if ( ! ns_capable ( net - > user_ns , CAP_NET_RAW ) ) {
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
err = - EPERM ;
goto exit_f ;
}
if ( cmsg - > cmsg_type = = IPV6_DSTOPTS ) {
opt - > opt_flen + = len ;
opt - > dst1opt = hdr ;
} else {
opt - > opt_nflen + = len ;
opt - > dst0opt = hdr ;
}
break ;
case IPV6_2292RTHDR :
2005-04-16 15:20:36 -07:00
case IPV6_RTHDR :
2007-02-09 23:24:49 +09:00
if ( cmsg - > cmsg_len < CMSG_LEN ( sizeof ( struct ipv6_rt_hdr ) ) ) {
2005-04-16 15:20:36 -07:00
err = - EINVAL ;
goto exit_f ;
}
rthdr = ( struct ipv6_rt_hdr * ) CMSG_DATA ( cmsg ) ;
2006-08-23 19:17:12 -07:00
switch ( rthdr - > type ) {
2012-10-29 16:23:10 +00:00
# if IS_ENABLED(CONFIG_IPV6_MIP6)
2006-08-23 19:17:12 -07:00
case IPV6_SRCRT_TYPE_2 :
2008-11-12 22:59:21 -08:00
if ( rthdr - > hdrlen ! = 2 | |
rthdr - > segments_left ! = 1 ) {
err = - EINVAL ;
goto exit_f ;
}
2006-08-23 19:17:12 -07:00
break ;
2007-07-10 22:55:49 -07:00
# endif
2006-08-23 19:17:12 -07:00
default :
2005-04-16 15:20:36 -07:00
err = - EINVAL ;
goto exit_f ;
}
len = ( ( rthdr - > hdrlen + 1 ) < < 3 ) ;
2007-02-09 23:24:49 +09:00
if ( cmsg - > cmsg_len < CMSG_LEN ( len ) ) {
2005-04-16 15:20:36 -07:00
err = - EINVAL ;
goto exit_f ;
}
/* segments left must also match */
if ( ( rthdr - > hdrlen > > 1 ) ! = rthdr - > segments_left ) {
err = - EINVAL ;
goto exit_f ;
}
opt - > opt_nflen + = len ;
opt - > srcrt = rthdr ;
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
if ( cmsg - > cmsg_type = = IPV6_2292RTHDR & & opt - > dst1opt ) {
2005-04-16 15:20:36 -07:00
int dsthdrlen = ( ( opt - > dst1opt - > hdrlen + 1 ) < < 3 ) ;
opt - > opt_nflen + = dsthdrlen ;
opt - > dst0opt = opt - > dst1opt ;
opt - > dst1opt = NULL ;
opt - > opt_flen - = dsthdrlen ;
}
break ;
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).
Support several new socket options / ancillary data:
IPV6_RECVPKTINFO, IPV6_PKTINFO,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS,
IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS,
IPV6_RECVRTHDR, IPV6_RTHDR,
IPV6_RECVHOPOPTS, IPV6_HOPOPTS
Old semantics are preserved as IPV6_2292xxxx so that
we can maintain backward compatibility.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-09-08 09:59:17 +09:00
case IPV6_2292HOPLIMIT :
2005-04-16 15:20:36 -07:00
case IPV6_HOPLIMIT :
if ( cmsg - > cmsg_len ! = CMSG_LEN ( sizeof ( int ) ) ) {
err = - EINVAL ;
goto exit_f ;
}
2016-05-02 21:40:07 -07:00
ipc6 - > hlimit = * ( int * ) CMSG_DATA ( cmsg ) ;
if ( ipc6 - > hlimit < - 1 | | ipc6 - > hlimit > 0xff ) {
2008-06-10 15:50:55 +08:00
err = - EINVAL ;
goto exit_f ;
}
2005-04-16 15:20:36 -07:00
break ;
2005-09-08 10:19:03 +09:00
case IPV6_TCLASS :
{
int tc ;
err = - EINVAL ;
2012-04-01 07:49:03 +00:00
if ( cmsg - > cmsg_len ! = CMSG_LEN ( sizeof ( int ) ) )
2005-09-08 10:19:03 +09:00
goto exit_f ;
tc = * ( int * ) CMSG_DATA ( cmsg ) ;
2006-09-13 20:08:07 -07:00
if ( tc < - 1 | | tc > 0xff )
2005-09-08 10:19:03 +09:00
goto exit_f ;
err = 0 ;
2016-05-02 21:40:07 -07:00
ipc6 - > tclass = tc ;
2005-09-08 10:19:03 +09:00
2010-04-23 11:26:08 +00:00
break ;
}
case IPV6_DONTFRAG :
{
int df ;
err = - EINVAL ;
2012-04-01 07:49:03 +00:00
if ( cmsg - > cmsg_len ! = CMSG_LEN ( sizeof ( int ) ) )
2010-04-23 11:26:08 +00:00
goto exit_f ;
df = * ( int * ) CMSG_DATA ( cmsg ) ;
if ( df < 0 | | df > 1 )
goto exit_f ;
err = 0 ;
2016-05-02 21:40:07 -07:00
ipc6 - > dontfrag = df ;
2010-04-23 11:26:08 +00:00
2005-09-08 10:19:03 +09:00
break ;
}
2005-04-16 15:20:36 -07:00
default :
2014-11-11 10:59:17 -08:00
net_dbg_ratelimited ( " invalid cmsg type: %d \n " ,
cmsg - > cmsg_type ) ;
2005-04-16 15:20:36 -07:00
err = - EINVAL ;
2008-07-29 23:57:58 -07:00
goto exit_f ;
2007-04-20 17:09:22 -07:00
}
2005-04-16 15:20:36 -07:00
}
exit_f :
return err ;
}
2013-01-31 01:02:24 +00:00
EXPORT_SYMBOL_GPL ( ip6_datagram_send_ctl ) ;
2013-05-31 15:05:48 +00:00
2018-06-08 11:35:40 +02:00
void __ip6_dgram_sock_seq_show ( struct seq_file * seq , struct sock * sp ,
__u16 srcp , __u16 destp , int rqueue , int bucket )
2013-05-31 15:05:48 +00:00
{
const struct in6_addr * dest , * src ;
ipv6: make lookups simpler and faster
TCP listener refactoring, part 4 :
To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common
Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.
Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).
inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.
inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.
We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 15:42:29 -07:00
dest = & sp - > sk_v6_daddr ;
src = & sp - > sk_v6_rcv_saddr ;
2013-05-31 15:05:48 +00:00
seq_printf ( seq ,
" %5d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
2019-05-17 17:11:28 +02:00
" %02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %u \n " ,
2013-05-31 15:05:48 +00:00
bucket ,
src - > s6_addr32 [ 0 ] , src - > s6_addr32 [ 1 ] ,
src - > s6_addr32 [ 2 ] , src - > s6_addr32 [ 3 ] , srcp ,
dest - > s6_addr32 [ 0 ] , dest - > s6_addr32 [ 1 ] ,
dest - > s6_addr32 [ 2 ] , dest - > s6_addr32 [ 3 ] , destp ,
sp - > sk_state ,
sk_wmem_alloc_get ( sp ) ,
2018-06-08 11:35:40 +02:00
rqueue ,
2013-05-31 15:05:48 +00:00
0 , 0L , 0 ,
from_kuid_munged ( seq_user_ns ( seq ) , sock_i_uid ( sp ) ) ,
0 ,
sock_i_ino ( sp ) ,
2017-06-30 13:08:01 +03:00
refcount_read ( & sp - > sk_refcnt ) , sp ,
2013-05-31 15:05:48 +00:00
atomic_read ( & sp - > sk_drops ) ) ;
}