0e4d354762
The IP_UNICAST_IF socket option is used to set the outgoing interface for outbound packets. The IP_UNICAST_IF socket option was added as it was needed by the Wine project, since no other existing option (SO_BINDTODEVICE socket option, IP_PKTINFO socket option or the bind function) provided the needed characteristics needed by the IP_UNICAST_IF socket option. [1] The IP_UNICAST_IF socket option works well for unconnected sockets, that is, the interface specified by the IP_UNICAST_IF socket option is taken into consideration in the route lookup process when a packet is being sent. However, for connected sockets, the outbound interface is chosen when connecting the socket, and in the route lookup process which is done when a packet is being sent, the interface specified by the IP_UNICAST_IF socket option is being ignored. This inconsistent behavior was reported and discussed in an issue opened on systemd's GitHub project [2]. Also, a bug report was submitted in the kernel's bugzilla [3]. To understand the problem in more detail, we can look at what happens for UDP packets over IPv4 (The same analysis was done separately in the referenced systemd issue). When a UDP packet is sent the udp_sendmsg function gets called and the following happens: 1. The oif member of the struct ipcm_cookie ipc (which stores the output interface of the packet) is initialized by the ipcm_init_sk function to inet->sk.sk_bound_dev_if (the device set by the SO_BINDTODEVICE socket option). 2. If the IP_PKTINFO socket option was set, the oif member gets overridden by the call to the ip_cmsg_send function. 3. If no output interface was selected yet, the interface specified by the IP_UNICAST_IF socket option is used. 4. If the socket is connected and no destination address is specified in the send function, the struct ipcm_cookie ipc is not taken into consideration and the cached route, that was calculated in the connect function is being used. Thus, for a connected socket, the IP_UNICAST_IF sockopt isn't taken into consideration. This patch corrects the behavior of the IP_UNICAST_IF socket option for connect()ed sockets by taking into consideration the IP_UNICAST_IF sockopt when connecting the socket. In order to avoid reconnecting the socket, this option is still ignored when applied on an already connected socket until connect() is called again by the Richard Gobert. Change the __ip4_datagram_connect function, which is called during socket connection, to take into consideration the interface set by the IP_UNICAST_IF socket option, in a similar way to what is done in the udp_sendmsg function. [1] https://lore.kernel.org/netdev/1328685717.4736.4.camel@edumazet-laptop/T/ [2] https://github.com/systemd/systemd/issues/11935#issuecomment-618691018 [3] https://bugzilla.kernel.org/show_bug.cgi?id=210255 Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220829111554.GA1771@debian Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
---|---|---|
.. | ||
af_unix | ||
bpf | ||
forwarding | ||
mptcp | ||
.gitignore | ||
altnames.sh | ||
amt.sh | ||
arp_ndisc_evict_nocarrier.sh | ||
arp_ndisc_untracked_subnets.sh | ||
bareudp.sh | ||
bind_bhash.c | ||
bind_bhash.sh | ||
cmsg_ipv6.sh | ||
cmsg_sender.c | ||
cmsg_so_mark.sh | ||
cmsg_time.sh | ||
config | ||
devlink_port_split.py | ||
drop_monitor_tests.sh | ||
fcnal-test.sh | ||
fib_nexthop_multiprefix.sh | ||
fib_nexthop_nongw.sh | ||
fib_nexthops.sh | ||
fib_rule_tests.sh | ||
fib_tests.sh | ||
fib-onlink-tests.sh | ||
fin_ack_lat.c | ||
fin_ack_lat.sh | ||
gre_gso.sh | ||
gro.c | ||
gro.sh | ||
hwtstamp_config.c | ||
icmp_redirect.sh | ||
icmp.sh | ||
in_netns.sh | ||
io_uring_zerocopy_tx.c | ||
io_uring_zerocopy_tx.sh | ||
ioam6_parser.c | ||
ioam6.sh | ||
ip6_gre_headroom.sh | ||
ip_defrag.c | ||
ip_defrag.sh | ||
ipsec.c | ||
ipv6_flowlabel_mgr.c | ||
ipv6_flowlabel.c | ||
ipv6_flowlabel.sh | ||
l2_tos_ttl_inherit.sh | ||
l2tp.sh | ||
Makefile | ||
msg_zerocopy.c | ||
msg_zerocopy.sh | ||
ndisc_unsolicited_na_test.sh | ||
netdevice.sh | ||
nettest.c | ||
pmtu.sh | ||
psock_fanout.c | ||
psock_lib.h | ||
psock_snd.c | ||
psock_snd.sh | ||
psock_tpacket.c | ||
reuseaddr_conflict.c | ||
reuseaddr_ports_exhausted.c | ||
reuseaddr_ports_exhausted.sh | ||
reuseport_addr_any.c | ||
reuseport_addr_any.sh | ||
reuseport_bpf_cpu.c | ||
reuseport_bpf_numa.c | ||
reuseport_bpf.c | ||
reuseport_dualstack.c | ||
route_localnet.sh | ||
rtnetlink.sh | ||
run_afpackettests | ||
run_netsocktests | ||
rxtimestamp.c | ||
rxtimestamp.sh | ||
settings | ||
setup_loopback.sh | ||
setup_veth.sh | ||
sk_bind_sendto_listen.c | ||
sk_connect_zero_addr.c | ||
so_netns_cookie.c | ||
so_txtime.c | ||
so_txtime.sh | ||
socket.c | ||
srv6_end_dt4_l3vpn_test.sh | ||
srv6_end_dt6_l3vpn_test.sh | ||
srv6_end_dt46_l3vpn_test.sh | ||
srv6_hencap_red_l3vpn_test.sh | ||
srv6_hl2encap_red_l2vpn_test.sh | ||
stress_reuseport_listen.c | ||
stress_reuseport_listen.sh | ||
tap.c | ||
tcp_fastopen_backup_key.c | ||
tcp_fastopen_backup_key.sh | ||
tcp_inq.c | ||
tcp_mmap.c | ||
test_blackhole_dev.sh | ||
test_bpf.sh | ||
test_vxlan_fdb_changelink.sh | ||
test_vxlan_under_vrf.sh | ||
test_vxlan_vnifiltering.sh | ||
timestamping.c | ||
tls.c | ||
toeplitz_client.sh | ||
toeplitz.c | ||
toeplitz.sh | ||
traceroute.sh | ||
tun.c | ||
txring_overwrite.c | ||
txtimestamp.c | ||
txtimestamp.sh | ||
udpgro_bench.sh | ||
udpgro_frglist.sh | ||
udpgro_fwd.sh | ||
udpgro.sh | ||
udpgso_bench_rx.c | ||
udpgso_bench_tx.c | ||
udpgso_bench.sh | ||
udpgso.c | ||
udpgso.sh | ||
unicast_extensions.sh | ||
veth.sh | ||
vrf_route_leaking.sh | ||
vrf_strict_mode_test.sh | ||
vrf-xfrm-tests.sh | ||
xfrm_policy.sh |