linux/net/ipv6
Jakub Sitnicki 10154dbded udp: Allow GSO transmit from devices with no checksum offload
Today sending a UDP GSO packet from a TUN device results in an EIO error:

  import fcntl, os, struct
  from socket import *

  TUNSETIFF = 0x400454CA
  IFF_TUN = 0x0001
  IFF_NO_PI = 0x1000
  UDP_SEGMENT = 103

  tun_fd = os.open("/dev/net/tun", os.O_RDWR)
  ifr = struct.pack("16sH", b"tun0", IFF_TUN | IFF_NO_PI)
  fcntl.ioctl(tun_fd, TUNSETIFF, ifr)

  os.system("ip addr add 192.0.2.1/24 dev tun0")
  os.system("ip link set dev tun0 up")

  s = socket(AF_INET, SOCK_DGRAM)
  s.setsockopt(SOL_UDP, UDP_SEGMENT, 1200)
  s.sendto(b"x" * 3000, ("192.0.2.2", 9)) # EIO

This is due to a check in the udp stack if the egress device offers
checksum offload. While TUN/TAP devices, by default, don't advertise this
capability because it requires support from the TUN/TAP reader.

However, the GSO stack has a software fallback for checksum calculation,
which we can use. This way we don't force UDP_SEGMENT users to handle the
EIO error and implement a segmentation fallback.

Lift the restriction so that UDP_SEGMENT can be used with any egress
device. We also need to adjust the UDP GSO code to match the GSO stack
expectation about ip_summed field, as set in commit 8d63bee643 ("net:
avoid skb_warn_bad_offload false positives on UFO"). Otherwise we will hit
the bad offload check.

Users should, however, expect a potential performance impact when
batch-sending packets with UDP_SEGMENT without checksum offload on the
egress device. In such case the packet payload is read twice: first during
the sendmsg syscall when copying data from user memory, and then in the GSO
stack for checksum computation. This double memory read can be less
efficient than a regular sendmsg where the checksum is calculated during
the initial data copy from user memory.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240626-linux-udpgso-v2-1-422dfcbd6b48@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28 18:12:59 -07:00
..
ila ila: block BH in ila_output() 2024-06-03 18:50:09 -07:00
netfilter net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-05-23 14:14:23 -07:00
addrconf_core.c ipv6: Ensure natural alignment of const ipv6 loopback and router addresses 2024-01-30 12:43:18 +01:00
addrconf.c net/ipv6/addrconf: constify ctl_table arguments of utility functions 2024-05-28 19:49:47 -07:00
addrlabel.c ipv6: remove RTNL protection from ip6addrlbl_dump() 2024-04-08 11:01:05 +01:00
af_inet6.c net: use unrcu_pointer() helper 2024-06-06 11:52:52 +02:00
ah6.c net: fill in MODULE_DESCRIPTION()s for ipv6 modules 2024-02-09 14:12:01 -08:00
anycast.c ipv6: anycast: use call_rcu_hurry() in aca_put() 2024-05-01 11:46:21 +01:00
calipso.c netlabel: remove impossible return value in netlbl_bitmap_walk 2024-02-28 19:37:34 -08:00
datagram.c ipv6: annotate data-races around np->ucast_oif 2023-12-11 10:59:17 +00:00
esp6_offload.c xfrm: Support GRO for IPv6 ESP in UDP encapsulation 2023-10-06 07:31:14 +02:00
esp6.c ipsec-next-2024-05-03 2024-05-06 19:14:56 -07:00
exthdrs_core.c ipv6: Fix out-of-bounds access in ipv6_find_tlv() 2023-05-24 08:43:39 +01:00
exthdrs_offload.c net: gso: add HBH extension header offload support 2024-01-05 08:11:49 -08:00
exthdrs.c net: ipv6: exthdrs: get rid of ipv6_skb_net() 2024-03-11 15:15:08 -07:00
fib6_notifier.c
fib6_rules.c ipv6: fib6_rules: avoid possible NULL dereference in fib6_rule_action() 2024-05-08 18:50:53 -07:00
fou6.c
icmp.c net: ipv{6,4}: Remove the now superfluous sentinel elements from ctl_table array 2024-05-03 13:29:42 +01:00
inet6_connection_sock.c net: implement lockless SO_PRIORITY 2023-10-01 19:09:54 +01:00
inet6_hashtables.c tcp: get rid of twsk_unique() 2024-05-09 20:25:55 -07:00
ioam6_iptunnel.c ipv6: ioam: block BH from ioam6_output() 2024-06-03 18:50:08 -07:00
ioam6.c ipv6/addrconf: annotate data-races around devconf fields (II) 2024-03-01 08:42:33 +00:00
ip6_checksum.c
ip6_fib.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-20 13:49:59 -07:00
ip6_flowlabel.c ipv6: move np->repflow to atomic flags 2023-09-15 10:33:48 +01:00
ip6_gre.c net: ip6_gre: Remove generic .ndo_get_stats64 2024-04-15 11:32:13 +01:00
ip6_icmp.c
ip6_input.c ipv6/addrconf: annotate data-races around devconf fields (II) 2024-03-01 08:42:33 +00:00
ip6_offload.c net: gro: initialize network_offset in network layer 2024-05-27 16:46:59 -07:00
ip6_offload.h
ip6_output.c net: Add additional bit to support clockid_t timestamp type 2024-05-23 14:14:36 -07:00
ip6_tunnel.c net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00
ip6_udp_tunnel.c net: fill in MODULE_DESCRIPTION()s for ipv6 modules 2024-02-09 14:12:01 -08:00
ip6_vti.c net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00
ip6mr.c ipv6: introduce dst_rt6_info() helper 2024-04-29 13:32:01 +01:00
ipcomp6.c
ipv6_sockglue.c net: use unrcu_pointer() helper 2024-06-06 11:52:52 +02:00
Kconfig ipv6: fix indentation of a config attribute 2023-08-16 10:03:08 +01:00
Makefile net/tcp: Introduce TCP_AO setsockopt()s 2023-10-27 10:35:44 +01:00
mcast_snoop.c
mcast.c ipv6/addrconf: annotate data-races around devconf fields (II) 2024-03-01 08:42:33 +00:00
mip6.c net: fill in MODULE_DESCRIPTION()s for ipv6 modules 2024-02-09 14:12:01 -08:00
ndisc.c net/ipv6/ndisc: constify ctl_table arguments of utility function 2024-05-28 19:49:47 -07:00
netfilter.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-13 13:13:46 -07:00
output_core.c ipv6: annotate data-races around cnf.hop_limit 2024-03-01 08:42:31 +00:00
ping.c ipv6: introduce dst_rt6_info() helper 2024-04-29 13:32:01 +01:00
proc.c net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams. 2023-10-20 12:01:00 +01:00
protocol.c
raw.c net: raw: use sk_skb_reason_drop to free rx packets 2024-06-19 12:44:22 +01:00
reassembly.c net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-05-23 14:14:23 -07:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-20 13:49:59 -07:00
rpl_iptunnel.c net: ipv6: rpl_iptunnel: block BH in rpl_output() and rpl_input() 2024-06-03 18:50:08 -07:00
rpl.c ipv6: rpl: Remove pskb(_may)?_pull() in ipv6_rpl_srh_rcv(). 2023-06-19 11:32:58 -07:00
seg6_hmac.c ipv6: sr: fix memleak in seg6_hmac_init_algo 2024-05-21 13:16:25 +02:00
seg6_iptunnel.c ipv6: sr: block BH in seg6_output_core() and seg6_input_core() 2024-06-03 18:50:08 -07:00
seg6_local.c seg6: Use nested-BH locking for seg6_bpf_srh_states. 2024-06-24 16:41:23 -07:00
seg6.c ipv6: sr: restruct ifdefines 2024-05-30 18:29:38 -07:00
sit.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
syncookies.c tcp: use sk_skb_reason_drop to free rx packets 2024-06-19 12:44:22 +01:00
sysctl_net_ipv6.c net: ipv{6,4}: Remove the now superfluous sentinel elements from ctl_table array 2024-05-03 13:29:42 +01:00
tcp_ao.c net/tcp: Wire up l3index to TCP-AO 2023-10-27 10:35:46 +01:00
tcp_ipv6.c tcp: use sk_skb_reason_drop to free rx packets 2024-06-19 12:44:22 +01:00
tcpv6_offload.c net: gro: use cb instead of skb->network_header 2024-05-13 14:44:06 -07:00
tunnel6.c net: fill in MODULE_DESCRIPTION()s for ipv6 modules 2024-02-09 14:12:01 -08:00
udp_impl.h
udp_offload.c net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb 2024-05-02 11:02:48 +02:00
udp.c udp: Allow GSO transmit from devices with no checksum offload 2024-06-28 18:12:59 -07:00
udplite.c udplite: remove UDPLITE_BIT 2023-09-14 16:16:36 +02:00
xfrm6_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-05-09 10:01:01 -07:00
xfrm6_output.c ipv6: drop feature RTAX_FEATURE_ALLFRAG 2023-10-25 18:04:29 -07:00
xfrm6_policy.c xfrm6: check ip6_dst_idev() return value in xfrm6_get_saddr() 2024-06-17 18:04:46 -07:00
xfrm6_protocol.c
xfrm6_state.c
xfrm6_tunnel.c ipsec-next-2024-03-06 2024-03-08 10:56:05 +00:00