linux/net/ipv4
Jakub Sitnicki 10154dbded udp: Allow GSO transmit from devices with no checksum offload
Today sending a UDP GSO packet from a TUN device results in an EIO error:

  import fcntl, os, struct
  from socket import *

  TUNSETIFF = 0x400454CA
  IFF_TUN = 0x0001
  IFF_NO_PI = 0x1000
  UDP_SEGMENT = 103

  tun_fd = os.open("/dev/net/tun", os.O_RDWR)
  ifr = struct.pack("16sH", b"tun0", IFF_TUN | IFF_NO_PI)
  fcntl.ioctl(tun_fd, TUNSETIFF, ifr)

  os.system("ip addr add 192.0.2.1/24 dev tun0")
  os.system("ip link set dev tun0 up")

  s = socket(AF_INET, SOCK_DGRAM)
  s.setsockopt(SOL_UDP, UDP_SEGMENT, 1200)
  s.sendto(b"x" * 3000, ("192.0.2.2", 9)) # EIO

This is due to a check in the udp stack if the egress device offers
checksum offload. While TUN/TAP devices, by default, don't advertise this
capability because it requires support from the TUN/TAP reader.

However, the GSO stack has a software fallback for checksum calculation,
which we can use. This way we don't force UDP_SEGMENT users to handle the
EIO error and implement a segmentation fallback.

Lift the restriction so that UDP_SEGMENT can be used with any egress
device. We also need to adjust the UDP GSO code to match the GSO stack
expectation about ip_summed field, as set in commit 8d63bee643 ("net:
avoid skb_warn_bad_offload false positives on UFO"). Otherwise we will hit
the bad offload check.

Users should, however, expect a potential performance impact when
batch-sending packets with UDP_SEGMENT without checksum offload on the
egress device. In such case the packet payload is read twice: first during
the sendmsg syscall when copying data from user memory, and then in the GSO
stack for checksum computation. This double memory read can be less
efficient than a regular sendmsg where the checksum is calculated during
the initial data copy from user memory.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240626-linux-udpgso-v2-1-422dfcbd6b48@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28 18:12:59 -07:00
..
netfilter netfilter: tproxy: bail out if IP has been disabled on the device 2024-05-29 00:37:51 +02:00
af_inet.c net: gro: initialize network_offset in network layer 2024-05-27 16:46:59 -07:00
ah4.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00
arp.c arp: Convert ioctl(SIOCGARP) to RCU. 2024-05-01 18:37:07 -07:00
bpf_tcp_ca.c bpf: pass bpf_struct_ops_link to callbacks in bpf_struct_ops. 2024-05-30 15:34:13 -07:00
cipso_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-20 13:49:59 -07:00
datagram.c ipv4: Set the routing scope properly in ip_route_output_ports(). 2024-02-12 17:33:05 -08:00
devinet.c rtnetlink: make the "split" NLM_DONE handling generic 2024-06-05 12:34:54 +01:00
esp4_offload.c xfrm: Support GRO for IPv4 ESP in UDP encapsulation 2023-10-06 07:30:40 +02:00
esp4.c ipsec-next-2024-05-03 2024-05-06 19:14:56 -07:00
fib_frontend.c rtnetlink: make the "split" NLM_DONE handling generic 2024-06-05 12:34:54 +01:00
fib_lookup.h
fib_notifier.c
fib_rules.c fib: remove unnecessary input parameters in fib_default_rule_add 2024-01-03 16:42:48 -08:00
fib_semantics.c net: remove NULL-pointer net parameter in ip_metrics_convert 2024-06-05 10:06:00 +01:00
fib_trie.c inet: switch inet_dump_fib() to RCU protection 2024-02-26 11:46:13 +00:00
fou_bpf.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
fou_core.c fou: remove warn in gue_gro_receive on unsupported protocol 2024-06-17 17:49:50 -07:00
fou_nl.c
fou_nl.h
gre_demux.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
gre_offload.c net: gro: rename skb_gro_header_hard() 2024-03-05 13:30:11 +01:00
icmp.c net/ipv4: add tracepoint for icmp_send 2024-05-08 10:39:26 +01:00
igmp.c ipv4: Set scope explicitly in ip_route_output(). 2024-04-08 13:20:51 +01:00
inet_connection_sock.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-27 12:14:11 -07:00
inet_diag.c inet_diag: skip over empty buckets 2024-01-23 15:13:55 +01:00
inet_fragment.c net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-05-23 14:14:23 -07:00
inet_hashtables.c tcp: get rid of twsk_unique() 2024-05-09 20:25:55 -07:00
inet_timewait_sock.c tcp: move inet_twsk_schedule helper out of header 2024-06-10 11:54:18 +01:00
inetpeer.c net: ipv4: Simplify the allocation of slab caches in inet_initpeers 2024-01-31 16:39:42 -08:00
ip_forward.c net: fix IPSTATS_MIB_OUTFORWDATAGRAMS increment after fragment check 2023-10-13 09:58:45 -07:00
ip_fragment.c net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-05-23 14:14:23 -07:00
ip_gre.c net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00
ip_input.c inet: introduce dst_rtable() helper 2024-04-30 18:32:38 -07:00
ip_options.c
ip_output.c net: Add additional bit to support clockid_t timestamp type 2024-05-23 14:14:36 -07:00
ip_sockglue.c inet: Add getsockopt support for IP_ROUTER_ALERT and IPV6_ROUTER_ALERT 2024-03-06 12:37:06 +00:00
ip_tunnel_core.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ip_tunnel.c ip_tunnel: Move stats allocation to core 2024-06-11 19:24:37 -07:00
ip_vti.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ipcomp.c
ipconfig.c
ipip.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ipmr_base.c
ipmr.c ip_tunnel: use a separate struct to store tunnel params in the kernel 2024-04-01 10:49:28 +01:00
Kconfig net/tcp: Add TCP-AO config and structures 2023-10-27 10:35:44 +01:00
Makefile bpfilter: remove bpfilter 2024-01-04 10:23:10 -08:00
metrics.c net: remove NULL-pointer net parameter in ip_metrics_convert 2024-06-05 10:06:00 +01:00
netfilter.c xfrm: pass struct net to xfrm_decode_session wrappers 2023-10-06 08:31:53 +02:00
netlink.c
nexthop.c nexthop: fix uninitialized variable in nla_put_nh_group_stats() 2024-03-22 18:03:29 -07:00
ping.c ping: use sk_skb_reason_drop to free rx packets 2024-06-19 12:44:22 +01:00
proc.c net: add <net/proto_memory.h> 2024-04-30 18:46:52 -07:00
protocol.c
raw_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
raw.c net: raw: use sk_skb_reason_drop to free rx packets 2024-06-19 12:44:22 +01:00
route.c net: ipv4,ipv6: Pass multipath hash computation through a helper 2024-06-12 16:42:11 -07:00
syncookies.c tcp: use sk_skb_reason_drop to free rx packets 2024-06-19 12:44:22 +01:00
sysctl_net_ipv4.c net: ipv4: Add a sysctl to set multipath hash seed 2024-06-12 16:42:11 -07:00
tcp_ao.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-20 13:49:59 -07:00
tcp_bbr.c tcp: Add new args for cong_control in tcp_congestion_ops 2024-05-02 16:26:56 -07:00
tcp_bic.c
tcp_bpf.c tcp_bpf: properly release resources on error paths 2023-10-18 18:09:31 -07:00
tcp_cdg.c
tcp_cong.c net: remove NULL-pointer net parameter in ip_metrics_convert 2024-06-05 10:06:00 +01:00
tcp_cubic.c bpf: Remove CONFIG_X86 and CONFIG_DYNAMIC_FTRACE guard from the tcp-cc kfuncs 2024-03-28 18:31:40 -07:00
tcp_dctcp.c tcp: Fix shift-out-of-bounds in dctcp_update_alpha(). 2024-05-21 13:34:50 +02:00
tcp_dctcp.h
tcp_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
tcp_fastopen.c net: use unrcu_pointer() helper 2024-06-06 11:52:52 +02:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-27 12:14:11 -07:00
tcp_ipv4.c net/ipv4: Use nested-BH locking for ipv4_tcp_sk. 2024-06-24 16:41:22 -07:00
tcp_lp.c tcp: rename tcp_time_stamp() to tcp_time_stamp_ts() 2023-10-23 09:35:01 +01:00
tcp_metrics.c tcp_metrics: use parallel_ops for tcp_metrics_nl_family 2024-04-17 18:31:53 -07:00
tcp_minisocks.c net: tcp: un-pin the tw_timer 2024-06-10 11:54:18 +01:00
tcp_nv.c
tcp_offload.c net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment 2024-05-13 14:44:06 -07:00
tcp_output.c net/tcp: Add tcp-md5 and tcp-ao tracepoints 2024-06-12 06:39:04 +01:00
tcp_plb.c
tcp_rate.c
tcp_recovery.c tcp: fix excessive TLP and RACK timeouts from HZ rounding 2023-10-17 17:25:42 -07:00
tcp_scalable.c
tcp_sigpool.c net/tcp_sigpool: Use nested-BH locking for sigpool_scratch. 2024-06-24 16:41:22 -07:00
tcp_timer.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-13 13:13:46 -07:00
tcp_ulp.c
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tcp.c net/tcp: Remove tcp_hash_fail() 2024-06-12 06:39:04 +01:00
tunnel4.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00
udp_bpf.c
udp_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
udp_impl.h sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
udp_offload.c udp: Allow GSO transmit from devices with no checksum offload 2024-06-28 18:12:59 -07:00
udp_tunnel_core.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
udp_tunnel_nic.c udp_tunnel: Use flex array to simplify code 2023-10-03 11:39:34 +02:00
udp_tunnel_stub.c
udp.c udp: Allow GSO transmit from devices with no checksum offload 2024-06-28 18:12:59 -07:00
udplite.c udplite: remove UDPLITE_BIT 2023-09-14 16:16:36 +02:00
xfrm4_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-05-09 10:01:01 -07:00
xfrm4_output.c
xfrm4_policy.c net: ipv{6,4}: Remove the now superfluous sentinel elements from ctl_table array 2024-05-03 13:29:42 +01:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00