linux/net/ipv4
Eric Dumazet 133c4c0d37 tcp: defer regular ACK while processing socket backlog
This idea came after a particular workload requested
the quickack attribute set on routes, and a performance
drop was noticed for large bulk transfers.

For high throughput flows, it is best to use one cpu
running the user thread issuing socket system calls,
and a separate cpu to process incoming packets from BH context.
(With TSO/GRO, bottleneck is usually the 'user' cpu)

Problem is the user thread can spend a lot of time while holding
the socket lock, forcing BH handler to queue most of incoming
packets in the socket backlog.

Whenever the user thread releases the socket lock, it must first
process all accumulated packets in the backlog, potentially
adding latency spikes. Due to flood mitigation, having too many
packets in the backlog increases chance of unexpected drops.

Backlog processing unfortunately shifts a fair amount of cpu cycles
from the BH cpu to the 'user' cpu, thus reducing max throughput.

This patch takes advantage of the backlog processing,
and the fact that ACK are mostly cumulative.

The idea is to detect we are in the backlog processing
and defer all eligible ACK into a single one,
sent from tcp_release_cb().

This saves cpu cycles on both sides, and network resources.

Performance of a single TCP flow on a 200Gbit NIC:

- Throughput is increased by 20% (100Gbit -> 120Gbit).
- Number of generated ACK per second shrinks from 240,000 to 40,000.
- Number of backlog drops per second shrinks from 230 to 0.

Benchmark context:
 - Regular netperf TCP_STREAM (no zerocopy)
 - Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids)
 - MAX_SKB_FRAGS = 17 (~60KB per GRO packet)

This feature is guarded by a new sysctl, and enabled by default:
 /proc/sys/net/ipv4/tcp_backlog_ack_defer

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 19:10:01 +02:00
..
bpfilter net: Use umd_cleanup_helper() 2023-05-31 13:06:57 +02:00
netfilter inet: move inet->nodefrag to inet->inet_flags 2023-08-16 11:09:17 +01:00
af_inet.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-24 10:51:39 -07:00
ah4.c net: ipv4: Remove completion function scaffolding 2023-02-13 18:35:15 +08:00
arp.c neighbour: annotate lockless accesses to n->nud_state 2023-03-15 00:37:32 -07:00
bpf_tcp_ca.c bpf: Drop useless btf_vmlinux in bpf_tcp_ca 2023-07-18 17:31:10 -07:00
cipso_ipv4.c inet: move inet->is_icsk to inet->inet_flags 2023-08-16 11:09:17 +01:00
datagram.c ipv4: fix data-races around inet->inet_id 2023-08-20 11:40:49 +01:00
devinet.c sysctl-6.6-rc1 2023-08-29 17:39:15 -07:00
esp4_offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-06-22 18:40:38 -07:00
esp4.c net: ipv4: Use kfree_sensitive instead of kfree 2023-07-19 11:03:03 +01:00
fib_frontend.c ipv4: Fix incorrect table ID in IOCTL path 2023-03-16 17:26:31 -07:00
fib_lookup.h
fib_notifier.c
fib_rules.c
fib_semantics.c ipv4: annotate data-races around fi->fib_dead 2023-08-31 11:58:18 +02:00
fib_trie.c ipv4: annotate data-races around fi->fib_dead 2023-08-31 11:58:18 +02:00
fou_bpf.c bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
fou_core.c bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
fou_nl.c net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
fou_nl.h net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
gre_demux.c
gre_offload.c net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
icmp.c icmp: guard against too small mtu 2023-03-31 21:37:06 -07:00
igmp.c igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU 2023-09-05 17:49:40 +01:00
inet_connection_sock.c tcp: annotate data-races around icsk->icsk_syn_retries 2023-07-20 12:34:18 -07:00
inet_diag.c inet: move inet->defer_connect to inet->inet_flags 2023-08-16 11:09:18 +01:00
inet_fragment.c net: dropreason: add SKB_DROP_REASON_FRAG_REASM_TIMEOUT 2022-10-31 20:14:27 -07:00
inet_hashtables.c net: remove duplicate INDIRECT_CALLABLE_DECLARE of udp[6]_ehashfn 2023-07-31 13:53:10 -07:00
inet_timewait_sock.c inet: move inet->transparent to inet->inet_flags 2023-08-16 11:09:17 +01:00
inetpeer.c
ip_forward.c net: ipv4, ipv6: fix IPSTATS_MIB_OUTOCTETS increment duplicated 2023-08-30 09:44:09 +01:00
ip_fragment.c networking: Update to register_net_sysctl_sz 2023-08-15 15:26:18 -07:00
ip_gre.c ipv4: ip_gre: fix return value check in erspan_xmit() 2023-07-19 12:27:09 +01:00
ip_input.c ipv4: ignore dst hint for multipath routes 2023-09-01 08:11:51 +01:00
ip_options.c
ip_output.c net: annotate data-races around sk->sk_tsflags 2023-09-01 07:27:33 +01:00
ip_sockglue.c net: annotate data-races around sk->sk_tsflags 2023-09-01 07:27:33 +01:00
ip_tunnel_core.c tunnels: fix kasan splat when generating ipv4 pmtu error 2023-08-04 18:24:52 -07:00
ip_tunnel.c bpf-next-for-netdev 2023-04-13 16:43:38 -07:00
ip_vti.c ip_vti: fix potential slab-use-after-free in decode_session6 2023-07-11 11:06:08 +02:00
ipcomp.c
ipconfig.c net: ipconfig: move ic_nameservers_fallback into #ifdef block 2023-05-22 11:17:55 +01:00
ipip.c ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices 2023-04-12 16:40:39 -07:00
ipmr_base.c
ipmr.c net: ipv4, ipv6: fix IPSTATS_MIB_OUTOCTETS increment duplicated 2023-08-30 09:44:09 +01:00
Kconfig tcp: configurable source port perturb table size 2022-11-16 13:02:04 +00:00
Makefile bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
metrics.c ipv4: prevent potential spectre v1 gadget in ip_metrics_convert() 2023-01-23 21:37:25 -08:00
netfilter.c
netlink.c
nexthop.c nexthop: Do not increment dump sentinel at the end of the dump 2023-08-15 18:54:53 -07:00
ping.c inet: move inet->recverr to inet->inet_flags 2023-08-16 11:09:17 +01:00
proc.c icmp: Add counters for rate limits 2023-01-26 10:52:18 +01:00
protocol.c
raw_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-06 12:01:20 -07:00
raw.c inet: move inet->hdrincl to inet->inet_flags 2023-08-16 11:09:17 +01:00
route.c net: dst: remove unnecessary input parameter in dst_alloc and dst_init 2023-09-12 11:42:25 +02:00
syncookies.c tcp: Set route scope properly in cookie_v4_check(). 2023-06-06 21:13:03 -07:00
sysctl_net_ipv4.c tcp: defer regular ACK while processing socket backlog 2023-09-12 19:10:01 +02:00
tcp_bbr.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
tcp_bic.c
tcp_bpf.c sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
tcp_cdg.c
tcp_cong.c net: Update an existing TCP congestion control algorithm. 2023-03-22 22:53:00 -07:00
tcp_cubic.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
tcp_dctcp.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
tcp_dctcp.h
tcp_diag.c
tcp_fastopen.c inet: move inet->defer_connect to inet->inet_flags 2023-08-16 11:09:18 +01:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: defer regular ACK while processing socket backlog 2023-09-12 19:10:01 +02:00
tcp_ipv4.c tcp: defer regular ACK while processing socket backlog 2023-09-12 19:10:01 +02:00
tcp_lp.c
tcp_metrics.c tcp_metrics: hash table allocation cleanup 2023-08-04 15:33:39 -07:00
tcp_minisocks.c inet: move inet->transparent to inet->inet_flags 2023-08-16 11:09:17 +01:00
tcp_nv.c
tcp_offload.c net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
tcp_output.c tcp: defer regular ACK while processing socket backlog 2023-09-12 19:10:01 +02:00
tcp_plb.c prandom: remove prandom_u32_max() 2022-12-20 03:13:45 +01:00
tcp_rate.c
tcp_recovery.c tcp: preserve const qualifier in tcp_sk() 2023-03-18 12:23:34 +00:00
tcp_scalable.c
tcp_timer.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
tcp_ulp.c net/ulp: use consistent error code when blocking ULP 2023-01-19 09:26:16 -08:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tcp.c Including fixes from netfilter and bpf. 2023-09-07 18:33:07 -07:00
tunnel4.c
udp_bpf.c bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() 2023-03-03 17:25:15 +01:00
udp_diag.c udp: Access &udp_table via net. 2022-11-16 09:43:35 +00:00
udp_impl.h sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
udp_offload.c net: gro: fix misuse of CB in udp socket lookup 2023-07-29 17:10:27 +01:00
udp_tunnel_core.c inet: move inet->mc_loop to inet->inet_frags 2023-08-16 11:09:17 +01:00
udp_tunnel_nic.c udp_tunnel: Add checks for nla_nest_start() in __udp_tunnel_nic_dump_write() 2022-11-29 08:44:24 -08:00
udp_tunnel_stub.c
udp.c net: annotate data-races around sk->sk_forward_alloc 2023-09-01 07:27:33 +01:00
udplite.c sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
xfrm4_input.c xfrm: fix inbound ipv4/udp/esp packets to UDPv6 dualstack sockets 2023-06-09 08:16:34 +02:00
xfrm4_output.c
xfrm4_policy.c sysctl-6.6-rc1 2023-08-29 17:39:15 -07:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c