linux/Documentation/networking
Eric Dumazet 133c4c0d37 tcp: defer regular ACK while processing socket backlog
This idea came after a particular workload requested
the quickack attribute set on routes, and a performance
drop was noticed for large bulk transfers.

For high throughput flows, it is best to use one cpu
running the user thread issuing socket system calls,
and a separate cpu to process incoming packets from BH context.
(With TSO/GRO, bottleneck is usually the 'user' cpu)

Problem is the user thread can spend a lot of time while holding
the socket lock, forcing BH handler to queue most of incoming
packets in the socket backlog.

Whenever the user thread releases the socket lock, it must first
process all accumulated packets in the backlog, potentially
adding latency spikes. Due to flood mitigation, having too many
packets in the backlog increases chance of unexpected drops.

Backlog processing unfortunately shifts a fair amount of cpu cycles
from the BH cpu to the 'user' cpu, thus reducing max throughput.

This patch takes advantage of the backlog processing,
and the fact that ACK are mostly cumulative.

The idea is to detect we are in the backlog processing
and defer all eligible ACK into a single one,
sent from tcp_release_cb().

This saves cpu cycles on both sides, and network resources.

Performance of a single TCP flow on a 200Gbit NIC:

- Throughput is increased by 20% (100Gbit -> 120Gbit).
- Number of generated ACK per second shrinks from 240,000 to 40,000.
- Number of backlog drops per second shrinks from 230 to 0.

Benchmark context:
 - Regular netperf TCP_STREAM (no zerocopy)
 - Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids)
 - MAX_SKB_FRAGS = 17 (~60KB per GRO packet)

This feature is guarded by a new sysctl, and enabled by default:
 /proc/sys/net/ipv4/tcp_backlog_ack_defer

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 19:10:01 +02:00
..
caif tty: cumulate and document tty_struct::flow* members 2021-05-13 16:57:16 +02:00
device_drivers VFIO updates for v6.6-rc1 2023-08-30 20:36:01 -07:00
devlink Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
dsa Documentation: networking: correct spelling 2023-01-31 13:00:47 +01:00
mac80211_hwsim
6lowpan.rst
6pack.rst
af_xdp.rst xsk: add multi-buffer documentation 2023-07-19 09:56:50 -07:00
alias.rst
arcnet-hardware.rst Documentation: networking: correct spelling 2023-01-31 13:00:47 +01:00
arcnet.rst Documentation: networking: arcnet: drop doubled word 2020-07-04 17:46:21 -07:00
atm.rst
ax25.rst Documentation: networking: ax25: drop doubled word 2020-07-04 17:46:21 -07:00
bareudp.rst Documentation: bareudp: Corrected description of bareudp module. 2020-07-28 17:53:03 -07:00
batman-adv.rst batman-adv: Fix mailing list address 2023-01-21 19:01:59 +01:00
bonding.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
bridge.rst docs: networking: Fix bridge documentation URL 2023-01-25 22:44:27 -08:00
can_ucan_protocol.rst Documentation: networking: correct spelling 2023-01-31 13:00:47 +01:00
can.rst Documentation: networking: correct spelling 2023-01-31 13:00:47 +01:00
cdc_mbim.rst Documentation: networking: correct spelling 2023-01-31 13:00:47 +01:00
checksum-offloads.rst
dccp.rst net: dccp: Add SIOCOUTQ IOCTL support (send buffer fill) 2020-07-22 17:00:37 -07:00
dctcp.rst
dns_resolver.rst
driver.rst net: docs: update the sample code in driver.rst 2023-04-13 13:30:21 +02:00
eql.rst
ethtool-netlink.rst net: ethtool: coalesce: try to make user settings stick twice 2023-04-24 18:09:49 -07:00
failover.rst
fib_trie.rst
filter.rst treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
gen_stats.rst
generic_netlink.rst Documentation: networking: Update generic_netlink_howto URL 2022-11-23 17:25:02 -08:00
generic-hdlc.rst
gtp.rst Documentation: networking: correct spelling 2023-01-31 13:00:47 +01:00
ieee802154.rst Documentation: networking: correct spelling 2023-01-31 13:00:47 +01:00
ila.rst
index.rst docs: networking: fix x25-iface.rst heading & index order 2023-05-10 10:31:46 +01:00
ioam6-sysctl.rst ipv6: ioam: Documentation for new IOAM sysctls 2021-07-21 08:14:33 -07:00
ip_dynaddr.rst
ip-sysctl.rst tcp: defer regular ACK while processing socket backlog 2023-09-12 19:10:01 +02:00
ipddp.rst
ipsec.rst
ipv6.rst
ipvlan.rst Documentation: networking: correct spelling 2023-01-31 13:00:47 +01:00
ipvs-sysctl.rst ipvs: run_estimation should control the kthread tasks 2022-12-10 22:44:43 +01:00
j1939.rst Documentation: networking: correct spelling 2023-01-31 13:00:47 +01:00
kapi.rst wimax: move out to staging 2020-10-29 19:27:45 +01:00
kcm.rst
l2tp.rst Documentation: networking: correct possessive "its" 2022-08-31 12:36:08 -07:00
lapb-module.rst
mac80211-auth-assoc-deauth.txt
mac80211-injection.rst doc: networking: wireless: fix wiki website url 2020-06-08 10:05:53 +02:00
mctp.rst mctp: Add SIOCMCTP{ALLOC,DROP}TAG ioctls for tag control 2022-02-09 12:00:11 +00:00
mpls-sysctl.rst
mptcp-sysctl.rst mptcp: add a new sysctl scheduler 2023-08-22 17:31:18 -07:00
msg_zerocopy.rst docs: net: fix inaccuracies in msg_zerocopy.rst 2023-02-24 18:31:31 -08:00
multiqueue.rst
napi.rst docs: net: clarify the NAPI rules around XDP Tx 2023-07-21 18:51:37 -07:00
net_dim.rst
net_failover.rst Documentation: networking: correct spelling 2023-01-31 13:00:47 +01:00
netconsole.rst netconsole: Append kernel version to message 2023-07-18 11:04:59 +02:00
netdev-features.rst net: hsr: add offloading support 2021-02-11 13:24:44 -08:00
netdevices.rst net: bonding: move ioctl handling to private ndo operation 2021-07-27 20:11:45 +01:00
netfilter-sysctl.rst
netif-msg.rst
nexthop-group-resilient.rst Documentation: net: Document resilient next-hop groups 2021-03-29 13:51:38 -07:00
nf_conntrack-sysctl.rst netfilter: set default timeout to 3 secs for sctp shutdown send and recv state 2023-08-16 00:05:15 +02:00
nf_flowtable.rst docs: nf_flowtable: fix compilation and warnings 2021-03-25 17:42:02 -07:00
nfc.rst
openvswitch.rst
operstates.rst docs: operstates: document IF_OPER_TESTING 2021-08-02 15:16:04 +01:00
packet_mmap.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
page_pool.rst docs: net: page_pool: de-duplicate the intro comment 2023-08-08 16:09:10 -07:00
phonet.rst Documentation: networking: correct spelling 2023-01-31 13:00:47 +01:00
phy.rst net: phy: Introduce PSGMII PHY interface mode 2023-08-14 08:12:53 +01:00
pktgen.rst pktgen: document the latest pktgen usage options 2021-08-25 13:44:30 +01:00
plip.rst
ppp_generic.rst docs: update ppp_generic.rst to document new ioctls 2020-12-10 13:57:36 -08:00
proc_net_tcp.rst
radiotap-headers.rst
rds.rst Doc: networking: Fix the title's Sphinx overline in rds.rst 2021-11-29 15:18:21 -07:00
regulatory.rst Documentation: networking: correct spelling 2023-01-31 13:00:47 +01:00
representors.rst docs: net: add an explanation of VF (and other) Representors 2022-09-21 07:31:38 -07:00
rxrpc.rst rxrpc: Fix potential race in error handling in afs_make_call() 2023-04-22 15:16:39 +01:00
scaling.rst sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
sctp.rst
secid.rst
seg6-sysctl.rst doc: move seg6_flowlabel to seg6-sysctl.rst 2021-04-14 13:13:15 -07:00
segmentation-offloads.rst
sfp-phylink.rst doc: sfp-phylink: Fix a broken reference 2022-08-02 21:45:07 -07:00
skbuff.rst skbuff: render the checksum comment to documentation 2022-05-10 17:48:37 -07:00
smc-sysctl.rst net/smc: Unbind r/w buffer size from clcsock and make them tunable 2022-09-22 12:58:21 +02:00
snmp_counter.rst Documentation: networking: correct spelling 2023-01-31 13:00:47 +01:00
statistics.rst docs: ethtool-netlink: document interface for MAC Merge layer 2023-01-23 12:44:18 +00:00
strparser.rst
switchdev.rst docs: net: add an explanation of VF (and other) Representors 2022-09-21 07:31:38 -07:00
sysfs-tagging.rst Documentation: networking: correct spelling 2023-01-31 13:00:47 +01:00
tc-actions-env-rules.rst
tc-queue-filters.rst Documentation: networking: TC queue based filtering 2022-10-25 10:32:40 +02:00
tcp-thin.rst
team.rst
timestamping.rst net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP 2022-12-08 19:49:21 -08:00
tipc.rst Documentation: add more details in tipc.rst 2021-07-01 13:18:18 -07:00
tls-handshake.rst net/handshake: Enable the SNI extension to work properly 2023-05-24 22:05:24 -07:00
tls-offload-layers.svg
tls-offload-reorder-bad.svg
tls-offload-reorder-good.svg
tls-offload.rst net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled 2021-01-19 15:58:05 -08:00
tls.rst tls: rx: add counter for NoPad violations 2022-07-11 19:48:33 -07:00
tproxy.rst
tuntap.rst docs: networking: Replace strncpy() with strscpy() 2021-06-04 11:21:43 -06:00
udplite.rst
vrf.rst doc: Document unexpected tcp_l3mdev_accept=1 behavior 2021-08-23 11:53:24 +01:00
vxlan.rst docs: vxlan: add info about device features 2020-09-28 12:50:12 -07:00
x25-iface.rst docs: networking: fix x25-iface.rst heading & index order 2023-05-10 10:31:46 +01:00
x25.rst net: x25: Remove unimplemented X.25-over-LLC code stubs 2020-12-12 17:15:33 -08:00
xdp-rx-metadata.rst xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support 2023-03-22 09:11:09 -07:00
xfrm_device.rst net/mlx5: Update dead links in Kconfig documentation 2023-08-21 10:55:16 -07:00
xfrm_proc.rst
xfrm_sync.rst
xfrm_sysctl.rst