linux/net
Salvatore Dipietro 7267e8dcad tcp: Add memory barrier to tcp_push()
On CPUs with weak memory models, reads and updates performed by tcp_push
to the sk variables can get reordered leaving the socket throttled when
it should not. The tasklet running tcp_wfree() may also not observe the
memory updates in time and will skip flushing any packets throttled by
tcp_push(), delaying the sending. This can pathologically cause 40ms
extra latency due to bad interactions with delayed acks.

Adding a memory barrier in tcp_push removes the bug, similarly to the
previous commit bf06200e73 ("tcp: tsq: fix nonagle handling").
smp_mb__after_atomic() is used to not incur in unnecessary overhead
on x86 since not affected.

Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu
22.04 and Apache Tomcat 9.0.83 running the basic servlet below:

import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class HelloWorldServlet extends HttpServlet {
    @Override
    protected void doGet(HttpServletRequest request, HttpServletResponse response)
      throws ServletException, IOException {
        response.setContentType("text/html;charset=utf-8");
        OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
        String s = "a".repeat(3096);
        osw.write(s,0,s.length());
        osw.flush();
    }
}

Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS
c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+
values is observed while, with the patch, the extra latency disappears.

No patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.60.173:8080/hello/hello
  ...
 50.000%    0.91ms
 75.000%    1.13ms
 90.000%    1.46ms
 99.000%    1.74ms
 99.900%    1.89ms
 99.990%   41.95ms  <<< 40+ ms extra latency
 99.999%   48.32ms
100.000%   48.96ms

With patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.60.173:8080/hello/hello
  ...
 50.000%    0.90ms
 75.000%    1.13ms
 90.000%    1.45ms
 99.000%    1.72ms
 99.900%    1.83ms
 99.990%    2.11ms  <<< no 40+ ms extra latency
 99.999%    2.53ms
100.000%    2.62ms

Patch has been also tested on x86 (m7i.2xlarge instance) which it is not
affected by this issue and the patch doesn't introduce any additional
delay.

Fixes: 7aa5470c2c ("tcp: tsq: move tsq_flags close to sk_wmem_alloc")
Signed-off-by: Salvatore Dipietro <dipiets@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240119190133.43698-1-dipiets@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-23 10:22:31 +01:00
..
6lowpan
9p net: 9p: avoid freeing uninit memory in p9pdu_vreadf 2023-12-13 05:44:30 +09:00
802 net: fill in MODULE_DESCRIPTION()s under net/802* 2023-10-28 11:29:28 +01:00
8021q vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING 2024-01-19 21:25:06 -08:00
appletalk net: remove SOCK_DEBUG leftovers 2023-12-26 20:31:01 +00:00
atm net: fill in MODULE_DESCRIPTION()s for ATM 2024-01-05 08:04:23 -08:00
ax25
batman-adv batman-adv: Switch to linux/array_size.h 2023-11-14 08:16:34 +01:00
bluetooth TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
bpf bpf: Fix dtor CFI 2023-12-15 16:25:55 -08:00
bridge netfilter: bridge: replace physindev with physinif in nf_bridge_info 2024-01-17 12:02:49 +01:00
caif net: fill in MODULE_DESCRIPTION()s for CAIF 2024-01-05 08:06:35 -08:00
can
ceph This update includes the following changes: 2023-11-02 16:15:30 -10:00
core net: fix removing a namespace with conflicting altnames 2024-01-22 01:09:30 +00:00
dcb
dccp net: remove SOCK_DEBUG leftovers 2023-12-26 20:31:01 +00:00
devlink devlink: extend multicast filtering by port index 2023-12-19 15:31:40 +01:00
dns_resolver Networking changes for 6.8. 2024-01-11 10:07:29 -08:00
dsa net: dsa: fix netdev_priv() dereference before check on non-DSA netdevice events 2024-01-11 16:33:52 -08:00
ethernet
ethtool ethtool: netlink: Add missing ethnl_ops_begin/complete 2024-01-18 13:21:06 +01:00
handshake Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-10-26 13:46:28 -07:00
hsr net: fill in MODULE_DESCRIPTION()s for HSR 2024-01-11 16:16:08 -08:00
ieee802154 mac802154: Avoid new associations while disassociating 2023-12-15 11:14:57 +01:00
ife net: sched: ife: fix potential use-after-free 2023-12-15 10:50:18 +00:00
ipv4 tcp: Add memory barrier to tcp_push() 2024-01-23 10:22:31 +01:00
ipv6 netfilter pull request 24-01-18 2024-01-18 12:45:05 -08:00
iucv iucv: make iucv_bus const 2023-12-29 07:46:38 +00:00
kcm net: kcm: fix direct access to bv_len 2024-01-03 18:37:22 -08:00
key
l2tp ipv6: annotate data-races around np->ucast_oif 2023-12-11 10:59:17 +00:00
l3mdev
lapb
llc llc: Drop support for ETH_P_TR_802_2. 2024-01-19 21:30:09 -08:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-01-04 18:06:46 -08:00
mac802154 mac802154: Avoid new associations while disassociating 2023-12-15 11:14:57 +01:00
mctp
mpls
mptcp mptcp: relax check on MPC passive fallback 2024-01-17 10:55:54 +00:00
ncsi net/ncsi: Add NC-SI 1.2 Get MC MAC Address command 2023-11-18 15:00:51 +00:00
netfilter ipvs: avoid stat macros calls from preemptible context 2024-01-17 12:02:51 +01:00
netlabel calipso: fix memory leak in netlbl_calipso_add_pass() 2023-12-07 14:23:12 -05:00
netlink genetlink: Use internal flags for multicast groups 2023-12-29 08:43:59 +00:00
netrom
nfc net: fill in MODULE_DESCRIPTION()s for NFC 2024-01-11 16:16:08 -08:00
nsh
openvswitch net/sched: act_ct: Always fill offloading tuple iifidx 2023-11-08 17:47:08 -08:00
packet net: fill in MODULE_DESCRIPTION() for AF_PACKET 2024-01-05 08:06:35 -08:00
phonet
psample genetlink: Use internal flags for multicast groups 2023-12-29 08:43:59 +00:00
qrtr net: qrtr: ns: Return 0 if server port is not present 2024-01-01 18:41:29 +00:00
rds net/rds: Fix UBSAN: array-index-out-of-bounds in rds_cmsg_recv 2024-01-22 11:24:00 +00:00
rfkill Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-12-21 22:17:23 +01:00
rose net/rose: fix races in rose_kill_by_device() 2023-12-15 11:59:53 +00:00
rxrpc rxrpc: Fix use of Don't Fragment flag 2024-01-11 16:41:41 -08:00
sched net: sched: track device in tcf_block_get/put_ext() only for clsact binder types 2024-01-13 15:49:15 +00:00
sctp sctp: fix busy polling 2024-01-04 10:29:18 +00:00
smc net/smc: fix illegal rmb_desc access in SMC-D connection dump 2024-01-19 12:04:17 +00:00
strparser
sunrpc net: fill in MODULE_DESCRIPTION()s for Sun RPC 2024-01-11 16:16:08 -08:00
switchdev
tipc tipc: Remove some excess struct member documentation 2023-12-22 23:14:43 +00:00
tls net: tls, fix WARNIING in __sk_msg_free 2024-01-14 12:17:14 +00:00
unix for-6.8/io_uring-2024-01-08 2024-01-11 14:19:23 -08:00
vmw_vsock vsock/virtio: use skb_frag_*() helpers 2024-01-03 18:37:16 -08:00
wireless Just a couple of more things over the holidays: 2024-01-04 17:00:08 -08:00
x25 net: remove SOCK_DEBUG leftovers 2023-12-26 20:31:01 +00:00
xdp netdev 2023-12-18 16:46:08 -08:00
xfrm bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc 2023-12-14 17:12:49 -08:00
compat.c file: stop exposing receive_fd_user() 2023-12-12 14:24:14 +01:00
devres.c
Kconfig bpfilter: remove bpfilter 2024-01-04 10:23:10 -08:00
Kconfig.debug
Makefile bpfilter: remove bpfilter 2024-01-04 10:23:10 -08:00
socket.c vfs-6.8.iov_iter 2024-01-08 11:43:04 -08:00
sysctl_net.c