linux/net
Neal Cardwell 5dfe9d2739 tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO
Testing determined that the recent commit 9e046bb111 ("tcp: clear
tp->retrans_stamp in tcp_rcv_fastopen_synack()") has a race, and does
not always ensure retrans_stamp is 0 after a TFO payload retransmit.

If transmit completion for the SYN+data skb happens after the client
TCP stack receives the SYNACK (which sometimes happens), then
retrans_stamp can erroneously remain non-zero for the lifetime of the
connection, causing a premature ETIMEDOUT later.

Testing and tracing showed that the buggy scenario is the following
somewhat tricky sequence:

+ Client attempts a TFO handshake. tcp_send_syn_data() sends SYN + TFO
  cookie + data in a single packet in the syn_data skb. It hands the
  syn_data skb to tcp_transmit_skb(), which makes a clone. Crucially,
  it then reuses the same original (non-clone) syn_data skb,
  transforming it by advancing the seq by one byte and removing the
  FIN bit, and enques the resulting payload-only skb in the
  sk->tcp_rtx_queue.

+ Client sets retrans_stamp to the start time of the three-way
  handshake.

+ Cookie mismatches or server has TFO disabled, and server only ACKs
  SYN.

+ tcp_ack() sees SYN is acked, tcp_clean_rtx_queue() clears
  retrans_stamp.

+ Since the client SYN was acked but not the payload, the TFO failure
  code path in tcp_rcv_fastopen_synack() tries to retransmit the
  payload skb.  However, in some cases the transmit completion for the
  clone of the syn_data (which had SYN + TFO cookie + data) hasn't
  happened.  In those cases, skb_still_in_host_queue() returns true
  for the retransmitted TFO payload, because the clone of the syn_data
  skb has not had its tx completetion.

+ Because skb_still_in_host_queue() finds skb_fclone_busy() is true,
  it sets the TSQ_THROTTLED bit and the retransmit does not happen in
  the tcp_rcv_fastopen_synack() call chain.

+ The tcp_rcv_fastopen_synack() code next implicitly assumes the
  retransmit process is finished, and sets retrans_stamp to 0 to clear
  it, but this is later overwritten (see below).

+ Later, upon tx completion, tcp_tsq_write() calls
  tcp_xmit_retransmit_queue(), which puts the retransmit in flight and
  sets retrans_stamp to a non-zero value.

+ The client receives an ACK for the retransmitted TFO payload data.

+ Since we're in CA_Open and there are no dupacks/SACKs/DSACKs/ECN to
  make tcp_ack_is_dubious() true and make us call
  tcp_fastretrans_alert() and reach a code path that clears
  retrans_stamp, retrans_stamp stays nonzero.

+ Later, if there is a TLP, RTO, RTO sequence, then the connection
  will suffer an early ETIMEDOUT due to the erroneously ancient
  retrans_stamp.

The fix: this commit refactors the code to have
tcp_rcv_fastopen_synack() retransmit by reusing the relevant parts of
tcp_simple_retransmit() that enter CA_Loss (without changing cwnd) and
call tcp_xmit_retransmit_queue(). We have tcp_simple_retransmit() and
tcp_rcv_fastopen_synack() share code in this way because in both cases
we get a packet indicating non-congestion loss (MTU reduction or TFO
failure) and thus in both cases we want to retransmit as many packets
as cwnd allows, without reducing cwnd. And given that retransmits will
set retrans_stamp to a non-zero value (and may do so in a later
calling context due to TSQ), we also want to enter CA_Loss so that we
track when all retransmitted packets are ACked and clear retrans_stamp
when that happens (to ensure later recurring RTOs are using the
correct retrans_stamp and don't declare ETIMEDOUT prematurely).

Fixes: 9e046bb111 ("tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack()")
Fixes: a7abf3cd76 ("tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Link: https://patch.msgid.link/20240624144323.2371403-1-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25 17:22:49 -07:00
..
6lowpan
9p Two fixes headed to stable trees: 2024-05-29 09:25:15 -07:00
802
8021q net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-05-09 10:01:01 -07:00
atm net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
ax25 ax25: Replace kfree() in ax25_dev_free() with ax25_dev_put() 2024-06-01 15:49:42 -07:00
batman-adv Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks" 2024-06-12 20:18:00 +02:00
bluetooth Bluetooth: fix connection setup in l2cap_connect 2024-06-10 09:48:30 -04:00
bpf bpf: Set run context for rawtp test_run callback 2024-06-05 09:41:33 +02:00
bridge net: bridge: mst: fix suspicious rcu usage in br_mst_set_state 2024-06-12 18:24:24 -07:00
caif caif: Use UTILITY_NAME_LENGTH instead of hard-coding 16 2024-04-02 18:20:00 -07:00
can net: can: j1939: recover socket queue on CAN bus error during BAM transmission 2024-06-21 10:50:17 +02:00
ceph
core bpf-for-netdev 2024-06-24 18:15:22 -07:00
dcb
dccp Fix race for duplicate reqsk on identical SYN 2024-06-25 11:37:45 +02:00
devlink devlink: extend devlink_param *set pointer 2024-04-22 13:05:19 -07:00
dns_resolver
dsa tracing/treewide: Remove second parameter of __assign_str() 2024-05-22 20:14:47 -04:00
ethernet netkit: Fix pkt_type override upon netkit pass verdict 2024-05-25 10:48:57 -07:00
ethtool net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool() 2024-06-06 13:34:33 +02:00
handshake net/handshake: remove redundant assignment to variable ret 2024-04-16 17:14:55 -07:00
hsr Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-05-09 10:01:01 -07:00
ieee802154 tracing/treewide: Remove second parameter of __assign_str() 2024-05-22 20:14:47 -04:00
ife
ipv4 tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO 2024-06-25 17:22:49 -07:00
ipv6 netfilter pull request 24-06-19 2024-06-20 11:21:53 +02:00
iucv more s390 updates for 6.10 merge window 2024-05-21 12:09:36 -07:00
kcm net: kcm: fix incorrect parameter validation in the kcm_getsockopt) function 2024-03-11 09:53:22 +00:00
key
l2tp l2tp: fix ICMP error handling for UDP-encap sockets 2024-05-17 12:15:22 -07:00
l3mdev
lapb
llc net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
mac80211 wifi: mac80211: fix monitor channel with chanctx emulation 2024-06-14 09:14:08 +02:00
mac802154
mctp
mpls net: Remove the now superfluous sentinel elements from ctl_table array 2024-05-03 13:29:41 +01:00
mptcp mptcp: pm: update add_addr counters after connect 2024-06-10 19:49:10 -07:00
ncsi net/ncsi: Fix the multi thread manner of NCSI driver 2024-06-01 16:21:44 -07:00
netfilter netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter core 2024-06-19 18:41:59 +02:00
netlabel netlabel: fix RCU annotation for IPv4 options on socket creation 2024-05-13 14:58:12 -07:00
netlink netlink: support all extack types in dumps 2024-04-23 10:09:49 -07:00
netrom netrom: Fix a memory leak in nr_heartbeat_expiry() 2024-06-17 13:06:23 +01:00
nfc Quite smaller than usual. Notably it includes the fix for the unix 2024-05-23 12:49:37 -07:00
nsh nsh: Restore skb->{protocol,data,mac_header} for outer header in nsh_gso_segment(). 2024-04-26 12:20:01 +02:00
openvswitch openvswitch: get related ct labels from its master if it is not confirmed 2024-06-21 10:17:30 +01:00
packet af_packet: do not call packet_read_pending() from tpacket_destruct_skb() 2024-05-16 19:38:05 -07:00
phonet net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
psample ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
qrtr net: qrtr: ns: Fix module refcnt 2024-05-16 09:47:45 +01:00
rds net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
rfkill net: rfkill: gpio: Convert to platform remove callback returning void 2024-03-25 15:40:22 +01:00
rose net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-05-09 10:01:01 -07:00
sched sched: act_ct: add netns into the key of tcf_ct_flow_table 2024-06-18 15:24:24 +02:00
sctp net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
smc net/smc: avoid overwriting when adjusting sock bufsizes 2024-06-05 09:42:57 +01:00
strparser
sunrpc NFS client bugfixes for Linux 6.10 2024-06-13 11:07:32 -07:00
switchdev net: bridge: switchdev: Improve error message for port_obj_add/del functions 2024-05-08 12:19:12 +01:00
tipc tipc: force a dst refcount before doing decryption 2024-06-18 15:08:57 +02:00
tls tls: fix missing memory barrier in tls_init 2024-05-23 12:03:26 +02:00
unix af_unix: Read with MSG_PEEK loops if the first unread byte is OOB 2024-06-13 08:03:55 -07:00
vmw_vsock virtio: features, fixes, cleanups 2024-05-23 12:04:36 -07:00
wireless wifi: cfg80211: wext: add extra SIOCSIWSCAN data check 2024-06-12 10:07:56 +02:00
x25 net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
xdp Revert "xsk: Support redirect to any socket bound to the same umem" 2024-06-05 09:42:30 +02:00
xfrm net: fix __dst_negative_advice() race 2024-05-29 17:34:49 -07:00
compat.c
devres.c
Kconfig net: add IEEE 802.1q specific helpers 2024-05-08 10:35:09 +01:00
Kconfig.debug
Makefile
socket.c net: have do_accept() take a struct proto_accept_arg argument 2024-05-13 18:19:19 -06:00
sysctl_net.c sysctl: treewide: constify argument ctl_table_root::permissions(table) 2024-04-24 09:43:54 +02:00