linux/net/ipv4
Koen De Schepper aecfde2310 tcp: Ensure DCTCP reacts to losses
RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
loss episodes in the same way as conventional TCP".

Currently, Linux DCTCP performs no cwnd reduction when losses
are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets
alpha to its maximal value if a RTO happens. This behavior
is sub-optimal for at least two reasons: i) it ignores losses
triggering fast retransmissions; and ii) it causes unnecessary large
cwnd reduction in the future if the loss was isolated as it resets
the historical term of DCTCP's alpha EWMA to its maximal value (i.e.,
denoting a total congestion). The second reason has an especially
noticeable effect when using DCTCP in high BDP environments, where
alpha normally stays at low values.

This patch replace the clamping of alpha by setting ssthresh to
half of cwnd for both fast retransmissions and RTOs, at most once
per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
has been removed.

The table below shows experimental results where we measured the
drop probability of a PIE AQM (not applying ECN marks) at a
bottleneck in the presence of a single TCP flow with either the
alpha-clamping option enabled or the cwnd halving proposed by this
patch. Results using reno or cubic are given for comparison.

                          |  Link   |   RTT    |    Drop
                 TCP CC   |  speed  | base+AQM | probability
        ==================|=========|==========|============
                    CUBIC |  40Mbps |  7+20ms  |    0.21%
                     RENO |         |          |    0.19%
        DCTCP-CLAMP-ALPHA |         |          |   25.80%
         DCTCP-HALVE-CWND |         |          |    0.22%
        ------------------|---------|----------|------------
                    CUBIC | 100Mbps |  7+20ms  |    0.03%
                     RENO |         |          |    0.02%
        DCTCP-CLAMP-ALPHA |         |          |   23.30%
         DCTCP-HALVE-CWND |         |          |    0.04%
        ------------------|---------|----------|------------
                    CUBIC | 800Mbps |   1+1ms  |    0.04%
                     RENO |         |          |    0.05%
        DCTCP-CLAMP-ALPHA |         |          |   18.70%
         DCTCP-HALVE-CWND |         |          |    0.06%

We see that, without halving its cwnd for all source of losses,
DCTCP drives the AQM to large drop probabilities in order to keep
the queue length under control (i.e., it repeatedly faces RTOs).
Instead, if DCTCP reacts to all source of losses, it can then be
controlled by the AQM using similar drop levels than cubic or reno.

Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Cc: Bob Briscoe <research@bobbriscoe.net>
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <borkmann@iogearbox.net>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04 10:51:16 -07:00
..
bpfilter net: bpfilter: disallow to remove bpfilter module while being used 2019-01-11 18:05:41 -08:00
netfilter netfilter: nf_tables: merge ipv4 and ipv6 nat chain types 2019-03-01 14:36:59 +01:00
af_inet.c gso: validate gso_type on ipip style tunnels 2019-02-20 11:24:27 -08:00
ah4.c net-ipv4: remove 2 always zero parameters from ipv4_redirect() 2018-09-26 20:30:55 -07:00
arp.c net: Evict neighbor entries on carrier down 2018-10-12 09:47:39 -07:00
cipso_ipv4.c netlabel: fix out-of-bounds memory accesses 2019-02-27 21:45:24 -08:00
datagram.c ipv4: Allow sending multicast packets on specific i/f using VRF socket 2018-10-02 22:28:17 -07:00
devinet.c net: ignore sysctl_devconf_inherit_init_net without SYSCTL 2019-03-04 13:14:34 -08:00
esp4_offload.c net: use skb_sec_path helper in more places 2018-12-19 11:21:37 -08:00
esp4.c esp: Skip TX bytes accounting when sending from a request socket 2019-01-28 11:20:58 +01:00
fib_frontend.c ipv4: Return error for RTA_VIA attribute 2019-02-26 13:23:17 -08:00
fib_lookup.h
fib_notifier.c
fib_rules.c ipv4: fib_rules: Fix possible infinite loop in fib_empty_table 2018-12-30 12:57:04 -08:00
fib_semantics.c ipv4: fib: use struct_size() in kzalloc() 2019-02-01 15:12:29 -08:00
fib_trie.c net: ipv4: Fix memory leak in network namespace dismantle 2019-01-15 13:33:44 -08:00
fou.c fou, fou6: avoid uninit-value in gue_err() and gue6_err() 2019-03-08 15:19:53 -08:00
gre_demux.c net: ip_gre: use erspan key field for tunnel lookup 2019-01-22 11:52:17 -08:00
gre_offload.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-03 10:29:26 +09:00
icmp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-03-02 12:54:35 -08:00
igmp.c net: remove unneeded switch fall-through 2019-02-21 13:48:00 -08:00
inet_connection_sock.c inet: minor optimization for backlog setting in listen(2) 2018-11-07 22:31:07 -08:00
inet_diag.c inet_diag: fix reporting cgroup classid and fallback to priority 2019-02-12 13:35:57 -05:00
inet_fragment.c net: remove unused struct inet_frag_queue.fragments field 2019-02-26 08:27:05 -08:00
inet_hashtables.c net: dccp: fix kernel crash on module load 2018-12-24 15:27:56 -08:00
inet_timewait_sock.c soreuseport: initialise timewait reuseport field 2018-04-07 22:32:32 -04:00
inetpeer.c net: ipv4: use a dedicated counter for icmp_v4 redirect packets 2019-02-08 21:50:15 -08:00
ip_forward.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-20 11:53:36 -08:00
ip_fragment.c net: remove unused struct inet_frag_queue.fragments field 2019-02-26 08:27:05 -08:00
ip_gre.c route: Add multipath_hash in flowi_common to make user-define hash 2019-02-27 12:50:17 -08:00
ip_input.c vrf: check accept_source_route on the original netdevice 2019-04-01 10:44:58 -07:00
ip_options.c vrf: check accept_source_route on the original netdevice 2019-04-01 10:44:58 -07:00
ip_output.c sk_buff: add skb extension infrastructure 2018-12-19 11:21:37 -08:00
ip_sockglue.c ip: on queued skb use skb_header_pointer instead of pskb_may_pull 2019-01-10 09:27:20 -05:00
ip_tunnel_core.c ip_tunnel: Add dst_cache support in lwtunnel_state of ip tunnel 2019-02-24 22:13:49 -08:00
ip_tunnel.c iptunnel: NULL pointer deref for ip_md_tunnel_xmit 2019-03-06 10:43:06 -08:00
ip_vti.c vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel 2019-01-09 14:00:37 +01:00
ipcomp.c net-ipv4: remove 2 always zero parameters from ipv4_redirect() 2018-09-26 20:30:55 -07:00
ipconfig.c ipconfig: add carrier_timeout kernel parameter 2019-02-01 15:24:13 -08:00
ipip.c ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit 2019-01-26 09:43:03 -08:00
ipmr_base.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-10-19 11:03:06 -07:00
ipmr.c ipmr: ip6mr: Create new sockopt to clear mfc cache or vifs 2019-02-21 13:05:05 -08:00
Kconfig net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
Makefile bpf, sockmap: convert to generic sk_msg interface 2018-10-15 12:23:19 -07:00
metrics.c net: Add extack argument to ip_fib_metrics_init 2018-11-06 15:00:45 -08:00
netfilter.c netfilter: ipv4: remove useless export_symbol 2019-01-28 11:32:58 +01:00
netlink.c ipv4: Add ICMPv6 support when parse route ipproto 2019-03-01 16:41:27 -08:00
ping.c ipv4: Allow sending multicast packets on specific i/f using VRF socket 2018-10-02 22:28:17 -07:00
proc.c tcp: implement coalescing on backlog queue 2018-11-30 13:26:54 -08:00
protocol.c fou, fou6: ICMP error handlers for FoU and GUE 2018-11-08 17:13:08 -08:00
raw_diag.c
raw.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-20 11:53:36 -08:00
route.c route: set the deleted fnhe fnhe_daddr to 0 in ip_del_fnhe to fix a race 2019-03-08 10:50:34 -08:00
syncookies.c tcp: handle inet_csk_reqsk_queue_add() failures 2019-03-08 16:05:10 -08:00
sysctl_net_ipv4.c net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs 2018-11-07 16:12:38 -08:00
tcp_bbr.c tcp_bbr: adapt cwnd based on ack aggregation estimation 2019-01-24 22:27:27 -08:00
tcp_bic.c
tcp_bpf.c bpf: sk_msg, sock{map|hash} redirect through ULP 2018-12-20 23:47:09 +01:00
tcp_cdg.c tcp: cdg: use tcp high resolution clock cache 2018-10-15 22:56:42 -07:00
tcp_cong.c
tcp_cubic.c
tcp_dctcp.c tcp: Ensure DCTCP reacts to losses 2019-04-04 10:51:16 -07:00
tcp_dctcp.h tcp: refactor DCTCP ECN ACK handling 2018-10-10 22:26:00 -07:00
tcp_diag.c
tcp_fastopen.c
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: handle inet_csk_reqsk_queue_add() failures 2019-03-08 16:05:10 -08:00
tcp_ipv4.c tcp: fix a potential NULL pointer dereference in tcp_sk_exit 2019-04-01 10:11:41 -07:00
tcp_lp.c
tcp_metrics.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
tcp_minisocks.c tcp: use tcp_md5_needed for timewait sockets 2019-02-26 13:16:03 -08:00
tcp_nv.c
tcp_offload.c net: use indirect call wrappers at GRO transport layer 2018-12-15 13:23:02 -08:00
tcp_output.c tcp: remove tcp_queue argument from tso_fragment() 2019-02-26 13:16:03 -08:00
tcp_rate.c tcp: introduce tcp_skb_timestamp_us() helper 2018-09-21 19:37:59 -07:00
tcp_recovery.c tcp: introduce tcp_skb_timestamp_us() helper 2018-09-21 19:37:59 -07:00
tcp_scalable.c
tcp_timer.c tcp: Refactor pingpong code 2019-01-27 13:29:43 -08:00
tcp_ulp.c tcp, ulp: remove socket lock assertion on ULP cleanup 2018-10-16 12:38:41 -07:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tcp.c tcp: do not report TCP_CM_INQ of 0 for closed connections 2019-03-06 11:00:50 -08:00
tunnel4.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
udp_diag.c net: diag: document swapped src/dst in udp_dump_one. 2018-10-28 19:27:21 -07:00
udp_impl.h udp: add missing rehash callback to udplite 2019-01-17 15:01:08 -08:00
udp_offload.c udp: use indirect call wrappers for GRO socket lookup 2018-12-15 13:23:02 -08:00
udp_tunnel.c net/ipv4/udp_tunnel: prefer SO_BINDTOIFINDEX over SO_BINDTODEVICE 2019-01-17 14:55:52 -08:00
udp.c udp: fix possible user after free in error handler 2019-02-22 16:05:11 -08:00
udplite.c udp: add missing rehash callback to udplite 2019-01-17 15:01:08 -08:00
xfrm4_input.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-09-04 10:26:30 +02:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-09-04 10:26:30 +02:00
xfrm4_mode_tunnel.c xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto 2018-03-07 10:54:29 +01:00
xfrm4_output.c
xfrm4_policy.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
xfrm4_protocol.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
xfrm4_state.c
xfrm4_tunnel.c