linux/net
Yuchung Cheng b9f64820fb tcp: track data delivery rate for a TCP connection
This patch generates data delivery rate (throughput) samples on a
per-ACK basis. These rate samples can be used by congestion control
modules, and specifically will be used by TCP BBR in later patches in
this series.

Key state:

tp->delivered: Tracks the total number of data packets (original or not)
	       delivered so far. This is an already-existing field.

tp->delivered_mstamp: the last time tp->delivered was updated.

Algorithm:

A rate sample is calculated as (d1 - d0)/(t1 - t0) on a per-ACK basis:

  d1: the current tp->delivered after processing the ACK
  t1: the current time after processing the ACK

  d0: the prior tp->delivered when the acked skb was transmitted
  t0: the prior tp->delivered_mstamp when the acked skb was transmitted

When an skb is transmitted, we snapshot d0 and t0 in its control
block in tcp_rate_skb_sent().

When an ACK arrives, it may SACK and ACK some skbs. For each SACKed
or ACKed skb, tcp_rate_skb_delivered() updates the rate_sample struct
to reflect the latest (d0, t0).

Finally, tcp_rate_gen() generates a rate sample by storing
(d1 - d0) in rs->delivered and (t1 - t0) in rs->interval_us.

One caveat: if an skb was sent with no packets in flight, then
tp->delivered_mstamp may be either invalid (if the connection is
starting) or outdated (if the connection was idle). In that case,
we'll re-stamp tp->delivered_mstamp.

At first glance it seems t0 should always be the time when an skb was
transmitted, but actually this could over-estimate the rate due to
phase mismatch between transmit and ACK events. To track the delivery
rate, we ensure that if packets are in flight then t0 and and t1 are
times at which packets were marked delivered.

If the initial and final RTTs are different then one may be corrupted
by some sort of noise. The noise we see most often is sending gaps
caused by delayed, compressed, or stretched acks. This either affects
both RTTs equally or artificially reduces the final RTT. We approach
this by recording the info we need to compute the initial RTT
(duration of the "send phase" of the window) when we recorded the
associated inflight. Then, for a filter to avoid bandwidth
overestimates, we generalize the per-sample bandwidth computation
from:

    bw = delivered / ack_phase_rtt

to the following:

    bw = delivered / max(send_phase_rtt, ack_phase_rtt)

In large-scale experiments, this filtering approach incorporating
send_phase_rtt is effective at avoiding bandwidth overestimates due to
ACK compression or stretched ACKs.

Signed-off-by: Van Jacobson <vanj@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 00:23:00 -04:00
..
6lowpan 6lowpan: ndisc: no overreact if no short address is available 2016-09-19 20:19:34 +02:00
9p 9p/trans_virtio: use kvfree() for iov_iter_get_pages_alloc() 2016-08-09 13:42:36 +03:00
802
8021q net: remove type_check from dev_get_nest_level() 2016-08-13 15:15:54 -07:00
appletalk appletalk: use IS_ENABLED() instead of checking for built-in or module 2016-09-10 21:19:10 -07:00
atm lec: use IS_ENABLED() instead of checking for built-in or module 2016-09-10 21:19:10 -07:00
ax25 AX.25: Close socket connection on session completion 2016-06-18 20:55:34 -07:00
batman-adv batman: make netlink attributes const 2016-09-01 14:09:00 -07:00
bluetooth Bluetooth: Set appearance only for LE capable controllers 2016-09-19 21:48:22 +03:00
bridge net: bridge: add helper to call /sbin/bridge-stp 2016-09-13 11:21:31 -04:00
caif caif: Remove unneeded header file 2016-06-28 05:26:14 -04:00
can can: only call can_stat_update with procfs 2016-06-23 11:23:49 +02:00
ceph libceph: using kfree_rcu() to simplify the code 2016-08-08 21:41:42 +02:00
core bpf: direct packet write and access for helpers for clsact progs 2016-09-20 23:32:11 -04:00
dcb
dccp Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-07-29 17:38:46 -07:00
decnet net: fix decnet rtnexthop parsing 2016-07-05 14:08:47 -07:00
dns_resolver
dsa net-next: dsa: make the set_addr() operation optional 2016-09-20 04:47:44 -04:00
ethernet
hsr
ieee802154 ieee802154: 6lowpan: fix intra pan id check 2016-07-08 13:23:12 +02:00
ipv4 tcp: track data delivery rate for a TCP connection 2016-09-21 00:23:00 -04:00
ipv6 gso: Support partial splitting at the frag_list pointer 2016-09-19 20:59:34 -04:00
ipx
irda net/irda: remove pointless assignment/check 2016-08-19 18:07:24 -07:00
iucv Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-07-29 17:38:46 -07:00
kcm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-12 15:52:44 -07:00
key
l2tp l2tp: constify net_device_ops structures 2016-09-17 10:07:23 -04:00
l3mdev net: ipv6: Remove l3mdev_get_saddr6 2016-09-10 23:12:53 -07:00
lapb
llc llc: switch type to bool as the timeout is only tested versus 0 2016-09-17 10:05:05 -04:00
mac80211 mac80211: Use rhltable instead of rhashtable 2016-09-20 04:43:36 -04:00
mac802154 mac802154: use rate limited warnings for malformed frames 2016-09-19 20:19:34 +02:00
mpls mpls: get rid of trivial returns 2016-09-01 10:13:15 -07:00
ncsi net/ncsi: avoid maybe-uninitialized warning 2016-07-25 10:32:59 -07:00
netfilter net: Add _nf_(un)register_hooks symbols 2016-09-19 01:25:22 -04:00
netlabel netlabel: Implement CALIPSO config functions for SMACK. 2016-06-27 15:06:18 -04:00
netlink netlink: don't forget to release a rhashtable_iter structure 2016-09-07 17:29:38 -07:00
netrom
nfc NFC: digital: Fix RTOX supervisor PDU handling 2016-07-11 02:02:03 +02:00
openvswitch openvswitch: avoid resetting flow key while installing new flow. 2016-09-20 22:54:35 -04:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-07-24 00:53:32 -04:00
phonet
qrtr
rds RDS: add __printf format attribute to error reporting functions 2016-08-08 16:16:21 -07:00
rfkill
rose rose: limit sk_filter trim to payload 2016-07-13 11:53:40 -07:00
rxrpc rxrpc: Add config to inject packet loss 2016-09-17 11:24:04 +01:00
sched net_sched: sch_fq: add low_rate_threshold parameter 2016-09-21 00:23:00 -04:00
sctp sctp: Remove some redundant code 2016-09-19 01:34:01 -04:00
strparser kcm: Remove TCP specific references from kcm and strparser 2016-08-28 23:32:41 -04:00
sunrpc SUNRPC: Silence WARN_ON when NFSv4.1 over RDMA is in use 2016-08-24 22:32:55 -04:00
switchdev rtnetlink: fdb dump: optimize by saving last interface markers 2016-09-01 16:56:15 -07:00
tipc tipc: fix possible memory leak in tipc_udp_enable() 2016-09-13 11:28:32 -04:00
unix af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock' 2016-09-04 13:29:29 -07:00
vmw_vsock vhost/vsock: drop space available check for TX vq 2016-08-15 05:05:21 +03:00
wimax
wireless This time we have various things - all across the board: 2016-09-18 22:29:08 -04:00
x25 net: x25: remove null checks on arrays calling_ae and called_ae 2016-09-09 18:13:30 -07:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-12 15:52:44 -07:00
compat.c packet: compat support for sock_fprog 2016-06-09 23:41:03 -07:00
Kconfig strparser: Stream parser for messages 2016-08-17 19:36:23 -04:00
Makefile strparser: Stream parser for messages 2016-08-17 19:36:23 -04:00
socket.c
sysctl_net.c net: make net namespace sysctls belong to container's owner 2016-08-14 21:08:58 -07:00