linux/net/ipv4
Eric Dumazet 46d3ceabd8 tcp: TCP Small Queues
This introduce TSQ (TCP Small Queues)

TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.

sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.

TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.

As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.

This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.

Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.

Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)

I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.

As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.

If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.

[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
  but some drivers call it in their start_xmit() handler.
  These drivers should at least use BQL, or else a single TCP
  session can still fill the whole NIC TX ring, since TSQ will
  have no effect.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11 18:12:59 -07:00
..
netfilter Merge branch 'master' of git://1984.lsi.us.es/nf-next 2012-07-07 16:18:50 -07:00
af_inet.c ipv4: Early TCP socket demux. 2012-06-19 21:22:05 -07:00
ah4.c ipv4: Handle PMTU in all ICMP error handlers. 2012-06-14 22:22:07 -07:00
arp.c Revert "ipv4: tcp: dont cache unconfirmed intput dst" 2012-06-27 17:05:06 -07:00
cipso_ipv4.c ipv4: Convert call_rcu() to kfree_rcu(), drop opt_kfree_rcu() 2012-02-21 09:03:31 -08:00
datagram.c ipv4: Lock socket and use cork flow in ip4_datagram_connect(). 2011-05-08 13:48:57 -07:00
devinet.c ipv4: Add interface option to enable routing of 127.0.0.0/8 2012-06-12 15:25:46 -07:00
esp4.c ipv4: Handle PMTU in all ICMP error handlers. 2012-06-14 22:22:07 -07:00
fib_frontend.c ipv4: Avoid overhead when no custom FIB rules are installed. 2012-07-05 22:13:13 -07:00
fib_lookup.h ipv4: Fix nexthop caching wrt. scoping. 2011-03-24 18:06:47 -07:00
fib_rules.c ipv4: Avoid overhead when no custom FIB rules are installed. 2012-07-05 22:13:13 -07:00
fib_semantics.c ipv4: Enforce max MTU metric at route insertion time. 2012-07-10 22:40:15 -07:00
fib_trie.c inet: Add inetpeer tree roots to the FIB tables. 2012-06-11 02:09:16 -07:00
gre.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
icmp.c inet: Minimize use of cached route inetpeer. 2012-07-10 22:40:11 -07:00
igmp.c ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
inet_connection_sock.c inet: Kill FLOWI_FLAG_PRECOW_METRICS. 2012-07-10 22:40:12 -07:00
inet_diag.c inet_diag: Do not use RTA_PUT() macros 2012-06-27 15:36:44 -07:00
inet_fragment.c inetpeer: add parameter net for inet_getpeer_v4,v6 2012-06-08 14:27:23 -07:00
inet_hashtables.c ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
inet_lro.c net: add skb frag size accessors 2011-10-19 03:10:46 -04:00
inet_timewait_sock.c net: ipv4 and ipv6: Convert printk(KERN_DEBUG to pr_debug 2012-05-16 01:01:03 -04:00
inetpeer.c ipv4: Maintain redirect and PMTU info in struct rtable again. 2012-07-10 22:40:14 -07:00
ip_forward.c snmp: fix OutOctets counter to include forwarded datagrams 2012-06-07 14:50:56 -07:00
ip_fragment.c Revert "ipv4: tcp: dont cache unconfirmed intput dst" 2012-06-27 17:05:06 -07:00
ip_gre.c ipv4: Handle PMTU in all ICMP error handlers. 2012-06-14 22:22:07 -07:00
ip_input.c ipv4: Kill early demux method return value. 2012-06-27 22:01:22 -07:00
ip_options.c ipv4: defer fib_compute_spec_dst() call 2012-07-05 03:03:32 -07:00
ip_output.c net: Do delayed neigh confirmation. 2012-07-05 01:03:06 -07:00
ip_sockglue.c ipv4: Create and use fib_compute_spec_dst() helper. 2012-06-28 03:59:11 -07:00
ipcomp.c ipv4: Handle PMTU in all ICMP error handlers. 2012-06-14 22:22:07 -07:00
ipconfig.c net/ipv4/ipconfig: neaten __setup placement 2012-05-20 04:06:16 -04:00
ipip.c ipv4: Handle PMTU in all ICMP error handlers. 2012-06-14 22:22:07 -07:00
ipmr.c net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
Kconfig net: delete all instances of special processing for token ring 2012-05-15 20:14:35 -04:00
Makefile tcp: Move dynamnic metrics handling into seperate file. 2012-07-10 20:31:36 -07:00
netfilter.c net: Delete all remaining instances of ctl_path 2012-04-20 21:22:30 -04:00
ping.c ipv4: Handle PMTU in all ICMP error handlers. 2012-06-14 22:22:07 -07:00
proc.c tcp: reduce out_of_order memory use 2012-03-19 16:53:08 -04:00
protocol.c inet: Sanitize inet{,6} protocol demux. 2012-06-19 18:56:21 -07:00
raw.c ipv4: Handle PMTU in all ICMP error handlers. 2012-06-14 22:22:07 -07:00
route.c ipv4: Remove inetpeer from routes. 2012-07-10 22:40:18 -07:00
syncookies.c tcp: fix syncookie regression 2012-03-11 15:52:12 -07:00
sysctl_net_ipv4.c tcp: TCP Small Queues 2012-07-11 18:12:59 -07:00
tcp_bic.c tcp: fix undo after RTO for BIC 2012-01-20 14:17:26 -05:00
tcp_cong.c tcp: bool conversions 2012-05-17 14:59:59 -04:00
tcp_cubic.c tcp: fix undo after RTO for CUBIC 2012-01-20 14:17:26 -05:00
tcp_diag.c inet_diag: Rename inet_diag_req into inet_diag_req_v2 2012-01-11 12:56:06 -08:00
tcp_highspeed.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_htcp.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_hybla.c tcp: bool conversions 2012-05-17 14:59:59 -04:00
tcp_illinois.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_input.c tcp: Move dynamnic metrics handling into seperate file. 2012-07-10 20:31:36 -07:00
tcp_ipv4.c tcp: TCP Small Queues 2012-07-11 18:12:59 -07:00
tcp_lp.c Fix common misspellings 2011-03-31 11:26:23 -03:00
tcp_memcontrol.c memcg: decrement static keys at real destroy time 2012-05-29 16:22:28 -07:00
tcp_metrics.c tcp: Fix out of bounds access to tcpm_vals 2012-07-11 17:30:41 -07:00
tcp_minisocks.c tcp: TCP Small Queues 2012-07-11 18:12:59 -07:00
tcp_output.c tcp: TCP Small Queues 2012-07-11 18:12:59 -07:00
tcp_probe.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
tcp_scalable.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_timer.c tcp: early retransmit: delayed fast retransmit 2012-05-02 20:56:10 -04:00
tcp_vegas.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_vegas.h
tcp_veno.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_westwood.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_yeah.c Fix common misspellings 2011-03-31 11:26:23 -03:00
tcp.c tcp: TCP Small Queues 2012-07-11 18:12:59 -07:00
tunnel4.c net: Convert printks to pr_<level> 2012-03-11 23:42:51 -07:00
udp_diag.c udp_diag: implement idiag_get_info for udp/udplite to get queue information 2012-04-25 20:43:01 -04:00
udp_impl.h ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
udp.c net: skb_free_datagram_locked() doesnt drop all packets 2012-06-27 15:40:57 -07:00
udplite.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
xfrm4_input.c Revert "ipv4: tcp: dont cache unconfirmed intput dst" 2012-06-27 17:05:06 -07:00
xfrm4_mode_beet.c ipsec: be careful of non existing mac headers 2012-02-23 16:50:45 -05:00
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c ipsec: be careful of non existing mac headers 2012-02-23 16:50:45 -05:00
xfrm4_output.c xfrm4: Don't call icmp_send on local error 2011-07-01 17:33:19 -07:00
xfrm4_policy.c ipv4: Remove inetpeer from routes. 2012-07-10 22:40:18 -07:00
xfrm4_state.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
xfrm4_tunnel.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00