linux/net/ipv4
Eric Dumazet 605ad7f184 tcp: refine TSO autosizing
Commit 95bd09eb27 ("tcp: TSO packets automatic sizing") tried to
control TSO size, but did this at the wrong place (sendmsg() time)

At sendmsg() time, we might have a pessimistic view of flow rate,
and we end up building very small skbs (with 2 MSS per skb).

This is bad because :

 - It sends small TSO packets even in Slow Start where rate quickly
   increases.
 - It tends to make socket write queue very big, increasing tcp_ack()
   processing time, but also increasing memory needs, not necessarily
   accounted for, as fast clones overhead is currently ignored.
 - Lower GRO efficiency and more ACK packets.

Servers with a lot of small lived connections suffer from this.

Lets instead fill skbs as much as possible (64KB of payload), but split
them at xmit time, when we have a precise idea of the flow rate.
skb split is actually quite efficient.

Patch looks bigger than necessary, because TCP Small Queue decision now
has to take place after the eventual split.

As Neal suggested, introduce a new tcp_tso_autosize() helper, so that
tcp_tso_should_defer() can be synchronized on same goal.

Rename tp->xmit_size_goal_segs to tp->gso_segs, as this variable
contains number of mss that we can put in GSO packet, and is not
related to the autosizing goal anymore.

Tested:

40 ms rtt link

nstat >/dev/null
netperf -H remote -l -2000000 -- -s 1000000
nstat | egrep "IpInReceives|IpOutRequests|TcpOutSegs|IpExtOutOctets"

Before patch :

Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/s

 87380 2000000 2000000    0.36         44.22
IpInReceives                    600                0.0
IpOutRequests                   599                0.0
TcpOutSegs                      1397               0.0
IpExtOutOctets                  2033249            0.0

After patch :

Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380 2000000 2000000    0.36       44.27
IpInReceives                    221                0.0
IpOutRequests                   232                0.0
TcpOutSegs                      1397               0.0
IpExtOutOctets                  2013953            0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 16:39:22 -05:00
..
netfilter netfilter: combine IPv4 and IPv6 nf_nat_redirect code in one module 2014-11-27 13:08:42 +01:00
af_inet.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-29 20:47:48 -08:00
ah4.c ipsec: Remove obsolete MAX_AH_AUTH_LEN 2014-09-18 10:54:36 +02:00
arp.c neigh: remove dynamic neigh table registration support 2014-11-11 15:23:54 -05:00
cipso_ipv4.c cipso: remove NULL assignment on static 2014-11-04 15:09:52 -05:00
datagram.c net: Save TX flow hash in sock and set in skbuf on xmit 2014-07-07 21:14:21 -07:00
devinet.c ipv4: fail early when creating netdev named all or default 2014-07-29 11:43:50 -07:00
esp4.c net: esp: Convert NETDEBUG to pr_info 2014-11-06 15:11:10 -05:00
fib_frontend.c ipv4: Restore accept_local behaviour in fib_validate_source() 2014-08-22 12:23:10 -07:00
fib_lookup.h ipv4: make fib_detect_death static 2013-12-28 17:01:46 -05:00
fib_rules.c ipv4: Fix incorrect error code when adding an unreachable route 2014-11-16 14:11:45 -05:00
fib_semantics.c ipv4: fix nexthop attlen check in fib_nh_match 2014-10-14 15:59:37 -04:00
fib_trie.c list: fix order of arguments for hlist_add_after(_rcu) 2014-08-06 18:01:24 -07:00
fou.c gue: Call remcsum_adjust 2014-11-26 12:25:44 -05:00
geneve.c vlan: introduce *vlan_hwaccel_push_inside helpers 2014-11-21 14:20:17 -05:00
gre_demux.c net: Fix GRE RX to use skb_transport_header for GRE header offset 2014-09-08 15:23:05 -07:00
gre_offload.c gre: Use inner mac length when computing tunnel length 2014-10-30 19:51:56 -04:00
icmp.c icmp: Remove some spurious dropped packet profile hits from the ICMP path 2014-11-18 15:28:28 -05:00
igmp.c ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs 2014-11-16 16:55:06 -05:00
inet_connection_sock.c ipv4: make ip_local_reserved_ports per netns 2014-05-14 15:31:45 -04:00
inet_diag.c inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets 2014-01-13 22:35:46 -08:00
inet_fragment.c net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited 2014-11-11 14:10:31 -05:00
inet_hashtables.c net: use reciprocal_scale() helper 2014-08-23 12:21:21 -07:00
inet_lro.c lro: remove dead code 2013-12-29 16:34:25 -05:00
inet_timewait_sock.c tcp/dccp: remove twchain 2013-10-08 23:19:24 -04:00
inetpeer.c inet: remove dead inetpeer sequence code 2014-09-08 16:42:42 -07:00
ip_forward.c net: rename local_df to ignore_df 2014-05-12 14:03:41 -04:00
ip_fragment.c net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited 2014-11-11 14:10:31 -05:00
ip_gre.c fou: Fix typo in returning flags in netlink 2014-11-05 22:18:20 -05:00
ip_input.c net: Fix memory leak if TPROXY used with TCP early demux 2014-01-27 16:22:11 -08:00
ip_options.c ipv4: rename ip_options_echo to __ip_options_echo() 2014-09-28 16:35:42 -04:00
ip_output.c net; ipv[46] - Remove 2 unnecessary NETDEBUG OOM messages 2014-11-06 15:11:10 -05:00
ip_sockglue.c net-timestamp: allow reading recv cmsg on errqueue with origin tstamp 2014-12-08 20:20:48 -05:00
ip_tunnel_core.c ipv4: fix a potential use after free in ip_tunnel_core.c 2014-10-17 23:45:26 -04:00
ip_tunnel.c ip_tunnel: Ops registration for secondary encap (fou, gue) 2014-11-12 15:01:35 -05:00
ip_vti.c ip_tunnel: the lack of vti_link_ops' dellink() cause kernel panic 2014-11-23 21:11:17 -05:00
ipcomp.c ipcomp4: Use the IPsec protocol multiplexer API 2014-02-25 07:04:17 +01:00
ipconfig.c ipv4: remove 0/NULL assignment on static 2014-11-04 15:09:52 -05:00
ipip.c fou: Fix typo in returning flags in netlink 2014-11-05 22:18:20 -05:00
ipmr.c net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
Kconfig net: Move fou_build_header into fou.c and refactor 2014-11-05 16:30:02 -05:00
Makefile net: Add Geneve tunneling protocol driver 2014-10-06 00:32:20 -04:00
netfilter.c netfilter: remove double colon 2014-02-19 11:41:25 +01:00
ping.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-29 20:47:48 -08:00
proc.c tcp_cubic: add SNMP counters to track how effective is Hystart 2014-12-09 14:58:23 -05:00
protocol.c net: Export inet_offloads and inet6_offloads 2014-09-19 17:15:31 -04:00
raw.c ipv4: Avoid reading user iov twice after raw_probe_proto_opt 2014-11-10 14:25:35 -05:00
route.c ipv4: Do not cache routing failures due to disabled forwarding. 2014-10-30 19:20:40 -04:00
syncookies.c net: allow setting ecn via routing table 2014-11-04 16:06:09 -05:00
sysctl_net_ipv4.c tcp: allow for bigger reordering level 2014-10-29 15:05:15 -04:00
tcp_bic.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_cong.c tcp: spelling s/plugable/pluggable 2014-11-04 15:09:52 -05:00
tcp_cubic.c tcp_cubic: refine Hystart delay threshold 2014-12-09 14:58:23 -05:00
tcp_dctcp.c net: tcp: add DCTCP congestion control algorithm 2014-09-29 00:13:10 -04:00
tcp_diag.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_fastopen.c tcp: remove unnecessary assignment. 2014-09-29 12:31:12 -04:00
tcp_highspeed.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_htcp.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_hybla.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_illinois.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_input.c new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
tcp_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-29 20:47:48 -08:00
tcp_lp.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tcp_memcontrol.c percpu_counter: add @gfp to percpu_counter_init() 2014-09-08 09:51:29 +09:00
tcp_metrics.c tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logic 2014-08-14 14:38:54 -07:00
tcp_minisocks.c tcp: change TCP_ECN prefixes to lower case 2014-09-29 14:41:22 -04:00
tcp_offload.c net: Remove MPLS GSO feature. 2014-11-05 23:52:33 -08:00
tcp_output.c tcp: refine TSO autosizing 2014-12-09 16:39:22 -05:00
tcp_probe.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_scalable.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_timer.c net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited 2014-11-11 14:10:31 -05:00
tcp_vegas.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_vegas.h net: ipv4/ipv6: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
tcp_veno.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_westwood.c net: tcp: split ack slow/fast events from cwnd_event 2014-09-29 00:13:10 -04:00
tcp_yeah.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp.c tcp: refine TSO autosizing 2014-12-09 16:39:22 -05:00
tunnel4.c
udp_diag.c
udp_impl.h net: ipv4/ipv6: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
udp_offload.c net: Remove MPLS GSO feature. 2014-11-05 23:52:33 -08:00
udp_tunnel.c udp-tunnel: Add a few more UDP tunnel APIs 2014-09-19 15:57:15 -04:00
udp.c udp: Neaten and reduce size of compute_score functions 2014-12-08 20:28:47 -05:00
udplite.c net: Eliminate no_check from protosw 2014-05-23 16:28:53 -04:00
xfrm4_input.c xfrm4: Add IPsec protocol multiplexer 2014-02-25 07:04:16 +01:00
xfrm4_mode_beet.c ipv4: ERROR: code indent should use tabs where possible 2013-12-26 13:43:21 -05:00
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c inetpeer: get rid of ip_id_count 2014-06-02 11:00:41 -07:00
xfrm4_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-05-24 00:32:30 -04:00
xfrm4_policy.c xfrm: Introduce xfrm_input_afinfo to access the the callbacks properly 2014-03-14 07:28:07 +01:00
xfrm4_protocol.c xfrm4: Remove duplicate semicolon 2014-06-30 07:49:47 +02:00
xfrm4_state.c inet: make no_pmtu_disc per namespace and kill ipv4_config 2013-12-18 16:58:20 -05:00
xfrm4_tunnel.c