linux/net
Eric Dumazet 605ad7f184 tcp: refine TSO autosizing
Commit 95bd09eb27 ("tcp: TSO packets automatic sizing") tried to
control TSO size, but did this at the wrong place (sendmsg() time)

At sendmsg() time, we might have a pessimistic view of flow rate,
and we end up building very small skbs (with 2 MSS per skb).

This is bad because :

 - It sends small TSO packets even in Slow Start where rate quickly
   increases.
 - It tends to make socket write queue very big, increasing tcp_ack()
   processing time, but also increasing memory needs, not necessarily
   accounted for, as fast clones overhead is currently ignored.
 - Lower GRO efficiency and more ACK packets.

Servers with a lot of small lived connections suffer from this.

Lets instead fill skbs as much as possible (64KB of payload), but split
them at xmit time, when we have a precise idea of the flow rate.
skb split is actually quite efficient.

Patch looks bigger than necessary, because TCP Small Queue decision now
has to take place after the eventual split.

As Neal suggested, introduce a new tcp_tso_autosize() helper, so that
tcp_tso_should_defer() can be synchronized on same goal.

Rename tp->xmit_size_goal_segs to tp->gso_segs, as this variable
contains number of mss that we can put in GSO packet, and is not
related to the autosizing goal anymore.

Tested:

40 ms rtt link

nstat >/dev/null
netperf -H remote -l -2000000 -- -s 1000000
nstat | egrep "IpInReceives|IpOutRequests|TcpOutSegs|IpExtOutOctets"

Before patch :

Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/s

 87380 2000000 2000000    0.36         44.22
IpInReceives                    600                0.0
IpOutRequests                   599                0.0
TcpOutSegs                      1397               0.0
IpExtOutOctets                  2033249            0.0

After patch :

Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380 2000000 2000000    0.36       44.27
IpInReceives                    221                0.0
IpOutRequests                   232                0.0
TcpOutSegs                      1397               0.0
IpExtOutOctets                  2013953            0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 16:39:22 -05:00
..
6lowpan 6lowpan: move skb_free from error paths in decompression 2014-11-06 22:09:48 +01:00
9p 9p/trans_virtio: enable VQs early 2014-10-15 10:25:04 +10:30
802
8021q vlan: Pass ethtool get_ts_info queries to real device. 2014-11-21 15:35:53 -05:00
appletalk new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
atm [atm] switch vcc_sendmsg() to copy_from_iter() 2014-11-24 05:16:42 -05:00
ax25 new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
batman-adv batman-adv: replace strnicmp with strncasecmp 2014-10-14 02:18:24 +02:00
bluetooth new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2014-12-05 20:56:46 -08:00
caif new helper: memcpy_to_msg() 2014-11-24 04:28:51 -05:00
can new helper: memcpy_to_msg() 2014-11-24 04:28:51 -05:00
ceph libceph: change from BUG to WARN for __remove_osd() asserts 2014-11-13 22:26:34 +03:00
core dst: no need to take reference on DST_NOCACHE dsts 2014-12-09 16:08:17 -05:00
dcb dcbnl : Disable software interrupts before taking dcb_lock 2014-11-16 14:50:52 -05:00
dccp new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
decnet new helper: memcpy_to_msg() 2014-11-24 04:28:51 -05:00
dns_resolver Merge commit 'v3.16' into next 2014-10-01 00:44:04 +10:00
dsa net: dsa: replace count*size kzalloc by kcalloc 2014-11-16 14:41:39 -05:00
ethernet net: Add function for parsing the header length out of linear ethernet frames 2014-09-05 17:47:02 -07:00
hsr
ieee802154 new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
ipv4 tcp: refine TSO autosizing 2014-12-09 16:39:22 -05:00
ipv6 ipv6: remove useless spin_lock/spin_unlock 2014-12-09 13:18:09 -05:00
ipx switch ipxrtr_route_packet() from iovec to msghdr 2014-11-24 04:28:49 -05:00
irda new helper: memcpy_to_msg() 2014-11-24 04:28:51 -05:00
iucv new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
key new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
l2tp new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
lapb lapb: move EXPORT_SYMBOL after functions. 2014-10-24 15:51:42 -04:00
llc new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-21 22:28:24 -05:00
mac802154 ieee802154: fix byteorder for short address and panid 2014-11-17 09:49:17 +01:00
mpls net: Remove MPLS GSO feature. 2014-11-05 23:52:33 -08:00
netfilter dst: no need to take reference on DST_NOCACHE dsts 2014-12-09 16:08:17 -05:00
netlabel netlabel: kernel-doc warning fix 2014-10-09 01:40:05 -04:00
netlink new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
netrom new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
nfc new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
openvswitch openvswitch: set correct protocol on route lookup 2014-12-09 16:01:21 -05:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-29 20:47:48 -08:00
phonet new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
rds rds: switch rds_message_copy_from_user() to iov_iter 2014-11-24 05:16:43 -05:00
rfkill net: rfkill: kernel-doc warning fixes 2014-10-09 11:16:15 +02:00
rose new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
rxrpc net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
sched net: sched: cls_basic: fix error path in basic_change() 2014-12-09 15:41:56 -05:00
sctp switch sctp_user_addto_chunk() and sctp_datamsg_from_user() to passing iov_iter 2014-11-24 05:16:40 -05:00
sunrpc SUNRPC: Fix locking around callback channel reply receive 2014-11-19 12:03:20 -05:00
switchdev bridge: call netdev_sw_port_stp_update when bridge port STP status changes 2014-12-02 20:01:22 -08:00
tipc tipc: fix missing spinlock init and nullptr oops 2014-12-09 13:41:54 -05:00
unix switch AF_PACKET and AF_UNIX to skb_copy_datagram_from_iter() 2014-11-24 05:16:39 -05:00
vmw_vsock vmci_transport: switch ->enqeue_dgram, ->enqueue_stream and ->dequeue_stream to msghdr 2014-11-24 05:16:42 -05:00
wimax wimax: convert printk to pr_foo() 2014-10-07 20:28:44 -04:00
wireless nl80211: Broadcast CMD_NEW_INTERFACE and CMD_DEL_INTERFACE 2014-11-19 19:02:42 +01:00
x25 new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2014-12-08 21:30:21 -05:00
compat.c fold verify_iovec() into copy_msghdr_from_user() 2014-11-19 16:23:49 -05:00
Kconfig net: introduce generic switch devices support 2014-12-02 20:01:20 -08:00
Makefile net: introduce generic switch devices support 2014-12-02 20:01:20 -08:00
nonet.c
socket.c net/socket.c : introduce helper function do_sock_sendmsg to replace reduplicate code 2014-12-09 15:24:26 -05:00
sysctl_net.c