linux/net/ipv4
Eric Dumazet 5640f76858 net: use a per task frag allocator
We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-24 16:31:37 -04:00
..
netfilter netfilter: combine ipt_REDIRECT and ip6t_REDIRECT 2012-09-21 12:12:05 +02:00
af_inet.c ipv4: Don't add TCP-code in inet_sock_destruct 2012-09-20 17:12:27 -04:00
ah4.c ipv4: Add redirect support to all protocol icmp error handlers. 2012-07-11 21:27:49 -07:00
arp.c ipv4/route: arg delay is useless in rt_cache_flush() 2012-09-07 14:44:08 -04:00
cipso_ipv4.c cipso: don't follow a NULL pointer when setsockopt() is called 2012-07-18 09:01:12 -07:00
datagram.c
devinet.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
esp4.c ipv4: Add redirect support to all protocol icmp error handlers. 2012-07-11 21:27:49 -07:00
fib_frontend.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
fib_lookup.h
fib_rules.c ipv4/route: arg delay is useless in rt_cache_flush() 2012-09-07 14:44:08 -04:00
fib_semantics.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
fib_trie.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
gre.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
icmp.c ipv4: Prepare for change of rt->rt_iif encoding. 2012-07-23 16:36:26 -07:00
igmp.c igmp: avoid drop_monitor false positives 2012-09-07 14:17:10 -04:00
inet_connection_sock.c tcp: fix TFO regression 2012-09-06 14:21:10 -04:00
inet_diag.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
inet_fragment.c ipv6: unify fragment thresh handling code 2012-09-19 17:23:28 -04:00
inet_hashtables.c ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
inet_lro.c net: add skb frag size accessors 2011-10-19 03:10:46 -04:00
inet_timewait_sock.c net: ipv4 and ipv6: Convert printk(KERN_DEBUG to pr_debug 2012-05-16 01:01:03 -04:00
inetpeer.c ipv4: Maintain redirect and PMTU info in struct rtable again. 2012-07-10 22:40:14 -07:00
ip_forward.c snmp: fix OutOctets counter to include forwarded datagrams 2012-06-07 14:50:56 -07:00
ip_fragment.c ipv6: unify fragment thresh handling code 2012-09-19 17:23:28 -04:00
ip_gre.c gre: add GSO support 2012-09-19 15:40:15 -04:00
ip_input.c net: TCP early demux cleanup 2012-07-30 14:53:21 -07:00
ip_options.c ipv4: optimize fib_compute_spec_dst call in ip_options_echo 2012-07-19 08:30:49 -07:00
ip_output.c net: use a per task frag allocator 2012-09-24 16:31:37 -04:00
ip_sockglue.c ipv4: Prepare for change of rt->rt_iif encoding. 2012-07-23 16:36:26 -07:00
ip_vti.c net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse. 2012-07-23 13:00:54 -07:00
ipcomp.c ipv4: Add redirect support to all protocol icmp error handlers. 2012-07-11 21:27:49 -07:00
ipconfig.c ipconfig: add nameserver IPs to kernel-parameter ip= 2012-09-21 14:51:21 -04:00
ipip.c ipv4: Adjust semantics of rt->rt_gateway. 2012-07-20 13:31:20 -07:00
ipmr.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
Kconfig net/ipv4: VTI support new module for ip_vti. 2012-07-18 09:36:12 -07:00
Makefile memcg: rename config variables 2012-07-31 18:42:43 -07:00
netfilter.c netfilter: properly annotate ipv4_netfilter_{init,fini}() 2012-09-03 13:56:04 +02:00
ping.c userns: Use kgids for sysctl_ping_group_range 2012-08-14 21:49:10 -07:00
proc.c tcp: TCP Fast Open Server - header & support functions 2012-08-31 20:02:18 -04:00
protocol.c inet: Sanitize inet{,6} protocol demux. 2012-06-19 18:56:21 -07:00
raw.c net: use a per task frag allocator 2012-09-24 16:31:37 -04:00
route.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
syncookies.c tcp: TCP Fast Open Server - support TFO listeners 2012-08-31 20:02:19 -04:00
sysctl_net_ipv4.c tcp: TCP Fast Open Server - header & support functions 2012-08-31 20:02:18 -04:00
tcp_bic.c tcp: fix undo after RTO for BIC 2012-01-20 14:17:26 -05:00
tcp_cong.c tcp: Apply device TSO segment limit earlier 2012-08-02 00:19:17 -07:00
tcp_cubic.c tcp: fix undo after RTO for CUBIC 2012-01-20 14:17:26 -05:00
tcp_diag.c inet_diag: Rename inet_diag_req into inet_diag_req_v2 2012-01-11 12:56:06 -08:00
tcp_fastopen.c tcp: TCP Fast Open Server - header & support functions 2012-08-31 20:02:18 -04:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c tcp: bool conversions 2012-05-17 14:59:59 -04:00
tcp_illinois.c
tcp_input.c tcp: TCP Fast Open Server - record retransmits after 3WHS 2012-09-22 23:15:25 -04:00
tcp_ipv4.c net: use a per task frag allocator 2012-09-24 16:31:37 -04:00
tcp_lp.c
tcp_memcontrol.c memcg: decrement static keys at real destroy time 2012-05-29 16:22:28 -07:00
tcp_metrics.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
tcp_minisocks.c tcp: TCP Fast Open Server - note timestamps and retransmits for SYNACK RTT 2012-09-22 15:47:10 -04:00
tcp_output.c tcp: use PRR to reduce cwin in CWR state 2012-09-03 14:34:02 -04:00
tcp_probe.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
tcp_scalable.c
tcp_timer.c tcp: TCP Fast Open Server - support TFO listeners 2012-08-31 20:02:19 -04:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tcp.c net: use a per task frag allocator 2012-09-24 16:31:37 -04:00
tunnel4.c net: Convert printks to pr_<level> 2012-03-11 23:42:51 -07:00
udp_diag.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
udp_impl.h ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
udp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-09-15 11:43:53 -04:00
udplite.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
xfrm4_input.c ipv4: Fix input route performance regression. 2012-07-26 15:50:39 -07:00
xfrm4_mode_beet.c ipsec: be careful of non existing mac headers 2012-02-23 16:50:45 -05:00
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c net/ipv4: VTI support rx-path hook in xfrm4_mode_tunnel. 2012-07-18 09:36:12 -07:00
xfrm4_output.c
xfrm4_policy.c ipv4: Properly purge netdev references on uncached routes. 2012-07-31 15:06:50 -07:00
xfrm4_state.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
xfrm4_tunnel.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00