linux/include
Eric Dumazet 3b47d30396 net: gro: add a per device gro flush timer
Tuning coalescing parameters on NIC can be really hard.

Servers can handle both bulk and RPC like traffic, with conflicting
goals : bulk flows want as big GRO packets as possible, RPC want minimal
latencies.

To reach big GRO packets on 10Gbe NIC, one can use :

ethtool -C eth0 rx-usecs 4 rx-frames 44

But this penalizes rpc sessions, with an increase of latencies, up to
50% in some cases, as NICs generally do not force an interrupt when
a packet with TCP Push flag is received.

Some NICs do not have an absolute timer, only a timer rearmed for every
incoming packet.

This patch uses a different strategy : Let GRO stack decides what do do,
based on traffic pattern.

Packets with Push flag wont be delayed.
Packets without Push flag might be held in GRO engine, if we keep
receiving data.

This new mechanism is off by default, and shall be enabled by setting
/sys/class/net/ethX/gro_flush_timeout to a value in nanosecond.

To fully enable this mechanism, drivers should use napi_complete_done()
instead of napi_complete().

Tested:
 Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)

Without this feature, we send back about 305,000 ACK per second.

GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)

Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)

Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.

Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
0.00      0.50

B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
0.00      0.50

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10 12:05:59 -05:00
..
acpi ACPI and power management updates for 3.18-rc2 2014-10-24 11:29:31 -07:00
asm-generic fast_hash: avoid indirect function calls 2014-11-05 22:01:21 -05:00
clocksource
crypto crypto: LLVMLinux: Add macro to remove use of VLAIS in crypto code 2014-10-14 10:51:22 +02:00
drm drm/mst: rework payload table allocation to conform better. 2014-10-13 14:40:53 +10:00
dt-bindings ARM: i.MX6: Fix "emi" clock name typo 2014-10-25 20:01:09 +08:00
keys KEYS: Restore partial ID matching functionality for asymmetric keys 2014-10-06 15:21:05 +01:00
kvm arm/arm64: KVM: Fix BE accesses to GICv2 EISR and ELRSR regs 2014-10-16 10:57:41 +02:00
linux net: gro: add a per device gro flush timer 2014-11-10 12:05:59 -05:00
math-emu
media Merge branch 'patchwork' into v4l_for_linus 2014-10-09 14:00:54 -03:00
memory
misc cxl: Add new header for call backs and structs 2014-10-08 20:15:43 +11:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-06 22:01:18 -05:00
pcmcia
ras
rdma IB/mlx5, iser, isert: Add Signature API additions 2014-10-09 00:10:53 -07:00
rxrpc
scsi Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd 2014-10-21 12:53:45 -07:00
soc/tegra
sound Merge branch 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma 2014-10-18 18:11:04 -07:00
target target: Add force_pr_aptpl device attribute 2014-10-04 05:41:20 +00:00
trace Merge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent 2014-10-30 07:37:37 +01:00
uapi rtnetlink: add babel protocol recognition 2014-11-10 12:05:59 -05:00
video fbdev changes for 3.18 2014-10-18 18:03:02 -07:00
xen xen: remove DEFINE_XENBUS_DRIVER() macro 2014-10-06 10:27:57 +01:00
Kbuild