linux/net/ipv6
Eric Dumazet 93ab6cc691 tcp: implement mmap() for zero copy receive
Some networks can make sure TCP payload can exactly fit 4KB pages,
with well chosen MSS/MTU and architectures.

Implement mmap() system call so that applications can avoid
copying data without complex splice() games.

Note that a successful mmap( X bytes) on TCP socket is consuming
bytes, as if recvmsg() has been done. (tp->copied += X)

Only PROT_READ mappings are accepted, as skb page frags
are fundamentally shared and read only.

If tcp_mmap() finds data that is not a full page, or a patch of
urgent data, -EINVAL is returned, no bytes are consumed.

Application must fallback to recvmsg() to read the problematic sequence.

mmap() wont block,  regardless of socket being in blocking or
non-blocking mode. If not enough bytes are in receive queue,
mmap() would return -EAGAIN, or -EIO if socket is in a state
where no other bytes can be added into receive queue.

An application might use SO_RCVLOWAT, poll() and/or ioctl( FIONREAD)
to efficiently use mmap()

On the sender side, MSG_EOR might help to clearly separate unaligned
headers and 4K-aligned chunks if necessary.

Tested:

mlx4 (cx-3) 40Gbit NIC, with tcp_mmap program provided in following patch.
MTU set to 4168  (4096 TCP payload, 40 bytes IPv6 header, 32 bytes TCP header)

Without mmap() (tcp_mmap -s)

received 32768 MB (0 % mmap'ed) in 8.13342 s, 33.7961 Gbit,
  cpu usage user:0.034 sys:3.778, 116.333 usec per MB, 63062 c-switches
received 32768 MB (0 % mmap'ed) in 8.14501 s, 33.748 Gbit,
  cpu usage user:0.029 sys:3.997, 122.864 usec per MB, 61903 c-switches
received 32768 MB (0 % mmap'ed) in 8.11723 s, 33.8635 Gbit,
  cpu usage user:0.048 sys:3.964, 122.437 usec per MB, 62983 c-switches
received 32768 MB (0 % mmap'ed) in 8.39189 s, 32.7552 Gbit,
  cpu usage user:0.038 sys:4.181, 128.754 usec per MB, 55834 c-switches

With mmap() on receiver (tcp_mmap -s -z)

received 32768 MB (100 % mmap'ed) in 8.03083 s, 34.2278 Gbit,
  cpu usage user:0.024 sys:1.466, 45.4712 usec per MB, 65479 c-switches
received 32768 MB (100 % mmap'ed) in 7.98805 s, 34.4111 Gbit,
  cpu usage user:0.026 sys:1.401, 43.5486 usec per MB, 65447 c-switches
received 32768 MB (100 % mmap'ed) in 7.98377 s, 34.4296 Gbit,
  cpu usage user:0.028 sys:1.452, 45.166 usec per MB, 65496 c-switches
received 32768 MB (99.9969 % mmap'ed) in 8.01838 s, 34.281 Gbit,
  cpu usage user:0.02 sys:1.446, 44.7388 usec per MB, 65505 c-switches

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-16 18:26:37 -04:00
..
ila net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
netfilter inet: frags: fix ip6frag_low_thresh boundary 2018-04-04 12:04:59 -04:00
addrconf_core.c net: ipv6: Make inet6addr_validator a blocking notifier 2017-10-20 13:15:07 +01:00
addrconf.c ipv6: remove unnecessary check in addrconf_prefix_rcv_add_addr() 2018-04-16 18:16:16 -04:00
addrlabel.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
af_inet6.c tcp: implement mmap() for zero copy receive 2018-04-16 18:26:37 -04:00
ah6.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-11-15 11:56:19 -08:00
anycast.c net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
calipso.c net, calipso: convert calipso_doi.refcount from atomic_t to refcount_t 2017-07-04 22:35:16 +01:00
datagram.c ipv6: add a wrapper for ip6_dst_store() with flowi6 checks 2018-04-04 11:31:57 -04:00
esp6_offload.c esp: check the NETIF_F_HW_ESP_TX_CSUM bit before segmenting 2018-02-27 10:46:01 +01:00
esp6.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-17 00:10:42 -05:00
exthdrs_core.c inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
exthdrs_offload.c
exthdrs.c ipv6: sr: fix TLVs not being copied using setsockopt 2018-01-10 16:03:55 -05:00
fib6_notifier.c net: Add module reference to FIB notifiers 2017-09-01 20:33:42 -07:00
fib6_rules.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
fou6.c fou: make local function static 2017-05-21 13:42:36 -04:00
icmp.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
inet6_connection_sock.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-28 10:33:06 -05:00
inet6_hashtables.c inet: Add a 2nd listener hashtable (port+addr) 2017-12-03 10:18:28 -05:00
ip6_checksum.c udplite: fix partial checksum initialization 2018-02-16 15:57:42 -05:00
ip6_fib.c net/ipv6: Move call_fib6_entry_notifiers up for route adds 2018-03-29 14:10:31 -04:00
ip6_flowlabel.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ip6_gre.c ip6_gre: better validate user provided tunnel names 2018-04-05 15:16:15 -04:00
ip6_icmp.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ip6_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-04-20 10:35:33 -04:00
ip6_offload.c gso: fix payload length when gso_size is zero 2017-10-08 10:12:15 -07:00
ip6_offload.h
ip6_output.c net/ipv6: Increment OUTxxx counters after netfilter hook 2018-04-05 22:23:43 -04:00
ip6_tunnel.c ip6_tunnel: better validate user provided tunnel names 2018-04-05 15:16:15 -04:00
ip6_udp_tunnel.c
ip6_vti.c vti6: better validate user provided tunnel names 2018-04-05 15:16:15 -04:00
ip6mr.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ipcomp6.c
ipv6_sockglue.c inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
Kconfig ipmr,ipmr6: Define a uniform vif_device 2018-03-01 13:13:23 -05:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mcast_snoop.c
mcast.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
mip6.c
ndisc.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
netfilter.c netfilter: use skb_to_full_sk in ip6_route_me_harder 2018-02-25 20:51:13 +01:00
output_core.c net: accept UFO datagrams from tuntap and packet 2017-11-24 01:37:35 +09:00
ping.c ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow() 2018-04-04 11:31:57 -04:00
proc.c inet: frags: break the 2GB limit for frags storage 2018-03-31 23:25:39 -04:00
protocol.c net: Add sysctl to toggle early demux for tcp and udp 2017-03-24 13:17:07 -07:00
raw.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
reassembly.c inet: frags: fix ip6frag_low_thresh boundary 2018-04-04 12:04:59 -04:00
route.c ipv6: add a wrapper for ip6_dst_store() with flowi6 checks 2018-04-04 11:31:57 -04:00
seg6_hmac.c ipv6: sr: Use ARRAY_SIZE macro 2017-09-01 18:35:23 -07:00
seg6_iptunnel.c ipv6: sr: fix seg6 encap performances with TSO enabled 2018-03-30 14:14:33 -04:00
seg6_local.c net/ipv6: Pass skb to route lookup 2018-03-04 13:04:22 -05:00
seg6.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
sit.c ipv6: sit: better validate user provided tunnel names 2018-04-05 15:16:15 -04:00
syncookies.c net/ipv4: disable SMC TCP option with SYN Cookies 2018-03-25 20:53:54 -04:00
sysctl_net_ipv6.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
tcp_ipv6.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-03-31 23:33:04 -04:00
tcpv6_offload.c gso: validate gso_type in GSO handlers 2018-01-22 16:01:30 -05:00
tunnel6.c
udp_impl.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
udp_offload.c gso: validate gso_type in GSO handlers 2018-01-22 16:01:30 -05:00
udp.c ipv6: udp: set dst cache for a connected sk if current not valid 2018-04-04 11:31:57 -04:00
udplite.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
xfrm6_input.c xfrm: Reinject transport-mode packets through tasklet 2017-12-19 08:23:21 +01:00
xfrm6_mode_beet.c networking: make skb_pull & friends return void pointers 2017-06-16 11:48:39 -04:00
xfrm6_mode_ro.c ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt() 2017-06-02 13:57:27 -04:00
xfrm6_mode_transport.c ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt() 2017-06-02 13:57:27 -04:00
xfrm6_mode_tunnel.c xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto 2018-03-07 10:54:29 +01:00
xfrm6_output.c net: xfrm: use skb_gso_validate_network_len() to check gso sizes 2018-03-04 17:49:17 -05:00
xfrm6_policy.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
xfrm6_protocol.c xfrm: input: constify xfrm_input_afinfo 2017-02-09 10:22:17 +01:00
xfrm6_state.c inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
xfrm6_tunnel.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00