44543 Commits

Author SHA1 Message Date
Linus Torvalds
ef091b3cef Ceph's ->read_iter() implementation is incompatible with the new
generic_file_splice_read() code that went into -rc1.  Switch to the
 less efficient default_file_splice_read() for now; the proper fix is
 being held for 4.10.
 
 We also have a fix for a 4.8 regression and a trival libceph fixup.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJYJdjPAAoJEEp/3jgCEfOLzEoH/A3B1qqiqs2WoMn0O4pnEdcM
 TxaU46VOkYcK2wh/xbYAns2kZEXKgcCySv+kXn4l3Gh6/lXVxv4WexNqWdO1o6yN
 GqEufIH7yQM6QOE/hkwtUciBXmPfQMPxF14vvprYQuyu5Bs96mrphiAa7vX6Vbk5
 VhfE/j0shb8Q2QQj/Om0mWqM6JtOAlr5aFtEcJcodbCk1k8CptUcBsSoQ31PXMC7
 UcaBHh1VHGLvx9WeG1Rw1g9tc2LiUyu+UK0csolp51+amB7HezgfmzDQzHtXzBmm
 n90SQwonf0DrdWUGuQlOpHnREwxLSgN19s68FCjLc0jeMTP4b6TFEIUgFxiqWc4=
 =Ws5s
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.9-rc5' of git://github.com/ceph/ceph-client

Pull Ceph fixes from Ilya Dryomov:
 "Ceph's ->read_iter() implementation is incompatible with the new
  generic_file_splice_read() code that went into -rc1.  Switch to the
  less efficient default_file_splice_read() for now; the proper fix is
  being held for 4.10.

  We also have a fix for a 4.8 regression and a trival libceph fixup"

* tag 'ceph-for-4.9-rc5' of git://github.com/ceph/ceph-client:
  libceph: initialize last_linger_id with a large integer
  libceph: fix legacy layout decode with pool 0
  ceph: use default file splice read callback
2016-11-11 09:17:10 -08:00
Linus Torvalds
ef5beed998 NFS client bugfixes for Linux 4.9
Bugfixes:
 - Trim extra slashes in v4 nfs_paths to fix tools that use this
 - Fix a -Wmaybe-uninitialized warnings
 - Fix suspicious RCU usages
 - Fix Oops when mounting multiple servers at once
 - Suppress a false-positive pNFS error
 - Fix a DMAR failure in NFS over RDMA
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJYJOCbAAoJENfLVL+wpUDrbO0QAIkcxdUu2iQeOrk07VP48kDE
 UEfJTal8vbW/KtKyL9bIeRa1qCvYpSJXnnKcR/Uo5VHE5nMz/5omoJofWf5Zg0UM
 iEHyZfOsuGFieBbl1NBaLjEd6MCoJYpmWFUj+3drZ8zqSdqDTL+JgrP7k3XEU2Mx
 glKb7U0AKoclm3h1MKyCyo5TgDVeI5TOhi+i3VVw2IN79VY2CUp4lHWMY4vloghp
 h+GuJWeVFS1nBpfCF9PpTU6LdHDfg4o/J5+DrP+IjIffD1XGzGEjfFR0BX5HyDcN
 PgOSF3fc7uVOOUIBEAqHUHY/7XiKlv6TEMRPdM8ALVoCXZ6hPSSFxq8JBJSWoVEp
 r11ts66VgYxdQgHbs51Y5AaKudLBwU60KosWuddbdZVb4YPM0cn5WQzVezrpoQYu
 k4rfrpt+LFv23NGfIJa6JaTSFBzM+YXmggEGUI8TI/YUFSN+wEp4uzLB4r19nqAP
 ff32iunzV9Z5edpPQFDCf3/1HAhzrL5KWo7E8EvijpdQKZl5k5CnUJxbG22lh4ct
 QIyYg51LjhCayzbRH8Mu+TKUFT29ORlcSp851BotLjT8ZdUetWXcFab93nAkQI7g
 sMREml4DvcXWy8qFAOzi8mX1ddTBumxBfOD0m3skPg+odxwsl/KiwjLCRwfTrgwS
 jfSXsXmrwTniPCDWgKg3
 =hFod
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.9-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "Most of these fix regressions in 4.9, and none are going to stable
  this time around.

  Bugfixes:
   - Trim extra slashes in v4 nfs_paths to fix tools that use this
   - Fix a -Wmaybe-uninitialized warnings
   - Fix suspicious RCU usages
   - Fix Oops when mounting multiple servers at once
   - Suppress a false-positive pNFS error
   - Fix a DMAR failure in NFS over RDMA"

* tag 'nfs-for-4.9-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect
  fs/nfs: Fix used uninitialized warn in nfs4_slot_seqid_in_use()
  NFS: Don't print a pNFS error if we aren't using pNFS
  NFS: Ignore connections that have cl_rpcclient uninitialized
  SUNRPC: Fix suspicious RCU usage
  NFSv4.1: work around -Wmaybe-uninitialized warning
  NFS: Trim extra slash in v4 nfs_path
2016-11-11 09:15:30 -08:00
Ilya Dryomov
264048afab libceph: initialize last_linger_id with a large integer
osdc->last_linger_id is a counter for lreq->linger_id, which is used
for watch cookies.  Starting with a large integer should ease the task
of telling apart kernel and userspace clients.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-11-10 20:13:08 +01:00
Yan, Zheng
3890dce1d3 libceph: fix legacy layout decode with pool 0
If your data pool was pool 0, ceph_file_layout_from_legacy()
transform that to -1 unconditionally, which broke upgrades.
We only want do that for a fully zeroed ceph_file_layout,
so that it still maps to a file_layout_t.  If any fields
are set, though, we trust the fl_pgpool to be a valid pool.

Fixes: 7627151ea30bc ("libceph: define new ceph_file_layout structure")
Link: http://tracker.ceph.com/issues/17825
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-11-10 20:13:08 +01:00
Lance Richardson
0ace81ec71 ipv4: update comment to document GSO fragmentation cases.
This is a follow-up to commit 9ee6c5dc816a ("ipv4: allow local
fragmentation in ip_finish_output_gso()"), updating the comment
documenting cases in which fragmentation is needed for egress
GSO packets.

Suggested-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10 12:01:54 -05:00
Chuck Lever
62bdf94a20 xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect
When a LOCALINV WR is flushed, the frmr is marked STALE, then
frwr_op_unmap_sync DMA-unmaps the frmr's SGL. These STALE frmrs
are then recovered when frwr_op_map hunts for an INVALID frmr to
use.

All other cases that need frmr recovery leave that SGL DMA-mapped.
The FRMR recovery path unconditionally DMA-unmaps the frmr's SGL.

To avoid DMA unmapping the SGL twice for flushed LOCAL_INV WRs,
alter the recovery logic (rather than the hot frwr_op_unmap_sync
path) to distinguish among these cases. This solution also takes
care of the case where multiple LOCAL_INV WRs are issued for the
same rpcrdma_req, some complete successfully, but some are flushed.

Reported-by: Vasco Steinmetz <linux@kyberraum.net>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Vasco Steinmetz <linux@kyberraum.net>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-10 11:04:54 -05:00
kbuild test robot
737d387b75 netfilter: ipset: hash: fix boolreturn.cocci warnings
net/netfilter/ipset/ip_set_hash_ipmac.c:70:8-9: WARNING: return of 0/1 in function 'hash_ipmac4_data_list' with return type bool
net/netfilter/ipset/ip_set_hash_ipmac.c:178:8-9: WARNING: return of 0/1 in function 'hash_ipmac6_data_list' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

CC: Tomasz Chilinski <tomasz.chilinski@chilan.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:50 +01:00
Jozsef Kadlecsik
fcb58a0332 netfilter: ipset: use setup_timer() and mod_timer().
Use setup_timer() and instead of init_timer(), being the preferred way
of setting up a timer.

Also, quoting the mod_timer() function comment:
-> mod_timer() is a more efficient way to update the expire field of an
   active timer (if the timer is inactive it will be activated).

Use setup_timer() and mod_timer() to setup and arm a timer, making the
code compact and easier to read.

Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:49 +01:00
Tomasz Chilinski
c0db95c7bc netfilter: ipset: hash:ipmac type support added to ipset
Introduce the hash:ipmac type.

Signed-off-by: Tomasz Chili??ski <tomasz.chilinski@chilan.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:49 +01:00
Jozsef Kadlecsik
a71bdbfa99 netfilter: ipset: Fix reported memory size for hash:* types
The calculation of the full allocated memory did not take
into account the size of the base hash bucket structure at some
places.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:48 +01:00
Jozsef Kadlecsik
9be37d2acd netfilter: ipset: Collapse same condition body to a single one
The set full case (with net_ratelimit()-ed pr_warn()) is already
handled, simply jump there.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:48 +01:00
Jozsef Kadlecsik
21956ab290 netfilter: ipset: Make struct htype per ipset family
Before this patch struct htype created at the first source
of ip_set_hash_gen.h and it is common for both IPv4 and IPv6
set variants.

Make struct htype per ipset family and use NLEN to make
nets array fixed size to simplify struct htype allocation.

Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:48 +01:00
Jozsef Kadlecsik
961509ac18 netfilter: ipset: Optimize hash creation routine
Exit as easly as possible on error and use RCU_INIT_POINTER()
as set is not seen at creation time.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:47 +01:00
Jozsef Kadlecsik
5a902e6d4b netfilter: ipset: Make sure element data size is a multiple of u32
Data for hashing required to be array of u32. Make sure that
element data always multiple of u32.

Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:47 +01:00
Jozsef Kadlecsik
cee8b97b6c netfilter: ipset: Make NLEN compile time constant for hash types
Hash types define HOST_MASK before inclusion of ip_set_hash_gen.h
and the only place where NLEN needed to be calculated at runtime
is *_create() method.

Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:46 +01:00
Jozsef Kadlecsik
509debc975 netfilter: ipset: Simplify mtype_expire() for hash types
Remove one leve of intendation by using continue while
iterating over elements in bucket.

Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:46 +01:00
Jozsef Kadlecsik
5fdb5f6938 netfilter: ipset: Remove redundant mtype_expire() arguments
Remove redundant parameters nets_length and dsize, because
they can be get from other parameters.

Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:46 +01:00
Jozsef Kadlecsik
9e41f26a50 netfilter: ipset: Count non-static extension memory for userspace
Non-static (i.e. comment) extension was not counted into the memory
size. A new internal counter is introduced for this. In the case of
the hash types the sizes of the arrays are counted there as well so
that we can avoid to scan the whole set when just the header data
is requested.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:45 +01:00
Jozsef Kadlecsik
702b71e7c6 netfilter: ipset: Add element count to all set types header
It is better to list the set elements for all set types, thus the
header information is uniform. Element counts are therefore added
to the bitmap and list types.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:45 +01:00
Eric B Munson
a54dad51a8 netfilter: ipset: Add element count to hash headers
It would be useful for userspace to query the size of an ipset hash,
however, this data is not exposed to userspace outside of counting the
number of member entries.  This patch uses the attribute
IPSET_ATTR_ELEMENTS to indicate the size in the the header that is
exported to userspace.  This field is then printed by the userspace
tool for hashes.

Signed-off-by: Eric B Munson <emunson@akamai.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Josh Hunt <johunt@akamai.com>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:44 +01:00
Jozsef Kadlecsik
722a94519a netfilter: ipset: Separate memsize calculation code into dedicated function
Hash types already has it's memsize calculation code in separate
functions. Clean up and do the same for *bitmap* and *list* sets.

Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.

Suggested-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:44 +01:00
Jozsef Kadlecsik
bec810d973 netfilter: ipset: Improve skbinfo get/init helpers
Use struct ip_set_skbinfo in struct ip_set_ext instead of open
coded fields and assign structure members in get/init helpers
instead of copying members one by one. Explicitly note that
struct ip_set_skbinfo must be padded to prevent non-aligned
access in the extension blob.

Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.

Suggested-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:42 +01:00
Eric Dumazet
f522a5fcf6 tcp: remove unaligned accesses from tcp_get_info()
After commit 6ed46d1247a5 ("sock_diag: align nlattr properly when
needed"), tcp_get_info() gets 64bit aligned memory, so we can avoid
the unaligned helpers.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 22:51:33 -05:00
David Ahern
9b6c14d51b net: tcp response should set oif only if it is L3 master
Lorenzo noted an Android unit test failed due to e0d56fdd7342:
"The expectation in the test was that the RST replying to a SYN sent to a
closed port should be generated with oif=0. In other words it should not
prefer the interface where the SYN came in on, but instead should follow
whatever the routing table says it should do."

Revert the change to ip_send_unicast_reply and tcp_v6_send_response such
that the oif in the flow is set to the skb_iif only if skb_iif is an L3
master.

Fixes: e0d56fdd7342 ("net: l3mdev: remove redundant calls")
Reported-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 22:32:10 -05:00
David S. Miller
d401c1d1e8 This feature and cleanup patchset includes the following changes:
- netlink and code cleanups by Sven Eckelmann (3 patches)
 
  - Cleanup and minor fixes by Linus Luessing (3 patches)
 
  - Speed up multicast update intervals, by Linus Luessing
 
  - Avoid (re)broadcast in meshes for some easy cases,
    by Linus Luessing
 
  - Clean up tx return state handling, by Sven Eckelmann (6 patches)
 
  - Fix some special mac address handling cases, by Sven Eckelmann
    (3 patches)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdBQJYI6GFFhxzd0BzaW1vbnd1bmRlcmxpY2guZGUACgkQoSvjmEKS
 nqHyEw/9GkYNRQJOk1JMuW0cDvj9uWqoendvXRNPVkCvqh4gjX4o+aQaeyumv1/v
 eYqpslQWmSsrIlGQ6UGCegzyzZ7jXo6ZijM7wvz2bWwB2C0NzUGlBBCzOeA6Bui/
 3Fq+Xmx0Xcf5+c82YmrLor/yYp4FIFTao4+a80vHzQeI/Hg8RuJTOFJdtVNV3JPP
 VrfzMAPLLXJPPKHjt1PN3lfANWqX6nWLUMhHBNkMpYB+mMdyaCve6X+MxPF+WYBH
 wBO8spU35chW7dp8HOncof5nRDv2xVHWs6TN2kdJ762YrZ1oL0GXwWXViKhWskSQ
 QEeOLboyj3IuwPsxOQOLQEbAMrp6jqj3L/6lYWRkV2U6Bbi8EYdozW8L3utxMcvA
 Dft8D2U5JAD5ja0VUFyGhwNaBFien2B9JSEwsyOLtUbaQSASNyvym75WrN2Ey/d7
 JhBzUt6Iwh8RNJylY3nG5OkoNnyXYv3VrQLsIW4QTHc8Um9eaiOeFHtuAi6WNBtI
 HgMwPcdErNbmPd3w9OM6kk6aBQ/DTUK/7CNUKYVoGDayGxKYGDwqhoog9zm18wrt
 wc/TtdIY+q95hgm8fDCJefrnkaIxDJrVtChs30N/pJ24MeKcHuibop3HzxIngze2
 zPZTuXRKA2VSt79+EV4KORAutexi1WQIN7nRH1a8zMsYfyMKG8Y=
 =1xrj
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-for-davem-20161108-v2' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
pull request for net-next: batman-adv 2016-11-08 v2

This feature and cleanup patchset includes the following changes:

 - netlink and code cleanups by Sven Eckelmann (3 patches)

 - Cleanup and minor fixes by Linus Luessing (3 patches)

 - Speed up multicast update intervals, by Linus Luessing

 - Avoid (re)broadcast in meshes for some easy cases,
   by Linus Luessing

 - Clean up tx return state handling, by Sven Eckelmann (6 patches)

 - Fix some special mac address handling cases, by Sven Eckelmann
   (3 patches)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 22:15:28 -05:00
Eric Dumazet
149d6ad836 net: napi_hash_add() is no longer exported
There are no more users except from net/core/dev.c
napi_hash_add() can now be static.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 21:16:05 -05:00
David Lebrun
a149e7c7ce ipv6: sr: add support for SRH injection through setsockopt
This patch adds support for per-socket SRH injection with the setsockopt
system call through the IPPROTO_IPV6, IPV6_RTHDR options.
The SRH is pushed through the ipv6_push_nfrag_opts function.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 20:40:06 -05:00
David Lebrun
613fa3ca9e ipv6: add source address argument for ipv6_push_nfrag_opts
This patch prepares for insertion of SRH through setsockopt().
The new source address argument is used when an HMAC field is
present in the SRH, which must be filled. The HMAC signature
process requires the source address as input text.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 20:40:06 -05:00
David Lebrun
9baee83406 ipv6: sr: add calls to verify and insert HMAC signatures
This patch enables the verification of the HMAC signature for transiting
SR-enabled packets, and its insertion on encapsulated/injected SRH.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 20:40:06 -05:00
David Lebrun
4f4853dc1c ipv6: sr: implement API to control SR HMAC structure
This patch provides an implementation of the genetlink commands
to associate a given HMAC key identifier with an hashing algorithm
and a secret.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 20:40:06 -05:00
David Lebrun
bf355b8d2c ipv6: sr: add core files for SR HMAC support
This patch adds the necessary functions to compute and check the HMAC signature
of an SR-enabled packet. Two HMAC algorithms are supported: hmac(sha1) and
hmac(sha256).

In order to avoid dynamic memory allocation for each HMAC computation,
a per-cpu ring buffer is allocated for this purpose.

A new per-interface sysctl called seg6_require_hmac is added, allowing a
user-defined policy for processing HMAC-signed SR-enabled packets.
A value of -1 means that the HMAC field will always be ignored.
A value of 0 means that if an HMAC field is present, its validity will
be enforced (the packet is dropped is the signature is incorrect).
Finally, a value of 1 means that any SR-enabled packet that does not
contain an HMAC signature or whose signature is incorrect will be dropped.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 20:40:06 -05:00
David Lebrun
6c8702c60b ipv6: sr: add support for SRH encapsulation and injection with lwtunnels
This patch creates a new type of interfaceless lightweight tunnel (SEG6),
enabling the encapsulation and injection of SRH within locally emitted
packets and forwarded packets.

>From a configuration viewpoint, a seg6 tunnel would be configured as follows:

  ip -6 ro ad fc00::1/128 encap seg6 mode encap segs fc42::1,fc42::2,fc42::3 dev eth0

Any packet whose destination address is fc00::1 would thus be encapsulated
within an outer IPv6 header containing the SRH with three segments, and would
actually be routed to the first segment of the list. If `mode inline' was
specified instead of `mode encap', then the SRH would be directly inserted
after the IPv6 header without outer encapsulation.

The inline mode is only available if CONFIG_IPV6_SEG6_INLINE is enabled. This
feature was made configurable because direct header insertion may break
several mechanisms such as PMTUD or IPSec AH.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 20:40:06 -05:00
David Lebrun
915d7e5e59 ipv6: sr: add code base for control plane support of SR-IPv6
This patch adds the necessary hooks and structures to provide support
for SR-IPv6 control plane, essentially the Generic Netlink commands
that will be used for userspace control over the Segment Routing
kernel structures.

The genetlink commands provide control over two different structures:
tunnel source and HMAC data. The tunnel source is the source address
that will be used by default when encapsulating packets into an
outer IPv6 header + SRH. If the tunnel source is set to :: then an
address of the outgoing interface will be selected as the source.

The HMAC commands currently just return ENOTSUPP and will be implemented
in a future patch.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 20:40:06 -05:00
David Lebrun
1ababeba4a ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)
Implement minimal support for processing of SR-enabled packets
as described in
https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-02.

This patch implements the following operations:
- Intermediate segment endpoint: incrementation of active segment and rerouting.
- Egress for SR-encapsulated packets: decapsulation of outer IPv6 header + SRH
  and routing of inner packet.
- Cleanup flag support for SR-inlined packets: removal of SRH if we are the
  penultimate segment endpoint.

A per-interface sysctl seg6_enabled is provided, to accept/deny SR-enabled
packets. Default is deny.

This patch does not provide support for HMAC-signed packets.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 20:40:06 -05:00
David S. Miller
9fa684ec86 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains a larger than usual batch of Netfilter
fixes for your net tree. This series contains a mixture of old bugs and
recently introduced bugs, they are:

1) Fix a crash when using nft_dynset with nft_set_rbtree, which doesn't
   support the set element updates from the packet path. From Liping
   Zhang.

2) Fix leak when nft_expr_clone() fails, from Liping Zhang.

3) Fix a race when inserting new elements to the set hash from the
   packet path, also from Liping.

4) Handle segmented TCP SIP packets properly, basically avoid that the
   INVITE in the allow header create bogus expectations by performing
   stricter SIP message parsing, from Ulrich Weber.

5) nft_parse_u32_check() should return signed integer for errors, from
   John Linville.

6) Fix wrong allocation instead of connlabels, allocate 16 instead of
   32 bytes, from Florian Westphal.

7) Fix compilation breakage when building the ip_vs_sync code with
   CONFIG_OPTIMIZE_INLINING on x86, from Arnd Bergmann.

8) Destroy the new set if the transaction object cannot be allocated,
   also from Liping Zhang.

9) Use device to route duplicated packets via nft_dup only when set by
   the user, otherwise packets may not follow the right route, again
   from Liping.

10) Fix wrong maximum genetlink attribute definition in IPVS, from
    WANG Cong.

11) Ignore untracked conntrack objects from xt_connmark, from Florian
    Westphal.

12) Allow to use conntrack helpers that are registered NFPROTO_UNSPEC
    via CT target, otherwise we cannot use the h.245 helper, from
    Florian.

13) Revisit garbage collection heuristic in the new workqueue-based
    timer approach for conntrack to evict objects earlier, again from
    Florian.

14) Fix crash in nf_tables when inserting an element into a verdict map,
    from Liping Zhang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 20:38:18 -05:00
Mathias Krause
f567e950bf rtnl: reset calcit fptr in rtnl_unregister()
To avoid having dangling function pointers left behind, reset calcit in
rtnl_unregister(), too.

This is no issue so far, as only the rtnl core registers a netlink
handler with a calcit hook which won't be unregistered, but may become
one if new code makes use of the calcit hook.

Fixes: c7ac8679bec9 ("rtnetlink: Compute and store minimum ifinfo...")
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 20:18:19 -05:00
Asbjørn Sloth Tønnesen
3f9b9770b4 net: l2tp: fix negative assignment to unsigned int
recv_seq, send_seq and lns_mode mode are all defined as
unsigned int foo:1;

Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 18:55:36 -05:00
Asbjørn Sloth Tønnesen
57ceb8611d net: l2tp: cleanup: remove redundant condition
These assignments follow this pattern:

	unsigned int foo:1;
	struct nlattr *nla = info->attrs[bar];

	if (nla)
		foo = nla_get_flag(nla); /* expands to: foo = !!nla */

This could be simplified to: if (nla) foo = 1;
but lets just remove the condition and use the macro,

	foo = nla_get_flag(nla);

Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 18:55:36 -05:00
Asbjørn Sloth Tønnesen
97b7af097e net: l2tp: netlink: l2tp_nl_tunnel_send: set UDP6 checksum flags
This patch causes the proper attribute flags to be set,
in the case that IPv6 UDP checksums are disabled, so that
userspace ie. `ip l2tp show tunnel` knows about it.

Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 18:55:36 -05:00
Asbjørn Sloth Tønnesen
7ff516ffe4 net: l2tp: only set L2TP_ATTR_UDP_CSUM if AF_INET
Only set L2TP_ATTR_UDP_CSUM in l2tp_nl_tunnel_send()
when it's running over IPv4.

This prepares the code to also have IPv6 specific attributes.

Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 18:55:36 -05:00
David Ahern
9d1a6c4ea4 net: icmp_route_lookup should use rt dev to determine L3 domain
icmp_send is called in response to some event. The skb may not have
the device set (skb->dev is NULL), but it is expected to have an rt.
Update icmp_route_lookup to use the rt on the skb to determine L3
domain.

Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 18:49:39 -05:00
Eric Dumazet
d61d072e87 net-gro: avoid reorders
Receiving a GSO packet in dev_gro_receive() is not uncommon
in stacked devices, or devices partially implementing LRO/GRO
like bnx2x. GRO is implementing the aggregation the device
was not able to do itself.

Current code causes reorders, like in following case :

For a given flow where sender sent 3 packets P1,P2,P3,P4

Receiver might receive P1 as a single packet, stored in GRO engine.

Then P2-P4 are received as a single GSO packet, immediately given to
upper stack, while P1 is held in GRO engine.

This patch will make sure P1 is given to upper stack, then P2-P4
immediately after.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 18:48:54 -05:00
Arnd Bergmann
56a62e2218 netfilter: conntrack: fix NF_REPEAT handling
gcc correctly identified a theoretical uninitialized variable use:

net/netfilter/nf_conntrack_core.c: In function 'nf_conntrack_in':
net/netfilter/nf_conntrack_core.c:1125:14: error: 'l4proto' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This could only happen when we 'goto out' before looking up l4proto,
and then enter the retry, implying that l3proto->get_l4proto()
returned NF_REPEAT. This does not currently get returned in any
code path and probably won't ever happen, but is not good to
rely on.

Moving the repeat handling up a little should have the same
behavior as today but avoids the warning by making that case
impossible to enter.

[ I have mangled this original patch to remove the check for tmpl, we
  should inconditionally jump back to the repeat label in case we hit
  NF_REPEAT instead. I have also moved the comment that explains this
  where it belongs. --pablo ]

Fixes: 08733a0cb7de ("netfilter: handle NF_REPEAT from nf_conntrack_in()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-10 00:19:33 +01:00
Arnd Bergmann
30f5815848 udp: provide udp{4,6}_lib_lookup for nf_socket_ipv{4,6}
Since commit ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU")
the udp6_lib_lookup and udp4_lib_lookup functions are only
provided when it is actually possible to call them.

However, moving the callers now caused a link error:

net/built-in.o: In function `nf_sk_lookup_slow_v6':
(.text+0x131a39): undefined reference to `udp6_lib_lookup'
net/ipv4/netfilter/nf_socket_ipv4.o: In function `nf_sk_lookup_slow_v4':
nf_socket_ipv4.c:(.text.nf_sk_lookup_slow_v4+0x114): undefined reference to `udp4_lib_lookup'

This extends the #ifdef so we also provide the functions when
CONFIG_NF_SOCKET_IPV4 or CONFIG_NF_SOCKET_IPV6, respectively
are set.

Fixes: 8db4c5be88f6 ("netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.c")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-10 00:05:14 +01:00
Davide Caratti
0e54d2179f netfilter: conntrack: simplify init/uninit of L4 protocol trackers
modify registration and deregistration of layer-4 protocol trackers to
facilitate inclusion of new elements into the current list of builtin
protocols. Both builtin (TCP, UDP, ICMP) and non-builtin (DCCP, GRE, SCTP,
UDPlite) layer-4 protocol trackers usually register/deregister themselves
using consecutive calls to nf_ct_l4proto_{,pernet}_{,un}register(...).
This sequence is interrupted and rolled back in case of error; in order to
simplify addition of builtin protocols, the input of the above functions
has been modified to allow registering/unregistering multiple protocols.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-09 23:49:25 +01:00
Sebastian Andrzej Siewior
a4fc1bfc42 net/flowcache: Convert to hotplug state machine
Install the callbacks via the state machine. Use multi state support to avoid
custom list handling for the multiple instances.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: netdev@vger.kernel.org
Cc: rt@linutronix.de
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20161103145021.28528-10-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09 23:45:28 +01:00
Sebastian Andrzej Siewior
f0bf90def3 net/dev: Convert to hotplug state machine
Install the callbacks via the state machine.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161103145021.28528-9-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09 23:45:28 +01:00
Liping Zhang
4e24877e61 netfilter: nf_tables: simplify the basic expressions' init routine
Some basic expressions are built into nf_tables.ko, such as nft_cmp,
nft_lookup, nft_range and so on. But these basic expressions' init
routine is a little ugly, too many goto errX labels, and we forget
to call nft_range_module_exit in the exit routine, although it is
harmless.

Acctually, the init and exit routines of these basic expressions
are same, i.e. do nft_register_expr in the init routine and do
nft_unregister_expr in the exit routine.

So it's better to arrange them into an array and deal with them
together.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-09 23:42:23 +01:00
Pablo Neira Ayuso
f86dab3aa6 netfilter: nft_hash: get random bytes if seed is not specified
If the user doesn't specify a seed, generate one at configuration time.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-09 23:41:09 +01:00
Hadar Hen Zion
75bfbca01e net/sched: act_tunnel_key: Add UDP dst port option
The current tunnel set action supports only IP addresses and key
options. Add UDP dst port option.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:41:55 -05:00