47288 Commits

Author SHA1 Message Date
Ursula Braun
897e1c2457 net/smc: use separate memory regions for RMBs
SMC currently uses the unsafe_global_rkey of the protection domain,
which exposes all memory for remote reads and writes once a connection
is established. This patch introduces separate memory regions with
separate rkeys for every RMB. Now the unsafe_global_rkey of the
protection domain is no longer needed.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:58 -07:00
Ursula Braun
a3fe3d01bd net/smc: introduce sg-logic for RMBs
The follow-on patch makes use of ib_map_mr_sg() when introducing
separate memory regions for RMBs. This function is based on
scatterlists; thus this patch introduces scatterlists for RMBs.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:57 -07:00
Ursula Braun
c45abf31e7 net/smc: shorten local bufsize variables
Initiate the coming rework of SMC buffer handling with this
small code cleanup. No functional changes here.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:57 -07:00
Ursula Braun
977bb32440 net/smc: serialize connection creation in all cases
If a link group for a new server connection exists already, the mutex
serializing the determination of link groups is given up early.
The coming registration of memory regions benefits from the serialization
as well, if the mutex is held till connection creation is finished.
This patch postpones the unlocking of the link group creation mutex.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:57 -07:00
Colin Ian King
f25cbb2a6a batman-adv: fix various spelling mistakes
Trivial fix to spelling mistakes in batadv_dbg debug messages and
also in a comment and ensure comment line is not wider than 80
characters

"ourselve" -> "ourselves"
"surpressed" -> "suppressed"
"troughput" -> "throughput"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-07-29 09:51:28 +02:00
Sven Eckelmann
e04de4861c batman-adv: Remove variable deprecated by skb_put_data
skb_put_data makes it unnecessary to store the skb_put return value to copy
some data to the packet. The returned pointer of skb_put_data should
therefore not stored by functions which previously only used it to copy
some data.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-07-29 09:51:27 +02:00
Sven Eckelmann
6a04be8d86 batman-adv: Remove too short %pM printk field width
The string representation for a mac address produced by %pM is 17
characters long. Left-aligning the output in a 15 character wide field
width %-15pM is therefore misleading and unnecessary.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-07-29 09:51:26 +02:00
Joe Perches
cd0edf3ada batman-adv: Remove unnecessary length qualifier in %14pM
It's misleading and unnecessary.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-07-29 09:51:26 +02:00
Simon Wunderlich
a311abdffb batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-07-29 09:51:25 +02:00
Julia Lawall
d04916a48a l2tp: constify inet6_protocol structures
The inet6_protocol structure is only passed as the first argument to
inet6_add_protocol or inet6_del_protocol, both of which are declared as
const.  Thus the inet6_protocol structure itself can be const.

Also drop __read_mostly on the newly const structure.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-28 18:52:56 -07:00
Julia Lawall
3a3a4e3054 ipv6: constify inet6_protocol structures
The inet6_protocol structure is only passed as the first argument to
inet6_add_protocol or inet6_del_protocol, both of which are declared as
const.  Thus the inet6_protocol structure itself can be const.

Also drop __read_mostly where present on the newly const structures.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-28 18:52:56 -07:00
Xin Long
e90ce2fc27 dccp: fix a memleak for dccp_feat_init err process
In dccp_feat_init, when ccid_get_builtin_ccids failsto alloc
memory for rx.val, it should free tx.val before returning an
error.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-27 00:01:05 -07:00
Xin Long
b7953d3c0e dccp: fix a memleak that dccp_ipv4 doesn't put reqsk properly
The patch "dccp: fix a memleak that dccp_ipv6 doesn't put reqsk
properly" fixed reqsk refcnt leak for dccp_ipv6. The same issue
exists on dccp_ipv4.

This patch is to fix it for dccp_ipv4.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-27 00:01:05 -07:00
Xin Long
0c2232b0a7 dccp: fix a memleak that dccp_ipv6 doesn't put reqsk properly
In dccp_v6_conn_request, after reqsk gets alloced and hashed into
ehash table, reqsk's refcnt is set 3. one is for req->rsk_timer,
one is for hlist, and the other one is for current using.

The problem is when dccp_v6_conn_request returns and finishes using
reqsk, it doesn't put reqsk. This will cause reqsk refcnt leaks and
reqsk obj never gets freed.

Jianlin found this issue when running dccp_memleak.c in a loop, the
system memory would run out.

dccp_memleak.c:
  int s1 = socket(PF_INET6, 6, IPPROTO_IP);
  bind(s1, &sa1, 0x20);
  listen(s1, 0x9);
  int s2 = socket(PF_INET6, 6, IPPROTO_IP);
  connect(s2, &sa1, 0x20);
  close(s1);
  close(s2);

This patch is to put the reqsk before dccp_v6_conn_request returns,
just as what tcp_conn_request does.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-27 00:01:05 -07:00
Matthias Kaehlcke
0c3a8f8b8f netpoll: Fix device name check in netpoll_setup()
Apparently netpoll_setup() assumes that netpoll.dev_name is a pointer
when checking if the device name is set:

if (np->dev_name) {
  ...

However the field is a character array, therefore the condition always
yields true. Check instead whether the first byte of the array has a
non-zero value.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-26 17:01:43 -07:00
Paolo Abeni
9688f9b020 udp: unbreak build lacking CONFIG_XFRM
We must use pre-processor conditional block or suitable accessors to
manipulate skb->sp elsewhere builds lacking the CONFIG_XFRM will break.

Fixes: dce4551cb2ad ("udp: preserve head state for IP_CMSG_PASSSEC")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-26 09:35:29 -07:00
Stefano Brivio
afce615aaa ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()
RFC 2465 defines ipv6IfStatsOutFragFails as:

	"The number of IPv6 datagrams that have been discarded
	 because they needed to be fragmented at this output
	 interface but could not be."

The existing implementation, instead, would increase the counter
twice in case we fail to allocate room for single fragments:
once for the fragment, once for the datagram.

This didn't look intentional though. In one of the two affected
affected failure paths, the double increase was simply a result
of a new 'goto fail' statement, introduced to avoid a skb leak.
The other path appears to be affected since at least 2.6.12-rc2.

Reported-by: Sabrina Dubroca <sdubroca@redhat.com>
Fixes: 1d325d217c7f ("ipv6: ip6_fragment: fix headroom tests and skb leak")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-25 21:17:10 -07:00
stephen hemminger
ce7426ca70 6lowpan: fix set not used warning
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Acked-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-25 12:31:37 -07:00
stephen hemminger
614d79c09e socket: fix set not used warning
The variable owned_by_user is always set, but only used
when kernel is configured with LOCKDEP enabled.

Get rid of the warning by moving the code to put the call
to owned_by_user into the the rcu_protected call.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-25 12:31:37 -07:00
stephen hemminger
3754b87a4e netfilter: remove unused variable
warning: ‘recent_old_fops’ defined but not used

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-25 12:31:37 -07:00
Paolo Abeni
dce4551cb2 udp: preserve head state for IP_CMSG_PASSSEC
Paul Moore reported a SELinux/IP_PASSSEC regression
caused by missing skb->sp at recvmsg() time. We need to
preserve the skb head state to process the IP_CMSG_PASSSEC
cmsg.

With this commit we avoid releasing the skb head state in the
BH even if a secpath is attached to the current skb, and stores
the skb status (with/without head states) in the scratch area,
so that we can access it at skb deallocation time, without
incurring in cache-miss penalties.

This also avoids misusing the skb CB for ipv6 packets,
as introduced by the commit 0ddf3fb2c43d ("udp: preserve
skb->dst if required for IP options processing").

Clean a bit the scratch area helpers implementation, to
reduce the code differences between 32 and 64 bits build.

Reported-by: Paul Moore <paul@paul-moore.com>
Fixes: 0a463c78d25b ("udp: avoid a cache miss on dequeue")
Fixes: 0ddf3fb2c43d ("udp: preserve skb->dst if required for IP options processing")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-25 10:00:58 -07:00
Florian Fainelli
9f9e772da2 net: dsa: Initialize ds->cpu_port_mask earlier
The mt7530 driver has its dsa_switch_ops::get_tag_protocol function
check ds->cpu_port_mask to issue a warning in case the configured CPU
port is not capable of supporting tags.

After commit 14be36c2c96c ("net: dsa: Initialize all CPU and enabled
ports masks in dsa_ds_parse()") we slightly re-arranged the
initialization such that this was no longer working. Just make sure that
ds->cpu_port_mask is set prior to the first call to get_tag_protocol,
thus restoring the expected contract. In case of error, the CPU port bit
is cleared.

Fixes: 14be36c2c96c ("net: dsa: Initialize all CPU and enabled ports masks in dsa_ds_parse()")
Reported-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 17:36:27 -07:00
WANG Cong
c800aaf8d8 packet: fix use-after-free in prb_retire_rx_blk_timer_expired()
There are multiple reports showing we have a use-after-free in
the timer prb_retire_rx_blk_timer_expired(), where we use struct
tpacket_kbdq_core::pkbdq, a pg_vec, after it gets freed by
free_pg_vec().

The interesting part is it is not freed via packet_release() but
via packet_setsockopt(), which means we are not closing the socket.
Looking into the big and fat function packet_set_ring(), this could
happen if we satisfy the following conditions:

1. closing == 0, not on packet_release() path
2. req->tp_block_nr == 0, we don't allocate a new pg_vec
3. rx_ring->pg_vec is already set as V3, which means we already called
   packet_set_ring() wtih req->tp_block_nr > 0 previously
4. req->tp_frame_nr == 0, pass sanity check
5. po->mapped == 0, never called mmap()

In this scenario we are clearing the old rx_ring->pg_vec, so we need
to free this pg_vec, but we don't stop the timer on this path because
of closing==0.

The timer has to be stopped as long as we need to free pg_vec, therefore
the check on closing!=0 is wrong, we should check pg_vec!=NULL instead.

Thanks to liujian for testing different fixes.

Reported-by: alexander.levin@verizon.com
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: liujian (CE) <liujian56@huawei.com>
Tested-by: liujian (CE) <liujian56@huawei.com>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 17:33:19 -07:00
Matvejchikov Ilya
e42e24c3cc tcp: remove redundant argument from tcp_rcv_established()
The last (4th) argument of tcp_rcv_established() is redundant as it
always equals to skb->len and the skb itself is always passed as 2th
agrument. There is no reason to have it.

Signed-off-by: Ilya V. Matveychikov <matvejchikov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 17:28:12 -07:00
Florian Westphal
a28b1b90de skbuff: re-add check for NULL skb->head in kfree_skb path
A null check is needed after all.  netlink skbs can have skb->head be
backed by vmalloc.  The netlink destructor vfree()s head, then sets it to
NULL.  We then panic in skb_release_data with a NULL dereference.

Re-add such a test.

Alternative would be to switch to kvfree to free skb->head memory
and remove the special handling in netlink destructor.

Reported-by: kernel test robot <fengguang.wu@intel.com>
Fixes: 06dc75ab06943 ("net: Revert "net: add function to allocate sk_buff head without data area")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 16:27:08 -07:00
Liping Zhang
69ec932e36 openvswitch: fix potential out of bound access in parse_ct
Before the 'type' is validated, we shouldn't use it to fetch the
ovs_ct_attr_lens's minlen and maxlen, else, out of bound access
may happen.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 16:25:06 -07:00
David S. Miller
71085745ec RxRPC rewrite
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAWXJGrPSw1s6N8H32AQK/5A/5AcdMLnejfV4r4qQvqKmv/M3w6Lt9P1qY
 sQJyWUhUPATptttj0rdtkHh9n5exOkBpPE/pBrwxoSXe0rm8oa1xO96UWsdDQWn1
 DwILipqyTQ9HHNESoY9XaBpPy7bTNRVUXOcVTXLqVuSozkiZgINic4uq/q8pVonB
 NRUULZPdcxmETUhZyBzloV2afY1pv287Rz5vRm8PUnRZmVK26lHylFi75Eywblju
 nw3N+McPe846Tc5qIFyj3b9VdMtzFA/py7GkrWPeHRmVHdZOviH9rQ++KkiBCUAz
 hQl/YaSKCGbTL9KU/B4E2dz3VnL48p3AVxQusCA5BExOO+HIDiCrenti3JMpEXLN
 gt29rD4AEyxBYbocJHpXNRxARxzDmBAmaw4tRC1Aw57MXomV5uMm/jKH/f646sYe
 S7ohnngaeWRwMa4JfxgNdf+NEenUwm/06tTSYrwYynWpjJDanI0xQDLgBYKR8SYp
 YoYLAv1tduMXcX7JjSWq2lPn6WvDnSZzRWOpJPHeFJcaGEcaYer5Qw8Yr/ZgOVxm
 0xz3wgZtckIfi2d6NcSybEvIPv5jI2BLhxqgpvxxfW95NdDXcsyQCycugX69Jg3o
 Zar4qJSFCBtC86KDOkL008X7fv/I27yb+nm5EcerC8stO6GymqfZVo6f9wEIhb6W
 P+rtLI3zYco=
 =8ZR+
 -----END PGP SIGNATURE-----

Merge tag 'rxrpc-rewrite-20170721' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Rearrange headers

Here's a pair of patches that rearrange some of the AF_RXRPC header files
that are outside of the net/rxrpc/ directory:

 (1) The bits userspace need are moved to uapi/linux/rxrpc.h.  [Should this
     be af_rxrpc.h instead, I wonder - but there doesn't seem to be
     precedent for that in the other net UAPI headers.]

 (2) For the most part, the contents of rxrpc/packet.h are no longer used
     outside of the AF_RXRPC module, so move them to net/rxrpc/protocol.h
     with the exception of the standard abort codes which are exposed to
     userspace when an abort occurs and the security index values which are
     needed when constructing keys.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 16:17:10 -07:00
Xin Long
441ae65ae0 sctp: remove the typedef sctp_abort_chunk_t
This patch is to remove the typedef sctp_abort_chunk_t, and
replace with struct sctp_abort_chunk in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 16:01:20 -07:00
Xin Long
38c00f7482 sctp: remove the typedef sctp_heartbeat_chunk_t
This patch is to remove the typedef sctp_heartbeat_chunk_t, and
replace with struct sctp_heartbeat_chunk in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 16:01:20 -07:00
Xin Long
4d2dcdf4e0 sctp: remove the typedef sctp_heartbeathdr_t
This patch is to remove the typedef sctp_heartbeathdr_t, and
replace with struct sctp_heartbeathdr in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 16:01:20 -07:00
Xin Long
d4d6c61489 sctp: remove the typedef sctp_sack_chunk_t
This patch is to remove the typedef sctp_sack_chunk_t, and
replace with struct sctp_sack_chunk in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 16:01:20 -07:00
Xin Long
787310859d sctp: remove the typedef sctp_sackhdr_t
This patch is to remove the typedef sctp_sackhdr_t, and replace
with struct sctp_sackhdr in the places where it's using this
typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 16:01:20 -07:00
Xin Long
afd93b7be6 sctp: remove the typedef sctp_sack_variable_t
This patch is to remove the typedef sctp_sack_variable_t, and
replace with union sctp_sack_variable in the places where it's
using this typedef.

It is also to fix some indents in sctp_acked().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 16:01:20 -07:00
Xin Long
62e6b7e4ee sctp: remove the typedef sctp_unrecognized_param_t
This patch is to remove the typedef sctp_unrecognized_param_t, and
replace with struct sctp_unrecognized_param in the places where it's
using this typedef.

It is also to fix some indents in sctp_sf_do_unexpected_init() and
sctp_sf_do_5_1B_init().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 16:01:19 -07:00
Xin Long
f48ef4c7f7 sctp: remove the typedef sctp_cookie_param_t
This patch is to remove the typedef sctp_cookie_param_t, and
replace with struct sctp_cookie_param in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 16:01:19 -07:00
Xin Long
cb1844c472 sctp: remove the typedef sctp_initack_chunk_t
This patch is to remove the typedef sctp_initack_chunk_t, and
replace with struct sctp_initack_chunk in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 16:01:19 -07:00
Paolo Abeni
864d966424 net/socket: fix type in assignment and trim long line
The commit ffb07550c76f ("copy_msghdr_from_user(): get rid of
field-by-field copyin") introduce a new sparse warning:

net/socket.c:1919:27: warning: incorrect type in assignment (different address spaces)
net/socket.c:1919:27:    expected void *msg_control
net/socket.c:1919:27:    got void [noderef] <asn:1>*[addressable] msg_control

and a line above 80 chars, let's fix them

Fixes: ffb07550c76f ("copy_msghdr_from_user(): get rid of field-by-field copyin")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 14:17:01 -07:00
Sabrina Dubroca
ae847f40b6 net: call udp_tunnel_get_rx_info when NETIF_F_RX_UDP_TUNNEL_PORT is toggled
NETIF_F_RX_UDP_TUNNEL_PORT is special, in that we need to do more than
just flip the bit in dev->features. When disabling we must also clear
currently offloaded ports from the device, and when enabling we must
tell the device to offload the ports it can.

Because vxlan stores its sockets in a hashtable and they are inserted at
the head of per-bucket lists, switching the feature off and then on can
result in a different set of ports being offloaded (depending on the
HW's limits).

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 13:52:59 -07:00
Sabrina Dubroca
296d8ee37c net: add infrastructure to un-offload UDP tunnel port
This adds a new NETDEV_UDP_TUNNEL_DROP_INFO event, similar to
NETDEV_UDP_TUNNEL_PUSH_INFO, to signal to un-offload ports.

This also adds udp_tunnel_drop_rx_port(), which calls
ndo_udp_tunnel_del.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 13:52:59 -07:00
Sabrina Dubroca
7a27fc6d53 net: check UDP tunnel RX port offload feature before calling tunnel ndo ndo
If NETIF_F_RX_UDP_TUNNEL_PORT was disabled on a given netdevice, skip
the tunnel offload ndo call during tunnel port creation and deletion.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 13:52:59 -07:00
Sabrina Dubroca
d764a122cc net: add new netdevice feature for offload of RX port for UDP tunnels
This adds a new netdevice feature, so that the offloading of RX port for
UDP tunnels can be disabled by the administrator on some netdevices,
using the "rx-udp_tunnel-port-offload" feature in ethtool.

This feature is set for all devices that provide ndo_udp_tunnel_add.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 13:52:59 -07:00
Linus Torvalds
505d5c1119 NFS client bugfixes for 4.13
Stable bugfixes:
 - Fix error reporting regression
 
 Bugfixes:
 - Fix setting filelayout ds address race
 - Fix subtle access bug when using ACLs
 - Fix setting mnt3_counts array size
 - Fix a couple of pNFS commit races
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAllyVYYACgkQ18tUv7Cl
 QOtTwBAA8ek0Ba5wIfwlQe4MvIX1t6v7q7Otrwdombhuw1a6nW410hFwu2SG8kGd
 fQWr1xnqRsMKwu83UuImtlYYvMa271TZgXxPNWx7KJpi/zocnigJRg5sVpkcRqza
 AjjPc245pHCooQoaTAlm5WrO9zDm0s7lRGuTB1cvPRsElnFVWT0jwhTASIDOU1Zy
 9K/hElAnXp/dZv/ydjSePTPPsVQPJWLbJPlSm6vAIQPyeXWUeCgAym0yf2FVvtsd
 AQeozaq9xq6tofVPZhsYWeBKswjTHs3FxL8vhDOF9gF3QMPm43mwsfrKVidWo4vW
 0UJnRZRCFgG0WIxxhA7l1Z9MovAsXlbWmFufgCa4Ev/bC5WuUT4ZEkjBGJw2vXD4
 0/lxkhD41PBhCl/LIod9OT6iJ8koifl50JUC4N67D2illFy9a7Btzx3EPxfDz9uG
 6jEek9x6B1xf7AC4HhxByN/E8gKX08N4Q/afxTFuAwrzKRKqI4Me3qbCyU86bp+T
 wiAxgVPVnmb/VBVLU68i7titdsA6U8ZO12FFqu9QOr9wHMXxa6108h2Nia9jVTdk
 EZhanXa6tJThQQ/QZicuR5hTtoM4BuikaaJSHZ4ODbgrAjsMAyqy4qsf4tf3LLAo
 tEDu5sJDmHJhhdtqzuz+OtDn5iJ2Nga6a4/fMwt9YU2ZQBm7X/8=
 =ZcQv
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.13-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "Stable bugfix:
   - Fix error reporting regression

  Bugfixes:
   - Fix setting filelayout ds address race
   - Fix subtle access bug when using ACLs
   - Fix setting mnt3_counts array size
   - Fix a couple of pNFS commit races"

* tag 'nfs-for-4.13-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS/filelayout: Fix racy setting of fl->dsaddr in filelayout_check_deviceid()
  NFS: Be more careful about mapping file permissions
  NFS: Store the raw NFS access mask in the inode's access cache
  NFSv3: Convert nfs3_proc_access() to use nfs_access_set_mask()
  NFS: Refactor NFS access to kernel access mask calculation
  net/sunrpc/xprt_sock: fix regression in connection error reporting.
  nfs: count correct array for mnt3_counts array size
  Revert commit 722f0b891198 ("pNFS: Don't send COMMITs to the DSes if...")
  pNFS/flexfiles: Handle expired layout segments in ff_layout_initiate_commit()
  NFS: Fix another COMMIT race in pNFS
  NFS: Fix a COMMIT race in pNFS
  mount: copy the port field into the cloned nfs_server structure.
  NFS: Don't run wake_up_bit() when nobody is waiting...
  nfs: add export operations
2017-07-21 16:26:01 -07:00
NeilBrown
3ffbc1d655 net/sunrpc/xprt_sock: fix regression in connection error reporting.
Commit 3d4762639dd3 ("tcp: remove poll() flakes when receiving
RST") in v4.12 changed the order in which ->sk_state_change()
and ->sk_error_report() are called when a socket is shut
down - sk_state_change() is now called first.

This causes xs_tcp_state_change() -> xs_sock_mark_closed() ->
xprt_disconnect_done() to wake all pending tasked with -EAGAIN.
When the ->sk_error_report() callback arrives, it is too late to
pass the error on, and it is lost.

As easy way to demonstrate the problem caused is to try to start
rpc.nfsd while rcpbind isn't running.
nfsd will attempt a tcp connection to rpcbind.  A ECONNREFUSED
error is returned, but sunrpc code loses the error and keeps
retrying.  If it saw the ECONNREFUSED, it would abort.

To fix this, handle the sk->sk_err in the TCP_CLOSE branch of
xs_tcp_state_change().

Fixes: 3d4762639dd3 ("tcp: remove poll() flakes when receiving RST")
Cc: stable@vger.kernel.org (v4.12)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-21 08:49:58 -04:00
David Howells
ddc6c70f07 rxrpc: Move the packet.h include file into net/rxrpc/
Move the protocol description header file into net/rxrpc/ and rename it to
protocol.h.  It's no longer necessary to expose it as packets are no longer
exposed to kernel services (such as AFS) that use the facility.

The abort codes are transferred to the UAPI header instead as we pass these
back to userspace and also to kernel services.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-07-21 11:00:20 +01:00
David S. Miller
7a68ada6ec Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-07-21 03:38:43 +01:00
Linus Torvalds
96080f6977 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) BPF verifier signed/unsigned value tracking fix, from Daniel
    Borkmann, Edward Cree, and Josef Bacik.

 2) Fix memory allocation length when setting up calls to
    ->ndo_set_mac_address, from Cong Wang.

 3) Add a new cxgb4 device ID, from Ganesh Goudar.

 4) Fix FIB refcount handling, we have to set it's initial value before
    the configure callback (which can bump it). From David Ahern.

 5) Fix double-free in qcom/emac driver, from Timur Tabi.

 6) A bunch of gcc-7 string format overflow warning fixes from Arnd
    Bergmann.

 7) Fix link level headroom tests in ip_do_fragment(), from Vasily
    Averin.

 8) Fix chunk walking in SCTP when iterating over error and parameter
    headers. From Alexander Potapenko.

 9) TCP BBR congestion control fixes from Neal Cardwell.

10) Fix SKB fragment handling in bcmgenet driver, from Doug Berger.

11) BPF_CGROUP_RUN_PROG_SOCK_OPS needs to check for null __sk, from Cong
    Wang.

12) xmit_recursion in ppp driver needs to be per-device not per-cpu,
    from Gao Feng.

13) Cannot release skb->dst in UDP if IP options processing needs it.
    From Paolo Abeni.

14) Some netdev ioctl ifr_name[] NULL termination fixes. From Alexander
    Levin and myself.

15) Revert some rtnetlink notification changes that are causing
    regressions, from David Ahern.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
  net: bonding: Fix transmit load balancing in balance-alb mode
  rds: Make sure updates to cp_send_gen can be observed
  net: ethernet: ti: cpsw: Push the request_irq function to the end of probe
  ipv4: initialize fib_trie prior to register_netdev_notifier call.
  rtnetlink: allocate more memory for dev_set_mac_address()
  net: dsa: b53: Add missing ARL entries for BCM53125
  bpf: more tests for mixed signed and unsigned bounds checks
  bpf: add test for mixed signed and unsigned bounds checks
  bpf: fix up test cases with mixed signed/unsigned bounds
  bpf: allow to specify log level and reduce it for test_verifier
  bpf: fix mixed signed/unsigned derived min/max value bounds
  ipv6: avoid overflow of offset in ip6_find_1stfragopt
  net: tehuti: don't process data if it has not been copied from userspace
  Revert "rtnetlink: Do not generate notifications for CHANGEADDR event"
  net: dsa: mv88e6xxx: Enable CMODE config support for 6390X
  dt-binding: ptp: Add SoC compatibility strings for dte ptp clock
  NET: dwmac: Make dwmac reset unconditional
  net: Zero terminate ifr_name in dev_ifname().
  wireless: wext: terminate ifr name coming from userspace
  netfilter: fix netfilter_net_init() return
  ...
2017-07-20 16:33:39 -07:00
Håkon Bugge
e623a48ee4 rds: Make sure updates to cp_send_gen can be observed
cp->cp_send_gen is treated as a normal variable, although it may be
used by different threads.

This is fixed by using {READ,WRITE}_ONCE when it is incremented and
READ_ONCE when it is read outside the {acquire,release}_in_xmit
protection.

Normative reference from the Linux-Kernel Memory Model:

    Loads from and stores to shared (but non-atomic) variables should
    be protected with the READ_ONCE(), WRITE_ONCE(), and
    ACCESS_ONCE().

Clause 5.1.2.4/25 in the C standard is also relevant.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-20 15:33:01 -07:00
Mahesh Bandewar
8799a221f5 ipv4: initialize fib_trie prior to register_netdev_notifier call.
Net stack initialization currently initializes fib-trie after the
first call to netdevice_notifier() call. In fact fib_trie initialization
needs to happen before first rtnl_register(). It does not cause any problem
since there are no devices UP at this moment, but trying to bring 'lo'
UP at initialization would make this assumption wrong and exposes the issue.

Fixes following crash

 Call Trace:
  ? alternate_node_alloc+0x76/0xa0
  fib_table_insert+0x1b7/0x4b0
  fib_magic.isra.17+0xea/0x120
  fib_add_ifaddr+0x7b/0x190
  fib_netdev_event+0xc0/0x130
  register_netdevice_notifier+0x1c1/0x1d0
  ip_fib_init+0x72/0x85
  ip_rt_init+0x187/0x1e9
  ip_init+0xe/0x1a
  inet_init+0x171/0x26c
  ? ipv4_offload_init+0x66/0x66
  do_one_initcall+0x43/0x160
  kernel_init_freeable+0x191/0x219
  ? rest_init+0x80/0x80
  kernel_init+0xe/0x150
  ret_from_fork+0x22/0x30
 Code: f6 46 23 04 74 86 4c 89 f7 e8 ae 45 01 00 49 89 c7 4d 85 ff 0f 85 7b ff ff ff 31 db eb 08 4c 89 ff e8 16 47 01 00 48 8b 44 24 38 <45> 8b 6e 14 4d 63 76 74 48 89 04 24 0f 1f 44 00 00 48 83 c4 08
 RIP: kmem_cache_alloc+0xcf/0x1c0 RSP: ffff9b1500017c28
 CR2: 0000000000000014

Fixes: 7b1a74fdbb9e ("[NETNS]: Refactor fib initialization so it can handle multiple namespaces.")
Fixes: 7f9b80529b8a ("[IPV4]: fib hash|trie initialization")

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-20 15:24:45 -07:00
WANG Cong
153711f942 rtnetlink: allocate more memory for dev_set_mac_address()
virtnet_set_mac_address() interprets mac address as struct
sockaddr, but upper layer only allocates dev->addr_len
which is ETH_ALEN + sizeof(sa_family_t) in this case.

We lack a unified definition for mac address, so just fix
the upper layer, this also allows drivers to interpret it
to struct sockaddr freely.

Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-20 15:23:22 -07:00
Sabrina Dubroca
6399f1fae4 ipv6: avoid overflow of offset in ip6_find_1stfragopt
In some cases, offset can overflow and can cause an infinite loop in
ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and
cap it at IPV6_MAXPLEN, since packets larger than that should be invalid.

This problem has been here since before the beginning of git history.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19 22:50:14 -07:00