29245 Commits

Author SHA1 Message Date
Yuchung Cheng
356d7d88e0 netfilter: nf_conntrack: fix tcp_in_window for Fast Open
Currently the conntrack checks if the ending sequence of a packet
falls within the observed receive window. However it does so even
if it has not observe any packet from the remote yet and uses an
uninitialized receive window (td_maxwin).

If a connection uses Fast Open to send a SYN-data packet which is
dropped afterward in the network. The subsequent SYNs retransmits
will all fail this check and be discarded, leading to a connection
timeout. This is because the SYN retransmit does not contain data
payload so

end == initial sequence number (isn) + 1
sender->td_end == isn + syn_data_len
receiver->td_maxwin == 0

The fix is to only apply this check after td_maxwin is initialized.

Reported-by: Michael Chan <mcfchan@stanford.edu>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-10 18:36:22 +02:00
Sridhar Samudrala
6453599302 rtnetlink: Fix inverted check in ndo_dflt_fdb_del()
Fix inverted check when deleting an fdb entry.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10 01:24:12 -07:00
Eric Dumazet
28d6427109 net: attempt high order allocations in sock_alloc_send_pskb()
Adding paged frags skbs to af_unix sockets introduced a performance
regression on large sends because of additional page allocations, even
if each skb could carry at least 100% more payload than before.

We can instruct sock_alloc_send_pskb() to attempt high order
allocations.

Most of the time, it does a single page allocation instead of 8.

I added an additional parameter to sock_alloc_send_pskb() to
let other users to opt-in for this new feature on followup patches.

Tested:

Before patch :

$ netperf -t STREAM_STREAM
STREAM STREAM TEST
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 2304  212992  212992    10.00    46861.15

After patch :

$ netperf -t STREAM_STREAM
STREAM STREAM TEST
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 2304  212992  212992    10.00    57981.11

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10 01:16:44 -07:00
Eric Dumazet
e370a72363 af_unix: improve STREAM behavior with fragmented memory
unix_stream_sendmsg() currently uses order-2 allocations,
and we had numerous reports this can fail.

The __GFP_REPEAT flag present in sock_alloc_send_pskb() is
not helping.

This patch extends the work done in commit eb6a24816b247c
("af_unix: reduce high order page allocations) for
datagram sockets.

This opens the possibility of zero copy IO (splice() and
friends)

The trick is to not use skb_pull() anymore in recvmsg() path,
and instead add a @consumed field in UNIXCB() to track amount
of already read payload in the skb.

There is a performance regression for large sends
because of extra page allocations that will be addressed
in a follow-up patch, allowing sock_alloc_send_pskb()
to attempt high order page allocations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10 01:16:44 -07:00
Yuchung Cheng
149479d019 tcp: add server ip to encrypt cookie in fast open
Encrypt the cookie with both server and client IPv4 addresses,
such that multi-homed server will grant different cookies
based on both the source and destination IPs. No client change
is needed since cookie is opaque to the client.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10 00:35:33 -07:00
David S. Miller
71acc0ddd4 Revert "net: sctp: convert sctp_checksum_disable module param into sctp sysctl"
This reverts commit cda5f98e36576596b9230483ec52bff3cc97eb21.

As per Vlad's request.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 13:09:41 -07:00
John W. Linville
fa5978447c Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2013-08-09 15:08:10 -04:00
John W. Linville
4f05444892 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2013-08-09 15:06:28 -04:00
Eliezer Tamir
288a937637 net: rename busy poll MIB counter
Rename mib counter from "low latency" to "busy poll"

v1 also moved the counter to the ip MIB (suggested by Shawn Bohrer)
Eric Dumazet suggested that the current location is better.

So v2 just renames the counter to fit the new naming convention.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:39:08 -07:00
Daniel Borkmann
477143e3fe net: sctp: trivial: update bug report in header comment
With the restructuring of the lksctp.org site, we only allow bug
reports through the SCTP mailing list linux-sctp@vger.kernel.org,
not via SF, as SF is only used for web hosting and nothing more.
While at it, also remove the obvious statement that bugs will be
fixed and incooperated into the kernel.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:33:02 -07:00
Daniel Borkmann
cda5f98e36 net: sctp: convert sctp_checksum_disable module param into sctp sysctl
Get rid of the last module parameter for SCTP and make this
configurable via sysctl for SCTP like all the rest of SCTP's
configuration knobs.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:33:02 -07:00
William Manley
2690048c01 net: igmp: Allow user-space configuration of igmp unsolicited report interval
Adds the new procfs knobs:

    /proc/sys/net/ipv4/conf/*/igmpv2_unsolicited_report_interval
    /proc/sys/net/ipv4/conf/*/igmpv3_unsolicited_report_interval

Which will allow userspace configuration of the IGMP unsolicited report
interval (see below) in milliseconds.  The defaults are 10000ms for IGMPv2
and 1000ms for IGMPv3 in accordance with RFC2236 and RFC3376.

Background:

If an IGMP join packet is lost you will not receive data sent to the
multicast group so if no data arrives from that multicast group in a
period of time after the IGMP join a second IGMP join will be sent.  The
delay between joins is the "IGMP Unsolicited Report Interval".

Prior to this patch this value was hard coded in the kernel to 10s for
IGMPv2 and 1s for IGMPv3.  10s is unsuitable for some use-cases, such as
IPTV as it can cause channel change to be slow in the presence of packet
loss.

This patch allows the value to be overridden from userspace for both
IGMPv2 and IGMPv3 such that it can be tuned accoding to the network.

Tested with Wireshark and a simple program to join a (non-existent)
multicast group.  The distribution of timings for the second join differ
based upon setting the procfs knobs.

igmpvX_unsolicited_report_interval is intended to follow the pattern
established by force_igmp_version, and while a procfs entry has been added
a corresponding sysctl knob has not as it is my understanding that sysctl
is deprecated[1].

[1]: http://lwn.net/Articles/247243/

Signed-off-by: William Manley <william.manley@youview.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:27:46 -07:00
William Manley
5c6fe01c1f net: igmp: Don't flush routing cache when force_igmp_version is modified
The procfs knob /proc/sys/net/ipv4/conf/*/force_igmp_version allows the
IGMP protocol version to use to be explicitly set.  As a side effect this
caused the routing cache to be flushed as it was declared as a
DEVINET_SYSCTL_FLUSHING_ENTRY.  Flushing is unnecessary and this patch
makes it so flushing does not occur.

Requested by Hannes Frederic Sowa as he was reviewing other patches
adding procfs entries.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: William Manley <william.manley@youview.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:27:46 -07:00
William Manley
cab70040df net: igmp: Reduce Unsolicited report interval to 1s when using IGMPv3
If an IGMP join packet is lost you will not receive data sent to the
multicast group so if no data arrives from that multicast group in a
period of time after the IGMP join a second IGMP join will be sent.  The
delay between joins is the "IGMP Unsolicited Report Interval".

Previously this value was hard coded to be chosen randomly between 0-10s.
This can be too long for some use-cases, such as IPTV as it can cause
channel change to be slow in the presence of packet loss.

The value 10s has come from IGMPv2 RFC2236, which was reduced to 1s in
IGMPv3 RFC3376.  This patch makes the kernel use the 1s value from the
later RFC if we are operating in IGMPv3 mode.  IGMPv2 behaviour is
unaffected.

Tested with Wireshark and a simple program to join a (non-existent)
multicast group.  The distribution of timings for the second join differ
based upon setting /proc/sys/net/ipv4/conf/eth0/force_igmp_version.

Signed-off-by: William Manley <william.manley@youview.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:27:46 -07:00
Eric Dumazet
e11aada32b net: flow_dissector: add 802.1ad support
Same behavior than 802.1q : finds the encapsulated protocol and
skip 32bit header.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:06:23 -07:00
Timo Teräs
77a482bdb2 ip_gre: fix ipgre_header to return correct offset
Fix ipgre_header() (header_ops->create) to return the correct
amount of bytes pushed. Most callers of dev_hard_header() seem
to care only if it was success, but af_packet.c uses it as
offset to the skb to copy from userspace only once. In practice
this fixes packet socket sendto()/sendmsg() to gre tunnels.

Regression introduced in c54419321455631079c7d6e60bc732dd0c5914c5
("GRE: Refactor GRE tunneling code.")

Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:05:24 -07:00
Simon Wunderlich
ab1e8ad3b4 mac80211: fix ieee80211_sta_process_chanswitch for 5/10 MHz channels
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-09 15:16:39 +02:00
Chun-Yeow Yeoh
0670307992 mac80211: allow lowest basic rate for unicast management for mesh
Allow lowest basic rate to be used for unicast management frame in
mesh. Otherwise, the lowest supported rate is used for unicast
management frame, such as 1Mbps for 2.4GHz and 6Mbps for 5GHz. Rename
the rc_send_low_broadcast to re_send_low_basicrate since now it is
also applied to unicast management frame in mesh.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-09 15:11:27 +02:00
Florian Westphal
c655bc6896 netfilter: nf_conntrack: don't send destroy events from iterator
Let nf_ct_delete handle delivery of the DESTROY event.

Based on earlier patch from Pablo Neira.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-09 12:03:33 +02:00
Eric Dumazet
1f07d03e20 net: add SNMP counters tracking incoming ECN bits
With GRO/LRO processing, there is a problem because Ip[6]InReceives SNMP
counters do not count the number of frames, but number of aggregated
segments.

Its probably too late to change this now.

This patch adds four new counters, tracking number of frames, regardless
of LRO/GRO, and on a per ECN status basis, for IPv4 and IPv6.

Ip[6]NoECTPkts : Number of packets received with NOECT
Ip[6]ECT1Pkts  : Number of packets received with ECT(1)
Ip[6]ECT0Pkts  : Number of packets received with ECT(0)
Ip[6]CEPkts    : Number of packets received with Congestion Experienced

lph37:~# nstat | egrep "Pkts|InReceive"
IpInReceives                    1634137            0.0
Ip6InReceives                   3714107            0.0
Ip6InNoECTPkts                  19205              0.0
Ip6InECT0Pkts                   52651828           0.0
IpExtInNoECTPkts                33630              0.0
IpExtInECT0Pkts                 15581379           0.0
IpExtInCEPkts                   6                  0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-08 22:24:59 -07:00
John W. Linville
1826ff2357 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-08-08 13:12:42 -04:00
Hannes Frederic Sowa
3e3be27585 ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not match
In case a subtree did not match we currently stop backtracking and return
NULL (root table from fib_lookup). This could yield in invalid routing
table lookups when using subtrees.

Instead continue to backtrack until a valid subtree or node is found
and return this match.

Also remove unneeded NULL check.

Reported-by: Teco Boot <teco@inf-net.nl>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David Lamparter <equinox@diac24.net>
Cc: <boutier@pps.univ-paris-diderot.fr>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 17:17:19 -07:00
David S. Miller
09effa67a1 packet: Revert recent header parsing changes.
This reverts commits:

0f75b09c798ed00c30d7d5551b896be883bc2aeb
cbd89acb9eb257ed3b2be867142583fdcf7fdc5b
c483e02614551e44ced3fe6eedda8e36d3277ccc

Amongst other things, it's modifies the SKB header
to pull the ethernet headers off via eth_type_trans()
on the output path which is bogus.

It's causing serious regressions for people.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 17:11:00 -07:00
Jason Wang
3d9953a2ef net: use skb_copy_datagram_from_iovec() in zerocopy_sg_from_iovec()
Use skb_copy_datagram_from_iovec() to avoid code duplication and make it easy to
be read. Also we can do the skipping inside the zero-copy loop.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:38 -07:00
Jason Wang
0433547aa7 net: use release_pages() in zerocopy_sg_from_iovec()
To reduce the duplicated codes.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:38 -07:00
Jason Wang
234a426708 net: remove the useless comment in zerocopy_sg_from_iovec()
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:37 -07:00
Jason Wang
0a57ec62df net: use skb_fill_page_desc() in zerocopy_sg_from_iovec()
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:35 -07:00
Jason Wang
c3bdeb5c7c net: move zerocopy_sg_from_iovec() to net/core/datagram.c
To let it be reused and reduce code duplication. Also document this function.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:34 -07:00
Jason Wang
b4bf07771f net: move iov_pages() to net/core/iovec.c
To let it be reused and reduce code duplication.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:33 -07:00
stephen hemminger
6261d983f2 ip_tunnel: embed hash list head
The IP tunnel hash heads can be embedded in the per-net structure
since it is a fixed size. Reduce the size so that the total structure
fits in a page size. The original size was overly large, even NETDEV_HASHBITS
is only 8 bits!

Also, add some white space for readability.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>.
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:47:52 -07:00
Trond Myklebust
786615bc1c SUNRPC: If the rpcbind channel is disconnected, fail the call to unregister
If rpcbind causes our connection to the AF_LOCAL socket to close after
we've registered a service, then we want to be careful about reconnecting
since the mount namespace may have changed.

By simply refusing to reconnect the AF_LOCAL socket in the case of
unregister, we avoid the need to somehow save the mount namespace. While
this may lead to some services not unregistering properly, it should
be safe.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Nix <nix@esperi.org.uk>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.9.x
2013-08-07 17:07:18 -04:00
Eric Dumazet
cd6b423afd tcp: cubic: fix bug in bictcp_acked()
While investigating about strange increase of retransmit rates
on hosts ~24 days after boot, Van found hystart was disabled
if ca->epoch_start was 0, as following condition is true
when tcp_time_stamp high order bit is set.

(s32)(tcp_time_stamp - ca->epoch_start) < HZ

Quoting Van :

 At initialization & after every loss ca->epoch_start is set to zero so
 I believe that the above line will turn off hystart as soon as the 2^31
 bit is set in tcp_time_stamp & hystart will stay off for 24 days.
 I think we've observed that cubic's restart is too aggressive without
 hystart so this might account for the higher drop rate we observe.

Diagnosed-by: Van Jacobson <vanj@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 10:35:08 -07:00
Wang Sheng-Hui
15401946f9 bridge: correct the comment for file br_sysfs_br.c
br_sysfs_if.c is for sysfs attributes of bridge ports, while br_sysfs_br.c
is for sysfs attributes of bridge itself. Correct the comment here.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 10:35:06 -07:00
Eric Dumazet
2ed0edf909 tcp: cubic: fix overflow error in bictcp_update()
commit 17a6e9f1aa9 ("tcp_cubic: fix clock dependency") added an
overflow error in bictcp_update() in following code :

/* change the unit from HZ to bictcp_HZ */
t = ((tcp_time_stamp + msecs_to_jiffies(ca->delay_min>>3) -
      ca->epoch_start) << BICTCP_HZ) / HZ;

Because msecs_to_jiffies() being unsigned long, compiler does
implicit type promotion.

We really want to constrain (tcp_time_stamp - ca->epoch_start)
to a signed 32bit value, or else 't' has unexpected high values.

This bugs triggers an increase of retransmit rates ~24 days after
boot [1], as the high order bit of tcp_time_stamp flips.

[1] for hosts with HZ=1000

Big thanks to Van Jacobson for spotting this problem.

Diagnosed-by: Van Jacobson <vanj@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 10:35:05 -07:00
Fan Du
af83fde751 xfrm: Remove rebundant address family checking
present_and_same_family has checked addresses family validness for both
SADB_EXT_ADDRESS_SRC and SADB_EXT_ADDRESS_DST in the beginning.
Thereafter pfkey_sadb_addr2xfrm_addr doesn't need to do the checking again.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-07 10:12:58 +02:00
Daniel Borkmann
54e35cc523 ipvs: ip_vs_sh: ip_vs_sh_get_port: check skb_header_pointer for NULL
skb_header_pointer could return NULL, so check for it as we do it
everywhere else in ipvs code. This fixes a coverity warning.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-08-07 08:57:57 +09:00
Johannes Berg
e7f1935c11 wireless: make TU conversion macros available
A few places in the code (mac80211 and iwlmvm) use the same
TU_TO_JIFFIES() macro and could use TU_TO_EXP_TIME() that
mac80211 has. Make these available to everyone and use them.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-06 11:00:59 +02:00
Dragos Foianu
70e3ca79cd ipvs: fixed spacing at for statements
found using checkpatch.pl

Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-08-06 15:09:58 +09:00
Fan Du
0659eea912 xfrm: Delete hold_timer when destroy policy
Both policy timer and hold_timer need to be deleted when destroy policy

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-06 06:59:18 +02:00
Trond Myklebust
00326ed644 SUNRPC: Don't auto-disconnect from the local rpcbind socket
There is no need for the kernel to time out the AF_LOCAL connection to
the rpcbind socket, and doing so is problematic because when it is
time to reconnect, our process may no longer be using the same mount
namespace.

Reported-by: Nix <nix@esperi.org.uk>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.9.x
2013-08-05 22:17:00 -04:00
Linus Lüssing
248ba8ec05 bridge: don't try to update timers in case of broken MLD queries
Currently we are reading an uninitialized value for the max_delay
variable when snooping an MLD query message of invalid length and would
update our timers with that.

Fixing this by simply ignoring such broken MLD queries (just like we do
for IGMP already).

This is a regression introduced by:
"bridge: disable snooping if there is no querier" (b00589af3b04)

Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 15:43:39 -07:00
Eric Dumazet
aab515d7c3 fib_trie: remove potential out of bound access
AddressSanitizer [1] dynamic checker pointed a potential
out of bound access in leaf_walk_rcu()

We could allocate one more slot in tnode_new() to leave the prefetch()
in-place but it looks not worth the pain.

Bug added in commit 82cfbb008572b ("[IPV4] fib_trie: iterator recode")

[1] :
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 15:26:11 -07:00
Veaceslav Falico
fc7f8f5c53 neighbour: populate neigh_parms on alloc before calling ndo_neigh_setup
dev->ndo_neigh_setup() might need some of the values of neigh_parms, so
populate them before calling it.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 15:19:04 -07:00
Daniel Borkmann
7921895a5e net: esp{4,6}: fix potential MTU calculation overflows
Commit 91657eafb ("xfrm: take net hdr len into account for esp payload
size calculation") introduced a possible interger overflow in
esp{4,6}_get_mtu() handlers in case of x->props.mode equals
XFRM_MODE_TUNNEL. Thus, the following expression will overflow

  unsigned int net_adj;
  ...
  <case ipv{4,6} XFRM_MODE_TUNNEL>
         net_adj = 0;
  ...
  return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) -
           net_adj) & ~(align - 1)) + (net_adj - 2);

where (net_adj - 2) would be evaluated as <foo> + (0 - 2) in an unsigned
context. Fix it by simply removing brackets as those operations here
do not need to have special precedence.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Benjamin Poirier <bpoirier@suse.de>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Benjamin Poirier <bpoirier@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 12:26:50 -07:00
nikolay@redhat.com
07ce76aa9b net_sched: make dev_trans_start return vlan's real dev trans_start
Vlan devices are LLTX and don't update their own trans_start, so if
dev_trans_start has to be called with a vlan device then 0 or a stale
value will be returned. Currently the bonding is the only such user, and
it's needed for proper arp monitoring when the slaves are vlans.
Fix this by extracting the vlan's real device trans_start.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 12:17:42 -07:00
nikolay@redhat.com
0369722f02 vlan: make vlan_dev_real_dev work over stacked vlans
Sometimes we might have stacked vlans on top of each other, and we're
interested in the first non-vlan real device on the path, so transform
vlan_dev_real_dev to go over the stacked vlans and extract the first
non-vlan device.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 12:17:42 -07:00
Julia Lawall
d9af2d67e4 net/vmw_vsock/af_vsock.c: drop unneeded semicolon
Drop the semicolon at the end of the list_for_each_entry loop header.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 11:07:44 -07:00
Dan Carpenter
e4d091d7bf netfilter: nfnetlink_{log,queue}: fix information leaks in netlink message
These structs have a "_pad" member.  Also the "phw" structs have an 8
byte "hw_addr[]" array but sometimes only the first 6 bytes are
initialized.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-05 17:36:04 +02:00
Florian Westphal
d8b3bfc253 netfilter: tproxy: fix build with IP6_NF_IPTABLES=n
after commit 93742cf (netfilter: tproxy: remove nf_tproxy_core.h)

CONFIG_IPV6=y
CONFIG_IP6_NF_IPTABLES=n

gives us:

net/netfilter/xt_TPROXY.c: In function 'nf_tproxy_get_sock_v6':
net/netfilter/xt_TPROXY.c:178:4: error: implicit declaration of function 'inet6_lookup_listener'

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-05 12:57:38 +02:00
Mathias Krause
8603b9556e af_key: constify lookup tables
The lookup tables for minimum sizes of extensions and for the pfkey
handler functions are read only, therefore can be const.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-05 11:14:00 +02:00