36750 Commits

Author SHA1 Message Date
Eric Dumazet
6c09fa09d4 tcp: align tcp_xmit_size_goal() on tcp_tso_autosize()
With some mss values, it is possible tcp_xmit_size_goal() puts
one segment more in TSO packet than tcp_tso_autosize().

We send then one TSO packet followed by one single MSS.

It is not a serious bug, but we can do slightly better, especially
for drivers using netif_set_gso_max_size() to lower gso_max_size.

Using same formula avoids these corner cases and makes
tcp_xmit_size_goal() a bit faster.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 605ad7f184b6 ("tcp: refine TSO autosizing")
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 22:31:12 -05:00
Erik Hugne
d0f91938be tipc: add ip/udp media type
The ip/udp bearer can be configured in a point-to-point
mode by specifying both local and remote ip/hostname,
or it can be enabled in multicast mode, where links are
established to all tipc nodes that have joined the same
multicast group. The multicast IP address is generated
based on the TIPC network ID, but can be overridden by
using another multicast address as remote ip.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 22:08:42 -05:00
Erik Hugne
948fa2d115 tipc: increase size of tipc discovery messages
The payload area following the TIPC discovery message header is an
opaque area defined by the media. INT_H_SIZE was enough for
Ethernet/IB/IPv4 but needs to be expanded to carry IPv6 addressing
information.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 22:08:42 -05:00
David S. Miller
9d73b42bbf Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:

1) Don't truncate ethernet protocol type to u8 in nft_compat, from
   Arturo Borrero.

2) Fix several problems in the addition/deletion of elements in nf_tables.

3) Fix module refcount leak in ip_vs_sync, from Julian Anastasov.

4) Fix a race condition in the abort path in the nf_tables transaction
   infrastructure. Basically aborted rules can show up as active rules
   until changes are unrolled, oneliner from Patrick McHardy.

5) Check for overflows in the data area of the rule, also from Patrick.

6) Fix off-by-one in the per-rule user data size field. This introduces
   a new nft_userdata structure that is placed at the beginning of the
   user data area that contains the length to save some bits from the
   rule and we only need one bit to indicate its presence, from Patrick.

7) Fix rule replacement error path, the replaced rule is deleted on
   error instead of leaving it in place. This has been fixed by relying
   on the abort path to undo the incomplete replacement.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 21:51:07 -05:00
Alexander Drozdov
3e32e733d1 ipv4: ip_check_defrag should not assume that skb_network_offset is zero
ip_check_defrag() may be used by af_packet to defragment outgoing packets.
skb_network_offset() of af_packet's outgoing packets is not zero.

Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 21:43:48 -05:00
WANG Cong
33f8b9ecdb net_sched: move tp->root allocation into fw_init()
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 21:30:44 -05:00
WANG Cong
a05c2d112c net_sched: move tp->root allocation into route4_init()
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 21:30:44 -05:00
Stephen Rothwell
4b5edb2f4a mpls: using vzalloc requires including vmalloc.h
Fixes this build error:

net/mpls/af_mpls.c: In function 'resize_platform_label_table':
net/mpls/af_mpls.c:767:4: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
    labels = vzalloc(size);
    ^

Fixes: 7720c01f3f59 ("mpls: Add a sysctl to control the size of the mpls label table")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 21:01:33 -05:00
Pablo Neira Ayuso
1cae565e8b netfilter: nf_tables: limit maximum table name length to 32 bytes
Set the same as we use for chain names, it should be enough.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:21 +01:00
Pablo Neira Ayuso
f04e599e20 netfilter: nf_tables: consolidate Kconfig options
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:15 +01:00
Patrick McHardy
354bf5a0d7 netfilter: nf_tables: consolidate tracing invocations
* JUMP and GOTO are equivalent except for JUMP pushing the current
  context to the stack

* RETURN and implicit RETURN (CONTINUE) are equivalent except that
  the logged rule number differs

Result:

  nft_do_chain              | -112
 1 function changed, 112 bytes removed, diff: -112

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:12 +01:00
Patrick McHardy
01ef16c2dd netfilter: nf_tables: minor tracing cleanups
The tracing code is squeezed between multiple related parts of the
evaluation code, move it out. Also add an inline wrapper for the
reoccuring test for skb->nf_trace.

Small code savings in nft_do_chain():

  nft_trace_packet          | -137
  nft_do_chain              |   -8
 2 functions changed, 145 bytes removed, diff: -145

net/netfilter/nf_tables_core.c:
  __nft_trace_packet | +137
 1 function changed, 137 bytes added, diff: +137

net/netfilter/nf_tables_core.o:
 3 functions changed, 137 bytes added, 145 bytes removed, diff: -8

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:11 +01:00
Pablo Neira Ayuso
43270b1bc5 netfilter: ipt_CLUSTERIP: deprecate it in favour of xt_cluster
xt_cluster supersedes ipt_CLUSTERIP since it can be also used in
gateway configurations (not only from the backend side).

ipt_CLUSTER is also known to leak the netdev that it uses on
device removal, which requires a rather large fix to workaround
the problem: http://patchwork.ozlabs.org/patch/358629/

So let's deprecate this so we can probably kill code this in the
future.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-06 01:21:05 +01:00
Jouni Malinen
842a9ae08a bridge: Extend Proxy ARP design to allow optional rules for Wi-Fi
This extends the design in commit 958501163ddd ("bridge: Add support for
IEEE 802.11 Proxy ARP") with optional set of rules that are needed to
meet the IEEE 802.11 and Hotspot 2.0 requirements for ProxyARP. The
previously added BR_PROXYARP behavior is left as-is and a new
BR_PROXYARP_WIFI alternative is added so that this behavior can be
configured from user space when required.

In addition, this enables proxyarp functionality for unicast ARP
requests for both BR_PROXYARP and BR_PROXYARP_WIFI since it is possible
to use unicast as well as broadcast for these frames.

The key differences in functionality:

BR_PROXYARP:
- uses the flag on the bridge port on which the request frame was
  received to determine whether to reply
- block bridge port flooding completely on ports that enable proxy ARP

BR_PROXYARP_WIFI:
- uses the flag on the bridge port to which the target device of the
  request belongs
- block bridge port flooding selectively based on whether the proxyarp
  functionality replied

Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 14:52:23 -05:00
kbuild test robot
787fb2bd42 ax25: Fix the build when CONFIG_INET is disabled
>
> >> net/ax25/ax25_ip.c:225:26: error: unknown type name 'sturct'
>     netdev_tx_t ax25_ip_xmit(sturct sk_buff *skb)
>                              ^
>
> vim +/sturct +225 net/ax25/ax25_ip.c
>
>    219				    unsigned short type, const void *daddr,
>    220				    const void *saddr, unsigned int len)
>    221	{
>    222		return -AX25_HEADER_LEN;
>    223	}
>    224
>  > 225	netdev_tx_t ax25_ip_xmit(sturct sk_buff *skb)
>    226	{
>    227		kfree_skb(skb);
>    228		return NETDEV_TX_OK;

Ooops I misspelled struct...

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 13:17:39 -05:00
Jakub Pawlowski
82f8b651a9 Bluetooth: fix service discovery behaviour for empty uuids filter
This patch fixes service discovery behaviour, when provided uuid filter
is empty and HCI_QUIRK_STRICT_DUPLICATE_FILTER is set. Before this
patch, empty uuid filter was unable to trigger scan restart, and that
caused inconsistent behaviour in applications.

Example: two DBus clients call BlueZ, one to find all devices with
service abcd, second to find all devices with rssi smaller than -90.
Sum of those filters, that is passed to mgmt_service_scan is empty
filter, with no rssi or uuids set.
That caused kernel not to restart scan when quirk was set.
That was inconsistent with what happen when there's only one of those
two filters set (scan is restarted and reports devices).

To fix that, new variable hdev->discovery.result_filtering was
introduced. It can indicate that filtered scan is running, no matter
what uuid or rssi filter is set.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-05 09:50:50 +02:00
Jakub Pawlowski
2976cdeb27 Bluetooth: Refactor service discovery filter logic
This patch refactor code responsible for filtering when service
discovery method is used. Previously this code was mixed with
mgmt_device found logic. Now when it's in one place whole logic can
be greatly simplified. That includes removing no longer necessary
length field and merging checks for eir and scan_rsp.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-05 09:50:50 +02:00
Jakub Pawlowski
48f86b7f26 Bluetooth: Move Service Discovery logic before refactoring
This patch moves whole packet filering logic of service discovery
into new function is_filter_match. It's done because logic inside
mgmt_device_found is very complicated and needs some
simplification.

Also having whole logic in one place will allow to simplify it in
the future.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-05 09:50:50 +02:00
Alexander Duyck
1de3d87bcd fib_trie: Prevent allocating tnode if bits is too big for size_t
This patch adds code to prevent us from attempting to allocate a tnode with
a size larger than what can be represented by size_t.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 23:35:18 -05:00
Alexander Duyck
71e8b67d0f fib_trie: Update last spot w/ idx >> n->bits code and explanation
This change updates the fib_table_lookup function so that it is in sync
with the fib_find_node function in terms of the explanation for the index
check based on the bits value.

I have also updated it from doing a mask to just doing a compare as I have
found that seems to provide more options to the compiler as I have seen it
turn this into a shift of the value and test under some circumstances.

In addition I addressed one minor issue in which we kept computing the key
^ n->key when checking the fib aliases.  I pulled the xor out of the loop
in order to reduce the number of memory reads in the lookup.  As a result
we should save a couple cycles since the xor is only done once much earlier
in the lookup.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 23:35:18 -05:00
Alexander Duyck
a7e5353123 fib_trie: Make fib_table rcu safe
The fib_table was wrapped in several places with an
rcu_read_lock/rcu_read_unlock however after looking over the code I found
several spots where the tables were being accessed as just standard
pointers without any protections.  This change fixes that so that all of
the proper protections are in place when accessing the table to take RCU
replacement or removal of the table into account.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 23:35:18 -05:00
Alexander Duyck
41b489fd6c fib_trie: move leaf and tnode to occupy the same spot in the key vector
If we are going to compact the leaf and tnode we first need to make sure
the fields are all in the same place.  In that regard I am moving the leaf
pointer which represents the fib_alias hash list to occupy what is
currently the first key_vector pointer.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 23:35:18 -05:00
Alexander Duyck
d5d6487cb8 fib_trie: Update insert and delete to make use of tp from find_node
This change makes it so that the insert and delete functions make use of
the tnode pointer returned in the fib_find_node call.  By doing this we
will not have to rely on the parent pointer in the leaf which will be going
away soon.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 23:35:18 -05:00
Alexander Duyck
d4a975e83f fib_trie: Fib find node should return parent
This change makes it so that the parent pointer is returned by reference in
fib_find_node.  By doing this I can use it to find the parent node when I
am performing an insertion and I don't have to look for it again in
fib_insert_node.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 23:35:17 -05:00
Alexander Duyck
8be33e955c fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf
This change makes it so that leaf_walk_rcu takes a tnode and a key instead
of the trie and a leaf.

The main idea behind this is to avoid using the leaf parent pointer as that
can have additional overhead in the future as I am trying to reduce the
size of a leaf down to 16 bytes on 64b systems and 12b on 32b systems.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 23:35:17 -05:00
Alexander Duyck
7289e6ddb6 fib_trie: Only resize tnodes once instead of on each leaf removal in fib_table_flush
This change makes it so that we only call resize on the tnodes, instead of
from each of the leaves.  By doing this we can significantly reduce the
amount of time spent resizing as we can update all of the leaves in the
tnode first before we make any determinations about resizing.  As a result
we can simply free the tnode in the case that all of the leaves from a
given tnode are flushed instead of resizing with each leaf removed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 23:35:17 -05:00
Wu Fengguang
f0126539c7 mpls: rtm_mpls_policy[] can be static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 16:39:45 -05:00
Lorenzo Colitti
9145736d48 net: ping: Return EAFNOSUPPORT when appropriate.
1. For an IPv4 ping socket, ping_check_bind_addr does not check
   the family of the socket address that's passed in. Instead,
   make it behave like inet_bind, which enforces either that the
   address family is AF_INET, or that the family is AF_UNSPEC and
   the address is 0.0.0.0.
2. For an IPv6 ping socket, ping_check_bind_addr returns EINVAL
   if the socket family is not AF_INET6. Return EAFNOSUPPORT
   instead, for consistency with inet6_bind.
3. Make ping_v4_sendmsg and ping_v6_sendmsg return EAFNOSUPPORT
   instead of EINVAL if an incorrect socket address structure is
   passed in.
4. Make IPv6 ping sockets be IPv6-only. The code does not support
   IPv4, and it cannot easily be made to support IPv4 because
   the protocol numbers for ICMP and ICMPv6 are different. This
   makes connect(::ffff:192.0.2.1) fail with EAFNOSUPPORT instead
   of making the socket unusable.

Among other things, this fixes an oops that can be triggered by:

    int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP);
    struct sockaddr_in6 sin6 = {
        .sin6_family = AF_INET6,
        .sin6_addr = in6addr_any,
    };
    bind(s, (struct sockaddr *) &sin6, sizeof(sin6));

Change-Id: If06ca86d9f1e4593c0d6df174caca3487c57a241
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 15:46:51 -05:00
Pablo Neira Ayuso
59900e0a01 netfilter: nf_tables: fix error handling of rule replacement
In general, if a transaction object is added to the list successfully,
we can rely on the abort path to undo what we've done. This allows us to
simplify the error handling of the rule replacement path in
nf_tables_newrule().

This implicitly fixes an unnecessary removal of the old rule, which
needs to be left in place if we fail to replace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04 18:46:08 +01:00
Patrick McHardy
86f1ec3231 netfilter: nf_tables: fix userdata length overflow
The NFT_USERDATA_MAXLEN is defined to 256, however we only have a u8
to store its size. Introduce a struct nft_userdata which contains a
length field and indicate its presence using a single bit in the rule.

The length field of struct nft_userdata is also a u8, however we don't
store zero sized data, so the actual length is udata->len + 1.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04 18:46:06 +01:00
Patrick McHardy
9889840f59 netfilter: nf_tables: check for overflow of rule dlen field
Check that the space required for the expressions doesn't exceed the
size of the dlen field, which would lead to the iterators crashing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04 18:46:05 +01:00
Patrick McHardy
8670c3a55e netfilter: nf_tables: fix transaction race condition
A race condition exists in the rule transaction code for rules that
get added and removed within the same transaction.

The new rule starts out as inactive in the current and active in the
next generation and is inserted into the ruleset. When it is deleted,
it is additionally set to inactive in the next generation as well.

On commit the next generation is begun, then the actions are finalized.
For the new rule this would mean clearing out the inactive bit for
the previously current, now next generation.

However nft_rule_clear() clears out the bits for *both* generations,
activating the rule in the current generation, where it should be
deactivated due to being deleted. The rule will thus be active until
the deletion is finalized, removing the rule from the ruleset.

Similarly, when aborting a transaction for the same case, the undo
of insertion will remove it from the RCU protected rule list, the
deletion will clear out all bits. However until the next RCU
synchronization after all operations have been undone, the rule is
active on CPUs which can still see the rule on the list.

Generally, there may never be any modifications of the current
generations' inactive bit since this defeats the entire purpose of
atomicity. Change nft_rule_clear() to only touch the next generations
bit to fix this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-04 18:46:04 +01:00
Eric W. Biederman
8de147dc8e mpls: Multicast route table change notifications
Unlike IPv4 this code notifies on all cases where mpls routes
are added or removed and it never automatically removes routes.
Avoiding both the userspace confusion that is caused by omitting
route updates and the possibility of a flood of netlink traffic
when an interface goes doew.

For now reserved labels are handled automatically and userspace
is not notified.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:26:06 -05:00
Eric W. Biederman
03c0566542 mpls: Netlink commands to add, remove, and dump routes
This change adds two new netlink routing attributes:
RTA_VIA and RTA_NEWDST.

RTA_VIA specifies the specifies the next machine to send a packet to
like RTA_GATEWAY.  RTA_VIA differs from RTA_GATEWAY in that it
includes the address family of the address of the next machine to send
a packet to.  Currently the MPLS code supports addresses in AF_INET,
AF_INET6 and AF_PACKET.  For AF_INET and AF_INET6 the destination mac
address is acquired from the neighbour table.  For AF_PACKET the
destination mac_address is specified in the netlink configuration.

I think raw destination mac address support with the family AF_PACKET
will prove useful.  There is MPLS-TP which is defined to operate
on machines that do not support internet packets of any flavor.  Further
seem to be corner cases where it can be useful.  At this point
I don't care much either way.

RTA_NEWDST specifies the destination address to forward the packet
with.  MPLS typically changes it's destination address at every hop.
For a swap operation RTA_NEWDST is specified with a length of one label.
For a push operation RTA_NEWDST is specified with two or more labels.
For a pop operation RTA_NEWDST is not specified or equivalently an emtpy
RTAN_NEWDST is specified.

Those new netlink attributes are used to implement handling of rt-netlink
RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE messages, to maintain the
MPLS label table.

rtm_to_route_config parses a netlink RTM_NEWROUTE or RTM_DELROUTE message,
verify no unhandled attributes or unhandled values are present and sets
up the data structures for mpls_route_add and mpls_route_del.

I did my best to match up with the existing conventions with the caveats
that MPLS addresses are all destination-specific-addresses, and so
don't properly have a scope.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:26:06 -05:00
Eric W. Biederman
966bae3349 mpls: Functions for reading and wrinting mpls labels over netlink
Reading and writing addresses in network byte order in netlink is
traditional and I see no reason to change that.  MPLS is interesting
as effectively it has variabely length addresses (the MPLS label
stack).  To represent these variable length addresses in netlink
I use a valid MPLS label stack (complete with stop bit).

This achieves two things: a well defined existing format is used,
and the data can be interpreted without looking at it's length.

Not needed to look at the length to decode the variable length
network representation allows existing userspace functions
such as inet_ntop to be used without needed to change their
prototype.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:26:06 -05:00
Eric W. Biederman
a2519929ab mpls: Basic support for adding and removing routes
mpls_route_add and mpls_route_del implement the basic logic for adding
and removing Next Hop Label Forwarding Entries from the MPLS input
label map.  The addition and subtraction is done in a way that is
consistent with how the existing routing table in Linux are
maintained.  Thus all of the work to deal with NLM_F_APPEND,
NLM_F_EXCL, NLM_F_REPLACE, and NLM_F_CREATE.

Cases that are not clearly defined such as changing the interpretation
of the mpls reserved labels is not allowed.

Because it seems like the right thing to do adding an MPLS route without
specifying an input label and allowing the kernel to pick a free label
table entry is supported.   The implementation is currently less than optimal
but that can be changed.

As I don't have anything else to test with only ethernet and the loopback
device are the only two device types currently supported for forwarding
MPLS over.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:26:06 -05:00
Eric W. Biederman
7720c01f3f mpls: Add a sysctl to control the size of the mpls label table
This sysctl gives two benefits.  By defaulting the table size to 0
mpls even when compiled in and enabled defaults to not forwarding
any packets.  This prevents unpleasant surprises for users.

The other benefit is that as mpls labels are allocated locally a dense
table a small dense label table may be used which saves memory and
is extremely simple and efficient to implement.

This sysctl allows userspace to choose the restrictions on the label
table size userspace applications need to cope with.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:26:06 -05:00
Eric W. Biederman
0189197f44 mpls: Basic routing support
This change adds a new Kconfig option MPLS_ROUTING.

The core of this change is the code to look at an mpls packet received
from another machine.  Look that packet up in a routing table and
forward the packet on.

Support of MPLS over ATM is not considered or attempted here.  This
implemntation follows RFC3032 and implements the MPLS shim header that
can pass over essentially any network.

What RFC3021 refers to as the as the Incoming Label Map (ILM) I call
net->mpls.platform_label[].  What RFC3031 refers to as the Next Label
Hop Forwarding Entry (NHLFE) I call mpls_route.  Though calling it the
label fordwarding information base (lfib) might also be valid.

Further the implemntation forwards packets as described in RFC3032.
There is no need and given the original motivation for MPLS a strong
discincentive to have a flexible label forwarding path.  In essence
the logic is the topmost label is read, looked up, removed, and
replaced by 0 or more new lables and the sent out the specified
interface to it's next hop.

Quite a few optional features are not implemented here.  Among them
are generation of ICMP errors when the TTL is exceeded or the packet
is larger than the next hop MTU (those conditions are detected and the
packets are dropped instead of generating an icmp error).  The traffic
class field is always set to 0.  The implementation focuses on IP over
MPLS and does not handle egress of other kinds of protocols.

Instead of implementing coordination with the neighbour table and
sorting out how to input next hops in a different address family (for
which there is value).  I was lazy and implemented a next hop mac
address instead.  The code is simpler and there are flavor of MPLS
such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is
appropriate so a next hop by mac address would need to be implemented
at some point.

Two new definitions AF_MPLS and PF_MPLS are exposed to userspace.

Decoding the mpls header must be done by first byeswapping a 32bit bit
endian word into the local cpu endian and then bit shifting to extract
the pieces.  There is no C bit-field that can represent a wire format
mpls header on a little endian machine as the low bits of the 20bit
label wind up in the wrong half of third byte.  Therefore internally
everything is deal with in cpu native byte order except when writing
to and reading from a packet.

For management simplicity if a label is configured to forward out
an interface that is down the packet is dropped early.  Similarly
if an network interface is removed rt_dev is updated to NULL
(so no reference is preserved) and any packets for that label
are dropped.  Keeping the label entries in the kernel allows
the kernel label table to function as the definitive source
of which labels are allocated and which are not.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:26:06 -05:00
Eric W. Biederman
cec9166ca4 mpls: Refactor how the mpls module is built
This refactoring is needed to allow more than just mpls gso
support to be built into the mpls moddule.

Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:26:06 -05:00
Eric W. Biederman
4fd3d7d9e8 neigh: Add helper function neigh_xmit
For MPLS I am building the code so that either the neighbour mac
address can be specified or we can have a next hop in ipv4 or ipv6.

The kind of next hop we have is indicated by the neighbour table
pointer.  A neighbour table pointer of NULL is a link layer address.
A non-NULL neighbour table pointer indicates which neighbour table and
thus which address family the next hop address is in that we need to
look up.

The code either sends a packet directly or looks up the appropriate
neighbour table entry and sends the packet.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:23:23 -05:00
Eric W. Biederman
60395a20ff neigh: Factor out ___neigh_lookup_noref
While looking at the mpls code I found myself writing yet another
version of neigh_lookup_noref.  We currently have __ipv4_lookup_noref
and __ipv6_lookup_noref.

So to make my work a little easier and to make it a smidge easier to
verify/maintain the mpls code in the future I stopped and wrote
___neigh_lookup_noref.  Then I rewote __ipv4_lookup_noref and
__ipv6_lookup_noref in terms of this new function.  I tested my new
version by verifying that the same code is generated in
ip_finish_output2 and ip6_finish_output2 where these functions are
inlined.

To get to ___neigh_lookup_noref I added a new neighbour cache table
function key_eq.  So that the static size of the key would be
available.

I also added __neigh_lookup_noref for people who want to to lookup
a neighbour table entry quickly but don't know which neibhgour table
they are going to look up.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:23:23 -05:00
Johannes Berg
2f56f6be47 bridge: fix bridge netlink RCU usage
When the STP timer fires, it can call br_ifinfo_notify(),
which in turn ends up in the new br_get_link_af_size().
This function is annotated to be using RTNL locking, which
clearly isn't the case here, and thus lockdep warns:

  ===============================
  [ INFO: suspicious RCU usage. ]
  3.19.0+ #569 Not tainted
  -------------------------------
  net/bridge/br_private.h:204 suspicious rcu_dereference_protected() usage!

Fix this by doing RCU locking here.

Fixes: b7853d73e39b ("bridge: add vlan info to bridge setlink and dellink notification messages")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-04 00:20:22 -05:00
David S. Miller
71a83a6db6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/rocker/rocker.c

The rocker commit was two overlapping changes, one to rename
the ->vport member to ->pport, and another making the bitmask
expression use '1ULL' instead of plain '1'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 21:16:48 -05:00
Linus Torvalds
a6c5170d1e Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
 "Three miscellaneous bugfixes, most importantly the clp->cl_revoked
  bug, which we've seen several reports of people hitting"

* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
  sunrpc: integer underflow in rsc_parse()
  nfsd: fix clp->cl_revoked list deletion causing softlock in nfsd
  svcrpc: fix memory leak in gssp_accept_sec_context_upcall
2015-03-03 15:52:50 -08:00
Linus Torvalds
789d7f60cd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) If an IPVS tunnel is created with a mixed-family destination
    address, it cannot be removed.  Fix from Alexey Andriyanov.

 2) Fix module refcount underflow in netfilter's nft_compat, from Pablo
    Neira Ayuso.

 3) Generic statistics infrastructure can reference variables sitting on
    a released function stack, therefore use dynamic allocation always.
    Fix from Ignacy Gawędzki.

 4) skb_copy_bits() return value test is inverted in ip_check_defrag().

 5) Fix network namespace exit in openvswitch, we have to release all of
    the per-net vports.  From Pravin B Shelar.

 6) Fix signedness bug in CAIF's cfpkt_iterate(), from Dan Carpenter.

 7) Fix rhashtable grow/shrink behavior, only expand during inserts and
    shrink during deletes.  From Daniel Borkmann.

 8) Netdevice names with semicolons should never be allowed, because
    they serve as a separator.  From Matthew Thode.

 9) Use {,__}set_current_state() where appropriate, from Fabian
    Frederick.

10) Revert byte queue limits support in r8169 driver, it's causing
    regressions we can't figure out.

11) tcp_should_expand_sndbuf() erroneously uses tp->packets_out to
    measure packets in flight, properly use tcp_packets_in_flight()
    instead.  From Neal Cardwell.

12) Fix accidental removal of support for bluetooth in CSR based Intel
    wireless cards.  From Marcel Holtmann.

13) We accidently added a behavioral change between native and compat
    tasks, wrt testing the MSG_CMSG_COMPAT bit.  Just ignore it if the
    user happened to set it in a native binary as that was always the
    behavior we had.  From Catalin Marinas.

14) Check genlmsg_unicast() return valud in hwsim netlink tx frame
    handling, from Bob Copeland.

15) Fix stale ->radar_required setting in mac80211 that can prevent
    starting new scans, from Eliad Peller.

16) Fix memory leak in nl80211 monitor, from Johannes Berg.

17) Fix race in TX index handling in xen-netback, from David Vrabel.

18) Don't enable interrupts in amx-xgbe driver until all software et al.
    state is ready for the interrupt handler to run.  From Thomas
    Lendacky.

19) Add missing netlink_ns_capable() checks to rtnl_newlink(), from Eric
    W Biederman.

20) The amount of header space needed in macvtap was not calculated
    properly, fix it otherwise we splat past the beginning of the
    packet.  From Eric Dumazet.

21) Fix bcmgenet TCP TX perf regression, from Jaedon Shin.

22) Don't raw initialize or mod timers, use setup_timer() and
    mod_timer() instead.  From Vaishali Thakkar.

23) Fix software maintained statistics in bcmgenet and systemport
    drivers, from Florian Fainelli.

24) DMA descriptor updates in sh_eth need proper memory barriers, from
    Ben Hutchings.

25) Don't do UDP Fragmentation Offload on RAW sockets, from Michal
    Kubecek.

26) Openvswitch's non-masked set actions aren't constructed properly
    into netlink messages, fix from Joe Stringer.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  openvswitch: Fix serialization of non-masked set actions.
  gianfar: Reduce logging noise seen due to phy polling if link is down
  ibmveth: Add function to enable live MAC address changes
  net: bridge: add compile-time assert for cb struct size
  udp: only allow UFO for packets from SOCK_DGRAM sockets
  sh_eth: Really fix padding of short frames on TX
  Revert "sh_eth: Enable Rx descriptor word 0 shift for r8a7790"
  sh_eth: Fix RX recovery on R-Car in case of RX ring underrun
  sh_eth: Ensure proper ordering of descriptor active bit write/read
  net/mlx4_en: Disbale GRO for incoming loopback/selftest packets
  net/mlx4_core: Fix wrong mask and error flow for the update-qp command
  net: systemport: fix software maintained statistics
  net: bcmgenet: fix software maintained statistics
  rxrpc: don't multiply with HZ twice
  rxrpc: terminate retrans loop when sending of skb fails
  net/hsr: Fix NULL pointer dereference and refcnt bugs when deleting a HSR interface.
  net: pasemi: Use setup_timer and mod_timer
  net: stmmac: Use setup_timer and mod_timer
  net: 8390: axnet_cs: Use setup_timer and mod_timer
  net: 8390: pcnet_cs: Use setup_timer and mod_timer
  ...
2015-03-03 15:30:07 -08:00
Joe Perches
1cea7e2c9f l2tp: Use eth_<foo>_addr instead of memset
Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 17:01:38 -05:00
Joe Perches
d2beae1078 wireless: Use eth_<foo>_addr instead of memset
Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 17:01:38 -05:00
Joe Perches
c84a67a2fc mac80211: Use eth_<foo>_addr instead of memset
Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 17:01:38 -05:00
Joe Perches
afc130dd39 ethernet: Use eth_<foo>_addr instead of memset
Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 17:01:38 -05:00
Joe Perches
211b85349c bluetooth: Use eth_<foo>_addr instead of memset
Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 17:01:37 -05:00