32413 Commits

Author SHA1 Message Date
Eric W. Biederman
080b3c19a4 netpoll: Remove strong unnecessary assumptions about skbs
Remove the assumption that the skbs that make it to
netpoll_send_skb_on_dev are allocated with find_skb, such that
skb->users == 1 and nothing is attached that would prevent the skbs from
being freed from hard irq context.

Remove this assumption by replacing __kfree_skb on error paths with
dev_kfree_skb_irq (in hard irq context) and kfree_skb (in process
context).

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 17:58:37 -04:00
Eric W. Biederman
66b5552fc2 netpoll: Rename netpoll_rx_enable/disable to netpoll_poll_disable/enable
The netpoll_rx_enable and netpoll_rx_disable functions have always
controlled polling the network drivers transmit and receive queues.

Rename them to netpoll_poll_enable and netpoll_poll_disable to make
their functionality clear.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 17:58:37 -04:00
Eric W. Biederman
3f4df2066b netpoll: Move rx enable/disable into __dev_close_many
Today netpoll_rx_enable and netpoll_rx_disable are called from
dev_close and and __dev_close, and not from dev_close_many.

Move the calls into __dev_close_many so that we have a single call
site to maintain, and so that dev_close_many gains this protection as
well.  Which importantly makes batched network device deletes safe.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 17:58:37 -04:00
Eric W. Biederman
944e294857 netpoll: Only call ndo_start_xmit from a single place
Factor out the code that needs to surround ndo_start_xmit
from netpoll_send_skb_on_dev into netpoll_start_xmit.

It is an unfortunate fact that as the netpoll code has been maintained
the primary call site ndo_start_xmit learned how to handle vlans
and timestamps but the second call of ndo_start_xmit in queue_process
did not.

With the introduction of netpoll_start_xmit this associated logic now
happens at both call sites of ndo_start_xmit and should make it easy
for that to continue into the future.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 17:58:37 -04:00
Eric W. Biederman
a8779ec1c5 netpoll: Remove gfp parameter from __netpoll_setup
The gfp parameter was added in:
commit 47be03a28cc6c80e3aa2b3e8ed6d960ff0c5c0af
Author: Amerigo Wang <amwang@redhat.com>
Date:   Fri Aug 10 01:24:37 2012 +0000

    netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

    slave_enable_netpoll() and __netpoll_setup() may be called
    with read_lock() held, so should use GFP_ATOMIC to allocate
    memory. Eric suggested to pass gfp flags to __netpoll_setup().

    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Cong Wang <amwang@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

The reason for the gfp parameter was removed in:
commit c4cdef9b7183159c23c7302aaf270d64c549f557
Author: dingtianhong <dingtianhong@huawei.com>
Date:   Tue Jul 23 15:25:27 2013 +0800

    bonding: don't call slave_xxx_netpoll under spinlocks

    The slave_xxx_netpoll will call synchronize_rcu_bh(),
    so the function may schedule and sleep, it should't be
    called under spinlocks.

    bond_netpoll_setup() and bond_netpoll_cleanup() are always
    protected by rtnl lock, it is no need to take the read lock,
    as the slave list couldn't be changed outside rtnl lock.

    Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
    Cc: Jay Vosburgh <fubar@us.ibm.com>
    Cc: Andy Gospodarek <andy@greyhouse.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Nothing else that calls __netpoll_setup or ndo_netpoll_setup
requires a gfp paramter, so remove the gfp parameter from both
of these functions making the code clearer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 17:58:37 -04:00
ZhangZhen
59ff3eb6d6 workqueue: remove deprecated WQ_NON_REENTRANT
Tejun Heo has made WQ_NON_REENTRANT useless in the dbf2576e37
("workqueue: make all workqueues non-reentrant"). So remove its
usages and definition.

This patch doesn't introduce any behavior changes.

tj: minor description updates.

Signed-off-by: ZhangZhen <zhenzhang.zhang@huawei.com>
Sigend-off-by: Tejun Heo <tj@kernel.org>
Acked-by: James Chapman <jchapman@katalix.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
2014-03-29 09:33:03 -04:00
J. Bruce Fields
de4aee2e0a rpc: Allow xdr_buf_subsegment to operate in-place
Allow

	xdr_buf_subsegment(&buf, &buf, base, len)

to modify an xdr_buf in-place.

Also, none of the callers need the iov_base of head or tail to be zeroed
out.

Also add documentation.

(As it turns out, I'm not really using this new guarantee, but it seems
a simple way to make this function a bit more robust.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 21:24:49 -04:00
Kinglong Mee
315f3812db SUNRPC: fix memory leak of peer addresses in XPRT
Creating xprt failed after xs_format_peer_addresses,
sunrpc must free those memory of peer addresses in xprt.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 21:23:39 -04:00
Jeff Layton
3cbe01a94c svcrdma: fix offset calculation for non-page aligned sge entries
The xdr_off value in dma_map_xdr gets passed to ib_dma_map_page as the
offset into the page to be mapped. This calculation does not correctly
take into account the case where the data starts at some offset into
the page. Increment the xdr_off by the page_base to ensure that it is
respected.

Cc: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 18:02:13 -04:00
Jeff Layton
2e8c12e1b7 xprtrdma: add separate Kconfig options for NFSoRDMA client and server support
There are two entirely separate modules under xprtrdma/ and there's no
reason that enabling one should automatically enable the other. Add
config options for each one so they can be enabled/disabled separately.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 18:02:12 -04:00
Tom Tucker
7e4359e261 Fix regression in NFSRDMA server
The server regression was caused by the addition of rq_next_page
(afc59400d6c65bad66d4ad0b2daf879cbff8e23e). There were a few places that
were missed with the update of the rq_respages array.

Signed-off-by: Tom Tucker <tom@ogc.us>
Tested-by: Steve Wise <swise@ogc.us>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 18:02:11 -04:00
Vlad Yasevich
2adb956b08 vlan: Warn the user if lowerdev has bad vlan features.
Some drivers incorrectly assign vlan acceleration features to
vlan_features thus causing issues for Q-in-Q vlan configurations.
Warn the user of such cases.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 17:16:51 -04:00
Vlad Yasevich
fc92f745f8 bridge: Fix crash with vlan filtering and tcpdump
When the vlan filtering is enabled on the bridge, but
the filter is not configured on the bridge device itself,
running tcpdump on the bridge device will result in a
an Oops with NULL pointer dereference.  The reason
is that br_pass_frame_up() will bypass the vlan
check because promisc flag is set.  It will then try
to get the table pointer and process the packet based
on the table.  Since the table pointer is NULL, we oops.
Catch this special condition in br_handle_vlan().

Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 17:14:02 -04:00
Vlad Yasevich
53d6471cef net: Account for all vlan headers in skb_mac_gso_segment
skb_network_protocol() already accounts for multiple vlan
headers that may be present in the skb.  However, skb_mac_gso_segment()
doesn't know anything about it and assumes that skb->mac_len
is set correctly to skip all mac headers.  That may not
always be the case.  If we are simply forwarding the packet (via
bridge or macvtap), all vlan headers may not be accounted for.

A simple solution is to allow skb_network_protocol to return
the vlan depth it has calculated.  This way skb_mac_gso_segment
will correctly skip all mac headers.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 17:10:36 -04:00
Hannes Frederic Sowa
c15b1ccadb ipv6: move DAD and addrconf_verify processing to workqueue
addrconf_join_solict and addrconf_join_anycast may cause actions which
need rtnl locked, especially on first address creation.

A new DAD state is introduced which defers processing of the initial
DAD processing into a workqueue.

To get rtnl lock we need to push the code paths which depend on those
calls up to workqueues, specifically addrconf_verify and the DAD
processing.

(v2)
addrconf_dad_failure needs to be queued up to the workqueue, too. This
patch introduces a new DAD state and stop the DAD processing in the
workqueue (this is because of the possible ipv6_del_addr processing
which removes the solicited multicast address from the device).

addrconf_verify_lock is removed, too. After the transition it is not
needed any more.

As we are not processing in bottom half anymore we need to be a bit more
careful about disabling bottom half out when we lock spin_locks which are also
used in bh.

Relevant backtrace:
[  541.030090] RTNL: assertion failed at net/core/dev.c (4496)
[  541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.10.33-1-amd64-vyatta #1
[  541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  541.031146]  ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8
[  541.031148]  0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18
[  541.031150]  0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000
[  541.031152] Call Trace:
[  541.031153]  <IRQ>  [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17
[  541.031180]  [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180
[  541.031183]  [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0
[  541.031185]  [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0
[  541.031189]  [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90
[  541.031198]  [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6]
[  541.031208]  [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0
[  541.031212]  [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6]
[  541.031216]  [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6]
[  541.031219]  [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6]
[  541.031223]  [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6]
[  541.031226]  [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6]
[  541.031229]  [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6]
[  541.031233]  [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6]
[  541.031241]  [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50
[  541.031244]  [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6]
[  541.031247]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
[  541.031255]  [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90
[  541.031258]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]

Hunks and backtrace stolen from a patch by Stephen Hemminger.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:54:50 -04:00
Eric Dumazet
015f0688f5 net: net: add a core netdev->tx_dropped counter
Dropping packets in __dev_queue_xmit() when transmit queue
is stopped (NIC TX ring buffer full or BQL limit reached) currently
outputs a syslog message.

It would be better to get a precise count of such events available in
netdevice stats so that monitoring tools can have a clue.

This extends the work done in caf586e5f23ce
("net: add a core netdev->rx_dropped counter")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:49:48 -04:00
Daniel Borkmann
43279500de packet: respect devices with LLTX flag in direct xmit
Quite often it can be useful to test with dummy or similar
devices as a blackhole sink for skbs. Such devices are only
equipped with a single txq, but marked as NETIF_F_LLTX as
they do not require locking their internal queues on xmit
(or implement locking themselves). Therefore, rather use
HARD_TX_{UN,}LOCK API, so that NETIF_F_LLTX will be respected.

trafgen mmap/TX_RING example against dummy device with config
foo: { fill(0xff, 64) } results in the following performance
improvements for such scenarios on an ordinary Core i7/2.80GHz:

Before:

 Performance counter stats for 'trafgen -i foo -o du0 -n100000000' (10 runs):

   160,975,944,159 instructions:k            #    0.55  insns per cycle          ( +-  0.09% )
   293,319,390,278 cycles:k                  #    0.000 GHz                      ( +-  0.35% )
       192,501,104 branch-misses:k                                               ( +-  1.63% )
               831 context-switches:k                                            ( +-  9.18% )
                 7 cpu-migrations:k                                              ( +-  7.40% )
            69,382 cache-misses:k            #    0.010 % of all cache refs      ( +-  2.18% )
       671,552,021 cache-references:k                                            ( +-  1.29% )

      22.856401569 seconds time elapsed                                          ( +-  0.33% )

After:

 Performance counter stats for 'trafgen -i foo -o du0 -n100000000' (10 runs):

   133,788,739,692 instructions:k            #    0.92  insns per cycle          ( +-  0.06% )
   145,853,213,256 cycles:k                  #    0.000 GHz                      ( +-  0.17% )
        59,867,100 branch-misses:k                                               ( +-  4.72% )
               384 context-switches:k                                            ( +-  3.76% )
                 6 cpu-migrations:k                                              ( +-  6.28% )
            70,304 cache-misses:k            #    0.077 % of all cache refs      ( +-  1.73% )
        90,879,408 cache-references:k                                            ( +-  1.35% )

      11.719372413 seconds time elapsed                                          ( +-  0.24% )

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:49:48 -04:00
Eric Dumazet
e2a1d3e47b tcp: fix get_timewait4_sock() delay computation on 64bit
It seems I missed one change in get_timewait4_sock() to compute
the remaining time before deletion of IPV4 timewait socket.

This could result in wrong output in /proc/net/tcp for tm->when field.

Fixes: 96f817fedec4 ("tcp: shrink tcp6_timewait_sock by one cache line")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:43:14 -04:00
Flavio Leitner
4f647e0a3c openvswitch: fix a possible deadlock and lockdep warning
There are two problematic situations.

A deadlock can happen when is_percpu is false because it can get
interrupted while holding the spinlock. Then it executes
ovs_flow_stats_update() in softirq context which tries to get
the same lock.

The second sitation is that when is_percpu is true, the code
correctly disables BH but only for the local CPU, so the
following can happen when locking the remote CPU without
disabling BH:

       CPU#0                            CPU#1
  ovs_flow_stats_get()
   stats_read()
 +->spin_lock remote CPU#1        ovs_flow_stats_get()
 |  <interrupted>                  stats_read()
 |  ...                       +-->  spin_lock remote CPU#0
 |                            |     <interrupted>
 |  ovs_flow_stats_update()   |     ...
 |   spin_lock local CPU#0 <--+     ovs_flow_stats_update()
 +---------------------------------- spin_lock local CPU#1

This patch disables BH for both cases fixing the deadlocks.
Acked-by: Jesse Gross <jesse@nicira.com>

=================================
[ INFO: inconsistent lock state ]
3.14.0-rc8-00007-g632b06a #1 Tainted: G          I
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/0/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
(&(&cpu_stats->lock)->rlock){+.?...}, at: [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810f973f>] __lock_acquire+0x68f/0x1c40
[<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
[<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
[<ffffffffa05dd9e4>] ovs_flow_stats_get+0xc4/0x1e0 [openvswitch]
[<ffffffffa05da855>] ovs_flow_cmd_fill_info+0x185/0x360 [openvswitch]
[<ffffffffa05daf05>] ovs_flow_cmd_build_info.constprop.27+0x55/0x90 [openvswitch]
[<ffffffffa05db41d>] ovs_flow_cmd_new_or_set+0x4dd/0x570 [openvswitch]
[<ffffffff816c245d>] genl_family_rcv_msg+0x1cd/0x3f0
[<ffffffff816c270e>] genl_rcv_msg+0x8e/0xd0
[<ffffffff816c0239>] netlink_rcv_skb+0xa9/0xc0
[<ffffffff816c0798>] genl_rcv+0x28/0x40
[<ffffffff816bf830>] netlink_unicast+0x100/0x1e0
[<ffffffff816bfc57>] netlink_sendmsg+0x347/0x770
[<ffffffff81668e9c>] sock_sendmsg+0x9c/0xe0
[<ffffffff816692d9>] ___sys_sendmsg+0x3a9/0x3c0
[<ffffffff8166a911>] __sys_sendmsg+0x51/0x90
[<ffffffff8166a962>] SyS_sendmsg+0x12/0x20
[<ffffffff817e3ce9>] system_call_fastpath+0x16/0x1b
irq event stamp: 1740726
hardirqs last  enabled at (1740726): [<ffffffff8175d5e0>] ip6_finish_output2+0x4f0/0x840
hardirqs last disabled at (1740725): [<ffffffff8175d59b>] ip6_finish_output2+0x4ab/0x840
softirqs last  enabled at (1740674): [<ffffffff8109be12>] _local_bh_enable+0x22/0x50
softirqs last disabled at (1740675): [<ffffffff8109db05>] irq_exit+0xc5/0xd0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&cpu_stats->lock)->rlock);
  <Interrupt>
    lock(&(&cpu_stats->lock)->rlock);

 *** DEADLOCK ***

5 locks held by swapper/0/0:
 #0:  (((&ifa->dad_timer))){+.-...}, at: [<ffffffff810a7155>] call_timer_fn+0x5/0x320
 #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81788a55>] mld_sendpack+0x5/0x4a0
 #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8175d149>] ip6_finish_output2+0x59/0x840
 #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8168ba75>] __dev_queue_xmit+0x5/0x9b0
 #4:  (rcu_read_lock){.+.+..}, at: [<ffffffffa05e41b5>] internal_dev_xmit+0x5/0x110 [openvswitch]

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I  3.14.0-rc8-00007-g632b06a #1
Hardware name:                  /DX58SO, BIOS SOX5810J.86A.5599.2012.0529.2218 05/29/2012
 0000000000000000 0fcf20709903df0c ffff88042d603808 ffffffff817cfe3c
 ffffffff81c134c0 ffff88042d603858 ffffffff817cb6da 0000000000000005
 ffffffff00000001 ffff880400000000 0000000000000006 ffffffff81c134c0
Call Trace:
 <IRQ>  [<ffffffff817cfe3c>] dump_stack+0x4d/0x66
 [<ffffffff817cb6da>] print_usage_bug+0x1f4/0x205
 [<ffffffff810f7f10>] ? check_usage_backwards+0x180/0x180
 [<ffffffff810f8963>] mark_lock+0x223/0x2b0
 [<ffffffff810f96d3>] __lock_acquire+0x623/0x1c40
 [<ffffffff810f5707>] ? __lock_is_held+0x57/0x80
 [<ffffffffa05e26c6>] ? masked_flow_lookup+0x236/0x250 [openvswitch]
 [<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffffa05dcc64>] ovs_dp_process_received_packet+0x84/0x120 [openvswitch]
 [<ffffffff810f93f7>] ? __lock_acquire+0x347/0x1c40
 [<ffffffffa05e3bea>] ovs_vport_receive+0x2a/0x30 [openvswitch]
 [<ffffffffa05e4218>] internal_dev_xmit+0x68/0x110 [openvswitch]
 [<ffffffffa05e41b5>] ? internal_dev_xmit+0x5/0x110 [openvswitch]
 [<ffffffff8168b4a6>] dev_hard_start_xmit+0x2e6/0x8b0
 [<ffffffff8168be87>] __dev_queue_xmit+0x417/0x9b0
 [<ffffffff8168ba75>] ? __dev_queue_xmit+0x5/0x9b0
 [<ffffffff8175d5e0>] ? ip6_finish_output2+0x4f0/0x840
 [<ffffffff8168c430>] dev_queue_xmit+0x10/0x20
 [<ffffffff8175d641>] ip6_finish_output2+0x551/0x840
 [<ffffffff8176128a>] ? ip6_finish_output+0x9a/0x220
 [<ffffffff8176128a>] ip6_finish_output+0x9a/0x220
 [<ffffffff8176145f>] ip6_output+0x4f/0x1f0
 [<ffffffff81788c29>] mld_sendpack+0x1d9/0x4a0
 [<ffffffff817895b8>] mld_send_initial_cr.part.32+0x88/0xa0
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff8178e301>] ipv6_mc_dad_complete+0x31/0x50
 [<ffffffff817690d7>] addrconf_dad_completed+0x147/0x220
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff8176934f>] addrconf_dad_timer+0x19f/0x1c0
 [<ffffffff810a71e9>] call_timer_fn+0x99/0x320
 [<ffffffff810a7155>] ? call_timer_fn+0x5/0x320
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff810a76c4>] run_timer_softirq+0x254/0x3b0
 [<ffffffff8109d47d>] __do_softirq+0x12d/0x480

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:41:53 -04:00
Toshiaki Makita
99b192da9c bridge: Fix handling stacked vlan tags
If a bridge with vlan_filtering enabled receives frames with stacked
vlan tags, i.e., they have two vlan tags, br_vlan_untag() strips not
only the outer tag but also the inner tag.

br_vlan_untag() is called only from br_handle_vlan(), and in this case,
it is enough to set skb->vlan_tci to 0 here, because vlan_tci has already
been set before calling br_handle_vlan().

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:33:09 -04:00
Toshiaki Makita
12464bb8de bridge: Fix inabillity to retrieve vlan tags when tx offload is disabled
Bridge vlan code (br_vlan_get_tag()) assumes that all frames have vlan_tci
if they are tagged, but if vlan tx offload is manually disabled on bridge
device and frames are sent from vlan device on the bridge device, the tags
are embedded in skb->data and they break this assumption.
Extract embedded vlan tags and move them to vlan_tci at ingress.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:33:09 -04:00
Erik Hugne
16470111ed tipc: make discovery domain a bearer attribute
The node discovery domain is assigned when a bearer is enabled.
In the previous commit we reflect this attribute directly in the
bearer structure since it's needed to reinitialize the node
discovery mechanism after a hardware address change.

There's no need to replicate this attribute anywhere else, so we
remove it from the tipc_link_req structure.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 14:46:29 -04:00
Erik Hugne
a21a584d67 tipc: fix neighbor detection problem after hw address change
If the hardware address of a underlying netdevice is changed, it is
not enough to simply reset the bearer/links over this device. We
also need to reflect this change in the TIPC bearer and node
discovery structures aswell.

This patch adds the necessary reinitialization of the node disovery
mechanism following a hardware address change so that the correct
originating media address is advertised in the discovery messages.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reported-by: Dong Liu <dliu.cn@gmail.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 14:46:29 -04:00
Rashika Kheria
4548120100 net: Mark functions as static in net/sunrpc/svc_xprt.c
Mark functions as static in net/sunrpc/svc_xprt.c because they are not
used outside this file.

This eliminates the following warning in net/sunrpc/svc_xprt.c:
net/sunrpc/svc_xprt.c:574:5: warning: no previous prototype for ‘svc_alloc_arg’ [-Wmissing-prototypes]
net/sunrpc/svc_xprt.c:615:18: warning: no previous prototype for ‘svc_get_next_xprt’ [-Wmissing-prototypes]
net/sunrpc/svc_xprt.c:694:6: warning: no previous prototype for ‘svc_add_new_temp_xprt’ [-Wmissing-prototypes]

Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27 16:31:57 -04:00
Jeff Layton
c42a01eee7 svcrdma: fix printk when memory allocation fails
It retries in 1s, not 1000 jiffies.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27 16:31:56 -04:00
Zoltan Kiss
36d5fe6a00 core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 15:29:38 -04:00
Julia Lawall
02f2d5a066 hsr: replace del_timer by del_timer_sync
Use del_timer_sync to ensure that the timer is stopped on all CPUs before
the driver exists.

This change was suggested by Thomas Gleixner.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
declarer name module_exit;
identifier ex;
@@

module_exit(ex);

@@
identifier r.ex;
@@

ex(...) {
  <...
- del_timer
+ del_timer_sync
    (...)
  ...>
}
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 15:28:06 -04:00
Julia Lawall
84275593ac atm: replace del_timer by del_timer_sync
Use del_timer_sync to ensure that the timer is stopped on all CPUs before
the driver exists.

This change was suggested by Thomas Gleixner.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
declarer name module_exit;
identifier ex;
@@

module_exit(ex);

@@
identifier r.ex;
@@

ex(...) {
  <...
- del_timer
+ del_timer_sync
    (...)
  ...>
}
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 15:28:06 -04:00
Eric Dumazet
a0b8486caf tcp: tcp_make_synack() minor changes
There is no need to allocate 15 bytes in excess for a SYNACK packet,
as it contains no data, only headers.

SYNACK are always generated in softirq context, and contain a single
segment, we can use TCP_INC_STATS_BH()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 15:10:10 -04:00
Michal Kubeček
e5fd387ad5 ipv6: do not overwrite inetpeer metrics prematurely
If an IPv6 host route with metrics exists, an attempt to add a
new route for the same target with different metrics fails but
rewrites the metrics anyway:

12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip -6 route show
fe80::/64 dev eth0  proto kernel  metric 256
fec0::1 dev eth0  metric 1024  rto_min lock 1s
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
RTNETLINK answers: File exists
12sp0:~ # ip -6 route show
fe80::/64 dev eth0  proto kernel  metric 256
fec0::1 dev eth0  metric 1024  rto_min lock 1.5s

This is caused by all IPv6 host routes using the metrics in
their inetpeer (or the shared default). This also holds for the
new route created in ip6_route_add() which shares the metrics
with the already existing route and thus ip6_route_add()
rewrites the metrics even if the new route ends up not being
used at all.

Another problem is that old metrics in inetpeer can reappear
unexpectedly for a new route, e.g.

12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip route del fec0::1
12sp0:~ # ip route add fec0::1 dev eth0
12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10
12sp0:~ # ip -6 route show
fe80::/64 dev eth0  proto kernel  metric 256
fec0::1 dev eth0  metric 1024  hoplimit 10 rto_min lock 1s

Resolve the first problem by moving the setting of metrics down
into fib6_add_rt2node() to the point we are sure we are
inserting the new route into the tree. Second problem is
addressed by introducing new flag DST_METRICS_FORCE_OVERWRITE
which is set for a new host route in ip6_route_add() and makes
ipv6_cow_metrics() always overwrite the metrics in inetpeer
(even if they are not "new"); it is reset after that.

v5: use a flag in _metrics member rather than one in flags

v4: fix a typo making a condition always true (thanks to Hannes
Frederic Sowa)

v3: rewritten based on David Miller's idea to move setting the
metrics (and allocation in non-host case) down to the point we
already know the route is to be inserted. Also rebased to
net-next as it is quite late in the cycle.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 15:09:07 -04:00
Vlad Yasevich
fc0d48b8fb vlan: Set hard_header_len according to available acceleration
Currently, if the card supports CTAG acceleration we do not
account for the vlan header even if we are configuring an
8021AD vlan.  This may not be best since we'll do software
tagging for 8021AD which will cause data copy on skb head expansion
Configure the length based on available hw offload capabilities and
vlan protocol.

CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 15:00:37 -04:00
John W. Linville
2de21e5899 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-03-27 14:16:50 -04:00
Ying Xue
dde2026608 tipc: use node list lock to protect tipc_num_links variable
Without properly implicit or explicit read memory barrier, it's
unsafe to read an atomic variable with atomic_read() from another
thread which is different with the thread of changing the atomic
variable with atomic_inc() or atomic_dec(). So a stale tipc_num_links
may be got with atomic_read() in tipc_node_get_links(). If the
tipc_num_links variable type is converted from atomic to unsigned
integer and node list lock is used to protect it, the issue would
be avoided.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:38 -04:00
Ying Xue
2220646a53 tipc: use node_list_lock to protect tipc_num_nodes variable
As tipc_node_list is protected by rcu read lock on read side, it's
unnecessary to hold node_list_lock to protect tipc_node_list in
tipc_node_get_links(). Instead, node_list_lock should just protects
tipc_num_nodes in the function.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
6c7a762e70 tipc: tipc: convert node list and node hlist to RCU lists
Convert tipc_node_list list and node_htable hash list to RCU lists.
On read side, the two lists are protected with RCU read lock, and
on update side, node_list_lock is applied to them.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
46651c59c4 tipc: rename node create lock to protect node list and hlist
When a node is created, tipc_net_lock read lock is first held and
then node_create_lock is grabbed in order to prevent the same node
from being created and inserted into both node list and hlist twice.
But when we query node from the two node lists, we only hold
tipc_net_lock read lock without grabbing node_create_lock. Obviously
this locking policy is unable to guarantee that the two node lists
are always synchronized especially when the operation of changing
and accessing them occurs in different contexts like currently doing.

Therefore, rename node_create_lock to node_list_lock to protect the
two node lists, that is, whenever node is inserted into them or node
is queried from them, the node_list_lock should be always held. As a
result, tipc_net_lock read lock becomes redundant and then can be
removed from the node query functions.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
987b58be37 tipc: make broadcast bearer store in bearer_list array
Now unicast bearer is dynamically allocated and placed into its
identity specified slot of bearer_list array. When we search
bearer_list array with a bearer identity, the corresponding bearer
instance can be found. But broadcast bearer is statically allocated
and it is not located in the bearer_list array yet. So we decide to
enlarge bearer_list array into MAX_BEARERS + 1 slots, and its last
slot stores the broadcast bearer so that the broadcast bearer can
be found from bearer_list array with MAX_BEARERS as index. The
change will help us reduce the complex relationship between bearer
and link in the future.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
f47de12b06 tipc: remove active flag from tipc_bearer structure
After the allocation of tipc_bearer structure instance is converted
from statical way to dynamical way, we identify whether a certain
tipc_bearer structure pointer is valid by checking whether the pointer
is NULL or not. So the active flag in tipc_bearer structure becomes
redundant.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
3874ccbba8 tipc: convert tipc_bearers array to pointer list
As part of the effort to introduce RCU protection for the bearer
list, we first need to change it to a list of pointers.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
78dfb789b6 tipc: acquire necessary locks in named_cluster_distribute routine
The 'tipc_node_list' is guarded by tipc_net_lock and 'links' array
defined in 'tipc_node' structure is protected by node lock as well.
Without acquiring the two locks in named_cluster_distribute() a fatal
oops may happen in case that a destroyed link might be got and then
accessed. Therefore, above mentioned two locks must be held in
named_cluster_distribute() to prevent the issue from happening
accidentally.

As 'links' array in node struct must be protected by node lock,
we have to move the code of selecting an active link from
tipc_link_xmit() to named_cluster_distribute() and then call
__tipc_link_xmit() with the selected link to deliver name messages.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:36 -04:00
Ying Xue
5902385a24 tipc: obsolete the remote management feature
Due to the lacking of any credential, it's allowed to accept commands
requested from remote nodes to query the local node status, which is
prone to involve potential security risks. Instead, if we login to
a remote node with ssh command, this approach is not only more safe
than the remote management feature, but also it can give us more
permissions like changing the remote node configuration. So it's
reasonable for us to obsolete the remote management feature now.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:36 -04:00
Ying Xue
76d7882420 tipc: remove unnecessary checking for node object
tipc_node_create routine doesn't need to check whether a node
object specified with a node address exists or not because its
caller(ie, tipc_disc_recv_msg routine) has checked this before
calling it.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:36 -04:00
Monam Agarwal
fcb144b5df net/core: Use RCU_INIT_POINTER(x, NULL) in netpoll.c
This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure.
And in the case of the NULL pointer, there is no structure to initialize.
So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL)

Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 00:18:09 -04:00
Monam Agarwal
cd18721e52 net/bridge: Use RCU_INIT_POINTER(x, NULL) in br_vlan.c
This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure.
And in the case of the NULL pointer, there is no structure to initialize.
So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL)

Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 00:18:09 -04:00
Eric Dumazet
de14439167 net: unix: non blocking recvmsg() should not return -EINTR
Some applications didn't expect recvmsg() on a non blocking socket
could return -EINTR. This possibility was added as a side effect
of commit b3ca9b02b00704 ("net: fix multithreaded signal handling in
unix recv routines").

To hit this bug, you need to be a bit unlucky, as the u->readlock
mutex is usually held for very small periods.

Fixes: b3ca9b02b00704 ("net: fix multithreaded signal handling in unix recv routines")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 17:05:40 -04:00
dingtianhong
71e415e44c vlan: make a new function vlan_dev_vlan_proto() and export
The vlan support 2 proto: 802.1q and 802.1ad, so make a new function
called vlan_dev_vlan_proto() which could return the vlan proto for
input dev.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 16:41:28 -04:00
Tom Herbert
61b905da33 net: Rename skb->rxhash to skb->hash
The packet hash can be considered a property of the packet, not just
on RX path.

This patch changes name of rxhash and l4_rxhash skbuff fields to be
hash and l4_hash respectively. This includes changing uses of the
field in the code which don't call the access functions.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 15:58:20 -04:00
Peter Pan(潘卫平)
cc93fc51f3 tcp: delete unused parameter in tcp_nagle_check()
After commit d4589926d7a9 (tcp: refine TSO splits), tcp_nagle_check() does
not use parameter mss_now anymore.

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 15:43:40 -04:00
Pravin B Shelar
fbd02dd405 ip_tunnel: Fix dst ref-count.
Commit 10ddceb22ba (ip_tunnel:multicast process cause panic due
to skb->_skb_refdst NULL pointer) removed dst-drop call from
ip-tunnel-recv.

Following commit reintroduce dst-drop and fix the original bug by
checking loopback packet before releasing dst.
Original bug: https://bugzilla.kernel.org/show_bug.cgi?id=70681

CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 15:18:40 -04:00
Johan Hedberg
e8b1ab9e6d Bluetooth: Fix returning peer address in pending connect state
We should let user space request the peer address also in the pending
connect states, i.e. BT_CONNECT and BT_CONNECT2. There is existing user
space code that tries to do this and will fail without extending the set
of allowed states for the peer address information.

This patch adds the two states to the allowed ones in the L2CAP and
RFCOMM sock_getname functions, thereby preventing ENOTCONN from being
returned.

Reported-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Tested-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-26 09:31:33 -07:00