Commit Graph

35551 Commits

Author SHA1 Message Date
Jiri Pirko
6ea3b446b9 net: sched: cls: use nla_nest_cancel instead of nlmsg_trim
To cancel nesting, this function is more convenient.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 21:49:58 -05:00
Ying Xue
023160bc8f tipc: avoid double lock 'spin_lock:&seq->lock'
The commit fb9962f3ce ("tipc: ensure all name sequences are properly
protected with its lock") involves below errors:

net/tipc/name_table.c:980 tipc_purge_publications() error: double lock 'spin_lock:&seq->lock'

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 18:27:03 -05:00
Roopa Prabhu
1d460b988d rocker: remove swdev mode
Remove use of 'swdev' mode in rocker. rocker dev offloads
can use the BRIDGE_FLAGS_SELF to indicate offload to hardware.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 18:24:47 -05:00
David S. Miller
b5f185f33d Merge tag 'master-2014-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-12-08

Please pull this last batch of pending wireless updates for the 3.19 tree...

For the wireless bits, Johannes says:

"This time I have Felix's no-status rate control work, which will allow
drivers to work better with rate control even if they don't have perfect
status reporting. In addition to this, a small hwsim fix from Patrik,
one of the regulatory patches from Arik, and a number of cleanups and
fixes I did myself.

Of note is a patch where I disable CFG80211_WEXT so that compatibility
is no longer selectable - this is intended as a wake-up call for anyone
who's still using it, and is still easily worked around (it's a one-line
patch) before we fully remove the code as well in the future."

For the Bluetooth bits, Johan says:

"Here's one more bluetooth-next pull request for 3.19:

 - Minor cleanups for ieee802154 & mac802154
 - Fix for the kernel warning with !TASK_RUNNING reported by Kirill A.
   Shutemov
 - Support for another ath3k device
 - Fix for tracking link key based security level
 - Device tree bindings for btmrvl + a state update fix
 - Fix for wrong ACL flags on LE links"

And...

"In addition to the previous one this contains two more cleanups to
mac802154 as well as support for some new HCI features from the
Bluetooth 4.2 specification.

From the original request:

'Here's what should be the last bluetooth-next pull request for 3.19.
It's rather large but the majority of it is the Low Energy Secure
Connections feature that's part of the Bluetooth 4.2 specification. The
specification went public only this week so we couldn't publish the
corresponding code before that. The code itself can nevertheless be
considered fairly mature as it's been in development for over 6 months
and gone through several interoperability test events.

Besides LE SC the pull request contains an important fix for command
complete events for mgmt sockets which also fixes some leaks of hci_conn
objects when powering off or unplugging Bluetooth adapters.

A smaller feature that's part of the pull request is service discovery
support. This is like normal device discovery except that devices not
matching specific UUIDs or strong enough RSSI are filtered out.

Other changes that the pull request contains are firmware dump support
to the btmrvl driver, firmware download support for Broadcom BCM20702A0
variants, as well as some coding style cleanups in 6lowpan &
ieee802154/mac802154 code.'"

For the NFC bits, Samuel says:

"With this one we get:

- NFC digital improvements for DEP support: Chaining, NACK and ATN
  support added.

- NCI improvements: Support for p2p target, SE IO operand addition,
  SE operands extensions to support proprietary implementations, and
  a few fixes.

- NFC HCI improvements: OPEN_PIPE and NOTIFY_ALL_CLEARED support,
  and SE IO operand addition.

- A bunch of minor improvements and fixes for STMicro st21nfcb and
  st21nfca"

For the iwlwifi bits, Emmanuel says:

"Major works are CSA and TDLS. On top of that I have a new
firmware API for scan and a few rate control improvements.
Johannes find a few tricks to improve our CPU utilization
and adds support for a new spin of 7265 called 7265D.
Along with this a few random things that don't stand out."

And...

"I deprecate here -8.ucode since -9 has been published long ago.
Along with that I have a new activity, we have now better
a infrastructure for firmware debugging. This will allow to
have configurable probes insides the firmware.
Luca continues his work on NetDetect, this feature is now
complete. All the rest is minor fixes here and there."

For the Atheros bits, Kalle says:

"Only ath10k changes this time and no major changes. Most visible are:

o new debugfs interface for runtime firmware debugging (Yanbo)

o fix shared WEP (Sujith)

o don't rebuild whenever kernel version changes (Johannes)

o lots of refactoring to make it easier to add new hw support (Michal)

There's also smaller fixes and improvements with no point of listing
here."

In addition, there are a few last minute updates to ath5k,
ath9k, brcmfmac, brcmsmac, mwifiex, rt2x00, rtlwifi, and wil6210.
Also included is a pull of the wireless tree to pick-up the fixes
originally included in "pull request: wireless 2014-12-03"...

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 18:12:03 -05:00
Li RongQing
e008f3f07f net: avoid to call skb_queue_len again
the queue length of sd->input_pkt_queue has been put into qlen,
and impossible to change, since hold the lock

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 17:03:19 -05:00
David S. Miller
602de7ead5 linux-can-next-for-3.19-20141207
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUhNqoAAoJECte4hHFiupU57IP/1ioNl+EkM8ZXCH+pZCsMuoF
 S33lLQjJ2WEh2WZXDEJqGWdv7FRh5dUyRB67TpMCzQa8lsyPykapFAy4s1DEEZ46
 EbsRHjkJdw+fg3dRGp33XPD55t2xXz9CYB7OuVGLjBEWdFb5a2by+JYCctTynqum
 xI+qGo6IKcAvlyYAmiopZ+FOBUMhRo30GkkzVnoIsQn+Z1HYEdJ+QGryL1rOY01D
 Gt4d+hZ6bT08yy+4ZB3Sr6/H3w4e8saUCS8H+JyLVYR+quM0T/uV4drqk/21kUNU
 954LPu5GY5l6gYDEaki96Rc6DpuqsWlgy7oh1E3p9XN0vZFPEjmFXkic28hHpvKm
 nDThB9qllwYUu9hmALaMuxkbRmJK/NvFwlzdtp0uZIiiENGGQrD368wiWxyzD3aP
 HvthWTNM2E+T15gmmzUNnGPbaTWgxjp4G4wEucX/yLiZDTu0ftoFBvnRy3emWhI0
 3N1Lf3ZBGYuHQvyUMWMgQ53nwuPuDgcVy/wYEUu11rI4zFcP7OmrznPhtnwfwQmz
 lMppDC0d3L0PGjI4/oKXJAXrCuAVldv+eLFOpHaJuXU+VuglEOpetjUDMv2A0hbQ
 23HcX+rIRd+8M8H+RtAYrqmocAOw70/cy0NzuLfI8a7kOW9H55dHADx4IFTae2E+
 X1dBTj1EHrIlyw6lkC9e
 =icE1
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-3.19-20141207' of git://gitorious.org/linux-can/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2014-12-07

this is a pull request of 8 patches for net-next/master.

Andri Yngvason contributes 4 patches in which the CAN state change
handling is consolidated and unified among the sja1000, mscan and
flexcan driver. The three patches by Jeremiah Mahler fix spelling
mistakes and eliminate the banner[] variable in various parts. And a
patch by me that switches on sparse endianess checking by default.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 16:49:00 -05:00
Eric Dumazet
605ad7f184 tcp: refine TSO autosizing
Commit 95bd09eb27 ("tcp: TSO packets automatic sizing") tried to
control TSO size, but did this at the wrong place (sendmsg() time)

At sendmsg() time, we might have a pessimistic view of flow rate,
and we end up building very small skbs (with 2 MSS per skb).

This is bad because :

 - It sends small TSO packets even in Slow Start where rate quickly
   increases.
 - It tends to make socket write queue very big, increasing tcp_ack()
   processing time, but also increasing memory needs, not necessarily
   accounted for, as fast clones overhead is currently ignored.
 - Lower GRO efficiency and more ACK packets.

Servers with a lot of small lived connections suffer from this.

Lets instead fill skbs as much as possible (64KB of payload), but split
them at xmit time, when we have a precise idea of the flow rate.
skb split is actually quite efficient.

Patch looks bigger than necessary, because TCP Small Queue decision now
has to take place after the eventual split.

As Neal suggested, introduce a new tcp_tso_autosize() helper, so that
tcp_tso_should_defer() can be synchronized on same goal.

Rename tp->xmit_size_goal_segs to tp->gso_segs, as this variable
contains number of mss that we can put in GSO packet, and is not
related to the autosizing goal anymore.

Tested:

40 ms rtt link

nstat >/dev/null
netperf -H remote -l -2000000 -- -s 1000000
nstat | egrep "IpInReceives|IpOutRequests|TcpOutSegs|IpExtOutOctets"

Before patch :

Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/s

 87380 2000000 2000000    0.36         44.22
IpInReceives                    600                0.0
IpOutRequests                   599                0.0
TcpOutSegs                      1397               0.0
IpExtOutOctets                  2033249            0.0

After patch :

Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380 2000000 2000000    0.36       44.27
IpInReceives                    221                0.0
IpOutRequests                   232                0.0
TcpOutSegs                      1397               0.0
IpExtOutOctets                  2013953            0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 16:39:22 -05:00
Hannes Frederic Sowa
dbfc4fb7d5 dst: no need to take reference on DST_NOCACHE dsts
Since commit f886497212 ("ipv4: fix dst race in sk_dst_get()")
DST_NOCACHE dst_entries get freed by RCU. So there is no need to get a
reference on them when we are in rcu protected sections.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 16:08:17 -05:00
Jiri Benc
d2b2a13245 openvswitch: set correct protocol on route lookup
Respect what the caller passed to ovs_tunnel_get_egress_info.

Fixes: 8f0aad6f35 ("openvswitch: Extend packet attribute for egress tunnel info")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 16:01:21 -05:00
Jiri Pirko
bd42b78860 net: sched: cls_basic: fix error path in basic_change()
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 15:41:56 -05:00
Gu Zheng
0cf00c6f36 net/socket.c : introduce helper function do_sock_sendmsg to replace reduplicate code
Introduce helper function do_sock_sendmsg() to simplify sock_sendmsg{_nosec},
and replace reduplicate code.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 15:24:26 -05:00
Eric Dumazet
42eef7a0bb tcp_cubic: refine Hystart delay threshold
In commit 2b4636a5f8 ("tcp_cubic: make the delay threshold of HyStart
less sensitive"), HYSTART_DELAY_MIN was changed to 4 ms.

The remaining problem is that using delay_min + (delay_min/16) as the
threshold is too sensitive.

6.25 % of variation is too small for rtt above 60 ms, which are not
uncommon.

Lets use 12.5 % instead (delay_min + (delay_min/8))

Tested:
 80 ms RTT between peers, FQ/pacing packet scheduler on sender.
 10 bulk transfers of 10 seconds :

nstat >/dev/null
for i in `seq 1 10`
 do
   netperf -H remote -- -k THROUGHPUT | grep THROUGHPUT
 done
nstat | grep Hystart

With the 6.25 % threshold :

THROUGHPUT=20.66
THROUGHPUT=249.38
THROUGHPUT=254.10
THROUGHPUT=14.94
THROUGHPUT=251.92
THROUGHPUT=237.73
THROUGHPUT=19.18
THROUGHPUT=252.89
THROUGHPUT=21.32
THROUGHPUT=15.58
TcpExtTCPHystartTrainDetect     2                  0.0
TcpExtTCPHystartTrainCwnd       4756               0.0
TcpExtTCPHystartDelayDetect     5                  0.0
TcpExtTCPHystartDelayCwnd       180                0.0

With the 12.5 % threshold
THROUGHPUT=251.09
THROUGHPUT=247.46
THROUGHPUT=250.92
THROUGHPUT=248.91
THROUGHPUT=250.88
THROUGHPUT=249.84
THROUGHPUT=250.51
THROUGHPUT=254.15
THROUGHPUT=250.62
THROUGHPUT=250.89
TcpExtTCPHystartTrainDetect     1                  0.0
TcpExtTCPHystartTrainCwnd       3175               0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 14:58:23 -05:00
Eric Dumazet
6e3a8a937c tcp_cubic: add SNMP counters to track how effective is Hystart
When deploying FQ pacing, one thing we noticed is that CUBIC Hystart
triggers too soon.

Having SNMP counters to have an idea of how often the various Hystart
methods trigger is useful prior to any modifications.

This patch adds SNMP counters tracking, how many time "ack train" or
"Delay" based Hystart triggers, and cumulative sum of cwnd at the time
Hystart decided to end SS (Slow Start)

myhost:~# nstat -a | grep Hystart
TcpExtTCPHystartTrainDetect     9                  0.0
TcpExtTCPHystartTrainCwnd       20650              0.0
TcpExtTCPHystartDelayDetect     10                 0.0
TcpExtTCPHystartDelayCwnd       360                0.0

->
 Train detection was triggered 9 times, and average cwnd was
 20650/9=2294,
 Delay detection was triggered 10 times and average cwnd was 36

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 14:58:23 -05:00
Jiri Pirko
57d743a3de net: sched: cls: remove unused op put from tcf_proto_ops
It is never called and implementations are void. So just remove it.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 14:49:02 -05:00
Erik Hugne
4988bb4a3f tipc: fix missing spinlock init and nullptr oops
commit 908344cdda ("tipc: fix bug in multicast congestion
handling") introduced two bugs with the bclink wakeup
function. This commit fixes the missing spinlock init for the
waiting_sks list. We also eliminate the race condition
between the waiting_sks length check/dequeue operations in
tipc_bclink_wakeup_users by simply removing the redundant
length check.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Tero Aho <Tero.Aho@coriant.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 13:41:54 -05:00
Eric Dumazet
6ffe75eb53 net: avoid two atomic operations in fast clones
Commit ce1a4ea3f1 ("net: avoid one atomic operation in skb_clone()")
took the wrong way to save one atomic operation.

It is actually possible to avoid two atomic operations, if we
do not change skb->fclone values, and only rely on clone_ref
content to signal if the clone is available or not.

skb_clone() can simply use the fast clone if clone_ref is 1.

kfree_skbmem() can avoid the atomic_dec_and_test() if clone_ref is 1.

Note that because we usually free the clone before the original skb,
this particular attempt is only done for the original skb to have better
branch prediction.

SKB_FCLONE_FREE is removed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Chris Mason <clm@fb.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 13:40:20 -05:00
Mahesh Bandewar
395eea6ccf rtnetlink: delay RTM_DELLINK notification until after ndo_uninit()
The commit 56bfa7ee7c ("unregister_netdevice : move RTM_DELLINK to
until after ndo_uninit") tried to do this ealier but while doing so
it created a problem. Unfortunately the delayed rtmsg_ifinfo() also
delayed call to fill_info(). So this translated into asking driver
to remove private state and then query it's private state. This
could have catastropic consequences.

This change breaks the rtmsg_ifinfo() into two parts - one takes the
precise snapshot of the device by called fill_info() before calling
the ndo_uninit() and the second part sends the notification using
collected snapshot.

It was brought to notice when last link is deleted from an ipvlan device
when it has free-ed the port and the subsequent .fill_info() call is
trying to get the info from the port.

kernel: [  255.139429] ------------[ cut here ]------------
kernel: [  255.139439] WARNING: CPU: 12 PID: 11173 at net/core/rtnetlink.c:2238 rtmsg_ifinfo+0x100/0x110()
kernel: [  255.139493] Modules linked in: ipvlan bonding w1_therm ds2482 wire cdc_acm ehci_pci ehci_hcd i2c_dev i2c_i801 i2c_core msr cpuid bnx2x ptp pps_core mdio libcrc32c
kernel: [  255.139513] CPU: 12 PID: 11173 Comm: ip Not tainted 3.18.0-smp-DEV #167
kernel: [  255.139514] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 1.0.10 05/15/2012
kernel: [  255.139515]  0000000000000009 ffff880851b6b828 ffffffff815d87f4 00000000000000e0
kernel: [  255.139516]  0000000000000000 ffff880851b6b868 ffffffff8109c29c 0000000000000000
kernel: [  255.139518]  00000000ffffffa6 00000000000000d0 ffffffff81aaf580 0000000000000011
kernel: [  255.139520] Call Trace:
kernel: [  255.139527]  [<ffffffff815d87f4>] dump_stack+0x46/0x58
kernel: [  255.139531]  [<ffffffff8109c29c>] warn_slowpath_common+0x8c/0xc0
kernel: [  255.139540]  [<ffffffff8109c2ea>] warn_slowpath_null+0x1a/0x20
kernel: [  255.139544]  [<ffffffff8150d570>] rtmsg_ifinfo+0x100/0x110
kernel: [  255.139547]  [<ffffffff814f78b5>] rollback_registered_many+0x1d5/0x2d0
kernel: [  255.139549]  [<ffffffff814f79cf>] unregister_netdevice_many+0x1f/0xb0
kernel: [  255.139551]  [<ffffffff8150acab>] rtnl_dellink+0xbb/0x110
kernel: [  255.139553]  [<ffffffff8150da90>] rtnetlink_rcv_msg+0xa0/0x240
kernel: [  255.139557]  [<ffffffff81329283>] ? rhashtable_lookup_compare+0x43/0x80
kernel: [  255.139558]  [<ffffffff8150d9f0>] ? __rtnl_unlock+0x20/0x20
kernel: [  255.139562]  [<ffffffff8152cb11>] netlink_rcv_skb+0xb1/0xc0
kernel: [  255.139563]  [<ffffffff8150a495>] rtnetlink_rcv+0x25/0x40
kernel: [  255.139565]  [<ffffffff8152c398>] netlink_unicast+0x178/0x230
kernel: [  255.139567]  [<ffffffff8152c75f>] netlink_sendmsg+0x30f/0x420
kernel: [  255.139571]  [<ffffffff814e0b0c>] sock_sendmsg+0x9c/0xd0
kernel: [  255.139575]  [<ffffffff811d1d7f>] ? rw_copy_check_uvector+0x6f/0x130
kernel: [  255.139577]  [<ffffffff814e11c9>] ? copy_msghdr_from_user+0x139/0x1b0
kernel: [  255.139578]  [<ffffffff814e1774>] ___sys_sendmsg+0x304/0x310
kernel: [  255.139581]  [<ffffffff81198723>] ? handle_mm_fault+0xca3/0xde0
kernel: [  255.139585]  [<ffffffff811ebc4c>] ? destroy_inode+0x3c/0x70
kernel: [  255.139589]  [<ffffffff8108e6ec>] ? __do_page_fault+0x20c/0x500
kernel: [  255.139597]  [<ffffffff811e8336>] ? dput+0xb6/0x190
kernel: [  255.139606]  [<ffffffff811f05f6>] ? mntput+0x26/0x40
kernel: [  255.139611]  [<ffffffff811d2b94>] ? __fput+0x174/0x1e0
kernel: [  255.139613]  [<ffffffff814e2129>] __sys_sendmsg+0x49/0x90
kernel: [  255.139615]  [<ffffffff814e2182>] SyS_sendmsg+0x12/0x20
kernel: [  255.139617]  [<ffffffff815df092>] system_call_fastpath+0x12/0x17
kernel: [  255.139619] ---[ end trace 5e6703e87d984f6b ]---

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 13:36:57 -05:00
Erik Hugne
88b17b6a22 tipc: drop tx side permission checks
Part of the old remote management feature is a piece of code
that checked permissions on the local system to see if a certain
operation was permitted, and if so pass the command to a remote
node. This serves no purpose after the removal of remote management
with commit 5902385a24 ("tipc: obsolete the remote management
feature") so we remove it.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 13:30:13 -05:00
Duan Jiong
86fe8f8920 ipv6: remove useless spin_lock/spin_unlock
xchg is atomic, so there is no necessary to use spin_lock/spin_unlock
to protect it. At last, remove the redundant
opt = xchg(&inet6_sk(sk)->opt, opt); statement.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 13:18:09 -05:00
David S. Miller
6db70e3e1d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2014-12-03

1) Fix a set but not used warning. From Fabian Frederick.

2) Currently we make sequence number values available to userspace
   only if we use ESN. Make the sequence number values also available
   for non ESN states. From Zhi Ding.

3) Remove socket policy hashing. We don't need it because socket
   policies are always looked up via a linked list. From Herbert Xu.

4) After removing socket policy hashing, we can use __xfrm_policy_link
   in xfrm_policy_insert. From Herbert Xu.

5) Add a lookup method for vti6 tunnels with wildcard endpoints.
   I forgot this when I initially implemented vti6.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 21:30:21 -05:00
Eyal Perry
892311f66f ethtool: Support for configurable RSS hash function
This patch extends the set/get_rxfh ethtool-options for getting or
setting the RSS hash function.

It modifies drivers implementation of set/get_rxfh accordingly.

This change also delegates the responsibility of checking whether a
modification to a certain RX flow hash parameter is supported to the
driver implementation of set_rxfh.

User-kernel API is done through the new hfunc bitmask field in the
ethtool_rxfh struct. A bit set in the hfunc field is corresponding to an
index in the new string-set ETH_SS_RSS_HASH_FUNCS.

Got approval from most of the relevant driver maintainers that their
driver is using Toeplitz, and for the few that didn't answered, also
assumed it is Toeplitz.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Ariel Elior <ariel.elior@qlogic.com>
Cc: Prashant Sreedharan <prashant@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Sathya Perla <sathya.perla@emulex.com>
Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com>
Cc: Ajit Khaparde <ajit.khaparde@emulex.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Matthew Vick <matthew.vick@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Cc: Mitch Williams <mitch.a.williams@intel.com>
Cc: Amir Vadai <amirv@mellanox.com>
Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Cc: Shradha Shah <sshah@solarflare.com>
Cc: Shreyas Bhatewara <sbhatewara@vmware.com>
Cc: "VMware, Inc." <pv-drivers@vmware.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 21:07:10 -05:00
Jiri Pirko
18b5427ae1 net_sched: cls_cgroup: remove unnecessary if
since head->handle == handle (checked before), just assign handle.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:53:41 -05:00
Jiri Pirko
2f8a2965da net_sched: cls_flow: remove duplicate assignments
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:53:41 -05:00
Jiri Pirko
6a659cd061 net_sched: cls_flow: remove faulty use of list_for_each_entry_rcu
rcu variant is not correct here. The code is called by updater (rtnl
lock is held), not by reader (no rcu_read_lock is held).

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:53:40 -05:00
Jiri Pirko
3fe6b49e2f net_sched: cls_bpf: remove faulty use of list_for_each_entry_rcu
rcu variant is not correct here. The code is called by updater (rtnl
lock is held), not by reader (no rcu_read_lock is held).

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
ACKed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:53:40 -05:00
Jiri Pirko
472f583701 net_sched: cls_bpf: remove unnecessary iteration and use passed arg
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:53:40 -05:00
Jiri Pirko
e4386456ae net_sched: cls_basic: remove unnecessary iteration and use passed arg
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:53:40 -05:00
Ying Xue
97ede29e80 tipc: convert name table read-write lock to RCU
Convert tipc name table read-write lock to RCU. After this change,
a new spin lock is used to protect name table on write side while
RCU is applied on read side.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:57 -05:00
Ying Xue
834caafa3e tipc: remove unnecessary INIT_LIST_HEAD
When a list_head variable is seen as a new entry to be added to a
list head, it's unnecessary to be initialized with INIT_LIST_HEAD().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:57 -05:00
Ying Xue
5492390a94 tipc: simplify relationship between name table lock and node lock
When tipc name sequence is published, name table lock is released
before name sequence buffer is delivered to remote nodes through its
underlying unicast links. However, when name sequence is withdrawn,
the name table lock is held until the transmission of the removal
message of name sequence is finished. During the process, node lock
is nested in name table lock. To prevent node lock from being nested
in name table lock, while withdrawing name, we should adopt the same
locking policy of publishing name sequence: name table lock should
be released before message is sent.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:57 -05:00
Ying Xue
3493d25cfb tipc: any name table member must be protected under name table lock
As tipc_nametbl_lock is used to protect name_table structure, the lock
must be held while all members of name_table structure are accessed.
However, the lock is not obtained while a member of name_table
structure - local_publ_count is read in tipc_nametbl_publish(), as
a consequence, an inconsistent value of local_publ_count might be got.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:57 -05:00
Ying Xue
fb9962f3ce tipc: ensure all name sequences are properly protected with its lock
TIPC internally created a name table which is used to store name
sequences. Now there is a read-write lock - tipc_nametbl_lock to
protect the table, and each name sequence saved in the table is
protected with its private lock. When a name sequence is inserted
or removed to or from the table, its members might need to change.
Therefore, in normal case, the two locks must be held while TIPC
operates the table. However, there are still several places where
we only hold tipc_nametbl_lock without proprerly obtaining name
sequence lock, which might cause the corruption of name sequence.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:56 -05:00
Ying Xue
38622f4195 tipc: ensure all name sequences are released when name table is stopped
As TIPC subscriber server is terminated before name table, no user
depends on subscription list of name sequence when name table is
stopped. Therefore, all name sequences stored in name table should
be released whatever their subscriptions lists are empty or not,
otherwise, memory leak might happen.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:56 -05:00
Ying Xue
993bfe5daf tipc: make name table allocated dynamically
Name table locking policy is going to be adjusted from read-write
lock protection to RCU lock protection in the future commits. But
its essential precondition is to convert the allocation way of name
table from static to dynamic mode.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:56 -05:00
Ying Xue
1b61e70ad1 tipc: remove size variable from publ_list struct
The size variable is introduced in publ_list struct to help us exactly
calculate SKB buffer sizes needed by publications when all publications
in name table are delivered in bulk in named_distribute(). But if
publication SKB buffer size is assumed to MTU, the size variable in
publ_list struct can be completely eliminated at the cost of wasting
a bit memory space for last SKB.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Tero Aho <tero.aho@coriant.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:55 -05:00
Joe Perches
60c04aecd8 udp: Neaten and reduce size of compute_score functions
The compute_score functions are a bit difficult to read.

Neaten them a bit to reduce object sizes and make them a
bit more intelligible.

Return early to avoid indentation and avoid unnecessary
initializations.

(allyesconfig, but w/ -O2 and no profiling)

$ size net/ipv[46]/udp.o.*
   text    data     bss     dec     hex filename
  28680    1184      25   29889    74c1 net/ipv4/udp.o.new
  28756    1184      25   29965    750d net/ipv4/udp.o.old
  17600    1010       2   18612    48b4 net/ipv6/udp.o.new
  17632    1010       2   18644    48d4 net/ipv6/udp.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:28:47 -05:00
Willem de Bruijn
829ae9d611 net-timestamp: allow reading recv cmsg on errqueue with origin tstamp
Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.

on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.

on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.

In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.

The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:

ipv4:        if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6:        if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {

At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Changes
  v1 -> v2
    large rewrite
    - integrate with existing pktinfo cmsg generation code
    - on ipv4: only send with new flag, to maintain legacy behavior
    - on ipv6: send at most a single pktinfo cmsg
    - on ipv6: initialize fields if not yet initialized

The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches

1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
   (http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
   cmsg from users without CAP_NET_RAW.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:20:48 -05:00
Willem de Bruijn
7ce875e5ec ipv4: warn once on passing AF_INET6 socket to ip_recv_error
One line change, in response to catching an occurrence of this bug.
See also fix f4713a3dfa ("net-timestamp: make tcp_recvmsg call ...")

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:20:48 -05:00
John W. Linville
81c412600f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2014-12-08 13:58:58 -05:00
Jeremiah Mahler
069f8457ae can: fix spelling errors
Fix various spelling errors in the comments of the CAN modules.

Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2014-12-07 21:22:05 +01:00
Jeremiah Mahler
b111b78c6e can: eliminate banner[] variable and switch to pr_info()
Several CAN modules use a design pattern with a banner[] variable at the
top which defines a string that is used once during init to print the
banner.  The string is also embedded with KERN_INFO which makes it
printk() specific.

Improve the code by eliminating the banner[] variable and moving the
string to where it is printed.  Then switch from printk(KERN_INFO to
pr_info() for the lines that were changed.

Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2014-12-07 21:22:01 +01:00
Alexei Starovoitov
89aa075832 net: sock: allow eBPF programs to be attached to sockets
introduce new setsockopt() command:

setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))

where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER

setsockopt() calls bpf_prog_get() which increments refcnt of the program,
so it doesn't get unloaded while socket is using the program.

The same eBPF program can be attached to multiple sockets.

User task exit automatically closes socket which calls sk_filter_uncharge()
which decrements refcnt of eBPF program

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:32 -08:00
David S. Miller
244ebd9f8f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains netfilter updates for net-next. Basically,
enhancements for xt_recent, skip zeroing of timer in conntrack, fix
linking problem with recent redirect support for nf_tables, ipset
updates and a couple of cleanups. More specifically, they are:

1) Rise maximum number per IP address to be remembered in xt_recent
   while retaining backward compatibility, from Florian Westphal.

2) Skip zeroing timer area in nf_conn objects, also from Florian.

3) Inspect IPv4 and IPv6 traffic from the bridge to allow filtering using
   using meta l4proto and transport layer header, from Alvaro Neira.

4) Fix linking problems in the new redirect support when CONFIG_IPV6=n
   and IP6_NF_IPTABLES=n.

And ipset updates from Jozsef Kadlecsik:

5) Support updating element extensions when the set is full (fixes
   netfilter bugzilla id 880).

6) Fix set match with 32-bits userspace / 64-bits kernel.

7) Indicate explicitly when /0 networks are supported in ipset.

8) Simplify cidr handling for hash:*net* types.

9) Allocate the proper size of memory when /0 networks are supported.

10) Explicitly add padding elements to hash:net,net and hash:net,port,
    because the elements must be u32 sized for the used hash function.

Jozsef is also cooking ipset RCU conversion which should land soon if
they reach the merge window in time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 20:56:46 -08:00
John W. Linville
f700076a9d Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-12-05 14:12:24 -05:00
Marcel Holtmann
5a34bd5f5d Bluetooth: Enable events for P-256 Public Key and DHKey commands
When the LE Read Local P-256 Public Key command is supported, then
enable its corresponding complete event. And when the LE Generate DHKey
command is supported, enable its corresponding complete event as well.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-05 18:17:49 +02:00
Marcel Holtmann
4efbb2ce8b Bluetooth: Add support for enabling Extended Scanner Filter Policies
The new Extended Scanner Filter Policies feature has to be enabled by
selecting the correct filter policy for the scan parameters. This
patch does that when the controller has been enabled to use LE Privacy.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-05 18:17:19 +02:00
Marcel Holtmann
2f010b5588 Bluetooth: Add support for handling LE Direct Advertising Report events
When the controller sends a LE Direct Advertising Report event, the host
must confirm that the resolvable random address provided matches with
its own identity resolving key. If it does, then that advertising report
needs to be processed. If it does not match, the report needs to be
ignored.

This patch adds full support for handling these new reports and using
them for device discovery and connection handling. This means when a
Bluetooth controller supports the Extended Scanner Filter Policies, it
is possible to use directed advertising with LE privacy.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-05 18:16:41 +02:00
Marcel Holtmann
4b71bba45c Bluetooth: Enabled LE Direct Advertising Report event if supported
When the controller supports the Extended Scanner Filter Policies, it
supports the LE Direct Advertising Report event. However by default
that event is blocked by the LE event mask. It is required to enable
it during controller setup.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-05 18:15:33 +02:00
Varka Bhadram
bcb47aabf4 mac802154: use goto label on failure
Signed-off-by: Varka Bhadram <varkab@cdac.in>
Reviewed-by: Stefan Schmidt <s.schmidt@samsung.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-05 14:18:42 +01:00
Marcel Holtmann
da25cf6a98 Bluetooth: Report invalid RSSI for service discovery and background scan
When using Start Service Discovery and when background scanning is used
to report devices, the RSSI is reported or the value 127 is provided in
case RSSI in unavailable.

For Start Discovery the value 0 is reported to keep backwards
compatibility with the existing users.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-12-05 14:14:28 +02:00