535495 Commits

Author SHA1 Message Date
Pravin B Shelar
9f57c67c37 gre: Remove support for sharing GRE protocol hook.
Support for sharing GREPROTO_CISCO port was added so that
OVS gre port and kernel GRE devices can co-exist. After
flow-based tunneling patches OVS GRE protocol processing
is completely moved to ip_gre module. so there is no need
for GRE protocol hook. Following patch consolidates
GRE protocol related functions into ip_gre module.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 14:03:54 -07:00
Pravin B Shelar
b2acd1dc39 openvswitch: Use regular GRE net_device instead of vport
Using GRE tunnel meta data collection feature, we can implement
OVS GRE vport. This patch removes all of the OVS
specific GRE code and make OVS use a ip_gre net_device.
Minimal GRE vport is kept to handle compatibility with
current userspace application.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 14:03:54 -07:00
Pravin B Shelar
2e15ea390e ip_gre: Add support to collect tunnel metadata.
Following patch create new tunnel flag which enable
tunnel metadata collection on given device.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 14:03:54 -07:00
Pravin B Shelar
a9020fde67 openvswitch: Move tunnel destroy function to oppenvswitch module.
This function will be used in gre and geneve vport implementations.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 14:03:54 -07:00
Rick Jones
fb811395cd net: add explicit logging and stat for neighbour table overflow
Add an explicit neighbour table overflow message (ratelimited) and
statistic to make diagnosing neighbour table overflows tractable in
the wild.

Diagnosing a neighbour table overflow can be quite difficult in the wild
because there is no explicit dmesg logged.  Callers to neighbour code
seem to use net_dbg_ratelimit when the neighbour call fails which means
the "base message" is not emitted and the callback suppressed messages
from the ratelimiting can end-up juxtaposed with unrelated messages.
Further, a forced garbage collection will increment a stat on each call
whether it was successful in freeing-up a table entry or not, so that
statistic is only a hint.  So, add a net_info_ratelimited message and
explicit statistic to the neighbour code.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:46:21 -07:00
Nikolay Aleksandrov
a7854037da bridge: netlink: add support for vlan_filtering attribute
This patch adds the ability to toggle the vlan filtering support via
netlink. Since we're already running with rtnl in .changelink() we don't
need to take any additional locks.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:36:43 -07:00
Benjamin Poirier
7a76a021cd net-timestamp: Update skb_complete_tx_timestamp comment
After "62bccb8 net-timestamp: Make the clone operation stand-alone from phy
timestamping" the hwtstamps parameter of skb_complete_tx_timestamp() may no
longer be NULL.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:35:37 -07:00
David S. Miller
e72ee3ed51 Merge branch 'qlcnic-enhancements'
Shahed Shaikh says:

====================
qlcnic: enhancements

This series adds few enhancements.

  o Patch from Harish reorders the sequence of header files inclusion,
    keeping kernel's header files on top.

  o Firmware introduced a new feature which allows driver to increases
    the size of firmware dump of iSCSI function which is being collected
    by NIC driver.

  o Print buffer address which is holding a firmware dump.

  o Use vzalloc() instead kzalloc() for allocating large chunk of memory
    which will avoid potential memory allocation failure.

  o Add new device ID for 0x8C30 which is a 83xx series based VF function.

Please apply this series to net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:34:29 -07:00
Shahed Shaikh
02509f171d qlcnic: Update version to 5.3.63
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:34:29 -07:00
Shahed Shaikh
0e90ad9bfd qlcnic: Don't use kzalloc unncecessarily for allocating large chunk of memory
Driver allocates a large chunk of temporary buffer using kzalloc
to copy FW image. As there is no real need of this memory to be
physically contiguous, use vzalloc instead.

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:34:28 -07:00
Shahed Shaikh
da286a6fd1 qlcnic: Add new VF device ID 0x8C30
This is a 83xx series based VF device

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:34:28 -07:00
Shahed Shaikh
642de51025 qlcnic: Print firmware minidump buffer and template header addresses
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:34:28 -07:00
Shahed Shaikh
d01a6d3c8a qlcnic: Add support to enable capability to extend minidump for iSCSI
In some cases it is required to capture minidump for iSCSI functions
as part of default minidump collection process. To enable this, firmware
exports it's capability and driver need to enable that capability
by issuing a mailbox command.

With this feature, firmware can provide additional iSCSI function's
minidump with smaller minidump capture mask (0x1f).

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:34:28 -07:00
Harish Patil
a930a4639d qlcnic: Rearrange ordering of header files inclusion
Include local headers files after kernel's header files.

Signed-off-by: Harish Patil <harish.patil@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:34:28 -07:00
Madalin Bucur
07151bc9f7 net: phy: select copper mode when Marvel 88e1111 in SGMII
For the Marvel 88e1111 PHY only two SGMII modes are available, both
allowing only SGMII to copper mode (with or without clock). SGMII
to fiber mode is not supported. Make sure the fiber/copper registers
selector bits are cleared for selecting copper mode.

Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:31:18 -07:00
Florian Westphal
330567b71d ipv6: don't reject link-local nexthop on other interface
48ed7b26faa7 ("ipv6: reject locally assigned nexthop addresses") is too
strict; it rejects following corner-case:

ip -6 route add default via fe80::1:2:3 dev eth1

[ where fe80::1:2:3 is assigned to a local interface, but not eth1 ]

Fix this by restricting search to given device if nh is linklocal.

Joint work with Hannes Frederic Sowa.

Fixes: 48ed7b26faa7 ("ipv6: reject locally assigned nexthop addresses")
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:29:22 -07:00
Kevin Hao
c4bc44c65b net: fec: fix the race between xmit and bdp reclaiming path
When we transmit a fragmented skb, we may run into a race like the
following scenario (assume txq->cur_tx is next to txq->dirty_tx):
           cpu 0                                          cpu 1
  fec_enet_txq_submit_skb
    reserve a bdp for the first fragment
    fec_enet_txq_submit_frag_skb
       update the bdp for the other fragment
       update txq->cur_tx
                                                   fec_enet_tx_queue
                                                     bdp = fec_enet_get_nextdesc(txq->dirty_tx, fep, queue_id);
                                                     This bdp is the bdp reserved for the first segment. Given
                                                     that this bdp BD_ENET_TX_READY bit is not set and txq->cur_tx
                                                     is already pointed to a bdp beyond this one. We think this is a
                                                     completed bdp and try to reclaim it.
    update the bdp for the first segment
    update txq->cur_tx

So we shouldn't update the txq->cur_tx until all the update to the
bdps used for fragments are performed. Also add the corresponding
memory barrier to guarantee that the update to the bdps, dirty_tx and
cur_tx performed in the proper order.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:28:14 -07:00
Daniel Borkmann
4e7c133068 netlink: make sure -EBUSY won't escape from netlink_insert
Linus reports the following deadlock on rtnl_mutex; triggered only
once so far (extract):

[12236.694209] NetworkManager  D 0000000000013b80     0  1047      1 0x00000000
[12236.694218]  ffff88003f902640 0000000000000000 ffffffff815d15a9 0000000000000018
[12236.694224]  ffff880119538000 ffff88003f902640 ffffffff81a8ff84 00000000ffffffff
[12236.694230]  ffffffff81a8ff88 ffff880119c47f00 ffffffff815d133a ffffffff81a8ff80
[12236.694235] Call Trace:
[12236.694250]  [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10
[12236.694257]  [<ffffffff815d133a>] ? schedule+0x2a/0x70
[12236.694263]  [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10
[12236.694271]  [<ffffffff815d2c3f>] ? __mutex_lock_slowpath+0x7f/0xf0
[12236.694280]  [<ffffffff815d2cc6>] ? mutex_lock+0x16/0x30
[12236.694291]  [<ffffffff814f1f90>] ? rtnetlink_rcv+0x10/0x30
[12236.694299]  [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180
[12236.694309]  [<ffffffff814f5ad3>] ? rtnl_getlink+0x113/0x190
[12236.694319]  [<ffffffff814f202a>] ? rtnetlink_rcv_msg+0x7a/0x210
[12236.694331]  [<ffffffff8124565c>] ? sock_has_perm+0x5c/0x70
[12236.694339]  [<ffffffff814f1fb0>] ? rtnetlink_rcv+0x30/0x30
[12236.694346]  [<ffffffff8150d62c>] ? netlink_rcv_skb+0x9c/0xc0
[12236.694354]  [<ffffffff814f1f9f>] ? rtnetlink_rcv+0x1f/0x30
[12236.694360]  [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180
[12236.694367]  [<ffffffff8150d344>] ? netlink_sendmsg+0x484/0x5d0
[12236.694376]  [<ffffffff810a236f>] ? __wake_up+0x2f/0x50
[12236.694387]  [<ffffffff814cad23>] ? sock_sendmsg+0x33/0x40
[12236.694396]  [<ffffffff814cb05e>] ? ___sys_sendmsg+0x22e/0x240
[12236.694405]  [<ffffffff814cab75>] ? ___sys_recvmsg+0x135/0x1a0
[12236.694415]  [<ffffffff811a9d12>] ? eventfd_write+0x82/0x210
[12236.694423]  [<ffffffff811a0f9e>] ? fsnotify+0x32e/0x4c0
[12236.694429]  [<ffffffff8108cb70>] ? wake_up_q+0x60/0x60
[12236.694434]  [<ffffffff814cba09>] ? __sys_sendmsg+0x39/0x70
[12236.694440]  [<ffffffff815d4797>] ? entry_SYSCALL_64_fastpath+0x12/0x6a

It seems so far plausible that the recursive call into rtnetlink_rcv()
looks suspicious. One way, where this could trigger is that the senders
NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so
the rtnl_getlink() request's answer would be sent to the kernel instead
to the actual user process, thus grabbing rtnl_mutex() twice.

One theory would be that netlink_autobind() triggered via netlink_sendmsg()
internally overwrites the -EBUSY error to 0, but where it is wrongly
originating from __netlink_insert() instead. That would reset the
socket's portid to 0, which is then filled into NETLINK_CB(skb).portid
later on. As commit d470e3b483dc ("[NETLINK]: Fix two socket hashing bugs.")
also puts it, -EBUSY should not be propagated from netlink_insert().

It looks like it's very unlikely to reproduce. We need to trigger the
rhashtable_insert_rehash() handler under a situation where rehashing
currently occurs (one /rare/ way would be to hit ht->elasticity limits
while not filled enough to expand the hashtable, but that would rather
require a specifically crafted bind() sequence with knowledge about
destination slots, seems unlikely). It probably makes sense to guard
__netlink_insert() in any case and remap that error. It was suggested
that EOVERFLOW might be better than an already overloaded ENOMEM.

Reference: http://thread.gmane.org/gmane.linux.network/372676
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 10:59:10 -07:00
Ivan Vecera
ade4dc3e61 bna: fix interrupts storm caused by erroneous packets
The commit "e29aa33 bna: Enable Multi Buffer RX" moved packets counter
increment from the beginning of the NAPI processing loop after the check
for erroneous packets so they are never accounted. This counter is used
to inform firmware about number of processed completions (packets).
As these packets are never acked the firmware fires IRQs for them again
and again.

Fixes: e29aa33 ("bna: Enable Multi Buffer RX")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 10:58:37 -07:00
David S. Miller
ea70858423 Merge branch 'mvpp2-fixes'
Marcin Wojtas says:

====================
Fixes for the network driver of Marvell Armada 375 SoC

This is a set of three patches that fix long-lasting problems implemented in
the initial support for the Armada 375 network controller.

Due to an inappropriate concept of handling the per-CPU sent packets'
processing on TX path the driver numerous problems occured, such as RCU
stalls. Those have been fixed, of which details you can find in the commit
logs. The patches were intensively tested on top of v4.2-rc5.

I'm looking forward to any comments or remarks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 10:57:01 -07:00
Marcin Wojtas
edc660fa09 net: mvpp2: replace TX coalescing interrupts with hrtimer
The PP2 controller is capable of per-CPU TX processing, which means there are
per-CPU banked register sets and queues. Current version of the driver supports
TX packet coalescing - once on given CPU sent packets amount reaches a threshold
value, an IRQ occurs. However, there is a single interrupt line responsible for
CPU0/1 TX and RX events (the latter is not per-CPU, the hardware does not
support RSS).

When the top-half executes the interrupt cause is not known. This is why in
NAPI poll function, along with RX processing, IRQ cause register on both
CPU's is accessed in order to determine on which of them the TX coalescing
threshold might have been reached. Thus the egress processing and releasing the
buffers is able to take place on the corresponding CPU. Hitherto approach lead
to an illegal usage of on_each_cpu function in softirq context.

The problem is solved by resigning from TX coalescing interrupts and separating
egress finalization from NAPI processing. For that purpose a method of using
hrtimer is introduced. In main transmit function (mvpp2_tx) buffers are released
once a software coalescing threshold is reached. In case not all the data is
processed a timer is set on this CPU - in its interrupt context a tasklet is
scheduled in which all queues are processed. At once only one timer per-CPU can
be running, which is controlled by a dedicated flag.

This commit removes TX processing from NAPI polling function, disables hardware
coalescing and enables hrtimer with tasklet, using new per-CPU port structure
(mvpp2_port_pcpu).

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 10:57:00 -07:00
Marcin Wojtas
71ce391dfb net: mvpp2: enable proper per-CPU TX buffers unmapping
mvpp2 driver allows usage of per-CPU TX processing. Once the packets are
prepared independetly on each CPU, the hardware enqueues the descriptors in
common TX queue. After they are sent, the buffers and associated sk_buffs
should be released on the corresponding CPU.

This is why a special index is maintained in order to point to the right data to
be released after transmission takes place. Each per-CPU TX queue comprise an
array of sent sk_buffs, freed in mvpp2_txq_bufs_free function. However, the
index was used there also for obtaining a descriptor (and therefore a buffer to
be DMA-unmapped) from common TX queue, which was wrong, because it was not
referring to the current CPU.

This commit enables proper unmapping of sent data buffers by indexing them in
per-CPU queues using a dedicated array for keeping their physical addresses.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 10:57:00 -07:00
Marcin Wojtas
d53793c5d6 net: mvpp2: remove excessive spinlocks from driver initialization
Using spinlocks protection during one-time driver initialization is not
necessary. Moreover it resulted in invalid GFP_KERNEL allocation under the lock.

This commit removes redundant spinlocks from buffer manager part of mvpp2
initialization.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Reported-by: Alexandre Fournier <alexandre.fournier@wisp-e.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 10:57:00 -07:00
Linus Torvalds
2b9bea035a - Fix dependency issues on ChromeOS platforms
- Fix runtime PM issues on Arizona
 - Fix IRQ/Suspend race on Arizona
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVyMu3AAoJEFGvii+H/Hdhvp4QAIBWKyet/MbWQY5rptvKZmTV
 zQeTjQb6Qyr/hUJQ1Ptedu/Df5eX8L4HVVLuQDTTMjVMd+4KSAD18PdqrjMsQ7U3
 fTfkYs4w4o71aSg3yKVQ8Y+OekwmumVPWOTlb2PSFwEciEhBS1Fgl0BIOWy3Or3g
 rEG19B0/rlL2CE448C9ENqtUcEmt4MylyyoDFUz/27zLY74Jw7trHpXBY9d1tXhG
 BJD9qQNDdO1o5zBvvVQ8ETr+QK60GahayuwhXtnbJbvL46wGrxhOALwS9UtyhF/w
 XgEvZkX7Ns6uZrZkLhz251GNPxqE7CEe2HZ2x8EUj8TxbIXA/QbNm4sid6xe74IQ
 YGRTrVql5By+LYemEqrOIw2T4i+7HI7hOf+LI6kVWvn49/zr6v7Vk2deSZ+aIcEs
 9rUTa7upIoLA5msDgk1SILdxKjStkgbz/BDUdAH8hbIJe8hzk29Kr2A9/D1QwIif
 k41+KWDaobUFiKLUQUavPdGTVus05kolsmgikmlg7lB3WSB496uhsQ9DxddH7Vqo
 WoJ4BHXn4O++uSAuGe8MQGBYviK32pEpSc5k+U1dauIqNmrRy1W7Tt5DsKtNNWDn
 F64KVNNLfbEhHp9Tgnd+7axvhuNpll6FaSGEvFC+TsA4Xdvk692k5HmDA2L4/Dyx
 OCHLOve/vkNqPi8mmD5K
 =/Rnq
 -----END PGP SIGNATURE-----

Merge tag 'mfd-fixes-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

Pull MFD fixes from Lee Jones:
 - fix dependency issues on ChromeOS platforms
 - fix runtime PM issues on Arizona
 - fix IRQ/suspend race on Arizona

* tag 'mfd-fixes-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
  mfd: Remove MFD_CROS_EC_SPI depends on OF
  platform/chrome: Don't make CHROME_PLATFORMS depends on X86 || ARM
  mfd: arizona: Fix initialisation of the PM runtime
  mfd: arizona: Fix race between runtime suspend and IRQs
2015-08-10 10:48:11 -07:00
Linus Torvalds
016a9f50e2 NTB bug fixes to address transport receive issues, stats, link
negotiation issues, and string formatting.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJVx78FAAoJEG5mS6x6i9IjczkP/2kVMQUdGTOCbxX7tqDUO4Fc
 nL2oa6uyUTv7/YaEIG2qdk3XTJB7xQI75LROySjMrG1hxMpR7HRz4pvskB3AB3SJ
 AyyX5SoD067YyZ7CX85S0LYN+vGo7eimdzbDMm1lZn9184cuKPxvkZr2sVgwEP0c
 AqtiZwHySmUIJ/twfG3W3Mb1Xx2b5Y9F26h94ewDe5kXsVxWF1+ROS6iTF4hK5PT
 WE/x7Y0Db0Dq4xEFZ1crzUSsln7qY+u2bUzujPZbpPrs4dR81PK48EGJtag9fLa1
 kuyXSE2vZEOIgAchS3P1JkN5Fj3mJNZkPcchWdkCqxzr4TqC4dydE/P1Etp7ZAT/
 7L8xIlHdl4EBClmqdFnoOlA2EgaLwbwOyKny1vDPO+s4WoLjy7KrgsttEDRfRe0I
 d2yJZyqQohXWUoW9bgoyHRy4c6r0i4heOq4ImSggHCFqdp5yhKzXR7xUQxvcPakh
 Bqu/7wfpEov9R2gOej/pt5j2ogE1cahpv9jhHgxvxKSHyXLvwkE/xW3eGARP4diK
 nuybsHXeA5KaFHfcKfnJXiGVNTXNeDgqx2Ia3t9+DC4EnnKQKC6UVVUq8VsV/IXe
 LCudolCza1PKNFAuXEWO10UPJeHLe+97eojXcdOFy/qPkiIz6YPt+bE73nBDit8w
 4EUQYPAxWjZqCf+bfLSb
 =zMAG
 -----END PGP SIGNATURE-----

Merge tag 'ntb-4.2-rc7' of git://github.com/jonmason/ntb

Pull NTB bugfixes from Jon Mason:
 "NTB bug fixes to address transport receive issues, stats, link
  negotiation issues, and string formatting"

* tag 'ntb-4.2-rc7' of git://github.com/jonmason/ntb:
  ntb: avoid format string in dev_set_name
  NTB: Fix dereference before check
  NTB: Fix zero size or integer overflow in ntb_set_mw
  NTB: Schedule to receive on QP link up
  NTB: Fix oops in debugfs when transport is half-up
  NTB: ntb_netdev not covering all receive errors
  NTB: Fix transport stats for multiple devices
  NTB: Fix ntb_transport out-of-order RX update
2015-08-10 10:38:42 -07:00
Linus Torvalds
a3ca013d88 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull RCU pathwalk fix from Al Viro:
 "Another racy use of nd->path.dentry in RCU mode"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  may_follow_link() should use nd->inode
2015-08-10 10:04:47 -07:00
Nathan Lynch
878854a374 arm64: VDSO: fix coarse clock monotonicity regression
Since 906c55579a63 ("timekeeping: Copy the shadow-timekeeper over the
real timekeeper last") it has become possible on arm64 to:

- Obtain a CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSE timestamp
  via syscall.
- Subsequently obtain a timestamp for the same clock ID via VDSO which
  predates the first timestamp (by one jiffy).

This is because arm64's update_vsyscall is deriving the coarse time
using the __current_kernel_time interface, when it should really be
using the timekeeper object provided to it by the timekeeping core.
It happened to work before only because __current_kernel_time would
access the same timekeeper object which had been passed to
update_vsyscall.  This is no longer the case.

Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-08-10 15:37:45 +01:00
Jason A. Donenfeld
fc5fee86bd x86/xen: build "Xen PV" APIC driver for domU as well
It turns out that a PV domU also requires the "Xen PV" APIC
driver. Otherwise, the flat driver is used and we get stuck in busy
loops that never exit, such as in this stack trace:

(gdb) target remote localhost:9999
Remote debugging using localhost:9999
__xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
56              while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
(gdb) bt
 #0  __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
 #1  __default_send_IPI_shortcut (shortcut=<optimized out>,
dest=<optimized out>, vector=<optimized out>) at
./arch/x86/include/asm/ipi.h:75
 #2  apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
 #3  0xffffffff81011336 in arch_irq_work_raise () at
arch/x86/kernel/irq_work.c:47
 #4  0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
kernel/irq_work.c:100
 #5  0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
 #6  0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
fmt=<optimized out>, args=<optimized out>)
    at kernel/printk/printk.c:1778
 #7  0xffffffff816010c8 in printk (fmt=<optimized out>) at
kernel/printk/printk.c:1868
 #8  0xffffffffc00013ea in ?? ()
 #9  0x0000000000000000 in ?? ()

Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-08-10 15:33:10 +01:00
Scot Doyle
2a17d7e80f fbcon: unconditionally initialize cursor blink interval
A sun7i-a20-olinuxino-micro fails to boot when kernel parameter
vt.global_cursor_default=0. The value is copied to vc->vc_deccm
causing the initialization of ops->cur_blink_jiffies to be skipped.
Unconditionally initialize it.

Reported-and-tested-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2015-08-10 17:20:32 +03:00
Christian Engelmayer
37b617f9be video: Fix possible leak in of_get_videomode()
In case videomode_from_timings() fails in function of_get_videomode(), the
allocated display timing data is not freed in the exit path. Make sure that
display_timings_release() is called in any case. Detected by Coverity CID
1309681.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2015-08-10 15:11:12 +03:00
Phil Sutter
3c16241c44 netfilter: SYNPROXY: fix sending window update to client
Upon receipt of SYNACK from the server, ipt_SYNPROXY first sends back an ACK to
finish the server handshake, then calls nf_ct_seqadj_init() to initiate
sequence number adjustment of forwarded packets to the client and finally sends
a window update to the client to unblock it's TX queue.

Since synproxy_send_client_ack() does not set synproxy_send_tcp()'s nfct
parameter, no sequence number adjustment happens and the client receives the
window update with incorrect sequence number. Depending on client TCP
implementation, this leads to a significant delay (until a window probe is
being sent).

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-10 13:55:07 +02:00
Phil Sutter
96fffb4f23 netfilter: ip6t_SYNPROXY: fix NULL pointer dereference
This happens when networking namespaces are enabled.

Suggested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-10 13:54:44 +02:00
Robert Jarzmik
9e6e35edb3 video: fbdev: pxa3xx_gcu: prepare the clocks
The clocks need to be prepared before being enabled. Without it a
warning appears in the drivers probe path :

WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:707 clk_core_enable+0x84/0xa0()
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.2.0-rc3-cm-x300+ #804
Hardware name: CM-X300 module
[<c000ed50>] (unwind_backtrace) from [<c000ce08>] (show_stack+0x10/0x14)
[<c000ce08>] (show_stack) from [<c0017eb4>] (warn_slowpath_common+0x7c/0xb4)
[<c0017eb4>] (warn_slowpath_common) from [<c0017f88>] (warn_slowpath_null+0x1c/0x24)
[<c0017f88>] (warn_slowpath_null) from [<c02d30dc>] (clk_core_enable+0x84/0xa0)
[<c02d30dc>] (clk_core_enable) from [<c02d3118>] (clk_enable+0x20/0x34)
[<c02d3118>] (clk_enable) from [<c0200dfc>] (pxa3xx_gcu_probe+0x148/0x338)
[<c0200dfc>] (pxa3xx_gcu_probe) from
[<c022eccc>] (platform_drv_probe+0x30/0x94)

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2015-08-10 12:25:43 +03:00
Jyri Sarha
6266f4b19d OMAPDSS: Fix omap_dss_find_output_by_port_node() port refcount decrement
Fix omap_dss_find_output_by_port_node() port parameter refcount
decrementation. The only user of dss_of_port_get_parent_device()
function is omap_dss_find_output_by_port_node() and it assumes the
refcount of the port parameter is not decremented by the call.

Signed-off-by: Jyri Sarha <jsarha@ti.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2015-08-10 12:22:40 +03:00
Jyri Sarha
2b55cb3b04 OMAPDSS: Fix node refcount leak in omapdss_of_get_next_port()
Fix node refcount leak in omapdss_of_get_next_port().

Signed-off-by: Jyri Sarha <jsarha@ti.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2015-08-10 12:22:35 +03:00
Linus Walleij
2701fa0864 fbdev: select versatile helpers for the integrator
Commit 11c32d7b6274cb0f554943d65bd4a126c4a86dcd
"video: move Versatile CLCD helpers" missed the fact
that the Integrator/CP is also using the helper, and
as a result the platform got only stubs and no graphics.
Add this as a default selection to Kconfig so we have
graphics again.

Fixes: 11c32d7b6274 (video: move Versatile CLCD helpers)
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2015-08-10 12:20:38 +03:00
David S. Miller
2cf1b5ce16 Merge branch 'mlxsw-fixes'
Jiri Pirko says:

====================
mlxsw: Couple of fixes/adjustments

Ido Schimmel (5):
  mlxsw: Call free_netdev when removing port
  mlxsw: Make system port to local port mapping explicit
  mlxsw: Simplify mlxsw_sx_port_xmit function
  mlxsw: Use correct skb length when dumping payload
  mlxsw: Fix use-after-free bug in mlxsw_sx_port_xmit

Jiri Pirko (2):
  mlxsw: Make pci module dependent on HAS_DMA and HAS_IOMEM
  mlxsw: Strip FCS from incoming packets
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:31 -07:00
Ido Schimmel
e577516b9d mlxsw: Fix use-after-free bug in mlxsw_sx_port_xmit
Store the length of the skb before transmitting it and use it for stats
instead of skb->len, since skb might have been freed already.

This issue was discovered using the Kernel Address sanitizer (KASan).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:10 -07:00
Ido Schimmel
3bfcd34764 mlxsw: Use correct skb length when dumping payload
Do not use the length of the transmitted skb (which was freed), but
that of the response skb.

This issue was discovered using the Kernel Address sanitizer (KASan).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:10 -07:00
Ido Schimmel
d003462a50 mlxsw: Simplify mlxsw_sx_port_xmit function
Previously we only checked if the transmission queue is not full in the
middle of the xmit function. This lead to complex logic due to the fact
that sometimes we need to reallocate the headroom for our Tx header.

Allow the switch driver to know if the transmission queue is not full
before sending the packet and remove this complex logic.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:10 -07:00
Jiri Pirko
7b7b9cff74 mlxsw: Strip FCS from incoming packets
FCS of incoming packets is already checked by HW. Just strip it out.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:10 -07:00
Jiri Pirko
74ed207e2a mlxsw: Make pci module dependent on HAS_DMA and HAS_IOMEM
This resolves compile errors on um-allyesconfig.

Note that there are many other drivers which have the same issue.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:09 -07:00
Ido Schimmel
e61011b5e0 mlxsw: Make system port to local port mapping explicit
System ports are unique identifiers in a multi-ASIC environment that
represent all the available ports in the system. Local ports on the
other hand, are unique only within the local ASIC.

Since system port to local port mapping is not part of the HW-SW
contract and since only single-ASIC configurations are currently
supported, set an explicit 1:1 mapping by configuring the Switch System
Port Record (SSPR) register.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:09 -07:00
Ido Schimmel
26a80f6e54 mlxsw: Call free_netdev when removing port
When removing a port's netdevice we should also free the memory
allocated by alloc_etherdev(). Do this by calling free_netdev() at the
end of the teardown sequence.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:09 -07:00
Masanari Iida
ecea49914b net: ethernet: Fix double word "the the" in eth.c
This patch fix double word "the the" in
Documentation/DocBook/networking/API-eth-get-headlen.html
Documentation/DocBook/networking/netdev.html
Documentation/DocBook/networking.xml

These files are generated from comment in source,
so I have to fix comment in net/ethernet/eth.c.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:53:00 -07:00
Shaohui Xie
0024f89200 net: phy: add RealTek RTL8211DN phy id
RTL8211DN is compatible with RTL8211E.

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:52:15 -07:00
Robert Shearman
118d523463 mpls: Enforce payload type of traffic sent using explicit NULL
RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
label on the stack, then after popping the resulting packet must be
treated as a IPv4 packet and forwarded based on the IPv4 header. The
same is true for IPv6 Explicit NULL with an IPv6 packet following.

Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
add an attribute that specifies the expected payload type for use at
forwarding time for determining the type of the encapsulated packet
instead of inspecting the first nibble of the packet.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:51:42 -07:00
David S. Miller
d74a790d52 Merge branch 'bpf-perf'
Kaixu Xia says:

====================
bpf: Introduce the new ability of eBPF programs to access hardware PMU counter

This patchset is base on the net-next:
 git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
commit 9dc20a649609c95ce7c5ac4282656ba627b67d49.

Previous patch v6 url:
https://lkml.org/lkml/2015/8/4/188

changes in V7:
 - rebase the whole patch set to net-next tree(9dc20a64);
 - split out the core perf APIs into Patch 1/5;
 - change the return value of function perf_event_attrs()
   from struct perf_event * to const struct perf_event * in
   Patch 1/5;
 - rename the function perf_event_read_internal() to perf_event_
   read_local() and rewrite it in Patch 1/5;
 - rename the function check_func_limit() to check_map_func
   _compatibility() and remove the unnecessary pass pointer to
   a pointer in Patch 4/5;

changes in V6:
 - make the Patch 1/4 commit message more meaning and readable;
 - remove the unnecessary comment in Patch 2/4 and make it clean;
 - declare the function perf_event_release_kernel() in include/
   linux/perf_event.h to fix the build error when CONFIG_PERF_EVENTS
   isn't configured in Patch 2/4;
 - add function perf_event_attrs() to get the struct perf_event_attr
   in Patch 2/4.
 - move the related code from kernel/trace/bpf_trace.c to kernel/
   events/core.c and add function perf_event_read_internal() to
   avoid poking inside of the event outside of perf code in Patch 3/4;
 - generial the func & map match-pair with an array in Patch 3/4;

changes in V5:
 - move struct fd_array_map_ops* fd_ops to bpf_map;
 - move array perf event decrement refcnt function to
   map_free;
 - fix the NULL ptr of perf_event_get();
 - move bpf_perf_event_read() to kernel/bpf/bpf_trace.c;
 - get rid of the remaining struct bpf_prog;
 - move the unnecessay cast on void *;

changes in V4:
 - make the bpf_prog_array_map more generic;
 - fix the bug of event refcnt leak;
 - use more useful errno in bpf_perf_event_read();

changes in V3:
 - collapse V2 patches 1-3 into one;
 - drop the function map->ops->map_traverse_elem() and release
   the struct perf_event in map_free;
 - only allow to access bpf_perf_event_read() from programs;
 - update the perf_event_array_map elem via xchg();
 - pass index directly to bpf_perf_event_read() instead of
   MAP_KEY;

changes in V2:
 - put atomic_long_inc_not_zero() between fdget() and fdput();
 - limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE;
 - Only read the event counter on current CPU or on current
   process;
 - add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the
   pointer to the struct perf_event;
 - according to the perf_event_map_fd and key, the function
   bpf_perf_event_read() can get the Hardware PMU counter value;

Patch 5/5 is a simple example and shows how to use this new eBPF
programs ability. The PMU counter data can be found in
/sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU
value when 'kprobe/sys_write' sampling)

  $ cat /sys/kernel/debug/tracing/trace_pipe
  $ ./tracex6
       ...
       syslog-ng-548   [000] d..1    76.905673: : CPU-0   681765271
       syslog-ng-548   [000] d..1    76.905690: : CPU-0   681787855
       syslog-ng-548   [000] d..1    76.905707: : CPU-0   681810504
       syslog-ng-548   [000] d..1    76.905725: : CPU-0   681834771
       syslog-ng-548   [000] d..1    76.905745: : CPU-0   681859519
       syslog-ng-548   [000] d..1    76.905766: : CPU-0   681890419
       syslog-ng-548   [000] d..1    76.905783: : CPU-0   681914045
       syslog-ng-548   [000] d..1    76.905800: : CPU-0   681935950
       syslog-ng-548   [000] d..1    76.905816: : CPU-0   681958299
              ls-690   [005] d..1    82.241308: : CPU-5   3138451
              sh-691   [004] d..1    82.244570: : CPU-4   7324988
           <...>-699   [007] d..1    99.961387: : CPU-7   3194027
           <...>-695   [003] d..1    99.961474: : CPU-3   288901
           <...>-695   [003] d..1    99.961541: : CPU-3   383145
           <...>-695   [003] d..1    99.961591: : CPU-3   450365
           <...>-695   [003] d..1    99.961639: : CPU-3   515751
           <...>-695   [003] d..1    99.961686: : CPU-3   579047
       ...

The detail of patches is as follow:

Patch 1/5 add the necessary core perf APIs perf_event_attrs(),
perf_event_get(),perf_event_read_local() when accessing events
counters in eBPF programs

Patch 2/5 rewrites part of the bpf_prog_array map code and make it
more generic;

Patch 3/5 introduces a new bpf map type. This map only stores the
pointer to struct perf_event;

Patch 4/5 implements function bpf_perf_event_read() that get the
selected hardware PMU conuter;

Patch 5/5 gives a simple example.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:50:06 -07:00
Kaixu Xia
47efb30274 samples/bpf: example of get selected PMU counter value
This is a simple example and shows how to use the new ability
to get the selected Hardware PMU counter value.

Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:50:06 -07:00
Kaixu Xia
35578d7984 bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter
According to the perf_event_map_fd and index, the function
bpf_perf_event_read() can convert the corresponding map
value to the pointer to struct perf_event and return the
Hardware PMU counter value.

Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:50:06 -07:00