636789 Commits

Author SHA1 Message Date
Gavin Shan
4f77c55c3a drivers/pci/hotplug: Fix initial state for empty slot
commit d0c424971f70501ec0a0364117b9934db039c9cc upstream.

In PowerNV PCI hotplug driver, the initial PCI slot's state is set
to PNV_PHP_STATE_POPULATED if no PCI devices are connected to the
slot. The PCI devices that are hot added to the slot won't be probed
and populated because of the check in pnv_php_enable():

        /* Check if the slot has been configured */
        if (php_slot->state != PNV_PHP_STATE_REGISTERED)
                return 0;

This fixes the issue by leaving the slot in PNV_PHP_STATE_REGISTERED
state initially if nothing is connected to the slot.

Fixes: 360aebd85a4 ("drivers/pci/hotplug: Support surprise hotplug in powernv driver")
Reported-by: Hank Chang <hankmax0000@gmail.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Willie Liauw <williel@supermicro.com.tw>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:46 +08:00
Gavin Shan
1afe7b4ac3 drivers/pci/hotplug: Handle presence detection change properly
commit d7d55536c6cd1f80295b6d7483ad0587b148bde4 upstream.

The surprise hotplug is driven by interrupt in PowerNV PCI hotplug
driver. In the interrupt handler, pnv_php_interrupt(), we bail when
pnv_pci_get_presence_state() returns zero wrongly. It causes the
presence change event is always ignored incorrectly.

This fixes the issue by bailing on error (non-zero value) returned
from pnv_pci_get_presence_state().

Fixes: 360aebd85a4 ("drivers/pci/hotplug: Support surprise hotplug in powernv driver")
Reported-by: Hank Chang <hankmax0000@gmail.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Willie Liauw <williel@supermicro.com.tw>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:46 +08:00
Nicholas Bellinger
17ea11d553 target: Fix NULL dereference during LUN lookup + active I/O shutdown
commit bd4e2d2907fa23a11d46217064ecf80470ddae10 upstream.

When transport_clear_lun_ref() is shutting down a se_lun via
configfs with new I/O in-flight, it's possible to trigger a
NULL pointer dereference in transport_lookup_cmd_lun() due
to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD
checking before incrementing lun->lun_ref.count after
lun->lun_ref has switched to atomic_t mode.

This results in a NULL pointer dereference as LUN shutdown
code in core_tpg_remove_lun() continues running after the
existing ->release() -> core_tpg_lun_ref_release() callback
completes, and clears the RCU protected se_lun->lun_se_dev
pointer.

During the OOPs, the state of lun->lun_ref in the process
which triggered the NULL pointer dereference looks like
the following on v4.1.y stable code:

struct se_lun {
  lun_link_magic = 4294932337,
  lun_status = TRANSPORT_LUN_STATUS_FREE,

  .....

  lun_se_dev = 0x0,
  lun_sep = 0x0,

  .....

  lun_ref = {
    count = {
      counter = 1
    },
    percpu_count_ptr = 3,
    release = 0xffffffffa02fa1e0 <core_tpg_lun_ref_release>,
    confirm_switch = 0x0,
    force_atomic = false,
    rcu = {
      next = 0xffff88154fa1a5d0,
      func = 0xffffffff8137c4c0 <percpu_ref_switch_to_atomic_rcu>
    }
  }
}

To address this bug, use percpu_ref_tryget_live() to ensure
once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref
has switched to atomic_t, all new I/Os will fail to obtain
a new lun->lun_ref reference.

Also use an explicit percpu_ref_kill_and_confirm() callback
to block on ->lun_ref_comp to allow the first stage and
associated RCU grace period to complete, and then block on
->lun_ref_shutdown waiting for the final percpu_ref_put()
to drop the last reference via transport_lun_remove_cmd()
before continuing with core_tpg_remove_lun() shutdown.

Reported-by: Rob Millner <rlm@daterainc.com>
Tested-by: Rob Millner <rlm@daterainc.com>
Cc: Rob Millner <rlm@daterainc.com>
Tested-by: Vaibhav Tandon <vst@datera.io>
Cc: Vaibhav Tandon <vst@datera.io>
Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:46 +08:00
Gavin Shan
54eff720c9 pci/hotplug/pnv-php: Disable surprise hotplug capability on conflicts
commit 303529d6ef1293513c2c73c9ab86489eebb37d08 upstream.

The root port or PCIe switch downstream port might have been associated
with driver other than pnv-php. The MSI or MSIx might also have been
enabled by that driver (e.g. pcieport_drv). Attempt to enable MSI incurs
below backtrace:

 PowerPC PowerNV PCI Hotplug Driver version: 0.1
 ------------[ cut here ]------------
 WARNING: CPU: 19 PID: 1004 at drivers/pci/msi.c:1071 \
                              __pci_enable_msi_range+0x84/0x4e0
 NIP [c000000000665c34] __pci_enable_msi_range+0x84/0x4e0
 LR [c000000000665c24] __pci_enable_msi_range+0x74/0x4e0
 Call Trace:
 [c000000384d67600] [c000000000665c24] __pci_enable_msi_range+0x74/0x4e0
 [c000000384d676e0] [d00000000aa31b04] pnv_php_register+0x564/0x5a0 [pnv_php]
 [c000000384d677c0] [d00000000aa31658] pnv_php_register+0xb8/0x5a0 [pnv_php]
 [c000000384d678a0] [d00000000aa31658] pnv_php_register+0xb8/0x5a0 [pnv_php]
 [c000000384d67980] [d00000000aa31dfc] pnv_php_init+0x60/0x98 [pnv_php]
 [c000000384d679f0] [c00000000000cfdc] do_one_initcall+0x6c/0x1d0
 [c000000384d67ab0] [c000000000b92354] do_init_module+0x94/0x254
 [c000000384d67b40] [c00000000019719c] load_module+0x258c/0x2c60
 [c000000384d67d30] [c000000000197bb0] SyS_finit_module+0xf0/0x170
 [c000000384d67e30] [c00000000000b184] system_call+0x38/0xe0

This fixes the issue by skipping enabling the surprise hotplug
capability if the MSI or MSIx on the PCI slot's upstream port has
been enabled by other driver.

Fixes: 360aebd85a4c ("drivers/pci/hotplug: Support surprise hotplug in powernv driver")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:45 +08:00
Gavin Shan
4ee3508f7a pci/hotplug/pnv-php: Remove WARN_ON() in pnv_php_put_slot()
commit 36c7c9da40c408a71e5e6bfe12e57dcf549a296d upstream.

The WARN_ON() causes unnecessary backtrace when putting the parent
slot, which is likely to be NULL.

 WARNING: CPU: 2 PID: 1071 at drivers/pci/hotplug/pnv_php.c:85 \
                              pnv_php_release+0xcc/0x150 [pnv_php]
    :
 Call Trace:
 [c0000003bc007c10] [d00000000ad613c4] pnv_php_release+0x144/0x150 [pnv_php]
 [c0000003bc007c40] [c0000000006641d8] pci_hp_deregister+0x238/0x330
 [c0000003bc007cd0] [d00000000ad61440] pnv_php_unregister_one+0x70/0xa0 [pnv_php]
 [c0000003bc007d10] [d00000000ad614c0] pnv_php_unregister+0x50/0x80 [pnv_php]
 [c0000003bc007d40] [d00000000ad61e84] pnv_php_exit+0x50/0xcb4 [pnv_php]
 [c0000003bc007d70] [c00000000019499c] SyS_delete_module+0x1fc/0x2a0
 [c0000003bc007e30] [c00000000000b184] system_call+0x38/0xe0

Fixes: 66725152fb9f ("PCI/hotplug: PowerPC PowerNV PCI hotplug driver")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:45 +08:00
Jeff Layton
5da90d0018 ceph: remove req from unsafe list when unregistering it
commit df963ea8a082d31521a120e8e31a29ad8a1dc215 upstream.

There's no reason a request should ever be on a s_unsafe list but not
in the request tree.

Link: http://tracker.ceph.com/issues/18474
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:45 +08:00
Steven Rostedt (VMware)
ff61e0123b ktest: Fix child exit code processing
commit 32677207dcc5e594254b7fb4fb2352b1755b1d5b upstream.

The child_exit errno needs to be shifted by 8 bits to compare against the
return values for the bisect variables.

Fixes: c5dacb88f0a64 ("ktest: Allow overriding bisect test results")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:45 +08:00
Boris Brezillon
1f2ca141ec memory/atmel-ebi: Fix ns <-> cycles conversions
commit ee194289502a6901cc77dc9a893bf2afd351ac5e upstream.

at91sam9_ebi_get_config() is incorrectly converting timings in clock
cycles into timings in nanoseconds by multiplying the cycle values by
the clk rate instead of the clk period.

at91sam9_ebi_xslate_config() has the same problem for the
tdf_ns -> tdf_cycles conversion.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Chris Leahy <leahycm@gmail.com>
Fixes: 6a4ec4cd0888 ("memory: add Atmel EBI (External Bus Interface) driver")
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:45 +08:00
Peter Zijlstra
b2b0f6ffd3 orangefs: Use RCU for destroy_inode
commit 0695d7dc1d9f19b82ec2cae24856bddce278cfe6 upstream.

freeing of inodes must be RCU-delayed on all filesystems

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:44 +08:00
Eric W. Biederman
d3381fab77 fs: Better permission checking for submounts
commit 93faccbbfa958a9668d3ab4e30f38dd205cee8d8 upstream.

To support unprivileged users mounting filesystems two permission
checks have to be performed: a test to see if the user allowed to
create a mount in the mount namespace, and a test to see if
the user is allowed to access the specified filesystem.

The automount case is special in that mounting the original filesystem
grants permission to mount the sub-filesystems, to any user who
happens to stumble across the their mountpoint and satisfies the
ordinary filesystem permission checks.

Attempting to handle the automount case by using override_creds
almost works.  It preserves the idea that permission to mount
the original filesystem is permission to mount the sub-filesystem.
Unfortunately using override_creds messes up the filesystems
ordinary permission checks.

Solve this by being explicit that a mount is a submount by introducing
vfs_submount, and using it where appropriate.

vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let
sget and friends know that a mount is a submount so they can take appropriate
action.

sget and sget_userns are modified to not perform any permission checks
on submounts.

follow_automount is modified to stop using override_creds as that
has proven problemantic.

do_mount is modified to always remove the new MS_SUBMOUNT flag so
that we know userspace will never by able to specify it.

autofs4 is modified to stop using current_real_cred that was put in
there to handle the previous version of submount permission checking.

cifs is modified to pass the mountpoint all of the way down to vfs_submount.

debugfs is modified to pass the mountpoint all of the way down to
trace_automount by adding a new parameter.  To make this change easier
a new typedef debugfs_automount_t is introduced to capture the type of
the debugfs automount function.

Fixes: 069d5ac9ae0d ("autofs:  Fix automounts by using current_real_cred()->uid")
Fixes: aeaa4a79ff6a ("fs: Call d_automount with the filesystems creds")
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:44 +08:00
Bart Van Assche
48e2181b0b IB/srp: Fix race conditions related to task management
commit 0a6fdbdeb1c25e31763c1fb333fa2723a7d2aba6 upstream.

Avoid that srp_process_rsp() overwrites the status information
in ch if the SRP target response timed out and processing of
another task management function has already started. Avoid that
issuing multiple task management functions concurrently triggers
list corruption. This patch prevents that the following stack
trace appears in the system log:

WARNING: CPU: 8 PID: 9269 at lib/list_debug.c:52 __list_del_entry_valid+0xbc/0xc0
list_del corruption. prev->next should be ffffc90004bb7b00, but was ffff8804052ecc68
CPU: 8 PID: 9269 Comm: sg_reset Tainted: G        W       4.10.0-rc7-dbg+ #3
Call Trace:
 dump_stack+0x68/0x93
 __warn+0xc6/0xe0
 warn_slowpath_fmt+0x4a/0x50
 __list_del_entry_valid+0xbc/0xc0
 wait_for_completion_timeout+0x12e/0x170
 srp_send_tsk_mgmt+0x1ef/0x2d0 [ib_srp]
 srp_reset_device+0x5b/0x110 [ib_srp]
 scsi_ioctl_reset+0x1c7/0x290
 scsi_ioctl+0x12a/0x420
 sd_ioctl+0x9d/0x100
 blkdev_ioctl+0x51e/0x9f0
 block_ioctl+0x38/0x40
 do_vfs_ioctl+0x8f/0x700
 SyS_ioctl+0x3c/0x70
 entry_SYSCALL_64_fastpath+0x18/0xad

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Steve Feeley <Steve.Feeley@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:44 +08:00
Bart Van Assche
d5d1d2cc4b IB/srp: Avoid that duplicate responses trigger a kernel bug
commit 6cb72bc1b40bb2c1750ee7a5ebade93bed49a5fb upstream.

After srp_process_rsp() returns there is a short time during which
the scsi_host_find_tag() call will return a pointer to the SCSI
command that is being completed. If during that time a duplicate
response is received, avoid that the following call stack appears:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: srp_recv_done+0x450/0x6b0 [ib_srp]
Oops: 0000 [#1] SMP
CPU: 10 PID: 0 Comm: swapper/10 Not tainted 4.10.0-rc7-dbg+ #1
Call Trace:
 <IRQ>
 __ib_process_cq+0x4b/0xd0 [ib_core]
 ib_poll_handler+0x1d/0x70 [ib_core]
 irq_poll_softirq+0xba/0x120
 __do_softirq+0xba/0x4c0
 irq_exit+0xbe/0xd0
 smp_apic_timer_interrupt+0x38/0x50
 apic_timer_interrupt+0x90/0xa0
 </IRQ>
RIP: srp_recv_done+0x450/0x6b0 [ib_srp] RSP: ffff88046f483e20

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Steve Feeley <Steve.Feeley@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:44 +08:00
Bart Van Assche
516a12ab11 IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS
commit d6c58dc40fec35ff6cdb350b53bce0fcf9143709 upstream.

Tests have shown that the following error message is reported when
using SG-GAPS registration with an mlx5 adapter:

scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bd4270eb0
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 0f007806 2500002a ad9fafd1
scsi host1: ib_srp: reconnect succeeded
mlx5_0:dump_cqe:262:(pid 7369): dump error cqe
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 0f007806 25000032 00105dd0
scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff880b92860138

Hence avoid using SG-GAPS memory registrations. Additionally,
always configure the blk_queue_virt_boundary() to avoid to trigger
a mapping failure when using adapters that support SG-GAPS (e.g.
mlx5).

Fixes: commit ad8e66b4a801 ("IB/srp: fix mr allocation when the device supports sg gaps")
Fixes: commit 509c5f33f4f6 ("IB/srp: Prevent mapping failures")
Reported-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mark Bloch <markb@mellanox.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:44 +08:00
Leon Romanovsky
04f16db056 IB/mlx5: Fix out-of-bound access
commit 0fd27a88c2e4f548937fd7d93fc6e65c4ad7c278 upstream.

When we initialize buffer to create SRQ in kernel,
the number of pages was less than actually used in
following mlx5_fill_page_array().

Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:43 +08:00
Erez Shitrit
2e539fa49e IB/IPoIB: Add destination address when re-queue packet
commit 2b0841766a898aba84630fb723989a77a9d3b4e6 upstream.

When sending packet to destination that was not resolved yet
via path query, the driver keeps the skb and tries to re-send it
again when the path is resolved.

But when re-sending via dev_queue_xmit the kernel doesn't call
to dev_hard_header, so IPoIB needs to keep 20 bytes in the skb
and to put the destination address inside them.

In that way the dev_start_xmit will have the correct destination,
and the driver won't take the destination from the skb->data, while
nothing exists there, which causes to packet be be dropped.

The test flow is:
1. Run the SM on remote node,
2. Restart the driver.
4. Ping some destination,
3. Observe that first ICMP request will be dropped.

Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Tested-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:43 +08:00
Feras Daoud
1626076b8e IB/ipoib: Fix deadlock between rmmod and set_mode
commit 0a0007f28304cb9fc87809c86abb80ec71317f20 upstream.

When calling set_mode from sys/fs, the call flow locks the sys/fs lock
first and then tries to lock rtnl_lock (when calling ipoib_set_mod).
On the other hand, the rmmod call flow takes the rtnl_lock first
(when calling unregister_netdev) and then tries to take the sys/fs
lock. Deadlock a->b, b->a.

The problem starts when ipoib_set_mod frees it's rtnl_lck and tries
to get it after that.

    set_mod:
    [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90
    [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180
    [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20
    [<ffffffff814fed2b>] mutex_lock+0x2b/0x50
    [<ffffffff81448675>] rtnl_lock+0x15/0x20
    [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib]
    [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib]
    [<ffffffff8134b840>] dev_attr_store+0x20/0x30
    [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170
    [<ffffffff8117b068>] vfs_write+0xb8/0x1a0
    [<ffffffff8117ba81>] sys_write+0x51/0x90
    [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

    rmmod:
    [<ffffffff81279ffc>] ? put_dec+0x10c/0x110
    [<ffffffff8127a2ee>] ? number+0x2ee/0x320
    [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0
    [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0
    [<ffffffff8127b550>] ? string+0x40/0x100
    [<ffffffff814fe323>] wait_for_common+0x123/0x180
    [<ffffffff81060250>] ? default_wake_function+0x0/0x20
    [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0
    [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20
    [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270
    [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0
    [<ffffffff81273f66>] kobject_del+0x16/0x40
    [<ffffffff8134cd14>] device_del+0x184/0x1e0
    [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0
    [<ffffffff8143c05e>] rollback_registered+0xae/0x130
    [<ffffffff8143c102>] unregister_netdevice+0x22/0x70
    [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30
    [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib]
    [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core]
    [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib]
    [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core]

Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:43 +08:00
Eric W. Biederman
808e83e5ad mnt: Tuck mounts under others instead of creating shadow/side mounts.
commit 1064f874abc0d05eeed8993815f584d847b72486 upstream.

Ever since mount propagation was introduced in cases where a mount in
propagated to parent mount mountpoint pair that is already in use the
code has placed the new mount behind the old mount in the mount hash
table.

This implementation detail is problematic as it allows creating
arbitrary length mount hash chains.

Furthermore it invalidates the constraint maintained elsewhere in the
mount code that a parent mount and a mountpoint pair will have exactly
one mount upon them.  Making it hard to deal with and to talk about
this special case in the mount code.

Modify mount propagation to notice when there is already a mount at
the parent mount and mountpoint where a new mount is propagating to
and place that preexisting mount on top of the new mount.

Modify unmount propagation to notice when a mount that is being
unmounted has another mount on top of it (and no other children), and
to replace the unmounted mount with the mount on top of it.

Move the MNT_UMUONT test from __lookup_mnt_last into
__propagate_umount as that is the only call of __lookup_mnt_last where
MNT_UMOUNT may be set on any mount visible in the mount hash table.

These modifications allow:
 - __lookup_mnt_last to be removed.
 - attach_shadows to be renamed __attach_mnt and its shadow
   handling to be removed.
 - commit_tree to be simplified
 - copy_tree to be simplified

The result is an easier to understand tree of mounts that does not
allow creation of arbitrary length hash chains in the mount hash table.

The result is also a very slight userspace visible difference in semantics.
The following two cases now behave identically, where before order
mattered:

case 1: (explicit user action)
	B is a slave of A
	mount something on A/a , it will propagate to B/a
	and than mount something on B/a

case 2: (tucked mount)
	B is a slave of A
	mount something on B/a
	and than mount something on A/a

Histroically umount A/a would fail in case 1 and succeed in case 2.
Now umount A/a succeeds in both configurations.

This very small change in semantics appears if anything to be a bug
fix to me and my survey of userspace leads me to believe that no programs
will notice or care of this subtle semantic change.

v2: Updated to mnt_change_mountpoint to not call dput or mntput
and instead to decrement the counts directly.  It is guaranteed
that there will be other references when mnt_change_mountpoint is
called so this is safe.

v3: Moved put_mountpoint under mount_lock in attach_recursive_mnt
    As the locking in fs/namespace.c changed between v2 and v3.

v4: Reworked the logic in propagate_mount_busy and __propagate_umount
    that detects when a mount completely covers another mount.

v5: Removed unnecessary tests whose result is alwasy true in
    find_topper and attach_recursive_mnt.

v6: Document the user space visible semantic difference.

Fixes: b90fa9ae8f51 ("[PATCH] shared mount handling: bind and rbind")
Tested-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:43 +08:00
Gavin Li
c9b3f3173f brcmfmac: fix incorrect event channel deduction
commit 8e290cecdd0178f3d4cf7d463c51dc7e462843b4 upstream.

brcmf_sdio_fromevntchan() was being called on the the data frame
rather than the software header, causing some frames to be
mischaracterized as on the event channel rather than the data channel.

This fixes a major performance regression (due to dropped packets). With
this patch the download speed jumped from 1Mbit/s back up to 40MBit/s due
to the sheer amount of packets being incorrectly processed.

Fixes: c56caa9db8ab ("brcmfmac: screening firmware event packet")
Signed-off-by: Gavin Li <git@thegavinli.com>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
[kvalo@codeaurora.org: improve commit logs based on email discussion]
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:43 +08:00
Andrew Donnellan
53d43706f2 cxl: fix nested locking hang during EEH hotplug
commit 171ed0fcd8966d82c45376f1434678e7b9d4d9b1 upstream.

Commit 14a3ae34bfd0 ("cxl: Prevent read/write to AFU config space while AFU
not configured") introduced a rwsem to fix an invalid memory access that
occurred when someone attempts to access the config space of an AFU on a
vPHB whilst the AFU is deconfigured, such as during EEH recovery.

It turns out that it's possible to run into a nested locking issue when EEH
recovery fails and a full device hotplug is required.
cxl_pci_error_detected() deconfigures the AFU, taking a writer lock on
configured_rwsem. When EEH recovery fails, the EEH code calls
pci_hp_remove_devices() to remove the device, which in turn calls
cxl_remove() -> cxl_pci_remove_afu() -> pci_deconfigure_afu(), which tries
to grab the writer lock that's already held.

Standard rwsem semantics don't express what we really want to do here and
don't allow for nested locking. Fix this by replacing the rwsem with an
atomic_t which we can control more finely. Allow the AFU to be locked
multiple times so long as there are no readers.

Fixes: 14a3ae34bfd0 ("cxl: Prevent read/write to AFU config space while AFU not configured")
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:42 +08:00
Andrew Donnellan
411d0b0ced cxl: Prevent read/write to AFU config space while AFU not configured
commit 14a3ae34bfd0bcb1cc12d55b06a8584c11fac6fc upstream.

During EEH recovery, we deconfigure all AFUs whilst leaving the
corresponding vPHB and virtual PCI device in place.

If something attempts to interact with the AFU's PCI config space (e.g.
running lspci) after the AFU has been deconfigured and before it's
reconfigured, cxl_pcie_{read,write}_config() will read invalid values from
the deconfigured struct cxl_afu and proceed to Oops when they try to
dereference pointers that have been set to NULL during deconfiguration.

Add a rwsem to struct cxl_afu so we can prevent interaction with config
space while the AFU is deconfigured.

Reported-by: Pradipta Ghosh <pradghos@in.ibm.com>
Suggested-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:42 +08:00
Thomas Petazzoni
60037aa689 net: mvpp2: fix DMA address calculation in mvpp2_txq_inc_put()
commit 239a3b663647869330955ec59caac0100ef9b60a upstream.

When TX descriptors are filled in, the buffer DMA address is split
between the tx_desc->buf_phys_addr field (high-order bits) and
tx_desc->packet_offset field (5 low-order bits).

However, when we re-calculate the DMA address from the TX descriptor in
mvpp2_txq_inc_put(), we do not take tx_desc->packet_offset into
account. This means that when the DMA address is not aligned on a 32
bytes boundary, we end up calling dma_unmap_single() with a DMA address
that was not the one returned by dma_map_single().

This inconsistency is detected by the kernel when DMA_API_DEBUG is
enabled. We fix this problem by properly calculating the DMA address in
mvpp2_txq_inc_put().

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:42 +08:00
Heiko Carstens
e067f68db2 s390: use correct input data address for setup_randomness
commit 4920e3cf77347d7d7373552d4839e8d832321313 upstream.

The current implementation of setup_randomness uses the stack address
and therefore the pointer to the SYSIB 3.2.2 block as input data
address. Furthermore the length of the input data is the number of
virtual-machine description blocks which is typically one.

This means that typically a single zero byte is fed to
add_device_randomness.

Fix both of these and use the address of the first virtual machine
description block as input data address and also use the correct
length.

Fixes: bcfcbb6bae64 ("s390: add system information as device randomness")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:42 +08:00
Heiko Carstens
321081d522 s390: make setup_randomness work
commit da8fd820f389a0e29080b14c61bf5cf1d8ef5ca1 upstream.

Commit bcfcbb6bae64 ("s390: add system information as device
randomness") intended to add some virtual machine specific information
to the randomness pool.

Unfortunately it uses the page allocator before it is ready to use. In
result the page allocator always returns NULL and the setup_randomness
function never adds anything to the randomness pool.

To fix this use memblock_alloc and memblock_free instead.

Fixes: bcfcbb6bae64 ("s390: add system information as device randomness")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:42 +08:00
Martin Schwidefsky
9d38fd6a4f s390: TASK_SIZE for kernel threads
commit fb94a687d96c570d46332a4a890f1dcb7310e643 upstream.

Return a sensible value if TASK_SIZE if called from a kernel thread.

This gets us around an issue with copy_mount_options that does a magic
size calculation "TASK_SIZE - (unsigned long)data" while in a kernel
thread and data pointing to kernel space.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:41 +08:00
Peter Oberparleiter
dc31841fcd s390/chsc: Add exception handler for CHSC instruction
commit 77759137248f34864a8f7a58bbcebfcf1047504a upstream.

Prevent kernel crashes due to unhandled exceptions raised by the CHSC
instruction which may for example be triggered by invalid ioctl data.

Fixes: 64150adf89df ("s390/cio: Introduce generic synchronous CHSC IOCTL")
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:41 +08:00
Michael Holzheu
91cfcaa6ed s390/kdump: Use "LINUX" ELF note name instead of "CORE"
commit a4a81d8eebdc1d209d034f62a082a5131e4242b5 upstream.

In binutils/libbfd (bfd/elf.c) it is enforced that all s390 specific ELF
notes like e.g. NT_S390_PREFIX or NT_S390_CTRS have "LINUX" specified
as note name. Otherwise the notes are ignored.

For /proc/vmcore we currently use "CORE" for these notes.

Up to now this has not been a real problem because the dump analysis tool
"crash" does not check the note name. But it will break all programs that
use libbfd for processing ELF notes.

So fix this and use "LINUX" for all s390 specific notes to comply with
libbfd.

Reported-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:41 +08:00
Gerald Schaefer
b848102542 s390/dcssblk: fix device size calculation in dcssblk_direct_access()
commit a63f53e34db8b49675448d03ae324f6c5bc04fe6 upstream.

Since commit dd22f551 "block: Change direct_access calling convention",
the device size calculation in dcssblk_direct_access() is off-by-one.
This results in bdev_direct_access() always returning -ENXIO because the
returned value is not page aligned.

Fix this by adding 1 to the dev_sz calculation.

Fixes: dd22f551 ("block: Change direct_access calling convention")
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:41 +08:00
Julian Wiedmann
5cec5e32ba s390/qdio: clear DSCI prior to scanning multiple input queues
commit 1e4a382fdc0ba8d1a85b758c0811de3a3631085e upstream.

For devices with multiple input queues, tiqdio_call_inq_handlers()
iterates over all input queues and clears the device's DSCI
during each iteration. If the DSCI is re-armed during one
of the later iterations, we therefore do not scan the previous
queues again.
The re-arming also raises a new adapter interrupt. But its
handler does not trigger a rescan for the device, as the DSCI
has already been erroneously cleared.
This can result in queue stalls on devices with multiple
input queues.

Fix it by clearing the DSCI just once, prior to scanning the queues.

As the code is moved in front of the loop, we also need to access
the DSCI directly (ie irq->dsci) instead of going via each queue's
parent pointer to the same irq. This is not a functional change,
and a follow-up patch will clean up the other users.

In practice, this bug only affects CQ-enabled HiperSockets devices,
ie. devices with sysfs-attribute "hsuid" set. Setting a hsuid is
needed for AF_IUCV socket applications that use HiperSockets
communication.

Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks")
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:41 +08:00
Dmitry Tunin
519b6cead2 Bluetooth: Add another AR3012 04ca:3018 device
commit 441ad62d6c3f131f1dbd7dcdd9cbe3f74dbd8501 upstream.

T:  Bus=01 Lev=01 Prnt=01 Port=07 Cnt=04 Dev#=  5 Spd=12  MxCh= 0
D:  Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=04ca ProdID=3018 Rev=00.01
C:  #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
I:  If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb

Signed-off-by: Dmitry Tunin <hanipouspilot@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:40 +08:00
Chao Peng
7c3bab189c KVM: VMX: use correct vmcs_read/write for guest segment selector/base
commit 96794e4ed4d758272c486e1529e431efb7045265 upstream.

Guest segment selector is 16 bit field and guest segment base is natural
width field. Fix two incorrect invocations accordingly.

Without this patch, build fails when aggressive inlining is used with ICC.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:40 +08:00
Janosch Frank
035dcc8e87 KVM: s390: Disable dirty log retrieval for UCONTROL guests
commit e1e8a9624f7ba8ead4f056ff558ed070e86fa747 upstream.

User controlled KVM guests do not support the dirty log, as they have
no single gmap that we can check for changes.

As they have no single gmap, kvm->arch.gmap is NULL and all further
referencing to it for dirty checking will result in a NULL
dereference.

Let's return -EINVAL if a caller tries to sync dirty logs for a
UCONTROL guest.

Fixes: 15f36eb ("KVM: s390: Add proper dirty bitmap support to S390 kvm.")
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:40 +08:00
Ian Abbott
c4c590be49 serial: 8250_pci: Add MKS Tenta SCOM-0800 and SCOM-0801 cards
commit 1c9c858e2ff8ae8024a3d75d2ed080063af43754 upstream.

The MKS Instruments SCOM-0800 and SCOM-0801 cards (originally by Tenta
Technologies) are 3U CompactPCI serial cards with 4 and 8 serial ports,
respectively.  The first 4 ports are implemented by an OX16PCI954 chip,
and the second 4 ports are implemented by an OX16C954 chip on a local
bus, bridged by the second PCI function of the OX16PCI954.  The ports
are jumper-selectable as RS-232 and RS-422/485, and the UARTs use a
non-standard oscillator frequency of 20 MHz (base_baud = 1250000).

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:40 +08:00
Alexander Popov
e5b9778761 tty: n_hdlc: get rid of racy n_hdlc.tbuf
commit 82f2341c94d270421f383641b7cd670e474db56b upstream.

Currently N_HDLC line discipline uses a self-made singly linked list for
data buffers and has n_hdlc.tbuf pointer for buffer retransmitting after
an error.

The commit be10eb7589337e5defbe214dae038a53dd21add8
("tty: n_hdlc add buffer flushing") introduced racy access to n_hdlc.tbuf.
After tx error concurrent flush_tx_queue() and n_hdlc_send_frames() can put
one data buffer to tx_free_buf_list twice. That causes double free in
n_hdlc_release().

Let's use standard kernel linked list and get rid of n_hdlc.tbuf:
in case of tx error put current data buffer after the head of tx_buf_list.

Signed-off-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 10:02:40 +08:00
Greg Kroah-Hartman
d379ab2707 Linux 4.9.14 2017-03-12 06:42:15 +01:00
Florian Westphal
371d0342a3 netfilter: conntrack: refine gc worker heuristics, redux
commit e5072053b09642b8ff417d47da05b84720aea3ee upstream.

This further refines the changes made to conntrack gc_worker in
commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics").

The main idea of that change was to reduce the scan interval when evictions
take place.

However, on the reporters' setup, there are 1-2 million conntrack entries
in total and roughly 8k new (and closing) connections per second.

In this case we'll always evict at least one entry per gc cycle and scan
interval is always at 1 jiffy because of this test:

 } else if (expired_count) {
     gc_work->next_gc_run /= 2U;
     next_run = msecs_to_jiffies(1);

being true almost all the time.

Given we scan ~10k entries per run its clearly wrong to reduce interval
based on nonzero eviction count, it will only waste cpu cycles since a vast
majorities of conntracks are not timed out.

Thus only look at the ratio (scanned entries vs. evicted entries) to make
a decision on whether to reduce or not.

Because evictor is supposed to only kick in when system turns idle after
a busy period, pick a high ratio -- this makes it 50%.  We thus keep
the idea of increasing scan rate when its likely that table contains many
expired entries.

In order to not let timed-out entries hang around for too long
(important when using event logging, in which case we want to timely
destroy events), we now scan the full table within at most
GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all
timed-out entries sit in same slot.

I tested this with a vm under synflood (with
sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3).

While flood is ongoing, interval now stays at its max rate
(GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms).

With feedback from Nicolas Dichtel.

Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries")
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:53 +01:00
Florian Westphal
5f7ff59d06 netfilter: conntrack: remove GC_MAX_EVICTS break
commit 524b698db06b9b6da7192e749f637904e2f62d7b upstream.

Instead of breaking loop and instant resched, don't bother checking
this in first place (the loop calls cond_resched for every bucket anyway).

Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:53 +01:00
Yan, Zheng
dc8470f3c8 ceph: update readpages osd request according to size of pages
commit d641df819db8b80198fd85d9de91137e8a823b07 upstream.

add_to_page_cache_lru() can fails, so the actual pages to read
can be smaller than the initial size of osd request. We need to
update osd request size in that case.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:53 +01:00
James Smart
27ab5414b9 scsi: lpfc: Correct WQ creation for pagesize
commit 8ea73db486cda442f0671f4bc9c03a76be398a28 upstream.

Correct WQ creation for pagesize

The driver was calculating the adapter command pagesize indicator from
the system pagesize. However, the buffers the driver allocates are only
one size (SLI4_PAGE_SIZE), so no calculation was necessary.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:53 +01:00
Ralf Baechle
aae02d1aaf MIPS: IP22: Fix build error due to binutils 2.25 uselessnes.
commit ae2f5e5ed04a17c1aa1f0a3714c725e12c21d2a9 upstream.

Fix the following build error with binutils 2.25.

  CC      arch/mips/mm/sc-ip22.o
{standard input}: Assembler messages:
{standard input}:132: Error: number (0x9000000080000000) larger than 32 bits
{standard input}:159: Error: number (0x9000000080000000) larger than 32 bits
{standard input}:200: Error: number (0x9000000080000000) larger than 32 bits
scripts/Makefile.build:293: recipe for target 'arch/mips/mm/sc-ip22.o' failed
make[1]: *** [arch/mips/mm/sc-ip22.o] Error 1

MIPS has used .set mips3 to temporarily switch the assembler to 64 bit
mode in 64 bit kernels virtually forever.  Binutils 2.25 broke this
behavious partially by happily accepting 64 bit instructions in .set mips3
mode but puking on 64 bit constants when generating 32 bit ELF.  Binutils
2.26 restored the old behaviour again.

Fix build with binutils 2.25 by open coding the offending

	dli $1, 0x9000000080000000

as

	li	$1, 0x9000
	dsll	$1, $1, 48

which is ugly be the only thing that will build on all binutils vintages.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:53 +01:00
Ralf Baechle
8a2307c7c0 MIPS: IP22: Reformat inline assembler code to modern standards.
commit f9f1c8db1c37253805eaa32265e1e1af3ae7d0a4 upstream.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:53 +01:00
Aneesh Kumar K.V
075be78c83 powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU
commit fda2d27db6eae5c2468f9e4657539b72bbc238bb upstream.

We will set LPCR with correct value for radix during int. This make sure we
start with a sanitized value of LPCR. In case of kexec, cpus can have LPCR
value based on the previous translation mode we were running.

Fixes: fe036a0605d60 ("powerpc/64/kexec: Fix MMU cleanup on radix")
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:53 +01:00
Aneesh Kumar K.V
3552f91715 powerpc/mm: Add MMU_FTR_KERNEL_RO to possible feature mask
commit a5ecdad4847897007399d7a14c9109b65ce4c9b7 upstream.

Without this we will always find the feature disabled.

Fixes: 984d7a1ec6 ("powerpc/mm: Fixup kernel read only mapping")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:53 +01:00
Ravi Bangoria
fccb22e7d5 powerpc/xmon: Fix data-breakpoint
commit c21a493a2b44650707d06741601894329486f2ad upstream.

Currently xmon data-breakpoint feature is broken.

Whenever there is a watchpoint match occurs, hw_breakpoint_handler will
be called by do_break via notifier chains mechanism. If watchpoint is
registered by xmon, hw_breakpoint_handler won't find any associated
perf_event and returns immediately with NOTIFY_STOP. Similarly, do_break
also returns without notifying to xmon.

Solve this by returning NOTIFY_DONE when hw_breakpoint_handler does not
find any perf_event associated with matched watchpoint, rather than
NOTIFY_STOP, which tells the core code to continue calling the other
breakpoint handlers including the xmon one.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:53 +01:00
Chuck Lever
86840a6305 xprtrdma: Reduce required number of send SGEs
commit 16f906d66cd76fb9895cbc628f447532a7ac1faa upstream.

The MAX_SEND_SGES check introduced in commit 655fec6987be
("xprtrdma: Use gathered Send for large inline messages") fails
for devices that have a small max_sge.

Instead of checking for a large fixed maximum number of SGEs,
check for a minimum small number. RPC-over-RDMA will switch to
using a Read chunk if an xdr_buf has more pages than can fit in
the device's max_sge limit. This is considerably better than
failing all together to mount the server.

This fix supports devices that have as few as three send SGEs
available.

Reported-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reported-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reported-by: Honggang Li <honli@redhat.com>
Reported-by: Ram Amrani <Ram.Amrani@cavium.com>
Fixes: 655fec6987be ("xprtrdma: Use gathered Send for large ...")
Tested-by: Honggang Li <honli@redhat.com>
Tested-by: Ram Amrani <Ram.Amrani@cavium.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:52 +01:00
Chuck Lever
73eea1c400 xprtrdma: Disable pad optimization by default
commit c95a3c6b88658bcb8f77f85f31a0b9d9036e8016 upstream.

Commit d5440e27d3e5 ("xprtrdma: Enable pad optimization") made the
Linux client omit XDR round-up padding in normal Read and Write
chunks so that the client doesn't have to register and invalidate
3-byte memory regions that contain no real data.

Unfortunately, my cheery 2014 assessment that this optimization "is
supported now by both Linux and Solaris servers" was premature.
We've found bugs in Solaris in this area since commit d5440e27d3e5
("xprtrdma: Enable pad optimization") was merged (SYMLINK is the
main offender).

So for maximum interoperability, I'm disabling this optimization
again. If a CM private message is exchanged when connecting, the
client recognizes that the server is Linux, and enables the
optimization for that connection.

Until now the Solaris server bugs did not impact common operations,
and were thus largely benign. Soon, less capable devices on Linux
NFS/RDMA clients will make use of Read chunks more often, and these
Solaris bugs will prevent interoperation in more cases.

Fixes: 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:52 +01:00
Chuck Lever
fab6c2caa4 xprtrdma: Per-connection pad optimization
commit b5f0afbea4f2ea52c613ac2b06cb6de2ea18cb6d upstream.

Pad optimization is changed by echoing into
/proc/sys/sunrpc/rdma_pad_optimize. This is a global setting,
affecting all RPC-over-RDMA connections to all servers.

The marshaling code picks up that value and uses it for decisions
about how to construct each RPC-over-RDMA frame. Having it change
suddenly in mid-operation can result in unexpected failures. And
some servers a client mounts might need chunk round-up, while
others don't.

So instead, copy the pad_optimize setting into each connection's
rpcrdma_ia when the transport is created, and use the copy, which
can't change during the life of the connection, instead.

This also removes a hack: rpcrdma_convert_iovs was using
the remote-invalidation-expected flag to predict when it could leave
out Write chunk padding. This is because the Linux server handles
implicit XDR padding on Write chunks correctly, and only Linux
servers can set the connection's remote-invalidation-expected flag.

It's more sensible to use the pad optimization setting instead.

Fixes: 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:52 +01:00
Chuck Lever
ec3bc2c5ed xprtrdma: Fix Read chunk padding
commit 24abdf1be15c478e2821d6fc903a4a4440beff02 upstream.

When pad optimization is disabled, rpcrdma_convert_iovs still
does not add explicit XDR round-up padding to a Read chunk.

Commit 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling")
incorrectly short-circuited the test for whether round-up padding
is needed that appears later in rpcrdma_convert_iovs.

However, if this is indeed a regular Read chunk (and not a
Position-Zero Read chunk), the tail iovec _always_ contains the
chunk's padding, and never anything else.

So, it's easy to just skip the tail when padding optimization is
enabled, and add the tail in a subsequent Read chunk segment, if
disabled.

Fixes: 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:52 +01:00
Magnus Lilja
788d81d4e5 dmaengine: ipu: Make sure the interrupt routine checks all interrupts.
commit adee40b265d7568296e218f079f478197ffa15bf upstream.

Commit 3d8cc00073d6 ("dmaengine: ipu: Consolidate duplicated irq handlers")
consolidated the two interrupts routines into one, but the remaining
interrupt routine only checks the status of the error interrupts, not the
normal interrupts.

This patch fixes that problem (tested on i.MX31 PDK board).

Fixes: 3d8cc00073d6 ("dmaengine: ipu: Consolidate duplicated irq handlers")
Cc: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Magnus Lilja <lilja.magnus@gmail.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:52 +01:00
Mark Marshall
9d82393e65 mtd: nand: ifc: Fix location of eccstat registers for IFC V1.0
commit 656441478ed55d960df5f3ccdf5a0f8c61dfd0b3 upstream.

The commit 7a654172161c ("mtd/ifc: Add support for IFC controller
version 2.0") added support for version 2.0 of the IFC controller.
The version 2.0 controller has the ECC status registers at a different
location to the previous versions.

Correct the fsl_ifc_nand structure so that the ECC status can be read
from the correct location for both version 1.0 and 2.0 of the controller.

Fixes: 7a654172161c ("mtd/ifc: Add support for IFC controller version 2.0")
Signed-off-by: Mark Marshall <mark.marshall@omicronenergy.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:52 +01:00
Rafał Miłecki
178d07a0b8 bcma: use (get|put)_device when probing/removing device driver
commit a971df0b9d04674e325346c17de9a895425ca5e1 upstream.

This allows tracking device state and e.g. makes devm work as expected.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:41:52 +01:00