linux

iv/linux

Author	SHA1	Message	Date
Don Zickus	b94f51183b	kernel/watchdog: prevent false hardlockup on overloaded system On an overloaded system, it is possible that a change in the watchdog threshold can be delayed long enough to trigger a false positive. This can easily be achieved by having a cpu spinning indefinitely on a task, while another cpu updates watchdog threshold. What happens is while trying to park the watchdog threads, the hrtimers on the other cpus trigger and reprogram themselves with the new slower watchdog threshold. Meanwhile, the nmi watchdog is still programmed with the old faster threshold. Because the one cpu is blocked, it prevents the thread parking on the other cpus from completing, which is needed to shutdown the nmi watchdog and reprogram it correctly. As a result, a false positive from the nmi watchdog is reported. Fix this by setting a park_in_progress flag to block all lockups until the parking is complete. Fix provided by Ulrich Obergfell. [akpm@linux-foundation.org: s/park_in_progress/watchdog_park_in_progress/] Link: http://lkml.kernel.org/r/1481041033-192236-1-git-send-email-dzickus@redhat.com Signed-off-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-01-24 16:26:14 -08:00
Ross Zwisler	6affb9d7b1	dax: fix build warnings with FS_DAX and !FS_IOMAP As reported by Arnd: https://lkml.org/lkml/2017/1/10/756 Compiling with the following configuration: # CONFIG_EXT2_FS is not set # CONFIG_EXT4_FS is not set # CONFIG_XFS_FS is not set # CONFIG_FS_IOMAP depends on the above filesystems, as is not set CONFIG_FS_DAX=y generates build warnings about unused functions in fs/dax.c: fs/dax.c:878:12: warning: `dax_insert_mapping' defined but not used [-Wunused-function] static int dax_insert_mapping(struct address_space mapping, ^~~~~~~~~~~~~~~~~~ fs/dax.c:572:12: warning: `copy_user_dax' defined but not used [-Wunused-function] static int copy_user_dax(struct block_device bdev, sector_t sector, size_t size, ^~~~~~~~~~~~~ fs/dax.c:542:12: warning: `dax_load_hole' defined but not used [-Wunused-function] static int dax_load_hole(struct address_space mapping, void entry, ^~~~~~~~~~~~~ fs/dax.c:312:14: warning: `grab_mapping_entry' defined but not used [-Wunused-function] static void grab_mapping_entry(struct address_space *mapping, pgoff_t index, ^~~~~~~~~~~~~~~~~~ Now that the struct buffer_head based DAX fault paths and I/O path have been removed we really depend on iomap support being present for DAX. Make this explicit by selecting FS_IOMAP if we compile in DAX support. This allows us to remove conditional selections of FS_IOMAP when FS_DAX was present for ext2 and ext4, and to remove an #ifdef in fs/dax.c. Link: http://lkml.kernel.org/r/1484087383-29478-1-git-send-email-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-01-24 16:26:14 -08:00
Keno Fischer	8310d48b12	mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp In commit `19be0eaffa` ("mm: remove gup_flags FOLL_WRITE games from __get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE after a COW was resolved to setting the (newly introduced) FOLL_COW instead. Simultaneously, the check in gup.c was updated to still allow writes with FOLL_FORCE set if FOLL_COW had also been set. However, a similar check in huge_memory.c was forgotten. As a result, remote memory writes to ro regions of memory backed by transparent huge pages cause an infinite loop in the kernel (handle_mm_fault sets FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails out immidiately because `(flags & FOLL_WRITE) && !pmd_write(pmd)` is true. While in this state the process is stil SIGKILLable, but little else works (e.g. no ptrace attach, no other signals). This is easily reproduced with the following code (assuming thp are set to always): #include <assert.h> #include <fcntl.h> #include <stdint.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #define TEST_SIZE 5 1024 * 1024 int main(void) { int status; pid_t child; int fd = open("/proc/self/mem", O_RDWR); void addr = mmap(NULL, TEST_SIZE, PROT_READ, MAP_ANONYMOUS \| MAP_PRIVATE, 0, 0); assert(addr != MAP_FAILED); pid_t parent_pid = getpid(); if ((child = fork()) == 0) { void addr2 = mmap(NULL, TEST_SIZE, PROT_READ \| PROT_WRITE, MAP_ANONYMOUS \| MAP_PRIVATE, 0, 0); assert(addr2 != MAP_FAILED); memset(addr2, 'a', TEST_SIZE); pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr); return 0; } assert(child == waitpid(child, &status, 0)); assert(WIFEXITED(status) && WEXITSTATUS(status) == 0); return 0; } Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously to the update in gup.c in the original commit. The same pattern exists in follow_devmap_pmd. However, we should not be able to reach that check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we ever do. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170106015025.GA38411@juliacomputing.com Signed-off-by: Keno Fischer <keno@juliacomputing.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Greg Thelen <gthelen@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-01-24 16:26:14 -08:00
Yasuaki Ishimatsu	8a1f780e7f	memory_hotplug: make zone_can_shift() return a boolean value online_{kernel\|movable} is used to change the memory zone to ZONE_{NORMAL\|MOVABLE} and online the memory. To check that memory zone can be changed, zone_can_shift() is used. Currently the function returns minus integer value, plus integer value and 0. When the function returns minus or plus integer value, it means that the memory zone can be changed to ZONE_{NORNAL\|MOVABLE}. But when the function returns 0, there are two meanings. One of the meanings is that the memory zone does not need to be changed. For example, when memory is in ZONE_NORMAL and onlined by online_kernel the memory zone does not need to be changed. Another meaning is that the memory zone cannot be changed. When memory is in ZONE_NORMAL and onlined by online_movable, the memory zone may not be changed to ZONE_MOVALBE due to memory online limitation(see Documentation/memory-hotplug.txt). In this case, memory must not be onlined. The patch changes the return type of zone_can_shift() so that memory online operation fails when memory zone cannot be changed as follows: Before applying patch: # grep -A 35 "Node 2" /proc/zoneinfo Node 2, zone Normal <snip> node_scanned 0 spanned 8388608 present 7864320 managed 7864320 # echo online_movable > memory4097/state # grep -A 35 "Node 2" /proc/zoneinfo Node 2, zone Normal <snip> node_scanned 0 spanned 8388608 present 8388608 managed 8388608 online_movable operation succeeded. But memory is onlined as ZONE_NORMAL, not ZONE_MOVABLE. After applying patch: # grep -A 35 "Node 2" /proc/zoneinfo Node 2, zone Normal <snip> node_scanned 0 spanned 8388608 present 7864320 managed 7864320 # echo online_movable > memory4097/state bash: echo: write error: Invalid argument # grep -A 35 "Node 2" /proc/zoneinfo Node 2, zone Normal <snip> node_scanned 0 spanned 8388608 present 7864320 managed 7864320 online_movable operation failed because of failure of changing the memory zone from ZONE_NORMAL to ZONE_MOVABLE Fixes: `df429ac039` ("memory-hotplug: more general validation of zone during online") Link: http://lkml.kernel.org/r/2f9c3837-33d7-b6e5-59c0-6ca4372b2d84@gmail.com Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-01-24 16:26:14 -08:00
Will Deacon	c7070619f3	vring: Force use of DMA API for ARM-based systems with legacy devices Booting Linux on an ARM fastmodel containing an SMMU emulation results in an unexpected I/O page fault from the legacy virtio-blk PCI device: [ 1.211721] arm-smmu-v3 2b400000.smmu: event 0x10 received: [ 1.211800] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010 [ 1.211880] arm-smmu-v3 2b400000.smmu: 0x0000020800000000 [ 1.211959] arm-smmu-v3 2b400000.smmu: 0x00000008fa081002 [ 1.212075] arm-smmu-v3 2b400000.smmu: 0x0000000000000000 [ 1.212155] arm-smmu-v3 2b400000.smmu: event 0x10 received: [ 1.212234] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010 [ 1.212314] arm-smmu-v3 2b400000.smmu: 0x0000020800000000 [ 1.212394] arm-smmu-v3 2b400000.smmu: 0x00000008fa081000 [ 1.212471] arm-smmu-v3 2b400000.smmu: 0x0000000000000000 <system hangs failing to read partition table> This is because the legacy virtio-blk device is behind an SMMU, so we have consequently swizzled its DMA ops and configured the SMMU to translate accesses. This then requires the vring code to use the DMA API to establish translations, otherwise all transactions will result in fatal faults and termination. Given that ARM-based systems only see an SMMU if one is really present (the topology is all described by firmware tables such as device-tree or IORT), then we can safely use the DMA API for all legacy virtio devices. Modern devices can advertise the prescense of an IOMMU using the VIRTIO_F_IOMMU_PLATFORM feature flag. Cc: Andy Lutomirski <luto@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: <stable@vger.kernel.org> Fixes: `876945dbf6` ("arm64: Hook up IOMMU dma_ops") Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com>	2017-01-25 00:33:11 +02:00
Robin Murphy	f7f6634d23	virtio_mmio: Set DMA masks appropriately Once DMA API usage is enabled, it becomes apparent that virtio-mmio is inadvertently relying on the default 32-bit DMA mask, which leads to problems like rapidly exhausting SWIOTLB bounce buffers. Ensure that we set the appropriate 64-bit DMA mask whenever possible, with the coherent mask suitably limited for the legacy vring as per `a0be1db430` ("virtio_pci: Limit DMA mask to 44 bits for legacy virtio devices"). Cc: Andy Lutomirski <luto@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Reported-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Fixes: `b42111382f` ("virtio_mmio: Use the DMA API if enabled") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-01-25 00:33:10 +02:00
Stefan Hajnoczi	0516ffd88f	vhost/vsock: handle vhost_vq_init_access() error Propagate the error when vhost_vq_init_access() fails and set vq->private_data to NULL. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-01-25 00:33:10 +02:00
Vineet Gupta	78f824d431	ARCv2: smp-boot: wake_flag polling by non-Masters needs to be uncached This is needed on HS38 cores, for setting up IO-Coherency aperture properly The polling could perturb the caches and coherecy fabric which could be wrong in the small window when Master is setting up IOC aperture etc in arc_cache_init() We do it only for ARCv2 based builds to not affect EZChip ARCompact based platform. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2017-01-24 14:25:19 -08:00
David S. Miller	ec221a17a6	Merge branch 'lwt-module-unload' Robert Shearman says: ==================== net: Fix oops on state free after lwt module unload An oops is seen in lwtstate_free after an lwt ops module has been unloaded. This patchset fixes this by preventing modules implementing lwtunnel ops from being unloaded whilst there's state alive using those ops. The first patch adds fills in a new owner field in all lwt ops and the second patch makes use of this to reference count the modules as state is built and destroyed using them. Changes in v3: - don't put module reference if try_module_get fails on building state Changes in v2: - specify module owner for all modules as suggested by DaveM - reference count all modules building lwt state, not just those ops implementing destroy_state, as also suggested by DaveM. - rebased on top of David Ahern's lwtunnel changes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 16:21:37 -05:00
Robert Shearman	85c814016c	lwtunnel: Fix oops on state free after encap module unload When attempting to free lwtunnel state after the module for the encap has been unloaded an oops occurs: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: lwtstate_free+0x18/0x40 [..] task: ffff88003e372380 task.stack: ffffc900001fc000 RIP: 0010:lwtstate_free+0x18/0x40 RSP: 0018:ffff88003fd83e88 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88002bbb3380 RCX: ffff88000c91a300 [..] Call Trace: <IRQ> free_fib_info_rcu+0x195/0x1a0 ? rt_fibinfo_free+0x50/0x50 rcu_process_callbacks+0x2d3/0x850 ? rcu_process_callbacks+0x296/0x850 __do_softirq+0xe4/0x4cb irq_exit+0xb0/0xc0 smp_apic_timer_interrupt+0x3d/0x50 apic_timer_interrupt+0x93/0xa0 [..] Code: e8 6e c6 fc ff 89 d8 5b 5d c3 bb de ff ff ff eb f4 66 90 66 66 66 66 90 55 48 89 e5 53 0f b7 07 48 89 fb 48 8b 04 c5 00 81 d5 81 <48> 8b 40 08 48 85 c0 74 13 ff d0 48 8d 7b 20 be 20 00 00 00 e8 The problem is after the module for the encap can be unloaded the corresponding ops is removed and is thus NULL here. Modules implementing lwtunnel ops should not be allowed to unload while there is state alive using those ops, so grab the module reference for the ops on creating lwtunnel state and of course release the reference when freeing the state. Fixes: `1104d9ba44` ("lwtunnel: Add destroy state operation") Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 16:21:36 -05:00
Robert Shearman	88ff7334f2	net: Specify the owning module for lwtunnel ops Modules implementing lwtunnel ops should not be allowed to unload while there is state alive using those ops, so specify the owning module for all lwtunnel ops. Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 16:21:36 -05:00
Yonatan Cohen	2d4b21e0a2	IB/rxe: Prevent from completer to operate on non valid QP On UD QP completer tasklet is scheduled for each packet sent. If it is followed by a destroy_qp(), the kernel panic will happen as the completer tries to operate on a destroyed QP. Fixes: `8700e3e7c4` ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 16:17:32 -05:00
Maor Gottlieb	f39f775218	IB/rxe: Fix rxe dev insertion to rxe_dev_list The first argument of list_add_tail is the new item and the second is the head of the list. Fix the code to pass arguments in the right order, otherwise not all the rxe devices will be removed during teardown. Fixes: `8700e3e7c4` ('Soft RoCE driver') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 16:17:25 -05:00
David S. Miller	04d7f1fb7d	Merge branch 'tipc-topology-fixes' Parthasarathy Bhuvaragan says: ==================== tipc: topology server fixes for nametable soft lockup In this series, we revert the commit `333f796235` ("tipc: fix a race condition leading to subscriber refcnt bug") and provide an alternate solution to fix the race conditions in commits 2-4. We have to do this as the above commit introduced a nametbl soft lockup at module exit as described by patch#4. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 16:14:59 -05:00
Parthasarathy Bhuvaragan	35e22e49a5	tipc: fix cleanup at module unload In tipc_server_stop(), we iterate over the connections with limiting factor as server's idr_in_use. We ignore the fact that this variable is decremented in tipc_close_conn(), leading to premature exit. In this commit, we iterate until the we have no connections left. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: John Thompson <thompa.atl@gmail.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 16:14:58 -05:00
Parthasarathy Bhuvaragan	4c887aa65d	tipc: ignore requests when the connection state is not CONNECTED In tipc_conn_sendmsg(), we first queue the request to the outqueue followed by the connection state check. If the connection is not connected, we should not queue this message. In this commit, we reject the messages if the connection state is not CF_CONNECTED. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: John Thompson <thompa.atl@gmail.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 16:14:58 -05:00
Parthasarathy Bhuvaragan	9dc3abdd1f	tipc: fix nametbl_lock soft lockup at module exit Commit `333f796235` ("tipc: fix a race condition leading to subscriber refcnt bug") reveals a soft lockup while acquiring nametbl_lock. Before commit `333f796235`, we call tipc_conn_shutdown() from tipc_close_conn() in the context of tipc_topsrv_stop(). In that context, we are allowed to grab the nametbl_lock. Commit `333f796235`, moved tipc_conn_release (renamed from tipc_conn_shutdown) to the connection refcount cleanup. This allows either tipc_nametbl_withdraw() or tipc_topsrv_stop() to the cleanup. Since tipc_exit_net() first calls tipc_topsrv_stop() and then tipc_nametble_withdraw() increases the chances for the later to perform the connection cleanup. The soft lockup occurs in the call chain of tipc_nametbl_withdraw(), when it performs the tipc_conn_kref_release() as it tries to grab nametbl_lock again while holding it already. tipc_nametbl_withdraw() grabs nametbl_lock tipc_nametbl_remove_publ() tipc_subscrp_report_overlap() tipc_subscrp_send_event() tipc_conn_sendmsg() << if (con->flags != CF_CONNECTED) we do conn_put(), triggering the cleanup as refcount=0. >> tipc_conn_kref_release tipc_sock_release tipc_conn_release tipc_subscrb_delete tipc_subscrp_delete tipc_nametbl_unsubscribe << Soft Lockup >> The previous changes in this series fixes the race conditions fixed by commit `333f796235`. Hence we can now revert the commit. Fixes: `333f796235` ("tipc: fix a race condition leading to subscriber refcnt bug") Reported-and-Tested-by: John Thompson <thompa.atl@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 16:14:58 -05:00
Parthasarathy Bhuvaragan	fc0adfc8fd	tipc: fix connection refcount error Until now, the generic server framework maintains the connection id's per subscriber in server's conn_idr. At tipc_close_conn, we remove the connection id from the server list, but the connection is valid until we call the refcount cleanup. Hence we have a window where the server allocates the same connection to an new subscriber leading to inconsistent reference count. We have another refcount warning we grab the refcount in tipc_conn_lookup() for connections with flag with CF_CONNECTED not set. This usually occurs at shutdown when the we stop the topology server and withdraw TIPC_CFG_SRV publication thereby triggering a withdraw message to subscribers. In this commit, we: 1. remove the connection from the server list at recount cleanup. 2. grab the refcount for a connection only if CF_CONNECTED is set. Tested-by: John Thompson <thompa.atl@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 16:14:57 -05:00
Parthasarathy Bhuvaragan	d094c4d5f5	tipc: add subscription refcount to avoid invalid delete Until now, the subscribers keep track of the subscriptions using reference count at subscriber level. At subscription cancel or subscriber delete, we delete the subscription only if the timer was pending for the subscription. This approach is incorrect as: 1. del_timer() is not SMP safe, if on CPU0 the check for pending timer returns true but CPU1 might schedule the timer callback thereby deleting the subscription. Thus when CPU0 is scheduled, it deletes an invalid subscription. 2. We export tipc_subscrp_report_overlap(), which accesses the subscription pointer multiple times. Meanwhile the subscription timer can expire thereby freeing the subscription and we might continue to access the subscription pointer leading to memory violations. In this commit, we introduce subscription refcount to avoid deleting an invalid subscription. Reported-and-Tested-by: John Thompson <thompa.atl@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 16:14:57 -05:00
Parthasarathy Bhuvaragan	93f955aad4	tipc: fix nametbl_lock soft lockup at node/link events We trigger a soft lockup as we grab nametbl_lock twice if the node has a pending node up/down or link up/down event while: - we process an incoming named message in tipc_named_rcv() and perform an tipc_update_nametbl(). - we have pending backlog items in the name distributor queue during a nametable update using tipc_nametbl_publish() or tipc_nametbl_withdraw(). The following are the call chain associated: tipc_named_rcv() Grabs nametbl_lock tipc_update_nametbl() (publish/withdraw) tipc_node_subscribe()/unsubscribe() tipc_node_write_unlock() << lockup occurs if an outstanding node/link event exits, as we grabs nametbl_lock again >> tipc_nametbl_withdraw() Grab nametbl_lock tipc_named_process_backlog() tipc_update_nametbl() << rest as above >> The function tipc_node_write_unlock(), in addition to releasing the lock processes the outstanding node/link up/down events. To do this, we need to grab the nametbl_lock again leading to the lockup. In this commit we fix the soft lockup by introducing a fast variant of node_unlock(), where we just release the lock. We adapt the node_subscribe()/node_unsubscribe() to use the fast variants. Reported-and-Tested-by: John Thompson <thompa.atl@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 16:14:57 -05:00
Pablo Neira Ayuso	b2c11e4b95	netfilter: nf_tables: bump set->ndeact on set flush Add missing set->ndeact update on each deactivated element from the set flush path. Otherwise, sets with fixed size break after flush since accounting breaks. # nft add set x y { type ipv4_addr\; size 2\; } # nft add element x y { 1.1.1.1 } # nft add element x y { 1.1.1.2 } # nft flush set x y # nft add element x y { 1.1.1.1 } <cmdline>:1:1-28: Error: Could not process rule: Too many open files in system Fixes: `8411b6442e` ("netfilter: nf_tables: support for set flushing") Reported-by: Elise Lennion <elise.lennion@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-01-24 21:46:59 +01:00
Pablo Neira Ayuso	de70185de0	netfilter: nf_tables: deconstify walk callback function The flush operation needs to modify set and element objects, so let's deconstify this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-01-24 21:46:58 +01:00
Pablo Neira Ayuso	35d0ac9070	netfilter: nf_tables: fix set->nelems counting with no NLM_F_EXCL If the element exists and no NLM_F_EXCL is specified, do not bump set->nelems, otherwise we leak one set element slot. This problem amplifies if the set is full since the abort path always decrements the counter for the -ENFILE case too, giving one spare extra slot. Fix this by moving set->nelems update to nft_add_set_elem() after successful element insertion. Moreover, remove the element if the set is full so there is no need to rely on the abort path to undo things anymore. Fixes: `c016c7e45d` ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-01-24 21:46:57 +01:00
Liping Zhang	5ce6b04ce9	netfilter: nft_log: restrict the log prefix length to 127 First, log prefix will be truncated to NF_LOG_PREFIXLEN-1, i.e. 127, at nf_log_packet(), so the extra part is useless. Second, after adding a log rule with a very very long prefix, we will fail to dump the nft rules after this _special_ one, but acctually, they do exist. For example: # name_65000=$(printf "%0.sQ" {1..65000}) # nft add rule filter output log prefix "$name_65000" # nft add rule filter output counter # nft add rule filter output counter # nft list chain filter output table ip filter { chain output { type filter hook output priority 0; policy accept; } } So now, restrict the log prefix length to NF_LOG_PREFIXLEN-1. Fixes: `96518518cc` ("netfilter: add nftables") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-01-24 21:46:29 +01:00
Kenneth Lee	828f6fa65c	IB/umem: Release pid in error and ODP flow 1. Release pid before enter odp flow 2. Release pid when fail to allocate memory Fixes: `87773dd56d` ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get") Fixes: `8ada2c1c0c` ("IB/core: Add support for on demand paging regions") Signed-off-by: Kenneth Lee <liguozhu@hisilicon.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 15:44:31 -05:00
Linus Torvalds	0263d4ebd9	platform-drivers-x86 for v4.10-4 SHORT SUMMARY OF CHANGES FOR LINUS MAINTAINERS: - Add myself to X86 PLATFORM DRIVERS as a co-maintainer ideapad-laptop: - handle ACPI event 1 intel_mid_powerbtn: - Set IRQ_ONESHOT surface3-wmi: - fix uninitialized symbol - Shut up unused-function warning mlx-platform: - free first dev on error -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEhiZOUlnC9oKN3n3AmT3/83c5Sy0FAliHm4gACgkQmT3/83c5 Sy0g1A/+PkP7Eqgd6wR1EO589ZVuJh9eMy1Oa65MBxOZMI0XH9CCAgvZyOKMwcZV Pyjufm6VFWNroxvgLIgpo2j6+fwHd04+yzDlGIv9qKkwsMwjbOi0UyGV+NuI8mZC 7YnZA8r2zQ7Mhyzjw0khvL00h1vYkfXWFtxD4p3x2d1Qb7TUT1Yo58vlaHpPygfY 6ghlYSzyD0gXC10Fa5QT5eUVzE8L4y1RpKPklX9ihwuntIvcqsV+Caz/81iK8lHP qMesQDSoxH61AZRYdLQ2QHP6k7Y+EwY/40YGs6mjY2HEn5w0zEbRN8jPK4u09Gcj qay5DKSNlSLXvnIHbXtmlozsz/gowwMAUGFH19Q72tDLOHMlIGbqBQA4Rfz4ADqX b61zOhTI8Xho2vz6KO2aQsQaIEXpDhw+mWlwFq+qyCCl4fs3QRXIz/sVQae1uM6C BwrPJgAPLuBKTkLI/gb5XLR/u4nDTC4rix9r1IrABxKQNQgMm5KtWnSmuGfM0gvs SJQ75JijkA6e3+NxVqcWJgSAUWkkIwDNXGe78RWFet0CTcjMAByIlwVFQy9CTj1T UUWyq7Gh34KndQJ1/SpzTd5aqxK+bxoMKJ4AIy88pM73IrsnLWIB7Y8FUgbmAbqi c9BSEfN6LVnBXOW2IWXkdh25l0MaJlvkjlvvvuwXYDEmGz4HDpM= =XDl+ -----END PGP SIGNATURE----- Merge tag 'platform-drivers-x86-v4.10-4' of git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform-driver fixes from Andy Shevchenko: "This is my first pull request since I become a co-maintainer of Platform Drivers x86 subsystem. It's a bit bigger than usual due to material collected for almost two weeks in a row. MAINTAINERS: - Add myself to X86 PLATFORM DRIVERS as a co-maintainer ideapad-laptop: - handle ACPI event 1 intel_mid_powerbtn: - Set IRQ_ONESHOT surface3-wmi: - fix uninitialized symbol - Shut up unused-function warning mlx-platform: - free first dev on error" * tag 'platform-drivers-x86-v4.10-4' of git://git.infradead.org/linux-platform-drivers-x86: MAINTAINERS: Add myself to X86 PLATFORM DRIVERS as a co-maintainer platform/x86: ideapad-laptop: handle ACPI event 1 platform/x86: intel_mid_powerbtn: Set IRQ_ONESHOT platform/x86: surface3-wmi: fix uninitialized symbol platform/x86: surface3-wmi: Shut up unused-function warning platform/x86: mlx-platform: free first dev on error	2017-01-24 12:38:43 -08:00
Ram Amrani	f449c7a2d8	RDMA/qedr: Dispatch port active event from qedr_add Relying on qede to trigger qedr on startup is problematic. When probing both if qedr loads slowly then qede can assume qedr is missing and not trigger it. This patch adds a triggering from qedr and protects against a race via an atomic bit. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 15:35:08 -05:00
Ram Amrani	9c1e0228ab	RDMA/qedr: Fix and simplify memory leak in PD alloc Free the PD if no internal resources were available. Move userspace code under the relevant 'if'. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 15:35:07 -05:00
Ram Amrani	af2b14b8b8	RDMA/qedr: Fix RDMA CM loopback The loopback logic in RDMA CM packets compares Ethernet addresses and was accidently inverse. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 15:35:02 -05:00
Ram Amrani	1a59075197	RDMA/qedr: Fix formatting Remove standalone ';'. List function's parameters in a single line. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 15:35:01 -05:00
Ram Amrani	27a4b1a6d6	RDMA/qedr: Mark three functions as static mark qedr_get_state_from_ibqp(), __qedr_alloc_mr() and __qedr_post_send() as static since they are only used in the same file. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 15:34:56 -05:00
Ram Amrani	933e6dcaa0	RDMA/qedr: Don't reset QP when queues aren't flushed Fail QP state transition from error to reset if SQ/RQ are not empty and still in the process of flushing out the queued work entries. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 15:34:55 -05:00
Ram Amrani	c78c314961	RDMA/qedr: Don't spam dmesg if QP is in error state It is normal to flush CQEs if the QP is in error state. Hence there's no use in printing a message per CQE to dmesg. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 15:34:54 -05:00
Ram Amrani	91bff997db	RDMA/qedr: Remove CQ spinlock from CM completion handlers There is only a single event queue that triggers the completion events for the RDMA CM and it is being processed serially. This means that inherently there can no parallelism of CQ completion handler callbacks, hence the lock is redundant. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 15:34:43 -05:00
Ram Amrani	59e8970b37	RDMA/qedr: Return max inline data in QP query result Return the maximum supported amount of inline data, not the qp's current configured inline data size, when filling out the results of a query qp call. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 15:34:37 -05:00
Ram Amrani	865cea40b6	RDMA/qedr: Return success when not changing QP state If the user is requesting us to change the QP state to the same state that it is already in, return success instead of failure. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 15:34:36 -05:00
Amrani, Ram	20f5e10ef8	RDMA/qedr: Add uapi header qedr-abi.h Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 15:34:36 -05:00
Amrani, Ram	097b615965	RDMA/qedr: Fix MTU returned from QP query MTU value returned from QP query should include overhead. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 15:34:30 -05:00
Amrani, Ram	d3f4aadd61	RDMA/core: Add the function ib_mtu_int_to_enum As the functionality to convert the MTU from a number to enum_ib_mtu is ubiquitous, define a dedicated function and remove the duplicated code. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 15:34:22 -05:00
Kinglong Mee	c929ea0b91	SUNRPC: cleanup ida information when removing sunrpc module After removing sunrpc module, I get many kmemleak information as, unreferenced object 0xffff88003316b1e0 (size 544): comm "gssproxy", pid 2148, jiffies 4294794465 (age 4200.081s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffb0cfb58a>] kmemleak_alloc+0x4a/0xa0 [<ffffffffb03507fe>] kmem_cache_alloc+0x15e/0x1f0 [<ffffffffb0639baa>] ida_pre_get+0xaa/0x150 [<ffffffffb0639cfd>] ida_simple_get+0xad/0x180 [<ffffffffc06054fb>] nlmsvc_lookup_host+0x4ab/0x7f0 [lockd] [<ffffffffc0605e1d>] lockd+0x4d/0x270 [lockd] [<ffffffffc06061e5>] param_set_timeout+0x55/0x100 [lockd] [<ffffffffc06cba24>] svc_defer+0x114/0x3f0 [sunrpc] [<ffffffffc06cbbe7>] svc_defer+0x2d7/0x3f0 [sunrpc] [<ffffffffc06c71da>] rpc_show_info+0x8a/0x110 [sunrpc] [<ffffffffb044a33f>] proc_reg_write+0x7f/0xc0 [<ffffffffb038e41f>] __vfs_write+0xdf/0x3c0 [<ffffffffb0390f1f>] vfs_write+0xef/0x240 [<ffffffffb0392fbd>] SyS_write+0xad/0x130 [<ffffffffb0d06c37>] entry_SYSCALL_64_fastpath+0x1a/0xa9 [<ffffffffffffffff>] 0xffffffffffffffff I found, the ida information (dynamic memory) isn't cleanup. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: `2f048db468` ("SUNRPC: Add an identifier for struct rpc_clnt") Cc: stable@vger.kernel.org # v3.12+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-01-24 15:29:24 -05:00
David S. Miller	294628c1fe	Merge branch 'alx-mq-fixes' Tobias Regnery says: ==================== alx: fix fallout from multi queue conversion Here are 3 fixes for the multi queue conversion in v4.10. The first patch fixes a wrong condition in an if statement. Patches 2 and 3 fixes regressions in the corner case when requesting msi-x interrupts fails and we fall back to msi or legacy interrupts. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 15:27:59 -05:00
Tobias Regnery	185aceefd8	alx: work around hardware bug in interrupt fallback path If requesting msi-x interrupts fails in alx_request_irq we fall back to a single tx queue and msi or legacy interrupts. Currently the adapter stops working in this case and we get tx watchdog timeouts. For reasons unknown the adapter gets confused when we load the dma adresses to the chip in alx_init_ring_ptrs twice: the first time with multiple queues and the second time in the fallback case with a single queue. To fix this move the the call to alx_reinit_rings (which calls alx_init_ring_ptrs) after alx_request_irq. At this time it is clear how much tx queues we have and which dma addresses we use. Fixes: `d768319cd4` ("alx: enable multiple tx queues") Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 15:27:58 -05:00
Tobias Regnery	37187a016c	alx: fix fallback to msi or legacy interrupts If requesting msi-x interrupts fails we should fall back to msi or legacy interrupts. However alx_realloc_ressources don't call alx_init_intr, so we fail to set the right number of tx queues. This results in watchdog timeouts and a nonfunctional adapter. Fixes: `d768319cd4` ("alx: enable multiple tx queues") Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 15:27:58 -05:00
Tobias Regnery	f1db5c101c	alx: fix wrong condition to free descriptor memory The condition to free the descriptor memory is wrong, we want to free the memory if it is set and not if it is unset. Invert the test to fix this issue. Fixes: b0999223f224b ("alx: add ability to allocate and free alx_napi structures") Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 15:27:58 -05:00
Bjørn Mork	5b9f575163	qmi_wwan/cdc_ether: add device ID for HP lt2523 (Novatel E371) WWAN card Another rebranded Novatel E371. qmi_wwan should drive this device, while cdc_ether should ignore it. Even though the USB descriptors are plain CDC-ETHER that USB interface is a QMI interface. Ref commit `7fdb7846c9` ("qmi_wwan/cdc_ether: add device IDs for Dell 5804 (Novatel E371) WWAN card") Cc: Dan Williams <dcbw@redhat.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 15:25:00 -05:00
Darrick J. Wong	83d230eb5c	xfs: verify dirblocklog correctly sb_dirblklog is added to sb_blocklog to compute the directory block size in bytes. Therefore, we must compare the sum of both those values against XFS_MAX_BLOCKSIZE_LOG, not just dirblklog. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-01-24 12:23:33 -08:00
Linus Torvalds	19ca2c8fec	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace fix from Eric Biederman: "This has a single brown bag fix. The possible deadlock with dec_pid_namespaces that I had thought was fixed earlier turned out only to have been moved. So instead of being cleaver this change takes ucounts_lock with irqs disabled. So dec_ucount can be used from any context without fear of deadlock. The items accounted for dec_ucount and inc_ucount are all comparatively heavy weight objects so I don't exepct this will have any measurable performance impact" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Make ucounts lock irq-safe	2017-01-24 12:21:51 -08:00
Thomas Huth	23d28a859f	ibmveth: Add a proper check for the availability of the checksum features When using the ibmveth driver in a KVM/QEMU based VM, it currently always prints out a scary error message like this when it is started: ibmveth 71000003 (unregistered net_device): unable to change checksum offload settings. 1 rc=-2 ret_attr=71000003 This happens because the driver always tries to enable the checksum offloading without checking for the availability of this feature first. QEMU does not support checksum offloading for the spapr-vlan device, thus we always get the error message here. According to the LoPAPR specification, the "ibm,illan-options" property of the corresponding device tree node should be checked first to see whether the H_ILLAN_ATTRIUBTES hypercall and thus the checksum offloading feature is available. Thus let's do this in the ibmveth driver, too, so that the error message is really only limited to cases where something goes wrong, and does not occur if the feature is just missing. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 15:15:21 -05:00
David S. Miller	7d6556ac66	Merge branch 'vxlan-fdb-fixes' Roopa Prabhu says: ==================== vxlan: misc fdb fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 15:01:58 -05:00
Balakrishnan Raman	efb5f68f32	vxlan: do not age static remote mac entries Mac aging is applicable only for dynamically learnt remote mac entries. Check for user configured static remote mac entries and skip aging. Signed-off-by: Balakrishnan Raman <ramanb@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 15:01:58 -05:00

... 3 4 5 6 7 ...

649372 Commits