636308 Commits

Author SHA1 Message Date
Max Kellermann
fe6531075e pctv452e: move buffer to heap, no mutex
commit 48775cb73c2e26b7ca9d679875a6e570c8b8e124 upstream.

commit 73d5c5c864f4 ("[media] pctv452e: don't do DMA on stack") caused
a NULL pointer dereference which occurs when dvb_usb_init()
calls dvb_usb_device_power_ctrl() for the first time, before the
frontend has been attached. It also caused a recursive deadlock because
tt3650_ci_msg_locked() has already locked the mutex.

So, partially revert it, but move the buffer to the heap
(DMA capable), not to the stack (may not be DMA capable).
Instead of sharing one buffer which needs mutex protection,
do a new heap allocation for each call.

Fixes: commit 73d5c5c864f4 ("[media] pctv452e: don't do DMA on stack")

Signed-off-by: Max Kellermann <max.kellermann@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:10 +01:00
Steve Wise
0bd3cb8d47 iw_cxgb4: free EQ queue memory on last deref
commit c12a67fec8d99bb554e8d4e99120d418f1a39c87 upstream.

Commit ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting
the last deref") introduced a bug where the RDMA QP EQ queue memory
(and QIDs) are possibly freed before the underlying connection has been
fully shutdown.  The result being a possible DMA read issued by HW after
the queue memory has been unmapped and freed.  This results in possible
WR corruption in the worst case, system bus errors if an IOMMU is in use,
and SGE "bad WR" errors reported in the very least.  The fix is to defer
unmap/free of queue memory and QID resources until the QP struct has
been fully dereferenced.  To do this, the c4iw_ucontext must also be kept
around until the last QP that references it is fully freed.  In addition,
since the last QP deref can happen in an IRQ disabled context, we need
a new workqueue thread to do the final unmap/free of the EQ queue memory.

Fixes: ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:09 +01:00
Kinglong Mee
cb1d48f55a SUNRPC: cleanup ida information when removing sunrpc module
commit c929ea0b910355e1876c64431f3d5802f95b3d75 upstream.

After removing sunrpc module, I get many kmemleak information as,
unreferenced object 0xffff88003316b1e0 (size 544):
  comm "gssproxy", pid 2148, jiffies 4294794465 (age 4200.081s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffffb0cfb58a>] kmemleak_alloc+0x4a/0xa0
    [<ffffffffb03507fe>] kmem_cache_alloc+0x15e/0x1f0
    [<ffffffffb0639baa>] ida_pre_get+0xaa/0x150
    [<ffffffffb0639cfd>] ida_simple_get+0xad/0x180
    [<ffffffffc06054fb>] nlmsvc_lookup_host+0x4ab/0x7f0 [lockd]
    [<ffffffffc0605e1d>] lockd+0x4d/0x270 [lockd]
    [<ffffffffc06061e5>] param_set_timeout+0x55/0x100 [lockd]
    [<ffffffffc06cba24>] svc_defer+0x114/0x3f0 [sunrpc]
    [<ffffffffc06cbbe7>] svc_defer+0x2d7/0x3f0 [sunrpc]
    [<ffffffffc06c71da>] rpc_show_info+0x8a/0x110 [sunrpc]
    [<ffffffffb044a33f>] proc_reg_write+0x7f/0xc0
    [<ffffffffb038e41f>] __vfs_write+0xdf/0x3c0
    [<ffffffffb0390f1f>] vfs_write+0xef/0x240
    [<ffffffffb0392fbd>] SyS_write+0xad/0x130
    [<ffffffffb0d06c37>] entry_SYSCALL_64_fastpath+0x1a/0xa9
    [<ffffffffffffffff>] 0xffffffffffffffff

I found, the ida information (dynamic memory) isn't cleanup.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes: 2f048db4680a ("SUNRPC: Add an identifier for struct rpc_clnt")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:09 +01:00
Benjamin Coddington
5637949edb NFSv4.0: always send mode in SETATTR after EXCLUSIVE4
commit a430607b2ef7c3be090f88c71cfcb1b3988aa7c0 upstream.

Some nfsv4.0 servers may return a mode for the verifier following an open
with EXCLUSIVE4 createmode, but this does not mean the client should skip
setting the mode in the following SETATTR.  It should only do that for
EXCLUSIVE4_1 or UNGAURDED createmode.

Fixes: 5334c5bdac92 ("NFS: Send attributes in OPEN request for NFS4_CREATE_EXCLUSIVE4_1")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:09 +01:00
Trond Myklebust
0a70235061 NFSv4.1: Fix a deadlock in layoutget
commit 8ac092519ad91931c96d306c4bfae2c6587c325f upstream.

We cannot call nfs4_handle_exception() without first ensuring that the
slot has been freed. If not, we end up deadlocking with the process
waiting for recovery to complete, and recovery waiting for the slot
table to drain.

Fixes: 2e80dbe7ac51 ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:09 +01:00
Chuck Lever
73fdda3b01 nfs: Don't increment lock sequence ID after NFS4ERR_MOVED
commit 059aa734824165507c65fd30a55ff000afd14983 upstream.

Xuan Qi reports that the Linux NFSv4 client failed to lock a file
that was migrated. The steps he observed on the wire:

1. The client sent a LOCK request to the source server
2. The source server replied NFS4ERR_MOVED
3. The client switched to the destination server
4. The client sent the same LOCK request to the destination
   server with a bumped lock sequence ID
5. The destination server rejected the LOCK request with
   NFS4ERR_BAD_SEQID

RFC 3530 section 8.1.5 provides a list of NFS errors which do not
bump a lock sequence ID.

However, RFC 3530 is now obsoleted by RFC 7530. In RFC 7530 section
9.1.7, this list has been updated by the addition of NFS4ERR_MOVED.

Reported-by: Xuan Qi <xuan.qi@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:08 +01:00
Helge Deller
2b95f1210e parisc: Don't use BITS_PER_LONG in userspace-exported swab.h header
commit 2ad5d52d42810bed95100a3d912679d8864421ec upstream.

In swab.h the "#if BITS_PER_LONG > 32" breaks compiling userspace programs if
BITS_PER_LONG is #defined by userspace with the sizeof() compiler builtin.

Solve this problem by using __BITS_PER_LONG instead.  Since we now
#include asm/bitsperlong.h avoid further potential userspace pollution
by moving the #define of SHIFT_PER_LONG to bitops.h which is not
exported to userspace.

This patch unbreaks compiling qemu on hppa/parisc.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:08 +01:00
Vineet Gupta
ca332b96ba ARC: [arcompact] handle unaligned access delay slot corner case
commit 9aed02feae57bf7a40cb04ea0e3017cb7a998db4 upstream.

After emulating an unaligned access in delay slot of a branch, we
pretend as the delay slot never happened - so return back to actual
branch target (or next PC if branch was not taken).

Curently we did this by handling STATUS32.DE, we also need to clear the
BTA.T bit, which is disregarded when returning from original misaligned
exception, but could cause weirdness if it took the interrupt return
path (in case interrupt was acive too)

One ARC700 customer ran into this when enabling unaligned access fixup
for kernel mode accesses as well

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:08 +01:00
Vineet Gupta
9d5f2c151e ARC: udelay: fix inline assembler by adding LP_COUNT to clobber list
commit 36425cd67052e3becf325fd4d3ba5691791ef7e4 upstream.

commit 3c7c7a2fc8811bc ("ARC: Don't use "+l" inline asm constraint")
modified the inline assembly to setup LP_COUNT register manually and NOT
rely on gcc to do it (with the +l inline assembler contraint hint, now
being retired in the compiler)

However the fix was flawed as we didn't add LP_COUNT to asm clobber list,
meaning gcc doesn't know that LP_COUNT or zero-delay-loops are in action
in the inline asm.

This resulted in some fun - as nested ZOL loops were being generared

| mov lp_count,250000 ;16 # tmp235,
| lp .L__GCC__LP14 #		<======= OUTER LOOP (gcc generated)
|   .L14:
|   ld r2, [r5] # MEM[(volatile u32 *)prephitmp_43], w
|   dmb 1
|   breq r2, -1, @.L21 #, w,,
|   bbit0 r2,1,@.L13 # w,,
|   ld r4,[r7] ;25 # loops_per_jiffy, loops_per_jiffy
|   mpymu r3,r4,r6 #, loops_per_jiffy, tmp234
|
|   mov lp_count, r3 #		 <====== INNER LOOP (from inline asm)
|   lp 1f
| 	 nop
|   1:
|   nop_s
| .L__GCC__LP14: ; loop end, start is @.L14 #,

This caused issues with drivers relying on sane behaviour of udelay
friends.

With LP_COUNT added to clobber list, gcc doesn't generate the outer
loop in say above case.

Addresses STAR 9001146134

Reported-by: Joao Pinto <jpinto@synopsys.com>
Fixes: 3c7c7a2fc8811bc ("ARC: Don't use "+l" inline asm constraint")
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:08 +01:00
Yegor Yefremov
50f5972cc2 can: ti_hecc: add missing prepare and unprepare of the clock
commit befa60113ce7ea270cb51eada28443ca2756f480 upstream.

In order to make the driver work with the common clock framework, this
patch converts the clk_enable()/clk_disable() to
clk_prepare_enable()/clk_disable_unprepare().

Also add error checking for clk_prepare_enable().

Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:08 +01:00
Einar Jón
9f56548b00 can: c_can_pci: fix null-pointer-deref in c_can_start() - set device pointer
commit c97c52be78b8463ac5407f1cf1f22f8f6cf93a37 upstream.

The priv->device pointer for c_can_pci is never set, but it is used
without a NULL check in c_can_start(). Setting it in c_can_pci_probe()
like c_can_plat_probe() prevents c_can_pci.ko from crashing, with and
without CONFIG_PM.

This might also cause the pm_runtime_*() functions in c_can.c to
actually be executed for c_can_pci devices - they are the only other
place where priv->device is used, but they all contain a null check.

Signed-off-by: Einar Jón <tolvupostur@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:08 +01:00
Israel Rukshin
a1af471b41 IB/srp: fix invalid indirect_sg_entries parameter value
commit 0a475ef4226e305bdcffe12b401ca1eab06c4913 upstream.

After setting indirect_sg_entries module_param to huge value (e.g 500,000),
srp_alloc_req_data() fails to allocate indirect descriptors for the request
ring (kmalloc fails). This commit enforces the maximum value of
indirect_sg_entries to be SG_MAX_SEGMENTS as signified in module param
description.

Fixes: 65e8617fba17 (scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS)
Fixes: c07d424d6118 (IB/srp: add support for indirect tables that don't fit in SRP_CMD)
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:07 +01:00
Israel Rukshin
c2293e76ba IB/srp: fix mr allocation when the device supports sg gaps
commit ad8e66b4a80182174f73487ed25fd2140cf43361 upstream.

If the device support arbitrary sg list mapping (device cap
IB_DEVICE_SG_GAPS_REG set) we allocate the memory regions with
IB_MR_TYPE_SG_GAPS.

Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:07 +01:00
Max Gurtovoy
24be606cd3 IB/iser: Fix sg_tablesize calculation
commit 1e5db6c31ade4150c2e2b1a21e39f776c38fea39 upstream.

For devices that can register page list that is bigger than
USHRT_MAX, we actually take the wrong value for sg_tablesize.
E.g: for CX4 max_fast_reg_page_list_len is 65536 (bigger than USHRT_MAX)
so we set sg_tablesize to 0 by mistake. Therefore, each IO that is
bigger than 4k splitted to "< 4k" chunks that cause performance degredation.
Remove wrong sg_tablesize assignment, and use the value that was set during
address resolution handler with the needed casting.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:07 +01:00
Nicolas Iooss
95600605ff IB/cxgb3: fix misspelling in header guard
commit b1a27eac7fefff33ccf6acc919fc0725bf9815fb upstream.

Use CXGB3_... instead of CXBG3_...

Fixes: a85fb3383340 ("IB/cxgb3: Move user vendor structures")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Steve Wise <swise@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:07 +01:00
Martin Schwidefsky
d7f56ee119 s390/ptrace: Preserve previous registers for short regset write
commit 9dce990d2cf57b5ed4e71a9cdbd7eae4335111ff upstream.

Ensure that if userspace supplies insufficient data to
PTRACE_SETREGSET to fill all the registers, the thread's old
registers are preserved.

convert_vx_to_fp() is adapted to handle only a specified number of
registers rather than unconditionally handling all of them: other
callers of this function are adapted appropriately.

Based on an initial patch by Dave Martin.

Reported-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:07 +01:00
Christian Borntraeger
62d7f2123f s390/mm: Fix cmma unused transfer from pgste into pte
commit 0d6da872d3e4a60f43c295386d7ff9a4cdcd57e9 upstream.

The last pgtable rework silently disabled the CMMA unused state by
setting a local pte variable (a parameter) instead of propagating it
back into the caller. Fix it.

Fixes: ebde765c0e85 ("s390/mm: uninline ptep_xxx functions from pgtable.h")
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:07 +01:00
Jack Morgenstein
97a2e39b7a RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabled
commit b4cfe3971f6eab542dd7ecc398bfa1aeec889934 upstream.

If IPV6 has not been enabled in the underlying kernel, we must avoid
calling IPV6 procedures in rdma_cm.ko.

This requires using "IS_ENABLED(CONFIG_IPV6)" in "if" statements
surrounding any code which calls external IPV6 procedures.

In the instance fixed here, procedure cma_bind_addr() called
ipv6_addr_type() -- which resulted in calling external procedure
__ipv6_addr_type().

Fixes: 6c26a77124ff ("RDMA/cma: fix IPv6 address resolution")
Cc: Spencer Baugh <sbaugh@catern.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:06 +01:00
Omar Sandoval
ffb97c11d0 Btrfs: remove ->{get, set}_acl() from btrfs_dir_ro_inode_operations
commit 57b59ed2e5b91e958843609c7884794e29e6c4cb upstream.

Subvolume directory inodes can't have ACLs.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:06 +01:00
Omar Sandoval
ad80fada9d Btrfs: disable xattr operations on subvolume directories
commit 1fdf41941b8010691679638f8d0c8d08cfee7726 upstream.

When you snapshot a subvolume containing a subvolume, you get a
placeholder directory where the subvolume would be. These directory
inodes have ->i_ops set to btrfs_dir_ro_inode_operations. Previously,
these i_ops didn't include the xattr operation callbacks. The conversion
to xattr_handlers missed this case, leading to bogus attempts to set
xattrs on these inodes. This manifested itself as failures when running
delayed inodes.

To fix this, clear IOP_XATTR in ->i_opflags on these inodes.

Fixes: 6c6ef9f26e59 ("xattr: Stop calling {get,set,remove}xattr inode operations")
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Reported-by: Chris Murphy <lists@colorremedies.com>
Tested-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:06 +01:00
Omar Sandoval
79babd4a6c Btrfs: remove old tree_root case in btrfs_read_locked_inode()
commit 67ade058ef2c65a3e56878af9c293ec76722a2e5 upstream.

As Jeff explained in c2951f32d36c ("btrfs: remove old tree_root dirent
processing in btrfs_real_readdir()"), supporting this old format is no
longer necessary since the Btrfs magic number has been updated since we
changed to the current format. There are other places where we still
handle this old format, but since this is part of a fix that is going to
stable, I'm only removing this one for now.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:06 +01:00
Arnd Bergmann
959f9709c0 ISDN: eicon: silence misleading array-bounds warning
commit 950eabbd6ddedc1b08350b9169a6a51b130ebaaf upstream.

With some gcc versions, we get a warning about the eicon driver,
and that currently shows up as the only remaining warning in one
of the build bots:

In file included from ../drivers/isdn/hardware/eicon/message.c:30:0:
eicon/message.c: In function 'mixer_notify_update':
eicon/platform.h:333:18: warning: array subscript is above array bounds [-Warray-bounds]

The code is easily changed to open-code the unusual PUT_WORD() line
causing this to avoid the warning.

Link: http://arm-soc.lixom.net/buildlogs/stable-rc/v4.4.45/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:06 +01:00
Brian Foster
4859524143 xfs: prevent quotacheck from overloading inode lru
commit e0d76fa4475ef2cf4b52d18588b8ce95153d021b upstream.

Quotacheck runs at mount time in situations where quota accounting must
be recalculated. In doing so, it uses bulkstat to visit every inode in
the filesystem. Historically, every inode processed during quotacheck
was released and immediately tagged for reclaim because quotacheck runs
before the superblock is marked active by the VFS. In other words,
the final iput() lead to an immediate ->destroy_inode() call, which
allowed the XFS background reclaim worker to start reclaiming inodes.

Commit 17c12bcd3 ("xfs: when replaying bmap operations, don't let
unlinked inodes get reaped") marks the XFS superblock active sooner as
part of the mount process to support caching inodes processed during log
recovery. This occurs before quotacheck and thus means all inodes
processed by quotacheck are inserted to the LRU on release.  The
s_umount lock is held until the mount has completed and thus prevents
the shrinkers from operating on the sb. This means that quotacheck can
excessively populate the inode LRU and lead to OOM conditions on systems
without sufficient RAM.

Update the quotacheck bulkstat handler to set XFS_IGET_DONTCACHE on
inodes processed by quotacheck. This causes ->drop_inode() to return 1
and in turn causes iput_final() to evict the inode. This preserves the
original quotacheck behavior and prevents it from overloading the LRU
and running out of memory.

Reported-by: Martin Svec <martin.svec@zoner.cz>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:05 +01:00
Eric Dumazet
03707d6c36 sysctl: fix proc_doulongvec_ms_jiffies_minmax()
commit ff9f8a7cf935468a94d9927c68b00daae701667e upstream.

We perform the conversion between kernel jiffies and ms only when
exporting kernel value to user space.

We need to do the opposite operation when value is written by user.

Only matters when HZ != 1000

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:05 +01:00
Nikolay Borisov
c755686778 userns: Make ucounts lock irq-safe
commit 880a38547ff08715ce4f1daf9a4bb30c87676e68 upstream.

The ucounts_lock is being used to protect various ucounts lifecycle
management functionalities. However, those services can also be invoked
when a pidns is being freed in an RCU callback (e.g. softirq context).
This can lead to deadlocks. There were already efforts trying to
prevent similar deadlocks in add7c65ca426 ("pid: fix lockdep deadlock
warning due to ucount_lock"), however they just moved the context
from hardirq to softrq. Fix this issue once and for all by explictly
making the lock disable irqs altogether.

Dmitry Vyukov <dvyukov@google.com> reported:

> I've got the following deadlock report while running syzkaller fuzzer
> on eec0d3d065bfcdf9cd5f56dd2a36b94d12d32297 of linux-next (on odroid
> device if it matters):
>
> =================================
> [ INFO: inconsistent lock state ]
> 4.10.0-rc3-next-20170112-xc2-dirty #6 Not tainted
> ---------------------------------
> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> swapper/2/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
>  (ucounts_lock){+.?...}, at: [<     inline     >] spin_lock
> ./include/linux/spinlock.h:302
>  (ucounts_lock){+.?...}, at: [<ffff2000081678c8>]
> put_ucounts+0x60/0x138 kernel/ucount.c:162
> {SOFTIRQ-ON-W} state was registered at:
> [<ffff2000081c82d8>] mark_lock+0x220/0xb60 kernel/locking/lockdep.c:3054
> [<     inline     >] mark_irqflags kernel/locking/lockdep.c:2941
> [<ffff2000081c97a8>] __lock_acquire+0x388/0x3260 kernel/locking/lockdep.c:3295
> [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753
> [<     inline     >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144
> [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151
> [<     inline     >] spin_lock ./include/linux/spinlock.h:302
> [<     inline     >] get_ucounts kernel/ucount.c:131
> [<ffff200008167c28>] inc_ucount+0x80/0x6c8 kernel/ucount.c:189
> [<     inline     >] inc_mnt_namespaces fs/namespace.c:2818
> [<ffff200008481850>] alloc_mnt_ns+0x78/0x3a8 fs/namespace.c:2849
> [<ffff200008487298>] create_mnt_ns+0x28/0x200 fs/namespace.c:2959
> [<     inline     >] init_mount_tree fs/namespace.c:3199
> [<ffff200009bd6674>] mnt_init+0x258/0x384 fs/namespace.c:3251
> [<ffff200009bd60bc>] vfs_caches_init+0x6c/0x80 fs/dcache.c:3626
> [<ffff200009bb1114>] start_kernel+0x414/0x460 init/main.c:648
> [<ffff200009bb01e8>] __primary_switched+0x6c/0x70 arch/arm64/kernel/head.S:456
> irq event stamp: 2316924
> hardirqs last  enabled at (2316924): [<     inline     >] rcu_do_batch
> kernel/rcu/tree.c:2911
> hardirqs last  enabled at (2316924): [<     inline     >]
> invoke_rcu_callbacks kernel/rcu/tree.c:3182
> hardirqs last  enabled at (2316924): [<     inline     >]
> __rcu_process_callbacks kernel/rcu/tree.c:3149
> hardirqs last  enabled at (2316924): [<ffff200008210414>]
> rcu_process_callbacks+0x7a4/0xc28 kernel/rcu/tree.c:3166
> hardirqs last disabled at (2316923): [<     inline     >] rcu_do_batch
> kernel/rcu/tree.c:2900
> hardirqs last disabled at (2316923): [<     inline     >]
> invoke_rcu_callbacks kernel/rcu/tree.c:3182
> hardirqs last disabled at (2316923): [<     inline     >]
> __rcu_process_callbacks kernel/rcu/tree.c:3149
> hardirqs last disabled at (2316923): [<ffff20000820fe80>]
> rcu_process_callbacks+0x210/0xc28 kernel/rcu/tree.c:3166
> softirqs last  enabled at (2316912): [<ffff20000811b4c4>]
> _local_bh_enable+0x4c/0x80 kernel/softirq.c:155
> softirqs last disabled at (2316913): [<     inline     >]
> do_softirq_own_stack ./include/linux/interrupt.h:488
> softirqs last disabled at (2316913): [<     inline     >]
> invoke_softirq kernel/softirq.c:371
> softirqs last disabled at (2316913): [<ffff20000811c994>]
> irq_exit+0x264/0x308 kernel/softirq.c:405
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock(ucounts_lock);
>   <Interrupt>
>     lock(ucounts_lock);
>
>  *** DEADLOCK ***
>
> 1 lock held by swapper/2/0:
>  #0:  (rcu_callback){......}, at: [<     inline     >] __rcu_reclaim
> kernel/rcu/rcu.h:108
>  #0:  (rcu_callback){......}, at: [<     inline     >] rcu_do_batch
> kernel/rcu/tree.c:2919
>  #0:  (rcu_callback){......}, at: [<     inline     >]
> invoke_rcu_callbacks kernel/rcu/tree.c:3182
>  #0:  (rcu_callback){......}, at: [<     inline     >]
> __rcu_process_callbacks kernel/rcu/tree.c:3149
>  #0:  (rcu_callback){......}, at: [<ffff200008210390>]
> rcu_process_callbacks+0x720/0xc28 kernel/rcu/tree.c:3166
>
> stack backtrace:
> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-rc3-next-20170112-xc2-dirty #6
> Hardware name: Hardkernel ODROID-C2 (DT)
> Call trace:
> [<ffff20000808fa60>] dump_backtrace+0x0/0x440 arch/arm64/kernel/traps.c:500
> [<ffff20000808fec0>] show_stack+0x20/0x30 arch/arm64/kernel/traps.c:225
> [<ffff2000088a99e0>] dump_stack+0x110/0x168
> [<ffff2000082fa2b4>] print_usage_bug.part.27+0x49c/0x4bc
> kernel/locking/lockdep.c:2387
> [<     inline     >] print_usage_bug kernel/locking/lockdep.c:2357
> [<     inline     >] valid_state kernel/locking/lockdep.c:2400
> [<     inline     >] mark_lock_irq kernel/locking/lockdep.c:2617
> [<ffff2000081c89ec>] mark_lock+0x934/0xb60 kernel/locking/lockdep.c:3065
> [<     inline     >] mark_irqflags kernel/locking/lockdep.c:2923
> [<ffff2000081c9a60>] __lock_acquire+0x640/0x3260 kernel/locking/lockdep.c:3295
> [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753
> [<     inline     >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144
> [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151
> [<     inline     >] spin_lock ./include/linux/spinlock.h:302
> [<ffff2000081678c8>] put_ucounts+0x60/0x138 kernel/ucount.c:162
> [<ffff200008168364>] dec_ucount+0xf4/0x158 kernel/ucount.c:214
> [<     inline     >] dec_pid_namespaces kernel/pid_namespace.c:89
> [<ffff200008293dc8>] delayed_free_pidns+0x40/0xe0 kernel/pid_namespace.c:156
> [<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
> [<     inline     >] rcu_do_batch kernel/rcu/tree.c:2919
> [<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:3182
> [<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:3149
> [<ffff2000082103d8>] rcu_process_callbacks+0x768/0xc28 kernel/rcu/tree.c:3166
> [<ffff2000080821dc>] __do_softirq+0x324/0x6e0 kernel/softirq.c:284
> [<     inline     >] do_softirq_own_stack ./include/linux/interrupt.h:488
> [<     inline     >] invoke_softirq kernel/softirq.c:371
> [<ffff20000811c994>] irq_exit+0x264/0x308 kernel/softirq.c:405
> [<ffff2000081ecc28>] __handle_domain_irq+0xc0/0x150 kernel/irq/irqdesc.c:636
> [<ffff200008081c80>] gic_handle_irq+0x68/0xd8
> Exception stack(0xffff8000648e7dd0 to 0xffff8000648e7f00)
> 7dc0:                                   ffff8000648d4b3c 0000000000000007
> 7de0: 0000000000000000 1ffff0000c91a967 1ffff0000c91a967 1ffff0000c91a967
> 7e00: ffff20000a4b6b68 0000000000000001 0000000000000007 0000000000000001
> 7e20: 1fffe4000149ae90 ffff200009d35000 0000000000000000 0000000000000002
> 7e40: 0000000000000000 0000000000000000 0000000002624a1a 0000000000000000
> 7e60: 0000000000000000 ffff200009cbcd88 000060006d2ed000 0000000000000140
> 7e80: ffff200009cff000 ffff200009cb6000 ffff200009cc2020 ffff200009d2159d
> 7ea0: 0000000000000000 ffff8000648d4380 0000000000000000 ffff8000648e7f00
> 7ec0: ffff20000820a478 ffff8000648e7f00 ffff20000820a47c 0000000010000145
> 7ee0: 0000000000000140 dfff200000000000 ffffffffffffffff ffff20000820a478
> [<ffff2000080837f8>] el1_irq+0xb8/0x130 arch/arm64/kernel/entry.S:486
> [<     inline     >] arch_local_irq_restore
> ./arch/arm64/include/asm/irqflags.h:81
> [<ffff20000820a47c>] rcu_idle_exit+0x64/0xa8 kernel/rcu/tree.c:1030
> [<     inline     >] cpuidle_idle_call kernel/sched/idle.c:200
> [<ffff2000081bcbfc>] do_idle+0x1dc/0x2d0 kernel/sched/idle.c:243
> [<ffff2000081bd1cc>] cpu_startup_entry+0x24/0x28 kernel/sched/idle.c:345
> [<ffff200008099f8c>] secondary_start_kernel+0x2cc/0x358
> arch/arm64/kernel/smp.c:276
> [<000000000279f1a4>] 0x279f1a4

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: add7c65ca426 ("pid: fix lockdep deadlock warning due to ucount_lock")
Fixes: f333c700c610 ("pidns: Add a limit on the number of pid namespaces")
Link: https://www.spinics.net/lists/kernel/msg2426637.html
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:05 +01:00
Will Deacon
13e39d5930 vring: Force use of DMA API for ARM-based systems with legacy devices
commit c7070619f3408d9a0dffbed9149e6f00479cf43b upstream.

Booting Linux on an ARM fastmodel containing an SMMU emulation results
in an unexpected I/O page fault from the legacy virtio-blk PCI device:

[    1.211721] arm-smmu-v3 2b400000.smmu: event 0x10 received:
[    1.211800] arm-smmu-v3 2b400000.smmu:	0x00000000fffff010
[    1.211880] arm-smmu-v3 2b400000.smmu:	0x0000020800000000
[    1.211959] arm-smmu-v3 2b400000.smmu:	0x00000008fa081002
[    1.212075] arm-smmu-v3 2b400000.smmu:	0x0000000000000000
[    1.212155] arm-smmu-v3 2b400000.smmu: event 0x10 received:
[    1.212234] arm-smmu-v3 2b400000.smmu:	0x00000000fffff010
[    1.212314] arm-smmu-v3 2b400000.smmu:	0x0000020800000000
[    1.212394] arm-smmu-v3 2b400000.smmu:	0x00000008fa081000
[    1.212471] arm-smmu-v3 2b400000.smmu:	0x0000000000000000

<system hangs failing to read partition table>

This is because the legacy virtio-blk device is behind an SMMU, so we
have consequently swizzled its DMA ops and configured the SMMU to
translate accesses. This then requires the vring code to use the DMA API
to establish translations, otherwise all transactions will result in
fatal faults and termination.

Given that ARM-based systems only see an SMMU if one is really present
(the topology is all described by firmware tables such as device-tree or
IORT), then we can safely use the DMA API for all legacy virtio devices.
Modern devices can advertise the prescense of an IOMMU using the
VIRTIO_F_IOMMU_PLATFORM feature flag.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Fixes: 876945dbf649 ("arm64: Hook up IOMMU dma_ops")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:05 +01:00
Vlastimil Babka
96e5cec10e mm, page_alloc: fix premature OOM when racing with cpuset mems update
commit e47483bca2cc59a4593b37a270b16ee42b1d9f08 upstream.

Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode
triggers OOM killer in few seconds, despite lots of free memory.  The
test attempts to repeatedly fault in memory in one process in a cpuset,
while changing allowed nodes of the cpuset between 0 and 1 in another
process.

The problem comes from insufficient protection against cpuset changes,
which can cause get_page_from_freelist() to consider all zones as
non-eligible due to nodemask and/or current->mems_allowed.  This was
masked in the past by sufficient retries, but since commit 682a3385e773
("mm, page_alloc: inline the fast path of the zonelist iterator") we fix
the preferred_zoneref once, and don't iterate over the whole zonelist in
further attempts, thus the only eligible zones might be placed in the
zonelist before our starting point and we always miss them.

A previous patch fixed this problem for current->mems_allowed.  However,
cpuset changes also update the task's mempolicy nodemask.  The fix has
two parts.  We have to repeat the preferred_zoneref search when we
detect cpuset update by way of seqcount, and we have to check the
seqcount before considering OOM.

[akpm@linux-foundation.org: fix typo in comment]
Link: http://lkml.kernel.org/r/20170120103843.24587-5-vbabka@suse.cz
Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:05 +01:00
Vlastimil Babka
b678e4ff7c mm, page_alloc: move cpuset seqcount checking to slowpath
commit 5ce9bfef1d27944c119a397a9d827bef795487ce upstream.

This is a preparation for the following patch to make review simpler.
While the primary motivation is a bug fix, this also simplifies the fast
path, although the moved code is only enabled when cpusets are in use.

Link: http://lkml.kernel.org/r/20170120103843.24587-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:04 +01:00
Vlastimil Babka
d1656c5aef mm, page_alloc: fix fast-path race with cpuset update or removal
commit 16096c25bf0ca5d87e4fa6ec6108ba53feead212 upstream.

Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode
triggers OOM killer in few seconds, despite lots of free memory.  The
test attempts to repeatedly fault in memory in one process in a cpuset,
while changing allowed nodes of the cpuset between 0 and 1 in another
process.

One possible cause is that in the fast path we find the preferred
zoneref according to current mems_allowed, so that it points to the
middle of the zonelist, skipping e.g.  zones of node 1 completely.  If
the mems_allowed is updated to contain only node 1, we never reach it in
the zonelist, and trigger OOM before checking the cpuset_mems_cookie.

This patch fixes the particular case by redoing the preferred zoneref
search if we switch back to the original nodemask.  The condition is
also slightly changed so that when the last non-root cpuset is removed,
we don't miss it.

Note that this is not a full fix, and more patches will follow.

Link: http://lkml.kernel.org/r/20170120103843.24587-3-vbabka@suse.cz
Fixes: 682a3385e773 ("mm, page_alloc: inline the fast path of the zonelist iterator")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:04 +01:00
Vlastimil Babka
ade7afe9dc mm, page_alloc: fix check for NULL preferred_zone
commit ea57485af8f4221312a5a95d63c382b45e7840dc upstream.

Patch series "fix premature OOM regression in 4.7+ due to cpuset races".

This is v2 of my attempt to fix the recent report based on LTP cpuset
stress test [1].  The intention is to go to stable 4.9 LTSS with this,
as triggering repeated OOMs is not nice.  That's why the patches try to
be not too intrusive.

Unfortunately why investigating I found that modifying the testcase to
use per-VMA policies instead of per-task policies will bring the OOM's
back, but that seems to be much older and harder to fix problem.  I have
posted a RFC [2] but I believe that fixing the recent regressions has a
higher priority.

Longer-term we might try to think how to fix the cpuset mess in a better
and less error prone way.  I was for example very surprised to learn,
that cpuset updates change not only task->mems_allowed, but also
nodemask of mempolicies.  Until now I expected the parameter to
alloc_pages_nodemask() to be stable.  I wonder why do we then treat
cpusets specially in get_page_from_freelist() and distinguish HARDWALL
etc, when there's unconditional intersection between mempolicy and
cpuset.  I would expect the nodemask adjustment for saving overhead in
g_p_f(), but that clearly doesn't happen in the current form.  So we
have both crazy complexity and overhead, AFAICS.

[1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com
[2] https://lkml.kernel.org/r/7c459f26-13a6-a817-e508-b65b903a8378@suse.cz

This patch (of 4):

Since commit c33d6c06f60f ("mm, page_alloc: avoid looking up the first
zone in a zonelist twice") we have a wrong check for NULL preferred_zone,
which can theoretically happen due to concurrent cpuset modification.  We
check the zoneref pointer which is never NULL and we should check the zone
pointer.  Also document this in first_zones_zonelist() comment per Michal
Hocko.

Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Link: http://lkml.kernel.org/r/20170120103843.24587-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:04 +01:00
Vlastimil Babka
9b1a1ae9b5 mm/mempolicy.c: do not put mempolicy before using its nodemask
commit d51e9894d27492783fc6d1b489070b4ba66ce969 upstream.

Since commit be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to
alloc_pages_vma") alloc_pages_vma() can potentially free a mempolicy by
mpol_cond_put() before accessing the embedded nodemask by
__alloc_pages_nodemask().  The commit log says it's so "we can use a
single exit path within the function" but that's clearly wrong.  We can
still do that when doing mpol_cond_put() after the allocation attempt.

Make sure the mempolicy is not freed prematurely, otherwise
__alloc_pages_nodemask() can end up using a bogus nodemask, which could
lead e.g.  to premature OOM.

Fixes: be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to alloc_pages_vma")
Link: http://lkml.kernel.org/r/20170118141124.8345-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:04 +01:00
Keno Fischer
6676aa6546 mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
commit 8310d48b125d19fcd9521d83b8293e63eb1646aa upstream.

In commit 19be0eaffa3a ("mm: remove gup_flags FOLL_WRITE games from
__get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE
after a COW was resolved to setting the (newly introduced) FOLL_COW
instead.  Simultaneously, the check in gup.c was updated to still allow
writes with FOLL_FORCE set if FOLL_COW had also been set.

However, a similar check in huge_memory.c was forgotten.  As a result,
remote memory writes to ro regions of memory backed by transparent huge
pages cause an infinite loop in the kernel (handle_mm_fault sets
FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails
out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is
true.

While in this state the process is stil SIGKILLable, but little else
works (e.g.  no ptrace attach, no other signals).  This is easily
reproduced with the following code (assuming thp are set to always):

    #include <assert.h>
    #include <fcntl.h>
    #include <stdint.h>
    #include <stdio.h>
    #include <string.h>
    #include <sys/mman.h>
    #include <sys/stat.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    #include <unistd.h>

    #define TEST_SIZE 5 * 1024 * 1024

    int main(void) {
      int status;
      pid_t child;
      int fd = open("/proc/self/mem", O_RDWR);
      void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
                        MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
      assert(addr != MAP_FAILED);
      pid_t parent_pid = getpid();
      if ((child = fork()) == 0) {
        void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
                           MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
        assert(addr2 != MAP_FAILED);
        memset(addr2, 'a', TEST_SIZE);
        pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
        return 0;
      }
      assert(child == waitpid(child, &status, 0));
      assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
      return 0;
    }

Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously
to the update in gup.c in the original commit.  The same pattern exists
in follow_devmap_pmd.  However, we should not be able to reach that
check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we
ever do.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170106015025.GA38411@juliacomputing.com
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:04 +01:00
Lucas Stach
a2104c7cd3 drm/atomic: clear out fence when duplicating state
[Fixed differently in 4.10]

The fence needs to be cleared out, otherwise the following commit
might wait on a stale fence from the previous commit. This was fixed
as a side effect of 9626014258a5 (drm/fence: add in-fences support)
in kernel 4.10.

As this commit introduces new functionality and as such can not be
applied to stable, this patch is the minimal fix for the kernel 4.9
stable series.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:03 +01:00
Alex Deucher
bbae3c4525 Revert "drm/radeon: always apply pci shutdown callbacks"
commit b9b487e494712c8e5905b724e12f5ef17e9ae6f9 upstream.

This seems to break reboot on some evergreen systems.

bugs:
https://bugs.freedesktop.org/show_bug.cgi?id=99524
https://bugzilla.kernel.org/show_bug.cgi?id=192271

This reverts commit a481daa88fd4d6b54f25348972bba10b5f6a84d0.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:03 +01:00
Dan Carpenter
5270c017f1 drm/vc4: fix a bounds check
commit 21ccc32496b2f63228f5232b3ac0e426e8fb3c31 upstream.

We accidentally return success even if vc4_full_res_bounds_check() fails.

Fixes: d5b1a78a772f ("drm/vc4: Add support for drawing 3D frames.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:03 +01:00
Eric Anholt
cfba2a001d drm/vc4: Return -EINVAL on the overflow checks failing.
commit 6b8ac63847bc2f958dd93c09edc941a0118992d9 upstream.

By failing to set the errno, we'd continue on to trying to set up the
RCL, and then oops on trying to dereference the tile_bo that binning
validation should have set up.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Fixes: d5b1a78a772f ("drm/vc4: Add support for drawing 3D frames.")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:03 +01:00
Eric Anholt
b9edac54cb drm/vc4: Fix an integer overflow in temporary allocation layout.
commit 0f2ff82e11c86c05d051cae32b58226392d33bbf upstream.

We copy the unvalidated ioctl arguments from the user into kernel
temporary memory to run the validation from, to avoid a race where the
user updates the unvalidate contents in between validating them and
copying them into the validated BO.

However, in setting up the layout of the kernel side, we failed to
check one of the additions (the roundup() for shader_rec_offset)
against integer overflow, allowing a nearly MAX_UINT value of
bin_cl_size to cause us to under-allocate the temporary space that we
then copy_from_user into.

Reported-by: Murray McAllister <murray.mcallister@insomniasec.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Fixes: d5b1a78a772f ("drm/vc4: Add support for drawing 3D frames.")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:03 +01:00
Eric Anholt
32600835eb drm/vc4: Fix memory leak of the CRTC state.
commit 7622b25543665567d8830a63210385b7d705924b upstream.

The underscores variant frees the pointers inside, while the
no-underscores variant calls underscores and then frees the struct.

Signed-off-by: Eric Anholt <eric@anholt.net>
Fixes: d8dbf44f13b9 ("drm/vc4: Make the CRTCs cooperate on allocating display lists.")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:03 +01:00
Ville Syrjälä
4c741e2adb drm/i915: Ignore bogus plane coordinates on SKL when the plane is not visible
commit 3bfdfdcbce2796ce75bf2d85fd8471858d702e5d upstream.

When the plane is invisible we may have all sorts of bogus stuff
in the coordinates, which we must ignore or else we might fail the
plane update. This started to happen on SKL when I moved the plane
offset computation to happen in the check phase. Previously we
happily ignored it all since we never called the update_plane hook
with an invisible plane.

Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Fixes: b63a16f6cd89 ("drm/i915: Compute display surface offset in the plane check hook for SKL+")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98258
Testcase: igt/pm_rpm/legacy-planes
Testcase: igt/pm_rpm/universal-planes
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478550057-24864-3-git-send-email-ville.syrjala@linux.intel.com
(cherry picked from commit a5e4c7d0aa6784d8abe95c3ceef0da9656d17468)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:02 +01:00
Takashi Iwai
f1dc9aaee0 drm: Fix broken VT switch with video=1366x768 option
commit fdf35a6b22247746a7053fc764d04218a9306f82 upstream.

I noticed that the VT switch doesn't work any longer with a Dell
laptop with 1366x768 eDP when the machine is connected with a DP
monitor.  It behaves as if VT were switched, but the graphics remain
frozen.  Actually the keyboard works, so I could switch back to VT7
again.

I tried to track down the problem, and encountered a long story until
we reach to this error:

- The machine is booted with video=1366x768 option (the distro
  installer seems to add it as default).
- Recently, drm_helper_probe_single_connector_modes() deals with
  cmdline modes, and it tries to create a new mode when no
  matching mode is found.
- The drm_mode_create_from_cmdline_mode() creates a mode based on
  either CVT of GFT according to the given cmdline mode; in our case,
  it's 1366x768.
- Since both CVT and GFT can't express the width 1366 due to
  alignment, the resultant mode becomes 1368x768, slightly larger than
  the given size.
- Later on, the atomic commit is performed, and in
  drm_atomic_check_only(), the size of each plane is checked.
- The size check of 1366x768 fails due to the above, and eventually
  the whole VT switch fails.

Back in the history, we've had a manual fix-up of 1368x768 in various
places via c09dedb7a50e ("drm/edid: Add a workaround for 1366x768 HD
panel"), but they have been all in drm_edid.c at probing the modes
from EDID.  For addressing the problem above, we need a similar hack
to the mode newly created from cmdline, manually adjusting the width
when the expected size is 1366 while we get 1368 instead.

Fixes: eaf99c749d43 ("drm: Perform cmdline mode parsing during...")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109145614.29454-1-tiwai@suse.de
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:02 +01:00
Peter Ujfalusi
2abb7f408f drm: Schedule the output_poll_work with 1s delay if we have delayed event
commit 68f458eec7069d618a6c884ca007426e0cea411b upstream.

Instead of scheduling the work to handle the initial delayed event, use 1s
delay.

This delay should not be needed, but Optimus/nouveau will fail in a
mysterious way if the delayed event is handled as soon as possible like it
is done in drm_helper_probe_single_connector_modes() in case the poll
was enabled before.

Reverting 339fd36238dd would give back the 10 sec (!) delay to handle the
delayed event. Adding 1sec delay to the poll_work is enough to work around
the issue in Optimus setups and gives shorter response on handling the
initial delayed event.

Fixes: 339fd36238dd ("drm: drm_probe_helper: Fix output_poll_work scheduling")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
[danvet: Add FIXME to the comment to make it stick out more.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109143158.21917-1-peter.ujfalusi@ti.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:02 +01:00
Dave Martin
e4be4d4942 tile/ptrace: Preserve previous registers for short regset write
commit fd7c99142d77dc4a851879a66715abf12a3193fb upstream.

Ensure that if userspace supplies insufficient data to
PTRACE_SETREGSET to fill all the registers, the thread's old
registers are preserved.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:02 +01:00
Kees Cook
544160b6ea fbdev: color map copying bounds checking
commit 2dc705a9930b4806250fbf5a76e55266e59389f2 upstream.

Copying color maps to userspace doesn't check the value of to->start,
which will cause kernel heap buffer OOB read due to signedness wraps.

CVE-2016-8405

Link: http://lkml.kernel.org/r/20170105224249.GA50925@beast
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Peter Pi (@heisecode) of Trend Micro
Cc: Min Chong <mchong@google.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:33:02 +01:00
Greg Kroah-Hartman
09f886dc5a Linux 4.9.6 2017-01-26 08:25:24 +01:00
Ilya Dryomov
f77ef5348d libceph: stop allocating a new cipher on every crypto request
commit 7af3ea189a9a13f090de51c97f676215dabc1205 upstream.

This is useless and more importantly not allowed on the writeback path,
because crypto_alloc_skcipher() allocates memory with GFP_KERNEL, which
can recurse back into the filesystem:

    kworker/9:3     D ffff92303f318180     0 20732      2 0x00000080
    Workqueue: ceph-msgr ceph_con_workfn [libceph]
     ffff923035dd4480 ffff923038f8a0c0 0000000000000001 000000009eb27318
     ffff92269eb28000 ffff92269eb27338 ffff923036b145ac ffff923035dd4480
     00000000ffffffff ffff923036b145b0 ffffffff951eb4e1 ffff923036b145a8
    Call Trace:
     [<ffffffff951eb4e1>] ? schedule+0x31/0x80
     [<ffffffff951eb77a>] ? schedule_preempt_disabled+0xa/0x10
     [<ffffffff951ed1f4>] ? __mutex_lock_slowpath+0xb4/0x130
     [<ffffffff951ed28b>] ? mutex_lock+0x1b/0x30
     [<ffffffffc0a974b3>] ? xfs_reclaim_inodes_ag+0x233/0x2d0 [xfs]
     [<ffffffff94d92ba5>] ? move_active_pages_to_lru+0x125/0x270
     [<ffffffff94f2b985>] ? radix_tree_gang_lookup_tag+0xc5/0x1c0
     [<ffffffff94dad0f3>] ? __list_lru_walk_one.isra.3+0x33/0x120
     [<ffffffffc0a98331>] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
     [<ffffffff94e05bfe>] ? super_cache_scan+0x17e/0x190
     [<ffffffff94d919f3>] ? shrink_slab.part.38+0x1e3/0x3d0
     [<ffffffff94d9616a>] ? shrink_node+0x10a/0x320
     [<ffffffff94d96474>] ? do_try_to_free_pages+0xf4/0x350
     [<ffffffff94d967ba>] ? try_to_free_pages+0xea/0x1b0
     [<ffffffff94d863bd>] ? __alloc_pages_nodemask+0x61d/0xe60
     [<ffffffff94ddf42d>] ? cache_grow_begin+0x9d/0x560
     [<ffffffff94ddfb88>] ? fallback_alloc+0x148/0x1c0
     [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130
     [<ffffffff94de09db>] ? __kmalloc+0x1eb/0x580
     [<ffffffffc09fe2db>] ? crush_choose_firstn+0x3eb/0x470 [libceph]
     [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130
     [<ffffffff94ed9c19>] ? crypto_spawn_tfm+0x39/0x60
     [<ffffffffc08b30a3>] ? crypto_cbc_init_tfm+0x23/0x40 [cbc]
     [<ffffffff94ed857c>] ? __crypto_alloc_tfm+0xcc/0x130
     [<ffffffff94edcc23>] ? crypto_skcipher_init_tfm+0x113/0x180
     [<ffffffff94ed7cc3>] ? crypto_create_tfm+0x43/0xb0
     [<ffffffff94ed83b0>] ? crypto_larval_lookup+0x150/0x150
     [<ffffffff94ed7da2>] ? crypto_alloc_tfm+0x72/0x120
     [<ffffffffc0a01dd7>] ? ceph_aes_encrypt2+0x67/0x400 [libceph]
     [<ffffffffc09fd264>] ? ceph_pg_to_up_acting_osds+0x84/0x5b0 [libceph]
     [<ffffffff950d40a0>] ? release_sock+0x40/0x90
     [<ffffffff95139f94>] ? tcp_recvmsg+0x4b4/0xae0
     [<ffffffffc0a02714>] ? ceph_encrypt2+0x54/0xc0 [libceph]
     [<ffffffffc0a02b4d>] ? ceph_x_encrypt+0x5d/0x90 [libceph]
     [<ffffffffc0a02bdf>] ? calcu_signature+0x5f/0x90 [libceph]
     [<ffffffffc0a02ef5>] ? ceph_x_sign_message+0x35/0x50 [libceph]
     [<ffffffffc09e948c>] ? prepare_write_message_footer+0x5c/0xa0 [libceph]
     [<ffffffffc09ecd18>] ? ceph_con_workfn+0x2258/0x2dd0 [libceph]
     [<ffffffffc09e9903>] ? queue_con_delay+0x33/0xd0 [libceph]
     [<ffffffffc09f68ed>] ? __submit_request+0x20d/0x2f0 [libceph]
     [<ffffffffc09f6ef8>] ? ceph_osdc_start_request+0x28/0x30 [libceph]
     [<ffffffffc0b52603>] ? rbd_queue_workfn+0x2f3/0x350 [rbd]
     [<ffffffff94c94ec0>] ? process_one_work+0x160/0x410
     [<ffffffff94c951bd>] ? worker_thread+0x4d/0x480
     [<ffffffff94c95170>] ? process_one_work+0x410/0x410
     [<ffffffff94c9af8d>] ? kthread+0xcd/0xf0
     [<ffffffff951efb2f>] ? ret_from_fork+0x1f/0x40
     [<ffffffff94c9aec0>] ? kthread_create_on_node+0x190/0x190

Allocating the cipher along with the key fixes the issue - as long the
key doesn't change, a single cipher context can be used concurrently in
multiple requests.

We still can't take that GFP_KERNEL allocation though.  Both
ceph_crypto_key_clone() and ceph_crypto_key_decode() are called from
GFP_NOFS context, so resort to memalloc_noio_{save,restore}() here.

Reported-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26 08:24:46 +01:00
Ilya Dryomov
5b482bf588 libceph: uninline ceph_crypto_key_destroy()
commit 6db2304aabb070261ad34923bfd83c43dfb000e3 upstream.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26 08:24:46 +01:00
Halil Pasic
12274f2c17 tools/virtio/ringtest: fix run-on-all.sh for offline cpus
commit 21f5eda9b8671744539c8295b9df62991fffb2ce upstream.

Since ef1b144d ("tools/virtio/ringtest: fix run-on-all.sh to work
without /dev/cpu") run-on-all.sh uses seq 0 $HOST_AFFINITY as the list
of ids of the CPUs to run the command on (assuming ids of online CPUs
are consecutive and start from 0), where $HOST_AFFINITY is the highest
CPU id in the system previously determined using lscpu.  This can fail
on systems with offline CPUs.

Instead let's use lscpu to determine the list of online CPUs.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Fixes: ef1b144d ("tools/virtio/ringtest: fix run-on-all.sh to work without
/dev/cpu")
Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26 08:24:46 +01:00
Madhavan Srinivasan
fa555d021d selftest/powerpc: Wrong PMC initialized in pmc56_overflow test
commit df21d2fa733035e4d414379960f94b2516b41296 upstream.

Test uses PMC2 to count the event. But PMC1 is being initialized.
Patch to fix it.

Fixes: 3752e453f6ba ('selftests/powerpc: Add tests of PMU EBBs')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26 08:24:45 +01:00
Wei Yongjun
f37b7a3004 soc: ti: wkup_m3_ipc: Fix error return code in wkup_m3_ipc_probe()
commit 36b29eb30ee0f6c99f06bea406c23a3fd4cbb80b upstream.

Fix to return a negative error code from the kthread_run() error
handling case instead of 0, as done elsewhere in this function.

Fixes: cdd5de500b2c ("soc: ti: Add wkup_m3_ipc driver")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26 08:24:45 +01:00
Andy Shevchenko
97d5e20575 spi: pxa2xx: add missed break
commit a2dd8af00ca7fff4972425a4a6b19dd1840dc807 upstream.

The commit 7c7289a40425 ("spi: pxa2xx: Default thresholds to PXA
configuration") while splitting up CE4100 code obviously missed a break
condition in one chunk. Add it here.

Looks like we have no active user of CE4100, though better to fix this later
than never.

Fixes: commit 7c7289a40425 ("spi: pxa2xx: Default thresholds to PXA configuration")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26 08:24:45 +01:00