linux

iv/linux

Author	SHA1	Message	Date
Feras Daoud	1626076b8e	IB/ipoib: Fix deadlock between rmmod and set_mode commit 0a0007f28304cb9fc87809c86abb80ec71317f20 upstream. When calling set_mode from sys/fs, the call flow locks the sys/fs lock first and then tries to lock rtnl_lock (when calling ipoib_set_mod). On the other hand, the rmmod call flow takes the rtnl_lock first (when calling unregister_netdev) and then tries to take the sys/fs lock. Deadlock a->b, b->a. The problem starts when ipoib_set_mod frees it's rtnl_lck and tries to get it after that. set_mod: [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90 [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20 [<ffffffff814fed2b>] mutex_lock+0x2b/0x50 [<ffffffff81448675>] rtnl_lock+0x15/0x20 [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib] [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib] [<ffffffff8134b840>] dev_attr_store+0x20/0x30 [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170 [<ffffffff8117b068>] vfs_write+0xb8/0x1a0 [<ffffffff8117ba81>] sys_write+0x51/0x90 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b rmmod: [<ffffffff81279ffc>] ? put_dec+0x10c/0x110 [<ffffffff8127a2ee>] ? number+0x2ee/0x320 [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0 [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0 [<ffffffff8127b550>] ? string+0x40/0x100 [<ffffffff814fe323>] wait_for_common+0x123/0x180 [<ffffffff81060250>] ? default_wake_function+0x0/0x20 [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0 [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20 [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270 [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0 [<ffffffff81273f66>] kobject_del+0x16/0x40 [<ffffffff8134cd14>] device_del+0x184/0x1e0 [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0 [<ffffffff8143c05e>] rollback_registered+0xae/0x130 [<ffffffff8143c102>] unregister_netdevice+0x22/0x70 [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30 [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib] [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core] [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib] [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core] Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:43 +08:00
Eric W. Biederman	808e83e5ad	mnt: Tuck mounts under others instead of creating shadow/side mounts. commit 1064f874abc0d05eeed8993815f584d847b72486 upstream. Ever since mount propagation was introduced in cases where a mount in propagated to parent mount mountpoint pair that is already in use the code has placed the new mount behind the old mount in the mount hash table. This implementation detail is problematic as it allows creating arbitrary length mount hash chains. Furthermore it invalidates the constraint maintained elsewhere in the mount code that a parent mount and a mountpoint pair will have exactly one mount upon them. Making it hard to deal with and to talk about this special case in the mount code. Modify mount propagation to notice when there is already a mount at the parent mount and mountpoint where a new mount is propagating to and place that preexisting mount on top of the new mount. Modify unmount propagation to notice when a mount that is being unmounted has another mount on top of it (and no other children), and to replace the unmounted mount with the mount on top of it. Move the MNT_UMUONT test from __lookup_mnt_last into __propagate_umount as that is the only call of __lookup_mnt_last where MNT_UMOUNT may be set on any mount visible in the mount hash table. These modifications allow: - __lookup_mnt_last to be removed. - attach_shadows to be renamed __attach_mnt and its shadow handling to be removed. - commit_tree to be simplified - copy_tree to be simplified The result is an easier to understand tree of mounts that does not allow creation of arbitrary length hash chains in the mount hash table. The result is also a very slight userspace visible difference in semantics. The following two cases now behave identically, where before order mattered: case 1: (explicit user action) B is a slave of A mount something on A/a , it will propagate to B/a and than mount something on B/a case 2: (tucked mount) B is a slave of A mount something on B/a and than mount something on A/a Histroically umount A/a would fail in case 1 and succeed in case 2. Now umount A/a succeeds in both configurations. This very small change in semantics appears if anything to be a bug fix to me and my survey of userspace leads me to believe that no programs will notice or care of this subtle semantic change. v2: Updated to mnt_change_mountpoint to not call dput or mntput and instead to decrement the counts directly. It is guaranteed that there will be other references when mnt_change_mountpoint is called so this is safe. v3: Moved put_mountpoint under mount_lock in attach_recursive_mnt As the locking in fs/namespace.c changed between v2 and v3. v4: Reworked the logic in propagate_mount_busy and __propagate_umount that detects when a mount completely covers another mount. v5: Removed unnecessary tests whose result is alwasy true in find_topper and attach_recursive_mnt. v6: Document the user space visible semantic difference. Fixes: b90fa9ae8f51 ("[PATCH] shared mount handling: bind and rbind") Tested-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:43 +08:00
Gavin Li	c9b3f3173f	brcmfmac: fix incorrect event channel deduction commit 8e290cecdd0178f3d4cf7d463c51dc7e462843b4 upstream. brcmf_sdio_fromevntchan() was being called on the the data frame rather than the software header, causing some frames to be mischaracterized as on the event channel rather than the data channel. This fixes a major performance regression (due to dropped packets). With this patch the download speed jumped from 1Mbit/s back up to 40MBit/s due to the sheer amount of packets being incorrectly processed. Fixes: c56caa9db8ab ("brcmfmac: screening firmware event packet") Signed-off-by: Gavin Li <git@thegavinli.com> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> [kvalo@codeaurora.org: improve commit logs based on email discussion] Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:43 +08:00
Andrew Donnellan	53d43706f2	cxl: fix nested locking hang during EEH hotplug commit 171ed0fcd8966d82c45376f1434678e7b9d4d9b1 upstream. Commit 14a3ae34bfd0 ("cxl: Prevent read/write to AFU config space while AFU not configured") introduced a rwsem to fix an invalid memory access that occurred when someone attempts to access the config space of an AFU on a vPHB whilst the AFU is deconfigured, such as during EEH recovery. It turns out that it's possible to run into a nested locking issue when EEH recovery fails and a full device hotplug is required. cxl_pci_error_detected() deconfigures the AFU, taking a writer lock on configured_rwsem. When EEH recovery fails, the EEH code calls pci_hp_remove_devices() to remove the device, which in turn calls cxl_remove() -> cxl_pci_remove_afu() -> pci_deconfigure_afu(), which tries to grab the writer lock that's already held. Standard rwsem semantics don't express what we really want to do here and don't allow for nested locking. Fix this by replacing the rwsem with an atomic_t which we can control more finely. Allow the AFU to be locked multiple times so long as there are no readers. Fixes: 14a3ae34bfd0 ("cxl: Prevent read/write to AFU config space while AFU not configured") Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:42 +08:00
Andrew Donnellan	411d0b0ced	cxl: Prevent read/write to AFU config space while AFU not configured commit 14a3ae34bfd0bcb1cc12d55b06a8584c11fac6fc upstream. During EEH recovery, we deconfigure all AFUs whilst leaving the corresponding vPHB and virtual PCI device in place. If something attempts to interact with the AFU's PCI config space (e.g. running lspci) after the AFU has been deconfigured and before it's reconfigured, cxl_pcie_{read,write}_config() will read invalid values from the deconfigured struct cxl_afu and proceed to Oops when they try to dereference pointers that have been set to NULL during deconfiguration. Add a rwsem to struct cxl_afu so we can prevent interaction with config space while the AFU is deconfigured. Reported-by: Pradipta Ghosh <pradghos@in.ibm.com> Suggested-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:42 +08:00
Thomas Petazzoni	60037aa689	net: mvpp2: fix DMA address calculation in mvpp2_txq_inc_put() commit 239a3b663647869330955ec59caac0100ef9b60a upstream. When TX descriptors are filled in, the buffer DMA address is split between the tx_desc->buf_phys_addr field (high-order bits) and tx_desc->packet_offset field (5 low-order bits). However, when we re-calculate the DMA address from the TX descriptor in mvpp2_txq_inc_put(), we do not take tx_desc->packet_offset into account. This means that when the DMA address is not aligned on a 32 bytes boundary, we end up calling dma_unmap_single() with a DMA address that was not the one returned by dma_map_single(). This inconsistency is detected by the kernel when DMA_API_DEBUG is enabled. We fix this problem by properly calculating the DMA address in mvpp2_txq_inc_put(). Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:42 +08:00
Heiko Carstens	e067f68db2	s390: use correct input data address for setup_randomness commit 4920e3cf77347d7d7373552d4839e8d832321313 upstream. The current implementation of setup_randomness uses the stack address and therefore the pointer to the SYSIB 3.2.2 block as input data address. Furthermore the length of the input data is the number of virtual-machine description blocks which is typically one. This means that typically a single zero byte is fed to add_device_randomness. Fix both of these and use the address of the first virtual machine description block as input data address and also use the correct length. Fixes: bcfcbb6bae64 ("s390: add system information as device randomness") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:42 +08:00
Heiko Carstens	321081d522	s390: make setup_randomness work commit da8fd820f389a0e29080b14c61bf5cf1d8ef5ca1 upstream. Commit bcfcbb6bae64 ("s390: add system information as device randomness") intended to add some virtual machine specific information to the randomness pool. Unfortunately it uses the page allocator before it is ready to use. In result the page allocator always returns NULL and the setup_randomness function never adds anything to the randomness pool. To fix this use memblock_alloc and memblock_free instead. Fixes: bcfcbb6bae64 ("s390: add system information as device randomness") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:42 +08:00
Martin Schwidefsky	9d38fd6a4f	s390: TASK_SIZE for kernel threads commit fb94a687d96c570d46332a4a890f1dcb7310e643 upstream. Return a sensible value if TASK_SIZE if called from a kernel thread. This gets us around an issue with copy_mount_options that does a magic size calculation "TASK_SIZE - (unsigned long)data" while in a kernel thread and data pointing to kernel space. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:41 +08:00
Peter Oberparleiter	dc31841fcd	s390/chsc: Add exception handler for CHSC instruction commit 77759137248f34864a8f7a58bbcebfcf1047504a upstream. Prevent kernel crashes due to unhandled exceptions raised by the CHSC instruction which may for example be triggered by invalid ioctl data. Fixes: 64150adf89df ("s390/cio: Introduce generic synchronous CHSC IOCTL") Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:41 +08:00
Michael Holzheu	91cfcaa6ed	s390/kdump: Use "LINUX" ELF note name instead of "CORE" commit a4a81d8eebdc1d209d034f62a082a5131e4242b5 upstream. In binutils/libbfd (bfd/elf.c) it is enforced that all s390 specific ELF notes like e.g. NT_S390_PREFIX or NT_S390_CTRS have "LINUX" specified as note name. Otherwise the notes are ignored. For /proc/vmcore we currently use "CORE" for these notes. Up to now this has not been a real problem because the dump analysis tool "crash" does not check the note name. But it will break all programs that use libbfd for processing ELF notes. So fix this and use "LINUX" for all s390 specific notes to comply with libbfd. Reported-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Reviewed-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:41 +08:00
Gerald Schaefer	b848102542	s390/dcssblk: fix device size calculation in dcssblk_direct_access() commit a63f53e34db8b49675448d03ae324f6c5bc04fe6 upstream. Since commit dd22f551 "block: Change direct_access calling convention", the device size calculation in dcssblk_direct_access() is off-by-one. This results in bdev_direct_access() always returning -ENXIO because the returned value is not page aligned. Fix this by adding 1 to the dev_sz calculation. Fixes: dd22f551 ("block: Change direct_access calling convention") Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:41 +08:00
Julian Wiedmann	5cec5e32ba	s390/qdio: clear DSCI prior to scanning multiple input queues commit 1e4a382fdc0ba8d1a85b758c0811de3a3631085e upstream. For devices with multiple input queues, tiqdio_call_inq_handlers() iterates over all input queues and clears the device's DSCI during each iteration. If the DSCI is re-armed during one of the later iterations, we therefore do not scan the previous queues again. The re-arming also raises a new adapter interrupt. But its handler does not trigger a rescan for the device, as the DSCI has already been erroneously cleared. This can result in queue stalls on devices with multiple input queues. Fix it by clearing the DSCI just once, prior to scanning the queues. As the code is moved in front of the loop, we also need to access the DSCI directly (ie irq->dsci) instead of going via each queue's parent pointer to the same irq. This is not a functional change, and a follow-up patch will clean up the other users. In practice, this bug only affects CQ-enabled HiperSockets devices, ie. devices with sysfs-attribute "hsuid" set. Setting a hsuid is needed for AF_IUCV socket applications that use HiperSockets communication. Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks") Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:41 +08:00
Dmitry Tunin	519b6cead2	Bluetooth: Add another AR3012 04ca:3018 device commit 441ad62d6c3f131f1dbd7dcdd9cbe3f74dbd8501 upstream. T: Bus=01 Lev=01 Prnt=01 Port=07 Cnt=04 Dev#= 5 Spd=12 MxCh= 0 D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=04ca ProdID=3018 Rev=00.01 C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb Signed-off-by: Dmitry Tunin <hanipouspilot@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:40 +08:00
Chao Peng	7c3bab189c	KVM: VMX: use correct vmcs_read/write for guest segment selector/base commit 96794e4ed4d758272c486e1529e431efb7045265 upstream. Guest segment selector is 16 bit field and guest segment base is natural width field. Fix two incorrect invocations accordingly. Without this patch, build fails when aggressive inlining is used with ICC. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:40 +08:00
Janosch Frank	035dcc8e87	KVM: s390: Disable dirty log retrieval for UCONTROL guests commit e1e8a9624f7ba8ead4f056ff558ed070e86fa747 upstream. User controlled KVM guests do not support the dirty log, as they have no single gmap that we can check for changes. As they have no single gmap, kvm->arch.gmap is NULL and all further referencing to it for dirty checking will result in a NULL dereference. Let's return -EINVAL if a caller tries to sync dirty logs for a UCONTROL guest. Fixes: 15f36eb ("KVM: s390: Add proper dirty bitmap support to S390 kvm.") Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:40 +08:00
Ian Abbott	c4c590be49	serial: 8250_pci: Add MKS Tenta SCOM-0800 and SCOM-0801 cards commit 1c9c858e2ff8ae8024a3d75d2ed080063af43754 upstream. The MKS Instruments SCOM-0800 and SCOM-0801 cards (originally by Tenta Technologies) are 3U CompactPCI serial cards with 4 and 8 serial ports, respectively. The first 4 ports are implemented by an OX16PCI954 chip, and the second 4 ports are implemented by an OX16C954 chip on a local bus, bridged by the second PCI function of the OX16PCI954. The ports are jumper-selectable as RS-232 and RS-422/485, and the UARTs use a non-standard oscillator frequency of 20 MHz (base_baud = 1250000). Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:40 +08:00
Alexander Popov	e5b9778761	tty: n_hdlc: get rid of racy n_hdlc.tbuf commit 82f2341c94d270421f383641b7cd670e474db56b upstream. Currently N_HDLC line discipline uses a self-made singly linked list for data buffers and has n_hdlc.tbuf pointer for buffer retransmitting after an error. The commit be10eb7589337e5defbe214dae038a53dd21add8 ("tty: n_hdlc add buffer flushing") introduced racy access to n_hdlc.tbuf. After tx error concurrent flush_tx_queue() and n_hdlc_send_frames() can put one data buffer to tx_free_buf_list twice. That causes double free in n_hdlc_release(). Let's use standard kernel linked list and get rid of n_hdlc.tbuf: in case of tx error put current data buffer after the head of tx_buf_list. Signed-off-by: Alexander Popov <alex.popov@linux.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-15 10:02:40 +08:00
Greg Kroah-Hartman	d379ab2707	Linux 4.9.14	2017-03-12 06:42:15 +01:00
Florian Westphal	371d0342a3	netfilter: conntrack: refine gc worker heuristics, redux commit e5072053b09642b8ff417d47da05b84720aea3ee upstream. This further refines the changes made to conntrack gc_worker in commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics"). The main idea of that change was to reduce the scan interval when evictions take place. However, on the reporters' setup, there are 1-2 million conntrack entries in total and roughly 8k new (and closing) connections per second. In this case we'll always evict at least one entry per gc cycle and scan interval is always at 1 jiffy because of this test: } else if (expired_count) { gc_work->next_gc_run /= 2U; next_run = msecs_to_jiffies(1); being true almost all the time. Given we scan ~10k entries per run its clearly wrong to reduce interval based on nonzero eviction count, it will only waste cpu cycles since a vast majorities of conntracks are not timed out. Thus only look at the ratio (scanned entries vs. evicted entries) to make a decision on whether to reduce or not. Because evictor is supposed to only kick in when system turns idle after a busy period, pick a high ratio -- this makes it 50%. We thus keep the idea of increasing scan rate when its likely that table contains many expired entries. In order to not let timed-out entries hang around for too long (important when using event logging, in which case we want to timely destroy events), we now scan the full table within at most GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all timed-out entries sit in same slot. I tested this with a vm under synflood (with sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3). While flood is ongoing, interval now stays at its max rate (GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms). With feedback from Nicolas Dichtel. Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries") Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:53 +01:00
Florian Westphal	5f7ff59d06	netfilter: conntrack: remove GC_MAX_EVICTS break commit 524b698db06b9b6da7192e749f637904e2f62d7b upstream. Instead of breaking loop and instant resched, don't bother checking this in first place (the loop calls cond_resched for every bucket anyway). Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:53 +01:00
Yan, Zheng	dc8470f3c8	ceph: update readpages osd request according to size of pages commit d641df819db8b80198fd85d9de91137e8a823b07 upstream. add_to_page_cache_lru() can fails, so the actual pages to read can be smaller than the initial size of osd request. We need to update osd request size in that case. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:53 +01:00
James Smart	27ab5414b9	scsi: lpfc: Correct WQ creation for pagesize commit 8ea73db486cda442f0671f4bc9c03a76be398a28 upstream. Correct WQ creation for pagesize The driver was calculating the adapter command pagesize indicator from the system pagesize. However, the buffers the driver allocates are only one size (SLI4_PAGE_SIZE), so no calculation was necessary. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:53 +01:00
Ralf Baechle	aae02d1aaf	MIPS: IP22: Fix build error due to binutils 2.25 uselessnes. commit ae2f5e5ed04a17c1aa1f0a3714c725e12c21d2a9 upstream. Fix the following build error with binutils 2.25. CC arch/mips/mm/sc-ip22.o {standard input}: Assembler messages: {standard input}:132: Error: number (0x9000000080000000) larger than 32 bits {standard input}:159: Error: number (0x9000000080000000) larger than 32 bits {standard input}:200: Error: number (0x9000000080000000) larger than 32 bits scripts/Makefile.build:293: recipe for target 'arch/mips/mm/sc-ip22.o' failed make[1]: *** [arch/mips/mm/sc-ip22.o] Error 1 MIPS has used .set mips3 to temporarily switch the assembler to 64 bit mode in 64 bit kernels virtually forever. Binutils 2.25 broke this behavious partially by happily accepting 64 bit instructions in .set mips3 mode but puking on 64 bit constants when generating 32 bit ELF. Binutils 2.26 restored the old behaviour again. Fix build with binutils 2.25 by open coding the offending dli $1, 0x9000000080000000 as li $1, 0x9000 dsll $1, $1, 48 which is ugly be the only thing that will build on all binutils vintages. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:53 +01:00
Ralf Baechle	8a2307c7c0	MIPS: IP22: Reformat inline assembler code to modern standards. commit f9f1c8db1c37253805eaa32265e1e1af3ae7d0a4 upstream. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:53 +01:00
Aneesh Kumar K.V	075be78c83	powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU commit fda2d27db6eae5c2468f9e4657539b72bbc238bb upstream. We will set LPCR with correct value for radix during int. This make sure we start with a sanitized value of LPCR. In case of kexec, cpus can have LPCR value based on the previous translation mode we were running. Fixes: fe036a0605d60 ("powerpc/64/kexec: Fix MMU cleanup on radix") Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:53 +01:00
Aneesh Kumar K.V	3552f91715	powerpc/mm: Add MMU_FTR_KERNEL_RO to possible feature mask commit a5ecdad4847897007399d7a14c9109b65ce4c9b7 upstream. Without this we will always find the feature disabled. Fixes: 984d7a1ec6 ("powerpc/mm: Fixup kernel read only mapping") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:53 +01:00
Ravi Bangoria	fccb22e7d5	powerpc/xmon: Fix data-breakpoint commit c21a493a2b44650707d06741601894329486f2ad upstream. Currently xmon data-breakpoint feature is broken. Whenever there is a watchpoint match occurs, hw_breakpoint_handler will be called by do_break via notifier chains mechanism. If watchpoint is registered by xmon, hw_breakpoint_handler won't find any associated perf_event and returns immediately with NOTIFY_STOP. Similarly, do_break also returns without notifying to xmon. Solve this by returning NOTIFY_DONE when hw_breakpoint_handler does not find any perf_event associated with matched watchpoint, rather than NOTIFY_STOP, which tells the core code to continue calling the other breakpoint handlers including the xmon one. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:53 +01:00
Chuck Lever	86840a6305	xprtrdma: Reduce required number of send SGEs commit 16f906d66cd76fb9895cbc628f447532a7ac1faa upstream. The MAX_SEND_SGES check introduced in commit 655fec6987be ("xprtrdma: Use gathered Send for large inline messages") fails for devices that have a small max_sge. Instead of checking for a large fixed maximum number of SGEs, check for a minimum small number. RPC-over-RDMA will switch to using a Read chunk if an xdr_buf has more pages than can fit in the device's max_sge limit. This is considerably better than failing all together to mount the server. This fix supports devices that have as few as three send SGEs available. Reported-by: Selvin Xavier <selvin.xavier@broadcom.com> Reported-by: Devesh Sharma <devesh.sharma@broadcom.com> Reported-by: Honggang Li <honli@redhat.com> Reported-by: Ram Amrani <Ram.Amrani@cavium.com> Fixes: 655fec6987be ("xprtrdma: Use gathered Send for large ...") Tested-by: Honggang Li <honli@redhat.com> Tested-by: Ram Amrani <Ram.Amrani@cavium.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:52 +01:00
Chuck Lever	73eea1c400	xprtrdma: Disable pad optimization by default commit c95a3c6b88658bcb8f77f85f31a0b9d9036e8016 upstream. Commit d5440e27d3e5 ("xprtrdma: Enable pad optimization") made the Linux client omit XDR round-up padding in normal Read and Write chunks so that the client doesn't have to register and invalidate 3-byte memory regions that contain no real data. Unfortunately, my cheery 2014 assessment that this optimization "is supported now by both Linux and Solaris servers" was premature. We've found bugs in Solaris in this area since commit d5440e27d3e5 ("xprtrdma: Enable pad optimization") was merged (SYMLINK is the main offender). So for maximum interoperability, I'm disabling this optimization again. If a CM private message is exchanged when connecting, the client recognizes that the server is Linux, and enables the optimization for that connection. Until now the Solaris server bugs did not impact common operations, and were thus largely benign. Soon, less capable devices on Linux NFS/RDMA clients will make use of Read chunks more often, and these Solaris bugs will prevent interoperation in more cases. Fixes: 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:52 +01:00
Chuck Lever	fab6c2caa4	xprtrdma: Per-connection pad optimization commit b5f0afbea4f2ea52c613ac2b06cb6de2ea18cb6d upstream. Pad optimization is changed by echoing into /proc/sys/sunrpc/rdma_pad_optimize. This is a global setting, affecting all RPC-over-RDMA connections to all servers. The marshaling code picks up that value and uses it for decisions about how to construct each RPC-over-RDMA frame. Having it change suddenly in mid-operation can result in unexpected failures. And some servers a client mounts might need chunk round-up, while others don't. So instead, copy the pad_optimize setting into each connection's rpcrdma_ia when the transport is created, and use the copy, which can't change during the life of the connection, instead. This also removes a hack: rpcrdma_convert_iovs was using the remote-invalidation-expected flag to predict when it could leave out Write chunk padding. This is because the Linux server handles implicit XDR padding on Write chunks correctly, and only Linux servers can set the connection's remote-invalidation-expected flag. It's more sensible to use the pad optimization setting instead. Fixes: 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:52 +01:00
Chuck Lever	ec3bc2c5ed	xprtrdma: Fix Read chunk padding commit 24abdf1be15c478e2821d6fc903a4a4440beff02 upstream. When pad optimization is disabled, rpcrdma_convert_iovs still does not add explicit XDR round-up padding to a Read chunk. Commit 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling") incorrectly short-circuited the test for whether round-up padding is needed that appears later in rpcrdma_convert_iovs. However, if this is indeed a regular Read chunk (and not a Position-Zero Read chunk), the tail iovec _always_ contains the chunk's padding, and never anything else. So, it's easy to just skip the tail when padding optimization is enabled, and add the tail in a subsequent Read chunk segment, if disabled. Fixes: 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:52 +01:00
Magnus Lilja	788d81d4e5	dmaengine: ipu: Make sure the interrupt routine checks all interrupts. commit adee40b265d7568296e218f079f478197ffa15bf upstream. Commit 3d8cc00073d6 ("dmaengine: ipu: Consolidate duplicated irq handlers") consolidated the two interrupts routines into one, but the remaining interrupt routine only checks the status of the error interrupts, not the normal interrupts. This patch fixes that problem (tested on i.MX31 PDK board). Fixes: 3d8cc00073d6 ("dmaengine: ipu: Consolidate duplicated irq handlers") Cc: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Magnus Lilja <lilja.magnus@gmail.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:52 +01:00
Mark Marshall	9d82393e65	mtd: nand: ifc: Fix location of eccstat registers for IFC V1.0 commit 656441478ed55d960df5f3ccdf5a0f8c61dfd0b3 upstream. The commit 7a654172161c ("mtd/ifc: Add support for IFC controller version 2.0") added support for version 2.0 of the IFC controller. The version 2.0 controller has the ECC status registers at a different location to the previous versions. Correct the fsl_ifc_nand structure so that the ECC status can be read from the correct location for both version 1.0 and 2.0 of the controller. Fixes: 7a654172161c ("mtd/ifc: Add support for IFC controller version 2.0") Signed-off-by: Mark Marshall <mark.marshall@omicronenergy.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:52 +01:00
Rafał Miłecki	178d07a0b8	bcma: use (get\|put)_device when probing/removing device driver commit a971df0b9d04674e325346c17de9a895425ca5e1 upstream. This allows tracking device state and e.g. makes devm work as expected. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:52 +01:00
colyli@suse.de	6f9c02ab9d	md linear: fix a race between linear_add() and linear_congested() commit 03a9e24ef2aaa5f1f9837356aed79c860521407a upstream. Recently I receive a bug report that on Linux v3.0 based kerenl, hot add disk to a md linear device causes kernel crash at linear_congested(). From the crash image analysis, I find in linear_congested(), mddev->raid_disks contains value N, but conf->disks[] only has N-1 pointers available. Then a NULL pointer deference crashes the kernel. There is a race between linear_add() and linear_congested(), RCU stuffs used in these two functions cannot avoid the race. Since Linuv v4.0 RCU code is replaced by introducing mddev_suspend(). After checking the upstream code, it seems linear_congested() is not called in generic_make_request() code patch, so mddev_suspend() cannot provent it from being called. The possible race still exists. Here I explain how the race still exists in current code. For a machine has many CPUs, on one CPU, linear_add() is called to add a hard disk to a md linear device; at the same time on other CPU, linear_congested() is called to detect whether this md linear device is congested before issuing an I/O request onto it. Now I use a possible code execution time sequence to demo how the possible race happens, seq linear_add() linear_congested() 0 conf=mddev->private 1 oldconf=mddev->private 2 mddev->raid_disks++ 3 for (i=0; i<mddev->raid_disks;i++) 4 bdev_get_queue(conf->disks[i].rdev->bdev) 5 mddev->private=newconf In linear_add() mddev->raid_disks is increased in time seq 2, and on another CPU in linear_congested() the for-loop iterates conf->disks[i] by the increased mddev->raid_disks in time seq 3,4. But conf with one more element (which is a pointer to struct dev_info type) to conf->disks[] is not updated yet, accessing its structure member in time seq 4 will cause a NULL pointer deference fault. To fix this race, there are 2 parts of modification in the patch, 1) Add 'int raid_disks' in struct linear_conf, as a copy of mddev->raid_disks. It is initialized in linear_conf(), always being consistent with pointers number of 'struct dev_info disks[]'. When iterating conf->disks[] in linear_congested(), use conf->raid_disks to replace mddev->raid_disks in the for-loop, then NULL pointer deference will not happen again. 2) RCU stuffs are back again, and use kfree_rcu() in linear_add() to free oldconf memory. Because oldconf may be referenced as mddev->private in linear_congested(), kfree_rcu() makes sure that its memory will not be released until no one uses it any more. Also some code comments are added in this patch, to make this modification to be easier understandable. This patch can be applied for kernels since v4.0 after commit: 3be260cc18f8 ("md/linear: remove rcu protections in favour of suspend/resume"). But this bug is reported on Linux v3.0 based kernel, for people who maintain kernels before Linux v4.0, they need to do some back back port to this patch. Changelog: - V3: add 'int raid_disks' in struct linear_conf, and use kfree_rcu() to replace rcu_call() in linear_add(). - v2: add RCU stuffs by suggestion from Shaohua and Neil. - v1: initial effort. Signed-off-by: Coly Li <colyli@suse.de> Cc: Shaohua Li <shli@fb.com> Cc: Neil Brown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:52 +01:00
Maxime Ripard	de2aa5b3ee	rtc: sun6i: Switch to the external oscillator commit fb61bb82cb46a932ef2fc62e1c731c8e7e6640d5 upstream. The RTC is clocked from either an internal, imprecise, oscillator or an external one, which is usually much more accurate. The difference perceived between the time elapsed and the time reported by the RTC is in a 10% scale, which prevents the RTC from being useful at all. Fortunately, the external oscillator is reported to be mandatory in the Allwinner datasheet, so we can just switch to it. Fixes: 9765d2d94309 ("rtc: sun6i: Add sun6i RTC driver") Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:52 +01:00
Maxime Ripard	6aae7ffa33	rtc: sun6i: Add some locking commit a9422a19ce270a22fc520f2278fb7e80c58be508 upstream. Some registers have a read-modify-write access pattern that are not atomic. Add some locking to prevent from concurrent accesses. Acked-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:52 +01:00
Maxime Ripard	616f5ef613	rtc: sun6i: Disable the build as a module commit 3753941475ae6501dcd1e41832bd0e6c35247d6a upstream. Since we have to provide the clock very early on, the RTC driver cannot be built as a module. Make sure that won't happen. Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:52 +01:00
Jaegeuk Kim	8c53efc399	f2fs: avoid to issue redundant discard commands commit 8b107f5b97772c7c0c218302e9a4d15b4edf50b4 upstream. If segs_per_sec is over 1 like under SMR, previously f2fs issues discard commands redundantly on the same section, since we didn't move end position for the previous discard command. E.g., start end \| \| prefree_bitmap = [01111100111100] And, after issue discard for this section, end start \| \| prefree_bitmap = [01111100111100] Select this section again by searching from (end + 1), start end \| \| prefree_bitmap = [01111100111100] Fixes: 36abef4e796d38 ("f2fs: introduce mode=lfs mount option") Cc: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:51 +01:00
Hou Pengyang	4992ba2840	f2fs: add ovp valid_blocks check for bg gc victim to fg_gc commit e93b9865251a0503d83fd570e7d5a7c8bc351715 upstream. For foreground gc, greedy algorithm should be adapted, which makes this formula work well: (2 * (100 / config.overprovision + 1) + 6) But currently, we fg_gc have a prior to select bg_gc victim segments to gc first, these victims are selected by cost-benefit algorithm, we can't guarantee such segments have the small valid blocks, which may destroy the f2fs rule, on the worstest case, would consume all the free segments. This patch fix this by add a filter in check_bg_victims, if segment's has # of valid blocks over overprovision ratio, skip such segments. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:51 +01:00
Jaegeuk Kim	d00d1b71d9	f2fs: fix multiple f2fs_add_link() calls having same name commit 88c5c13a5027b36d914536fdba23f069d7067204 upstream. It turns out a stakable filesystem like sdcardfs in AOSP can trigger multiple vfs_create() to lower filesystem. In that case, f2fs will add multiple dentries having same name which breaks filesystem consistency. Until upper layer fixes, let's work around by f2fs, which shows actually not much performance regression. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:51 +01:00
Yunlei He	ec160ad2ac	f2fs: fix a problem of using memory after free commit 7855eba4d6102f811b6dd142d6c749f53b591fa3 upstream. This patch fix a problem of using memory after free in function __try_merge_extent_node. Fixes: 0f825ee6e873 ("f2fs: add new interfaces for extent tree") Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:51 +01:00
Weston Andros Adamson	d78f93384d	NFSv4: fix getacl ERANGE for some ACL buffer sizes commit ed92d8c137b7794c2c2aa14479298b9885967607 upstream. We're not taking into account that the space needed for the (variable length) attr bitmap, with the result that we'd sometimes get a spurious ERANGE when the ACL data got close to the end of a page. Just add in an extra page to make sure. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:51 +01:00
J. Bruce Fields	3f22cc6f5c	NFSv4: fix getacl head length estimation commit 6682c14bbe505a8b912c57faf544f866777ee48d upstream. Bitmap and attrlen follow immediately after the op reply header. This was an oversight from commit bf118a342f. Consequences of this are just minor efficiency (extra calls to xdr_shrink_bufhead). Fixes: bf118a342f10 "NFSv4: include bitmap in nfsv4 get acl data" Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:51 +01:00
Trond Myklebust	c65db336d6	pNFS/flexfiles: If the layout is invalid, it must be updated before retrying commit df3ab232e462bce20710596d697ade6b72497694 upstream. If we see that our pNFS READ/WRITE/COMMIT operation failed, but we also see that our layout segment is no longer valid, then we need to get a new layout segment before retrying. Fixes: 90816d1ddacf ("NFSv4.1/flexfiles: Don't mark the entire deviceid...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:51 +01:00
Trond Myklebust	77bbc0c771	NFSv4: Fix reboot recovery in copy offload commit 9d8cacbf5636657d2cd0dda17438a56d806d3224 upstream. Copy offload code needs to be hooked into the code for handling NFS4ERR_BAD_STATEID by ensuring that we set the "stateid" field in struct nfs4_exception. Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: 2e72448b07dc3 ("NFS: Add COPY nfs operation") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:51 +01:00
Trond Myklebust	0465339eb5	NFSv4: Fix memory and state leak in _nfs4_open_and_get_state commit a974deee477af89411e0f80456bfb344ac433c98 upstream. If we exit because the file access check failed, we currently leak the struct nfs4_state. We need to attach it to the open context before returning. Fixes: 3efb9722475e ("NFSv4: Refactor _nfs4_open_and_get_state..") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:51 +01:00
Christoph Hellwig	a3c6cbc4ea	nfsd: special case truncates some more commit 783112f7401ff449d979530209b3f6c2594fdb4e upstream. Both the NFS protocols and the Linux VFS use a setattr operation with a bitmap of attributes to set to set various file attributes including the file size and the uid/gid. The Linux syscalls never mix size updates with unrelated updates like the uid/gid, and some file systems like XFS and GFS2 rely on the fact that truncates don't update random other attributes, and many other file systems handle the case but do not update the other attributes in the same transaction. NFSD on the other hand passes the attributes it gets on the wire more or less directly through to the VFS, leading to updates the file systems don't expect. XFS at least has an assert on the allowed attributes, which caught an unusual NFS client setting the size and group at the same time. To handle this issue properly this splits the notify_change call in nfsd_setattr into two separate ones. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:50 +01:00
Christoph Hellwig	9bdd39c146	nfsd: minor nfsd_setattr cleanup commit 758e99fefe1d9230111296956335cd35995c0eaf upstream. Simplify exit paths, size_change use. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-12 06:41:50 +01:00

1 2 3 4 5 ...

636774 Commits