linux

iv/linux

Author	SHA1	Message	Date
Theodore Ts'o	b891e716b8	ext4: fix interaction between i_size, fallocate, and delalloc after a crash commit 51e3ae81ec58e95f10a98ef3dd6d7bce5d8e35a2 upstream. If there are pending writes subject to delayed allocation, then i_size will show size after the writes have completed, while i_disksize contains the value of i_size on the disk (since the writes have not been persisted to disk). If fallocate(2) is called with the FALLOC_FL_KEEP_SIZE flag, either with or without the FALLOC_FL_ZERO_RANGE flag set, and the new size after the fallocate(2) is between i_size and i_disksize, then after a crash, if a journal commit has resulted in the changes made by the fallocate() call to be persisted after a crash, but the delayed allocation write has not resolved itself, i_size would not be updated, and this would cause the following e2fsck complaint: Inode 12, end of extent exceeds allowed value (logical block 33, physical block 33441, len 7) This can only take place on a sparse file, where the fallocate(2) call is allocating blocks in a range which is before a pending delayed allocation write which is extending i_size. Since this situation is quite rare, and the window in which the crash must take place is typically < 30 seconds, in practice this condition will rarely happen. Nevertheless, it can be triggered in testing, and in particular by xfstests generic/456. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:47 +00:00
Rameshwar Prasad Sahu	496e65f649	ata: fixes kernel crash while tracing ata_eh_link_autopsy event commit f1601113ddc0339a745e702f4fb1ca37d4875e65 upstream. When tracing ata link error event, the kernel crashes when the disk is removed due to NULL pointer access by trace_ata_eh_link_autopsy API. This occurs as the dev is NULL when the disk disappeared. This patch fixes this crash by calling trace_ata_eh_link_autopsy only if "dev" is not NULL. v2 changes: Removed direct passing "link" pointer instead of "dev" in trace API. Signed-off-by: Rameshwar Prasad Sahu <rsahu@apm.com> Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 255c03d15a29 ("libata: Add tracepoints") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:47 +00:00
Miklos Szeredi	2f5be98162	fsnotify: fix pinning group in fsnotify_prepare_user_wait() commit 9a31d7ad997f55768c687974ce36b759065b49e5 upstream. Blind increment of group's user_waits is not enough, we could be far enough in the group's destruction that it isn't taken into account (i.e. grabbing the mark ref afterwards doesn't guarantee that it was the ref coming from the _group_ that was grabbed). Instead we need to check (under lock) that the mark is still attached to the group after having obtained a ref to the mark. If not, skip it. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 9385a84d7e1f ("fsnotify: Pass fsnotify_iter_info into handle_event handler") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:47 +00:00
Miklos Szeredi	9e9569f05e	fsnotify: pin both inode and vfsmount mark commit 0d6ec079d6aaa098b978d6395973bb027c752a03 upstream. We may fail to pin one of the marks in fsnotify_prepare_user_wait() when dropping the srcu read lock, resulting in use after free at the next iteration. Solution is to store both marks in iter_info instead of just the one we'll be sending the event for. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 9385a84d7e1f ("fsnotify: Pass fsnotify_iter_info into handle_event handler") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:47 +00:00
Miklos Szeredi	47b02dcac6	fsnotify: clean up fsnotify_prepare/finish_user_wait() commit 24c20305c7fc8959836211cb8c50aab93ae0e54f upstream. This patch doesn't actually fix any bug, just paves the way for fixing mark and group pinning. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:47 +00:00
Shaohua Li	a6ff2fb417	md/bitmap: revert a patch commit 938b533d479e7428b7fa1b8179283646d2e2c53d upstream. This reverts commit 8031c3ddc70a. That patches doesn't work well if PAGE_SIZE > 4k. We will fix the original problem with a different approach. Fix: 8031c3ddc70a(md/bitmap: copy correct data for bitmap super) Reported-by: Joshua Kinard <kumba@gentoo.org> Suggested-by: Neil Brown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:47 +00:00
Loic Poulain	5912d9ca14	Bluetooth: btqcomsmd: Add support for BD address setup commit 6e518111060c2290427d79c43d4add9600ad852b upstream. This patch implements the hdev setup function since wcnss-bt does not have persistent memory to store an allocated BD address. The device is therefore marked as unconfigured if no BD address has been previously retrieved. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:47 +00:00
Artur Paszkiewicz	7cd7a7aa01	md: don't check MD_SB_CHANGE_CLEAN in md_allow_write commit b90f6ff080c52e2f05364210733df120e3c4e597 upstream. Only MD_SB_CHANGE_PENDING should be used to wait for transition from clean to dirty. Checking also MD_SB_CHANGE_CLEAN is unnecessary and can race with e.g. md_do_sync(). This sporadically causes a hang when changing consistency policy during resync: INFO: task mdadm:6183 blocked for more than 30 seconds. Not tainted 4.14.0-rc3+ #391 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mdadm D12752 6183 6022 0x00000000 Call Trace: __schedule+0x93f/0x990 schedule+0x6b/0x90 md_allow_write+0x100/0x130 [md_mod] ? do_wait_intr_irq+0x90/0x90 resize_stripes+0x3a/0x5b0 [raid456] ? kernfs_fop_write+0xbe/0x180 raid5_change_consistency_policy+0xa6/0x200 [raid456] consistency_policy_store+0x2e/0x70 [md_mod] md_attr_store+0x90/0xc0 [md_mod] sysfs_kf_write+0x42/0x50 kernfs_fop_write+0x119/0x180 __vfs_write+0x28/0x110 ? rcu_sync_lockdep_assert+0x12/0x60 ? __sb_start_write+0x15a/0x1c0 ? vfs_write+0xa3/0x1a0 vfs_write+0xb4/0x1a0 SyS_write+0x49/0xa0 entry_SYSCALL_64_fastpath+0x18/0xad Fixes: 2214c260c72b ("md: don't return -EAGAIN in md_allow_write for external metadata arrays") Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:47 +00:00
NeilBrown	459aad50a5	md: fix deadlock error in recent patch. commit d47c8ad261f787af22a220ffcc2d07afba809223 upstream. A recent patch aimed to cause md_write_start() to fail (rather than block) when the mddev was suspending, so as to avoid deadlocks. Unfortunately the test in wait_event() was wrong, and it didn't change behaviour at all. We wait_event() must wait until the metadata is written OR the array is suspending. Fixes: cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and md_write_start()") Reported-by: Xiao Ni <xni@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:47 +00:00
Thomas Backlund	55e357bc31	iwlwifi: fix firmware names for 9000 and A000 series hw commit c2c48ddfc8b03b9ecb51d2832b586497b37531bc upstream. iwlwifi 9000 and a0000 series hw contains an extra dash in firmware file name as seeen in modinfo output for kernel 4.14: firmware: iwlwifi-9260-th-b0-jf-b0--34.ucode firmware: iwlwifi-9260-th-a0-jf-a0--34.ucode firmware: iwlwifi-9000-pu-a0-jf-b0--34.ucode firmware: iwlwifi-9000-pu-a0-jf-a0--34.ucode firmware: iwlwifi-QuQnj-a0-hr-a0--34.ucode firmware: iwlwifi-QuQnj-a0-jf-b0--34.ucode firmware: iwlwifi-QuQnj-f0-hr-a0--34.ucode firmware: iwlwifi-Qu-a0-jf-b0--34.ucode firmware: iwlwifi-Qu-a0-hr-a0--34.ucode Fix that by dropping the extra adding of '"-"'. Signed-off-by: Thomas Backlund <tmb@mageia.org> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:46 +00:00
Arnd Bergmann	404dcc55b0	rtlwifi: fix uninitialized rtlhal->last_suspend_sec time commit 3f2a162fab15aee243178b5308bb5d1206fc4043 upstream. We set rtlhal->last_suspend_sec to an uninitialized stack variable, but unfortunately gcc never warned about this, I only found it while working on another patch. I opened a gcc bug for this. Presumably the value of rtlhal->last_suspend_sec is not all that important, but it does get used, so we probably want the patch backported to stable kernels. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82839 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:46 +00:00
Larry Finger	9f724960c5	rtlwifi: rtl8192ee: Fix memory leak when loading firmware commit 519ce2f933fa14acf69d5c8cabcc18711943d629 upstream. In routine rtl92ee_set_fw_rsvdpagepkt(), the driver allocates an skb, but never calls rtl_cmd_send_packet(), which will free the buffer. All other rtlwifi drivers perform this operation correctly. This problem has been in the driver since it was included in the kernel. Fortunately, each firmware load only leaks 4 buffers, which likely explains why it has not previously been detected. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:46 +00:00
Andrew Elble	584f0bb568	nfsd: deal with revoked delegations appropriately commit 95da1b3a5aded124dd1bda1e3cdb876184813140 upstream. If a delegation has been revoked by the server, operations using that delegation should error out with NFS4ERR_DELEG_REVOKED in the >4.1 case, and NFS4ERR_BAD_STATEID otherwise. The server needs NFSv4.1 clients to explicitly free revoked delegations. If the server returns NFS4ERR_DELEG_REVOKED, the client will do that; otherwise it may just forget about the delegation and be unable to recover when it later sees SEQ4_STATUS_RECALLABLE_STATE_REVOKED set on a SEQUENCE reply. That can cause the Linux 4.1 client to loop in its stage manager. Signed-off-by: Andrew Elble <aweits@rit.edu> Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:46 +00:00
NeilBrown	5756707370	NFS: revalidate "." etc correctly on "open". commit b688741cb06695312f18b730653d6611e1bad28d upstream. For correct close-to-open semantics, NFS must validate the change attribute of a directory (or file) on open. Since commit ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op"), open() of "." or a path ending ".." is not revalidated reliably (except when that direct is a mount point). Prior to that commit, "." was revalidated using nfs_lookup_revalidate() which checks the LOOKUP_OPEN flag and forces revalidation if the flag is set. Since that commit, nfs_weak_revalidate() is used for NFSv3 (which ignores the flags) and nothing is used for NFSv4. This is fixed by using nfs_lookup_verify_inode() in nfs_weak_revalidate(). This does the revalidation exactly when needed. Also, add a definition of .d_weak_revalidate for NFSv4. The incorrect behavior is easily demonstrated by running "echo " in some non-mountpoint NFS directory while watching network traffic. Without this patch, "echo " sometimes doesn't produce any traffic. With the patch it always does. Fixes: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:46 +00:00
Anna Schumaker	2deb89453f	NFS: Avoid RCU usage in tracepoints commit 3944369db701f075092357b511fd9f5755771585 upstream. There isn't an obvious way to acquire and release the RCU lock during a tracepoint, so we can't use the rpc_peeraddr2str() function here. Instead, rely on the client's cl_hostname, which should have similar enough information without needing an rcu_dereference(). Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:46 +00:00
Chuck Lever	aed1a43399	nfs: Fix ugly referral attributes commit c05cefcc72416a37eba5a2b35f0704ed758a9145 upstream. Before traversing a referral and performing a mount, the mounted-on directory looks strange: dr-xr-xr-x. 2 4294967294 4294967294 0 Dec 31 1969 dir.0 nfs4_get_referral is wiping out any cached attributes with what was returned via GETATTR(fs_locations), but the bit mask for that operation does not request any file attributes. Retrieve owner and timestamp information so that the memcpy in nfs4_get_referral fills in more attributes. Changes since v1: - Don't request attributes that the client unconditionally replaces - Request only MOUNTED_ON_FILEID or FILEID attribute, not both - encode_fs_locations() doesn't use the third bitmask word Fixes: 6b97fd3da1ea ("NFSv4: Follow a referral") Suggested-by: Pradeep Thomas <pradeepthomas@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:46 +00:00
Benjamin Coddington	57f3c05d03	NFS: Revert "NFS: Move the flock open mode check into nfs_flock()" commit fcfa447062b2061e11f68b846d61cbfe60d0d604 upstream. Commit e12937279c8b "NFS: Move the flock open mode check into nfs_flock()" changed NFSv3 behavior for flock() such that the open mode must match the lock type, however that requirement shouldn't be enforced for flock(). Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:46 +00:00
Joshua Watt	afaacc000e	NFS: Fix typo in nomigration mount option commit f02fee227e5f21981152850744a6084ff3fa94ee upstream. The option was incorrectly masking off all other options. Signed-off-by: Joshua Watt <JPEWhacker@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:45 +00:00
Jaegeuk Kim	d628ac8abd	f2fs: expose some sectors to user in inline data or dentry case commit 5b4267d195dd887c4412e34b5a7365baa741b679 upstream. If there's some data written through inline data or dentry, we need to shouw st_blocks. This fixes reporting zero blocks even though there is small written data. Reviewed-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: avoid link file for quotacheck] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:45 +00:00
Josef Bacik	f111762831	btrfs: change how we decide to commit transactions during flushing commit 996478ca9c460886ac147eb0d00e99841b71d31b upstream. Nikolay reported that generic/273 was failing currently with ENOSPC. Turns out this is because we get to the point where the outstanding reservations are greater than the pinned space on the fs. This is a mistake, previously we used the current reservation amount in may_commit_transaction, not the entire outstanding reservation amount. Fix this to find the minimum byte size needed to make progress in flushing, and pass that into may_commit_transaction. From there we can make a smarter decision on whether to commit the transaction or not. This fixes the failure in generic/273. From Nikolai, IOW: when we go to the final stage of deciding whether to do trans commit, instead of passing all the reservations from all tickets we just pass the reservation for the current ticket. Otherwise, in case all reservations exceed pinned space, then we don't commit transaction and fail prematurely. Before we passed num_bytes from flush_space, where num_bytes was the sum of all pending reserverations, but now all we do is take the first ticket and commit the trans if we can satisfy that. Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure") Reported-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> [ added Nikolai's comment ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:45 +00:00
Arnd Bergmann	f2122d66ed	isofs: fix timestamps beyond 2027 commit 34be4dbf87fc3e474a842305394534216d428f5d upstream. isofs uses a 'char' variable to load the number of years since 1900 for an inode timestamp. On architectures that use a signed char type by default, this results in an invalid date for anything beyond 2027. This changes the function argument to a 'u8' array, which is defined the same way on all architectures, and unambiguously lets us use years until 2155. This should be backported to all kernels that might still be in use by that date. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:45 +00:00
Miklos Szeredi	1dd7dd07e8	fanotify: fix fsnotify_prepare_user_wait() failure commit f37650f1c7c71cf5180b43229d13b421d81e7170 upstream. If fsnotify_prepare_user_wait() fails, we leave the event on the notification list. Which will result in a warning in fsnotify_destroy_event() and later use-after-free. Instead of adding a new helper to remove the event from the list in this case, I opted to move the prepare/finish up into fanotify_handle_event(). This will allow these to be moved further out into the generic code later, and perhaps let us move to non-sleeping RCU. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 05f0e38724e8 ("fanotify: Release SRCU lock when waiting for userspace response") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:45 +00:00
Greg Edwards	5c21c3dde4	fs: guard_bio_eod() needs to consider partitions commit 67f2519fe2903c4041c0e94394d14d372fe51399 upstream. guard_bio_eod() needs to look at the partition capacity, not just the capacity of the whole device, when determining if truncation is necessary. [ 60.268688] attempt to access beyond end of device [ 60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506 [ 60.268693] buffer_io_error: 2 callbacks suppressed [ 60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async page read Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Edwards <gedwards@ddn.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:45 +00:00
Coly Li	e9c80881b3	bcache: check ca->alloc_thread initialized before wake up it commit 91af8300d9c1d7c6b6a2fd754109e08d4798b8d8 upstream. In bcache code, sysfs entries are created before all resources get allocated, e.g. allocation thread of a cache set. There is posibility for NULL pointer deference if a resource is accessed but which is not initialized yet. Indeed Jorg Bornschein catches one on cache set allocation thread and gets a kernel oops. The reason for this bug is, when bch_bucket_alloc() is called during cache set registration and attaching, ca->alloc_thread is not properly allocated and initialized yet, call wake_up_process() on ca->alloc_thread triggers NULL pointer deference failure. A simple and fast fix is, before waking up ca->alloc_thread, checking whether it is allocated, and only wake up ca->alloc_thread when it is not NULL. Signed-off-by: Coly Li <colyli@suse.de> Reported-by: Jorg Bornschein <jb@capsec.org> Cc: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:45 +00:00
Eric Biggers	bcae2363e2	libceph: don't WARN() if user tries to add invalid key commit b11270853fa3654f08d4a6a03b23ddb220512d8d upstream. The WARN_ON(!key->len) in set_secret() in net/ceph/crypto.c is hit if a user tries to add a key of type "ceph" with an invalid payload as follows (assuming CONFIG_CEPH_LIB=y): echo -e -n '\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' \ \| keyctl padd ceph desc @s This can be hit by fuzzers. As this is merely bad input and not a kernel bug, replace the WARN_ON() with return -EINVAL. Fixes: 7af3ea189a9a ("libceph: stop allocating a new cipher on every crypto request") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:45 +00:00
Dan Carpenter	bc6e896836	eCryptfs: use after free in ecryptfs_release_messaging() commit db86be3a12d0b6e5c5b51c2ab2a48f06329cb590 upstream. We're freeing the list iterator so we should be using the _safe() version of hlist_for_each_entry(). Fixes: 88b4a07e6610 ("[PATCH] eCryptfs: Public key transport mechanism") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:45 +00:00
Eric Biggers	ddf1264ec5	fscrypt: lock mutex before checking for bounce page pool commit a0b3bc855374c50b5ea85273553485af48caf2f7 upstream. fscrypt_initialize(), which allocates the global bounce page pool when an encrypted file is first accessed, uses "double-checked locking" to try to avoid locking fscrypt_init_mutex. However, it doesn't use any memory barriers, so it's theoretically possible for a thread to observe a bounce page pool which has not been fully initialized. This is a classic bug with "double-checked locking". While "only a theoretical issue" in the latest kernel, in pre-4.8 kernels the pointer that was checked was not even the last to be initialized, so it was easily possible for a crash (NULL pointer dereference) to happen. This was changed only incidentally by the large refactor to use fs/crypto/. Solve both problems in a trivial way that can easily be backported: just always take the mutex. It's theoretically less efficient, but it shouldn't be noticeable in practice as the mutex is only acquired very briefly once per encrypted file. Later I'd like to make this use a helper macro like DO_ONCE(). However, DO_ONCE() runs in atomic context, so we'd need to add a new macro that allows blocking. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:44 +00:00
Andreas Rohner	f94782668b	nilfs2: fix race condition that causes file system corruption commit 31ccb1f7ba3cfe29631587d451cf5bb8ab593550 upstream. There is a race condition between nilfs_dirty_inode() and nilfs_set_file_dirty(). When a file is opened, nilfs_dirty_inode() is called to update the access timestamp in the inode. It calls __nilfs_mark_inode_dirty() in a separate transaction. __nilfs_mark_inode_dirty() caches the ifile buffer_head in the i_bh field of the inode info structure and marks it as dirty. After some data was written to the file in another transaction, the function nilfs_set_file_dirty() is called, which adds the inode to the ns_dirty_files list. Then the segment construction calls nilfs_segctor_collect_dirty_files(), which goes through the ns_dirty_files list and checks the i_bh field. If there is a cached buffer_head in i_bh it is not marked as dirty again. Since nilfs_dirty_inode() and nilfs_set_file_dirty() use separate transactions, it is possible that a segment construction that writes out the ifile occurs in-between the two. If this happens the inode is not on the ns_dirty_files list, but its ifile block is still marked as dirty and written out. In the next segment construction, the data for the file is written out and nilfs_bmap_propagate() updates the b-tree. Eventually the bmap root is written into the i_bh block, which is not dirty, because it was written out in another segment construction. As a result the bmap update can be lost, which leads to file system corruption. Either the virtual block address points to an unallocated DAT block, or the DAT entry will be reused for something different. The error can remain undetected for a long time. A typical error message would be one of the "bad btree" errors or a warning that a DAT entry could not be found. This bug can be reproduced reliably by a simple benchmark that creates and overwrites millions of 4k files. Link: http://lkml.kernel.org/r/1509367935-3086-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Andreas Rohner <andreas.rohner@gmx.net> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:44 +00:00
NeilBrown	7b7f543793	autofs: don't fail mount for transient error commit ecc0c469f27765ed1e2b967be0aa17cee1a60b76 upstream. Currently if the autofs kernel module gets an error when writing to the pipe which links to the daemon, then it marks the whole moutpoint as catatonic, and it will stop working. It is possible that the error is transient. This can happen if the daemon is slow and more than 16 requests queue up. If a subsequent process tries to queue a request, and is then signalled, the write to the pipe will return -ERESTARTSYS and autofs will take that as total failure. So change the code to assess -ERESTARTSYS and -ENOMEM as transient failures which only abort the current request, not the whole mountpoint. It isn't a crash or a data corruption, but having autofs mountpoints suddenly stop working is rather inconvenient. Ian said: : And given the problems with a half dozen (or so) user space applications : consuming large amounts of CPU under heavy mount and umount activity this : could happen more easily than we expect. Link: http://lkml.kernel.org/r/87y3norvgp.fsf@notabene.neil.brown.name Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:44 +00:00
Vitaly Wool	c1a14af38a	mm/z3fold.c: use kref to prevent page free/compact race commit 5d03a6613957785e94af7a4a6212ad4af66aa5c2 upstream. There is a race in the current z3fold implementation between do_compact() called in a work queue context and the page release procedure when page's kref goes to 0. do_compact() may be waiting for page lock, which is released by release_z3fold_page_locked right before putting the page onto the "stale" list, and then the page may be freed as do_compact() modifies its contents. The mechanism currently implemented to handle that (checking the PAGE_STALE flag) is not reliable enough. Instead, we'll use page's kref counter to guarantee that the page is not released if its compaction is scheduled. It then becomes compaction function's responsibility to decrease the counter and quit immediately if the page was actually freed. Link: http://lkml.kernel.org/r/20171117092032.00ea56f42affbed19f4fcc6c@gmail.com Signed-off-by: Vitaly Wool <vitaly.wool@sonymobile.com> Cc: <Oleksiy.Avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:44 +00:00
Stanislaw Gruszka	769bfea594	rt2x00usb: mark device removed when get ENOENT usb error commit bfa62a52cad93686bb8d8171ea5288813248a7c6 upstream. ENOENT usb error mean "specified interface or endpoint does not exist or is not enabled". Mark device not present when we encounter this error similar like we do with ENODEV error. Otherwise we can have infinite loop in rt2x00usb_work_rxdone(), because we remove and put again RX entries to the queue infinitely. We can have similar situation when submit urb will fail all the time with other error, so we need consider to limit number of entries processed by rxdone work. But for now, since the patch fixes reproducible soft lockup issue on single processor systems and taken ENOENT error meaning, let apply this fix. Patch adds additional ENOENT check not only in rx kick routine, but also on other places where we check for ENODEV error. Reported-by: Richard Genoud <richard.genoud@gmail.com> Debugged-by: Richard Genoud <richard.genoud@gmail.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Tested-by: Richard Genoud <richard.genoud@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:44 +00:00
Aleksandar Markovic	085d66519c	MIPS: math-emu: Fix final emulation phase for certain instructions commit 409fcace9963c1e8d2cb0f7ac62e8b34d47ef979 upstream. Fix final phase of <CLASS\|MADDF\|MSUBF\|MAX\|MIN\|MAXA\|MINA>.<D\|S> emulation. Provide proper generation of SIGFPE signal and updating debugfs FP exception stats in cases of any exception flags set in preceding phases of emulation. CLASS.<D\|S> instruction may generate "Unimplemented Operation" FP exception. <MADDF\|MSUBF>.<D\|S> instructions may generate "Inexact", "Unimplemented Operation", "Invalid Operation", "Overflow", and "Underflow" FP exceptions. <MAX\|MIN\|MAXA\|MINA>.<D\|S> instructions can generate "Unimplemented Operation" and "Invalid Operation" FP exceptions. The proper final processing of the cases when any FP exception flag is set is achieved by replacing "break" statement with "goto copcsr" statement. With such solution, this patch brings the final phase of emulation of the above instructions consistent with the one corresponding to the previously implemented emulation of other related FPU instructions (ADD, SUB, etc.). Fixes: 38db37ba069f ("MIPS: math-emu: Add support for the MIPS R6 CLASS FPU instruction") Fixes: e24c3bec3e8e ("MIPS: math-emu: Add support for the MIPS R6 MADDF FPU instruction") Fixes: 83d43305a1df ("MIPS: math-emu: Add support for the MIPS R6 MSUBF FPU instruction") Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction") Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction") Signed-off-by: Aleksandar Markovic <aleksandar.markovic@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Douglas Leung <douglas.leung@mips.com> Cc: Goran Ferenc <goran.ferenc@mips.com> Cc: "Maciej W. Rozycki" <macro@imgtec.com> Cc: Miodrag Dinic <miodrag.dinic@mips.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Petar Jovanovic <petar.jovanovic@mips.com> Cc: Raghu Gandham <raghu.gandham@mips.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/17581/ Signed-off-by: James Hogan <jhogan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:44 +00:00
Mirko Parthey	8d187fa8e9	MIPS: BCM47XX: Fix LED inversion for WRT54GSv1 commit 56a46acf62af5ba44fca2f3f1c7c25a2d5385b19 upstream. The WLAN LED on the Linksys WRT54GSv1 is active low, but the software treats it as active high. Fix the inverted logic. Fixes: 7bb26b169116 ("MIPS: BCM47xx: Fix LEDs on WRT54GS V1.0") Signed-off-by: Mirko Parthey <mirko.parthey@web.de> Looks-ok-by: Rafał Miłecki <zajec5@gmail.com> Cc: Hauke Mehrtens <hauke@hauke-m.de> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16071/ Signed-off-by: James Hogan <jhogan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:44 +00:00
Maciej W. Rozycki	dc3aceed47	MIPS: Fix an n32 core file generation regset support regression commit 547da673173de51f73887377eb275304775064ad upstream. Fix a commit 7aeb753b5353 ("MIPS: Implement task_user_regset_view.") regression, then activated by commit 6a9c001b7ec3 ("MIPS: Switch ELF core dumper to use regsets.)", that caused n32 processes to dump o32 core files by failing to set the EF_MIPS_ABI2 flag in the ELF core file header's `e_flags' member: $ file tls-core tls-core: ELF 32-bit MSB executable, MIPS, N32 MIPS64 rel2 version 1 (SYSV), [...] $ ./tls-core Aborted (core dumped) $ file core core: ELF 32-bit MSB core file MIPS, MIPS-I version 1 (SYSV), SVR4-style $ Previously the flag was set as the result of a: statement placed in arch/mips/kernel/binfmt_elfn32.c, however in the regset case, i.e. when CORE_DUMP_USE_REGSET is set, ELF_CORE_EFLAGS is no longer used by `fill_note_info' in fs/binfmt_elf.c, and instead the `->e_flags' member of the regset view chosen is. We have the views defined in arch/mips/kernel/ptrace.c, however only an o32 and an n64 one, and the latter is used for n32 as well. Consequently an o32 core file is incorrectly dumped from n32 processes (the ELF32 vs ELF64 class is chosen elsewhere, and the 32-bit one is correctly selected for n32). Correct the issue then by defining an n32 regset view and using it as appropriate. Issue discovered in GDB testing. Fixes: 7aeb753b5353 ("MIPS: Implement task_user_regset_view.") Signed-off-by: Maciej W. Rozycki <macro@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Djordje Todorovic <djordje.todorovic@rt-rk.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/17617/ Signed-off-by: James Hogan <jhogan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:44 +00:00
Masahiro Yamada	43bce9f2eb	MIPS: dts: remove bogus bcm96358nb4ser.dtb from dtb-y entry commit 3cad14d56adbf7d621fc5a35db42f3acc0a2d6e8 upstream. arch/mips/boot/dts/brcm/bcm96358nb4ser.dts does not exist, so we cannot build bcm96358nb4ser.dtb . Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Fixes: 695835511f96 ("MIPS: BMIPS: rename bcm96358nb4ser to bcm6358-neufbox4-sercom") Acked-by: James Hogan <jhogan@kernel.org> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:43 +00:00
James Hogan	d63534042a	MIPS: Fix MIPS64 FP save/restore on 32-bit kernels commit 22b8ba765a726d90e9830ff6134c32b04f12c10f upstream. 32-bit kernels can be configured to support MIPS64, in which case neither CONFIG_64BIT or CONFIG_CPU_MIPS32_R* will be set. This causes the CP0_Status.FR checks at the point of floating point register save and restore to be compiled out, which results in odd FP registers not being saved or restored to the task or signal context even when CP0_Status.FR is set. Fix the ifdefs to use CONFIG_CPU_MIPSR2 and CONFIG_CPU_MIPSR6, which are enabled for the relevant revisions of either MIPS32 or MIPS64, along with some other CPUs such as Octeon (r2), Loongson1 (r2), XLP (r2), Loongson 3A R2. The suspect code originates from commit 597ce1723e0f ("MIPS: Support for 64-bit FP with O32 binaries") in v3.14, however the code in __enable_fpu() was consistent and refused to set FR=1, falling back to software FPU emulation. This was suboptimal but should be functionally correct. Commit fcc53b5f6c38 ("MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6 CPU") in v4.2 (and stable tagged back to 4.0) later introduced the bug by updating __enable_fpu() to set FR=1 but failing to update the other similar ifdefs to enable FR=1 state handling. Fixes: fcc53b5f6c38 ("MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6 CPU") Signed-off-by: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16739/ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:43 +00:00
James Hogan	43292e6527	MIPS: Fix odd fp register warnings with MIPS64r2 commit c7fd89a6407ea3a44a2a2fa12d290162c42499c4 upstream. Building 32-bit MIPS64r2 kernels produces warnings like the following on certain toolchains (such as GNU assembler 2.24.90, but not GNU assembler 2.28.51) since commit 22b8ba765a72 ("MIPS: Fix MIPS64 FP save/restore on 32-bit kernels"), due to the exposure of fpu_save_16odd from fpu_save_double and fpu_restore_16odd from fpu_restore_double: arch/mips/kernel/r4k_fpu.S:47: Warning: float register should be even, was 1 ... arch/mips/kernel/r4k_fpu.S:59: Warning: float register should be even, was 1 ... This appears to be because .set mips64r2 does not change the FPU ABI to 64-bit when -march=mips64r2 (or e.g. -march=xlp) is provided on the command line on that toolchain, from the default FPU ABI of 32-bit due to the -mabi=32. This makes access to the odd FPU registers invalid. Fix by explicitly changing the FPU ABI with .set fp=64 directives in fpu_save_16odd and fpu_restore_16odd, and moving the undefine of fp up in asmmacro.h so fp doesn't turn into $30. Fixes: 22b8ba765a72 ("MIPS: Fix MIPS64 FP save/restore on 32-bit kernels") Signed-off-by: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/17656/ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:43 +00:00
Mike Snitzer	e39516d24f	dm: discard support requires all targets in a table support discards commit 8a74d29d541cd86569139c6f3f44b2d210458071 upstream. A DM device with a mix of discard capabilities (due to some underlying devices not having discard support) _should_ just return -EOPNOTSUPP for the region of the device that doesn't support discards (even if only by way of the underlying driver formally not supporting discards). BUT, that does ask the underlying driver to handle something that it never advertised support for. In doing so we're exposing users to the potential for a underlying disk driver hanging if/when a discard is issued a the device that is incapable and never claimed to support discards. Fix this by requiring that each DM target in a DM table provide discard support as a prereq for a DM device to advertise support for discards. This may cause some configurations that were happily supporting discards (even in the face of a mix of discard support) to stop supporting discards -- but the risk of users hitting driver hangs, and forced reboots, outweighs supporting those fringe mixed discard configurations. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:43 +00:00
Hou Tao	3bfb87ecb4	dm: fix race between dm_get_from_kobject() and __dm_destroy() commit b9a41d21dceadf8104812626ef85dc56ee8a60ed upstream. The following BUG_ON was hit when testing repeat creation and removal of DM devices: kernel BUG at drivers/md/dm.c:2919! CPU: 7 PID: 750 Comm: systemd-udevd Not tainted 4.1.44 Call Trace: [<ffffffff81649e8b>] dm_get_from_kobject+0x34/0x3a [<ffffffff81650ef1>] dm_attr_show+0x2b/0x5e [<ffffffff817b46d1>] ? mutex_lock+0x26/0x44 [<ffffffff811df7f5>] sysfs_kf_seq_show+0x83/0xcf [<ffffffff811de257>] kernfs_seq_show+0x23/0x25 [<ffffffff81199118>] seq_read+0x16f/0x325 [<ffffffff811de994>] kernfs_fop_read+0x3a/0x13f [<ffffffff8117b625>] __vfs_read+0x26/0x9d [<ffffffff8130eb59>] ? security_file_permission+0x3c/0x44 [<ffffffff8117bdb8>] ? rw_verify_area+0x83/0xd9 [<ffffffff8117be9d>] vfs_read+0x8f/0xcf [<ffffffff81193e34>] ? __fdget_pos+0x12/0x41 [<ffffffff8117c686>] SyS_read+0x4b/0x76 [<ffffffff817b606e>] system_call_fastpath+0x12/0x71 The bug can be easily triggered, if an extra delay (e.g. 10ms) is added between the test of DMF_FREEING & DMF_DELETING and dm_get() in dm_get_from_kobject(). To fix it, we need to ensure the test of DMF_FREEING & DMF_DELETING and dm_get() are done in an atomic way, so _minor_lock is used. The other callers of dm_get() have also been checked to be OK: some callers invoke dm_get() under _minor_lock, some callers invoke it under _hash_lock, and dm_start_request() invoke it after increasing md->open_count. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:43 +00:00
John Crispin	9be341edeb	MIPS: pci: Remove KERN_WARN instance inside the mt7620 driver commit 8593b18ad348733b5d5ddfa0c79dcabf51dff308 upstream. Switch the printk() call to the prefered pr_warn() api. Fixes: 7e5873d3755c ("MIPS: pci: Add MT7620a PCIE driver") Signed-off-by: John Crispin <john@phrozen.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15321/ Signed-off-by: James Hogan <jhogan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:43 +00:00
Steven Rostedt (Red Hat)	f17c786b28	sched/rt: Simplify the IPI based RT balancing logic commit 4bdced5c9a2922521e325896a7bbbf0132c94e56 upstream. When a CPU lowers its priority (schedules out a high priority task for a lower priority one), a check is made to see if any other CPU has overloaded RT tasks (more than one). It checks the rto_mask to determine this and if so it will request to pull one of those tasks to itself if the non running RT task is of higher priority than the new priority of the next task to run on the current CPU. When we deal with large number of CPUs, the original pull logic suffered from large lock contention on a single CPU run queue, which caused a huge latency across all CPUs. This was caused by only having one CPU having overloaded RT tasks and a bunch of other CPUs lowering their priority. To solve this issue, commit: b6366f048e0c ("sched/rt: Use IPI to trigger RT task push migration instead of pulling") changed the way to request a pull. Instead of grabbing the lock of the overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work. Although the IPI logic worked very well in removing the large latency build up, it still could suffer from a large number of IPIs being sent to a single CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet, when I tested this on a 120 CPU box, with a stress test that had lots of RT tasks scheduling on all CPUs, it actually triggered the hard lockup detector! One CPU had so many IPIs sent to it, and due to the restart mechanism that is triggered when the source run queue has a priority status change, the CPU spent minutes! processing the IPIs. Thinking about this further, I realized there's no reason for each run queue to send its own IPI. As all CPUs with overloaded tasks must be scanned regardless if there's one or many CPUs lowering their priority, because there's no current way to find the CPU with the highest priority task that can schedule to one of these CPUs, there really only needs to be one IPI being sent around at a time. This greatly simplifies the code! The new approach is to have each root domain have its own irq work, as the rto_mask is per root domain. The root domain has the following fields attached to it: rto_push_work - the irq work to process each CPU set in rto_mask rto_lock - the lock to protect some of the other rto fields rto_loop_start - an atomic that keeps contention down on rto_lock the first CPU scheduling in a lower priority task is the one to kick off the process. rto_loop_next - an atomic that gets incremented for each CPU that schedules in a lower priority task. rto_loop - a variable protected by rto_lock that is used to compare against rto_loop_next rto_cpu - The cpu to send the next IPI to, also protected by the rto_lock. When a CPU schedules in a lower priority task and wants to make sure overloaded CPUs know about it. It increments the rto_loop_next. Then it atomically sets rto_loop_start with a cmpxchg. If the old value is not "0", then it is done, as another CPU is kicking off the IPI loop. If the old value is "0", then it will take the rto_lock to synchronize with a possible IPI being sent around to the overloaded CPUs. If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no IPI being sent around, or one is about to finish. Then rto_cpu is set to the first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set in rto_mask, then there's nothing to be done. When the CPU receives the IPI, it will first try to push any RT tasks that is queued on the CPU but can't run because a higher priority RT task is currently running on that CPU. Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it finds one, it simply sends an IPI to that CPU and the process continues. If there's no more CPUs in the rto_mask, then rto_loop is compared with rto_loop_next. If they match, everything is done and the process is over. If they do not match, then a CPU scheduled in a lower priority task as the IPI was being passed around, and the process needs to start again. The first CPU in rto_mask is sent the IPI. This change removes this duplication of work in the IPI logic, and greatly lowers the latency caused by the IPIs. This removed the lockup happening on the 120 CPU machine. It also simplifies the code tremendously. What else could anyone ask for? Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and supplying me with the rto_start_trylock() and rto_start_unlock() helper functions. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Clark Williams <williams@redhat.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott Wood <swood@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:43 +00:00
Mikulas Patocka	2bf483c9a4	dm: allocate struct mapped_device with kvzalloc commit 856eb0916d181da6d043cc33e03f54d5c5bbe54a upstream. The structure srcu_struct can be very big, its size is proportional to the value CONFIG_NR_CPUS. The Fedora kernel has CONFIG_NR_CPUS 8192, the field io_barrier in the struct mapped_device has 84kB in the debugging kernel and 50kB in the non-debugging kernel. The large size may result in failure of the function kzalloc_node. In order to avoid the allocation failure, we use the function kvzalloc_node, this function falls back to vmalloc if a large contiguous chunk of memory is not available. This patch also moves the field io_barrier to the last position of struct mapped_device - the reason is that on many processor architectures, short memory offsets result in smaller code than long memory offsets - on x86-64 it reduces code size by 320 bytes. Note to stable kernel maintainers - the kernels 4.11 and older don't have the function kvzalloc_node, you can use the function vzalloc_node instead. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:43 +00:00
Vivek Goyal	13e6560075	ovl: Put upperdentry if ovl_check_origin() fails commit 5455f92b54e516995a9ca45bbf790d3629c27a93 upstream. If ovl_check_origin() fails, we should put upperdentry. We have a reference on it by now. So goto out_put_upper instead of out. Fixes: a9d019573e88 ("ovl: lookup non-dir copy-up-origin by file handle") Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:43 +00:00
Eric Biggers	08720bf98b	dm bufio: fix integer overflow when limiting maximum cache size commit 74d4108d9e681dbbe4a2940ed8fdff1f6868184c upstream. The default max_cache_size_bytes for dm-bufio is meant to be the lesser of 25% of the size of the vmalloc area and 2% of the size of lowmem. However, on 32-bit systems the intermediate result in the expression (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100 overflows, causing the wrong result to be computed. For example, on a 32-bit system where the vmalloc area is 520093696 bytes, the result is 1174405 rather than the expected 130023424, which makes the maximum cache size much too small (far less than 2% of lowmem). This causes severe performance problems for dm-verity users on affected systems. Fix this by using mult_frac() to correctly multiply by a percentage. Do this for all places in dm-bufio that multiply by a percentage. Also replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary to the comment is now defined in include/linux/vmalloc.h. Depends-on: 9993bc635 ("sched/x86: Fix overflow in cyc2ns_offset") Fixes: 95d402f057f2 ("dm: add bufio") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:42 +00:00
Ming Lei	4d7a55f5b8	dm mpath: remove annoying message of 'blk_get_request() returned -11' commit 9dc112e2daf87b40607fd8d357e2d7de32290d45 upstream. It is very normal to see allocation failure, especially with blk-mq request_queues, so it's unnecessary to report this error and annoy people. In practice this 'blk_get_request() returned -11' error gets logged quite frequently when a blk-mq DM multipath device sees heavy IO. This change is marked for stable@ because the annoying message in question was included in stable@ commit 7083abbbf. Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:42 +00:00
Damien Le Moal	ac29afdb45	dm zoned: ignore last smaller runt zone commit 114e025968b5990ad0b57bf60697ea64ee206aac upstream. The SCSI layer allows ZBC drives to have a smaller last runt zone. For such a device, specifying the entire capacity for a dm-zoned target table entry fails because the specified capacity is not aligned on a device zone size indicated in the request queue structure of the device. Fix this problem by ignoring the last runt zone in the entry length when seting up the dm-zoned target (ctr method) and when iterating table entries of the target (iterate_devices method). This allows dm-zoned users to still easily setup a target using the entire device capacity (as mandated by dm-zoned) or the aligned capacity excluding the last runt zone. While at it, replace direct references to the device queue chunk_sectors limit with calls to the accessor blk_queue_zone_sectors(). Reported-by: Peter Desnoyers <pjd@ccs.neu.edu> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:42 +00:00
Mikulas Patocka	8f71f493f4	dm crypt: allow unaligned bv_offset commit 0440d5c0ca9744b92a07aeb6df0a9a75db6f4280 upstream. When slub_debug is enabled kmalloc returns unaligned memory. XFS uses this unaligned memory for its buffers (if an unaligned buffer crosses a page, XFS frees it and allocates a full page instead - see the function xfs_buf_allocate_memory). dm-crypt checks if bv_offset is aligned on page size and these checks fail with slub_debug and XFS. Fix this bug by removing the bv_offset checks. Switch to checking if bv_len is aligned instead of bv_offset (this check should be sufficient to prevent overruns if a bio with too small bv_len is received). Fixes: 8f0009a22517 ("dm crypt: optionally support larger encryption sector size") Reported-by: Bruno Prémont <bonbons@sysophe.eu> Tested-by: Bruno Prémont <bonbons@sysophe.eu> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:42 +00:00
Joe Thornber	ec544ec956	dm cache: fix race condition in the writeback mode overwrite_bio optimisation commit d1260e2a3f85f4c1010510a15f89597001318b1b upstream. When a DM cache in writeback mode moves data between the slow and fast device it can often avoid a copy if the triggering bio either: i) covers the whole block (no point copying if we're about to overwrite it) ii) the migration is a promotion and the origin block is currently discarded Prior to this fix there was a race with case (ii). The discard status was checked with a shared lock held (rather than exclusive). This meant another bio could run in parallel and write data to the origin, removing the discard state. After the promotion the parallel write would have been lost. With this fix the discard status is re-checked once the exclusive lock has been aquired. If the block is no longer discarded it falls back to the slower full copy path. Fixes: b29d4986d ("dm cache: significant rework to leverage dm-bio-prison-v2") Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:42 +00:00
Mikulas Patocka	a502cd2dd4	dm integrity: allow unaligned bv_offset commit 95b1369a9638cfa322ad1c0cde8efbe524059884 upstream. When slub_debug is enabled kmalloc returns unaligned memory. XFS uses this unaligned memory for its buffers (if an unaligned buffer crosses a page, XFS frees it and allocates a full page instead - see the function xfs_buf_allocate_memory). dm-integrity checks if bv_offset is aligned on page size and this check fail with slub_debug and XFS. Fix this bug by removing the bv_offset check, leaving only the check for bv_len. Fixes: 7eada909bfd7 ("dm: add integrity target") Reported-by: Bruno Prémont <bonbons@sysophe.eu> Reviewed-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:42 +00:00
Vijendar Mukunda	ca90f34e2e	ALSA: hda: Add Raven PCI ID commit 9ceace3c9c18c67676e75141032a65a8e01f9a7a upstream. This commit adds PCI ID for Raven platform Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-30 08:40:42 +00:00

1 2 3 4 5 ...

708367 Commits