linux

iv/linux

Author	SHA1	Message	Date
Qu Wenruo	34ffafdba1	btrfs: ctree: Remove stray comment of setting up path lock The following comment shows up in btrfs_search_slot() with out much sense: /* * setup the path here so we can release it under lock * contention with the cow code / if (cow) { / code touching path->lock[] is far away from here / } This comment hasn't been cleaned up after the relevant code has been removed. The original code is introduced in commit 65b51a009e29 ("btrfs_search_slot: reduce lock contention by cowing in two stages"): + + / + * setup the path here so we can release it under lock + * contention with the cow code + */ + p->nodes[level] = b; + if (!p->skip_locking) + p->locks[level] = 1; + But in current code, we have different timing for modifying path lock, so just remove the comment. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-11-18 12:46:46 +01:00
Qu Wenruo	abe9339d69	btrfs: ctree: Reduce one indent level for btrfs_search_old_slot() Similar to btrfs_search_slot() done in previous patch, make a shortcut for the level 0 case and allow to reduce indentation for the remaining case. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2019-11-18 12:46:46 +01:00
Qu Wenruo	f624d97608	btrfs: ctree: Reduce one indent level for btrfs_search_slot() In btrfs_search_slot(), we something like: if (level != 0) { /* Do search inside tree nodes/ } else { / Do search inside tree leaves / goto done; } This caused extra indent for tree node search code. Change it to something like: if (level == 0) { / Do search inside tree leaves / goto done' } / Do search inside tree nodes */ So we have more space to maneuver our code, this is especially useful as the tree nodes search code is more complex than the leaves search code. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-11-18 12:46:46 +01:00
Qu Wenruo	71bf92a9b8	btrfs: tree-checker: Add check for INODE_REF For INODE_REF we will check: - Objectid (ino) against previous key To detect missing INODE_ITEM. - No overflow/padding in the data payload Much like DIR_ITEM, but with less members to check. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-11-18 12:46:46 +01:00
Qu Wenruo	c18679ebd8	btrfs: tree-checker: Try to detect missing INODE_ITEM For the following items, key->objectid is inode number: - DIR_ITEM - DIR_INDEX - XATTR_ITEM - EXTENT_DATA - INODE_REF So in the subvolume tree, such items must have its previous item share the same objectid, e.g.: (257 INODE_ITEM 0) (257 DIR_INDEX xxx) (257 DIR_ITEM xxx) (258 INODE_ITEM 0) (258 INODE_REF 0) (258 XATTR_ITEM 0) (258 EXTENT_DATA 0) But if we have the following sequence, then there is definitely something wrong, normally some INODE_ITEM is missing, like: (257 INODE_ITEM 0) (257 DIR_INDEX xxx) (257 DIR_ITEM xxx) (258 XATTR_ITEM 0) <<< objecitd suddenly changed to 258 (258 EXTENT_DATA 0) So just by checking the previous key for above inode based key types, we can detect a missing inode item. For INODE_REF key type, the check will be added along with INODE_REF checker. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-11-18 12:46:46 +01:00
Filipe Manana	b9fae2ebee	Btrfs: make btrfs_wait_extents() static It's not used ouside of transaction.c Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-11-18 12:46:45 +01:00
Nikolay Borisov	35b814f3c5	btrfs: Add assert to catch nested transaction commit A recent patch to btrfs showed that there was at least 1 case where a nested transaction was committed. Nested transaction in this case means a code which has a transaction handle calls some function which in turn obtains a copy of the same transaction handle. In such cases the correct thing to do is for the lower callee to call btrfs_end_transaction which contains appropriate checks so as to not commit the transaction which will result in stale trans handler for the caller. To catch such cases add an assert in btrfs_commit_transaction ensuring btrfs_trans_handle::use_count is always 1. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-11-18 12:46:45 +01:00
Goldwyn Rodrigues	9cf35f6735	btrfs: simplify inode locking for RWF_NOWAIT This is similar to 942491c9e6d6 ("xfs: fix AIM7 regression"). Apparently our current rwsem code doesn't like doing the trylock, then lock for real scheme. This causes extra contention on the lock and can be measured eg. by AIM7 benchmark. So change our read/write methods to just do the trylock for the RWF_NOWAIT case. Fixes: edf064e7c6fe ("btrfs: nowait aio support") Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2019-11-18 12:46:45 +01:00
Linus Torvalds	b226c9e1f4	for-linus-20191115 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3O/gkQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpsLBD/47jITnsOf/EU1gqW8vbl+psrPYQN+p68id EA5L8fqF7wHg/Anxg9MApDO6noH8BvnfSGFnqxWoE5YcvT/mfj4pVciLMiNG2BwA hUiJCwIG8SGCn2MRbaTQpqRnMw8aoTKdJAUWwjZTl/db+X9aCv++Odn4XuAABfh2 LxIb0ZZBF5M8CfKRHtksuCcGBftEUTrlCzSZ9dXI5tD8EpRNJw/5LDGB6w7inhcZ 0+X7ENdSQrMKA9ImJunLPUDFejHu4fr4qJdAX67Qai0Wf2dR54eaXmTVO4d4SGcU UX0zpNC6bozCq+X/ICnlJkK+ECuR33xFLRIS0S7Xv2Er6n3Ul8N6cb6RRv8Q+o1h XG5NfpOH+Atqmdyp9zSRI2c2UVfIfmvmRVIUFM+ZXmdw5oSfUltGLdyNVnKuhzc+ f2Y3dti96YnT35TIihKcwfqlFuaXfLfCmLYabtVylwlOJ80Sjhgea3IyvwstpJau uIs5X8Z5AdBuqufPj4veS3x73DeE7slGmzADcNtUeFb1K5423MJqlQUOeVeJW3x3 85tS7aot/SoMnA1dtREvceerFP/lIa/02iqX0TYQ7BqsN5oZjQzaiuJkUfV2WNOs 3TlNRBKF69tpX4+NXxaSm5kC0YHtHIWF0EtNliKM7Yi8WS0tVsy74pDO7otj3j1m s10Rr/1seA== =wP5w -----END PGP SIGNATURE----- Merge tag 'for-linus-20191115' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A few fixes that should make it into this release. This contains: - io_uring: - The timeout command assumes sequence == 0 means that we want one completion, but this kind of overloading is unfortunate as it prevents users from doing a pure time based wait. Since this operation was introduced in this cycle, let's correct it now, while we can. (me) - One-liner to fix an issue with dependent links and fixed buffer reads. The actual IO completed fine, but the link got severed since we stored the wrong expected value. (me) - Add TIMEOUT to list of opcodes that don't need a file. (Pavel) - rsxx missing workqueue destry calls. Old bug. (Chuhong) - Fix blk-iocost active list check (Jiufei) - Fix impossible-to-hit overflow merge condition, that still hit some folks very rarely (Junichi) - Fix bfq hang issue from 5.3. This didn't get marked for stable, but will go into stable post this merge (Paolo)" * tag 'for-linus-20191115' of git://git.kernel.dk/linux-block: rsxx: add missed destroy_workqueue calls in remove iocost: check active_list of all the ancestors in iocg_activate() block, bfq: deschedule empty bfq_queues not referred by any process io_uring: ensure registered buffer import returns the IO length io_uring: Fix getting file for timeout block: check bi_size overflow before merge io_uring: make timeout sequence == 0 mean no sequence	2019-11-15 13:02:34 -08:00
Linus Torvalds	875fef493f	Two fixes for the buffered reads and O_DIRECT writes serialization patch that went into -rc1 and a fixup for a bogus warning on older gcc versions. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl3O2RMTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi+k6CACf0hUTyWcJaiH3WAmkpKOnZVG//Ghv +hDWskib0gSilW+mx8Cjsndb5rXVuE4MhZ9P1VD1MMhhfVlfTUspCPG6cIQ3B3gd jEVLHDALaMc/tpKwa6EbxvxQRAL5D/2Umh8aK1kVMX2U9R6KKfMiRVToHVPewSkS eM3HJuV0kUonnD6glHyie1iwI9iFkDgt+eTJR1hpiFx26y6TwVCH5RNNvZGr0Tcf KMLgwAHxIowx0SxblvbJTMf1iIoPJNiUuZyBoo0Hli9hI7S/b4XfARAkD9eud02d 4shv7D9Js0V1tKDsfV/c8UpnOgBTGEY4AEXIN3Mm6Gk6q9pMnobWt+l8 =kIHR -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.4-rc8' of git://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "Two fixes for the buffered reads and O_DIRECT writes serialization patch that went into -rc1 and a fixup for a bogus warning on older gcc versions" * tag 'ceph-for-5.4-rc8' of git://github.com/ceph/ceph-client: rbd: silence bogus uninitialized warning in rbd_object_map_update_finish() ceph: increment/decrement dio counter on async requests ceph: take the inode lock before acquiring cap refs	2019-11-15 10:30:24 -08:00
David Howells	a28f239e29	afs: Fix race in commit bulk status fetch When a lookup is done, the afs filesystem will perform a bulk status-fetch operation on the requested vnode (file) plus the next 49 other vnodes from the directory list (in AFS, directory contents are downloaded as blobs and parsed locally). When the results are received, it will speculatively populate the inode cache from the extra data. However, if the lookup races with another lookup on the same directory, but for a different file - one that's in the 49 extra fetches, then if the bulk status-fetch operation finishes first, it will try and update the inode from the other lookup. If this other inode is still in the throes of being created, however, this will cause an assertion failure in afs_apply_status(): BUG_ON(test_bit(AFS_VNODE_UNSET, &vnode->flags)); on or about fs/afs/inode.c:175 because it expects data to be there already that it can compare to. Fix this by skipping the update if the inode is being created as the creator will presumably set up the inode with the same information. Fixes: 39db9815da48 ("afs: Fix application of the results of a inline bulk status fetch") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-11-15 10:28:02 -08:00
Linus Torvalds	b4c0800e42	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs fixes from Al Viro: "Assorted fixes all over the place; some of that is -stable fodder, some regressions from the last window" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable ecryptfs: fix unlink and rmdir in face of underlying fs modifications audit_get_nd(): don't unlock parent too early exportfs_decode_fh(): negative pinned may become positive without the parent locked cgroup: don't put ERR_PTR() into fc->root autofs: fix a leak in autofs_expire_indirect() aio: Fix io_pgetevents() struct __compat_aio_sigset layout fs/namespace.c: fix use-after-free of mount in mnt_warn_timestamp_expiry()	2019-11-15 08:44:08 -08:00
Jens Axboe	eac406c61c	io_uring: make POLL_ADD/POLL_REMOVE scale better One of the obvious use cases for these commands is networking, where it's not uncommon to have tons of sockets open and polled for. The current implementation uses a list for insertion and lookup, which works fine for file based use cases where the count is usually low, it breaks down somewhat for higher number of files / sockets. A test case with 30k sockets being polled for and cancelled takes: real 0m6.968s user 0m0.002s sys 0m6.936s with the patch it takes: real 0m0.233s user 0m0.010s sys 0m0.176s If you go to 50k sockets, it gets even more abysmal with the current code: real 0m40.602s user 0m0.010s sys 0m40.555s with the patch it takes: real 0m0.398s user 0m0.000s sys 0m0.341s Change is pretty straight forward, just replace the cancel_list with a red/black tree instead. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-14 12:09:58 -07:00
Jeff Layton	6a81749ebe	ceph: increment/decrement dio counter on async requests Ceph can in some cases issue an async DIO request, in which case we can end up calling ceph_end_io_direct before the I/O is actually complete. That may allow buffered operations to proceed while DIO requests are still in flight. Fix this by incrementing the i_dio_count when issuing an async DIO request, and decrement it when tearing down the aio_req. Fixes: 321fe13c9398 ("ceph: add buffered/direct exclusionary locking for reads and writes") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-11-14 18:44:51 +01:00
Jeff Layton	a81bc3102b	ceph: take the inode lock before acquiring cap refs Most of the time, we (or the vfs layer) takes the inode_lock and then acquires caps, but ceph_read_iter does the opposite, and that can lead to a deadlock. When there are multiple clients treading over the same data, we can end up in a situation where a reader takes caps and then tries to acquire the inode_lock. Another task holds the inode_lock and issues a request to the MDS which needs to revoke the caps, but that can't happen until the inode_lock is unwedged. Fix this by having ceph_read_iter take the inode_lock earlier, before attempting to acquire caps. Fixes: 321fe13c9398 ("ceph: add buffered/direct exclusionary locking for reads and writes") Link: https://tracker.ceph.com/issues/36348 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-11-14 18:44:51 +01:00
Jens Axboe	021d1cdda3	io-wq: remove now redundant struct io_wq_nulls_list Since we don't iterate these lists anymore after commit: e61df66c69b1 ("io-wq: ensure free/busy list browsing see all items") we don't need to retain the nulls value we use for them. That means it's pretty pointless to wrap the hlist_nulls_head in a structure, so get rid of it. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-14 08:02:19 -07:00
Christoph Hellwig	979c690d9a	block: move clearing bd_invalidated into check_disk_size_change Both callers of check_disk_size_change clear bd_invalidate directly after the call, so move the clearing into check_disk_size_change itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-14 07:44:01 -07:00
Christoph Hellwig	f0b870df80	block: remove (__)blkdev_reread_part as an exported API In general drivers should never mess with partition tables directly. Unfortunately s390 and loop do for somewhat historic reasons, but they can use bdev_disk_changed directly instead when we export it as they satisfy the sanity checks we have in __blkdev_reread_part. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> [dasd] Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-14 07:43:59 -07:00
Christoph Hellwig	142fe8f4bb	block: fix bdev_disk_changed for non-partitioned devices We still have to set the capacity to 0 if invalidating or call revalidate_disk if not even if the disk has no partitions. Fix that by merging rescan_partitions into bdev_disk_changed and just stubbing out blk_add_partitions and blk_drop_partitions for non-partitioned devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-14 07:43:53 -07:00
Christoph Hellwig	a1548b6744	block: move rescan_partitions to fs/block_dev.c Large parts of rescan_partitions aren't about partitions, and moving it to block_dev.c will allow for some further cleanups by merging it into its only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-14 07:43:21 -07:00
Christoph Hellwig	6917d06899	block: merge invalidate_partitions into rescan_partitions A lot of the logic in invalidate_partitions and rescan_partitions is shared. Merge the two functions to simplify things. There is a small behavior change in that we now send the kevent change notice also if we were not invalidating but no partitions were found, which seems like the right thing to do. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-14 07:42:41 -07:00
Pavel Begunkov	a320e9fa1e	io_uring: Fix getting file for non-fd opcodes For timeout requests and bunch of others io_uring tries to grab a file with specified fd, which is usually stdin/fd=0. Update io_op_needs_file() Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-13 19:41:01 -07:00
Bob Liu	9d858b2148	io_uring: introduce req_need_defer() Makes the code easier to read. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-13 19:41:01 -07:00
Bob Liu	2f6d9b9d63	io_uring: clean up io_uring_cancel_files() We don't use the return value anymore, drop it. Also drop the unecessary double cancel_req value check. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-13 19:41:01 -07:00
Jens Axboe	e61df66c69	io-wq: ensure free/busy list browsing see all items We have two lists for workers in io-wq, a busy and a free list. For certain operations we want to browse all workers, and we currently do that by browsing the two separate lists. But since these lists are RCU protected, we can potentially miss workers if they move between the two lists while we're browsing them. Add a third list, all_list, that simply holds all workers. A worker is added to that list when it starts, and removed when it exits. This makes the worker iteration cleaner, too. Reported-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-13 19:40:57 -07:00
Jens Axboe	5e559561a8	io_uring: ensure registered buffer import returns the IO length A test case was reported where two linked reads with registered buffers failed the second link always. This is because we set the expected value of a request in req->result, and if we don't get this result, then we fail the dependent links. For some reason the registered buffer import returned -ERROR/0, while the normal import returns -ERROR/length. This broke linked commands with registered buffers. Fix this by making io_import_fixed() correctly return the mapped length. Cc: stable@vger.kernel.org # v5.3 Reported-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-13 16:15:14 -07:00
Pavel Begunkov	5683e5406e	io_uring: Fix getting file for timeout For timeout requests io_uring tries to grab a file with specified fd, which is usually stdin/fd=0. Update io_op_needs_file() Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-13 15:25:57 -07:00
Jens Axboe	36c2f9223e	io-wq: ensure we have a stable view of ->cur_work for cancellations worker->cur_work is currently protected by the lock of the wqe that the worker belongs to. When we send a signal to a worker, we need a stable view of ->cur_work, so we need to hold that lock. But this doesn't work so well, since we have the opposite order potentially on queueing work. If POLL_ADD is used with a signalfd, then io_poll_wake() is called with the signal lock, and that sometimes needs to insert work items. Add a specific worker lock that protects the current work item. Then we can guarantee that the task we're sending a signal is currently processing the exact work we think it is. Reported-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-13 13:51:54 -07:00
Eric Biggers	924e319416	f2fs: support STATX_ATTR_VERITY Set the STATX_ATTR_VERITY bit when the statx() system call is used on a verity file on f2fs. Signed-off-by: Eric Biggers <ebiggers@google.com>	2019-11-13 12:15:34 -08:00
Eric Biggers	1f60719552	ext4: support STATX_ATTR_VERITY Set the STATX_ATTR_VERITY bit when the statx() system call is used on a verity file on ext4. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Eric Biggers <ebiggers@google.com>	2019-11-13 12:15:34 -08:00
Linus Torvalds	afd7a71872	for-5.4-rc7-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl3MUkoACgkQxWXV+ddt WDvAEQ//fEHZ51NbIMwJNltqF4mr6Oao0M5u0wejEgiXzmR9E1IuHUgVK+KDQmSu wZl/y+RTlQC0TiURyStaVFBreXEqiuB79my9u4iDeNv/4UJQB42qpmYB4EuviDgB mxb9bFpWTLkO6Oc+vMrGF3BOmVsQKlq2nOua25g8VFtApQ6uiEfbwBOslCcC8kQB ZpNBl6x74xz/VWNWZnRStBfwYjRitKNDVU6dyIyRuLj8cktqfGBxGtx7/w0wDiZT kPR1bNtdpy3Ndke6H/0G6plRWi9kENqcN43hvrz54IKh2l+Jd2/as51j4Qq2tJU9 KaAnJzRaSePxc2m0SqtgZTvc2BYSOg7dqaCyHxBB0CUBdTdJdz2TVZ2KM9MiLns4 1haHBLo4l8o8zeYZpW05ac6OXKY4f8qsjWPEGshn4FDbq0TrHQzYxAF3c0X3hPag SnuvilgYUuYal+n87qinePg/ZmVrrBXPRycpQnn7FxqezJbf/2WUEojUVQnreU05 mdp8mulxQxyFhgEvO7K1uDtlP8bqW69IO9M//6IWzGNKTDK2SRI08ULplghqgyna 8SG0+y9w26r8UIWDhuvPbdfUMSG3kEH8yLFK84AFDMVJJxOnfznE3sC8sGOiP5q9 OUkl8l7bhDkyAdWZY57gGUobebdPfnLxRV9A+LZQ2El1kSOEK18= =xzXs -----END PGP SIGNATURE----- Merge tag 'for-5.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A fix for an older bug that has started to show up during testing (because of an updated test for rename exchange). It's an in-memory corruption caused by local variable leaking out of the function scope" * tag 'for-5.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix log context list corruption after rename exchange operation	2019-11-13 12:06:10 -08:00
Jens Axboe	7d7230652e	io_wq: add get/put_work handlers to io_wq_create() For cancellation, we need to ensure that the work item stays valid for as long as ->cur_work is valid. Right now we can't safely dereference the work item even under the wqe->lock, because while the ->cur_work pointer will remain valid, the work could be completing and be freed in parallel. Only invoke ->get/put_work() on items we know that the caller queued themselves. Add IO_WQ_WORK_INTERNAL for io-wq to use, which is needed when we're queueing a flush item, for instance. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-13 11:37:54 -07:00
Jens Axboe	15dff286d0	io_uring: check for validity of ->rings in teardown Normally the rings are always valid, the exception is if we failed to allocate the rings at setup time. syzbot reports this: RSP: 002b:00007ffd6e8aa078 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441229 RDX: 0000000000000002 RSI: 0000000020000140 RDI: 0000000000000d0d RBP: 00007ffd6e8aa090 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 8903 Comm: syz-executor410 Not tainted 5.4.0-rc7-next-20191113 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline] RIP: 0010:__io_commit_cqring fs/io_uring.c:496 [inline] RIP: 0010:io_commit_cqring+0x1e1/0xdb0 fs/io_uring.c:592 Code: 03 0f 8e df 09 00 00 48 8b 45 d0 4c 8d a3 c0 00 00 00 4c 89 e2 48 c1 ea 03 44 8b b8 c0 01 00 00 48 b8 00 00 00 00 00 fc ff df <0f> b6 14 02 4c 89 e0 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 61 RSP: 0018:ffff88808f51fc08 EFLAGS: 00010006 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff815abe4a RDX: 0000000000000018 RSI: ffffffff81d168d5 RDI: ffff8880a9166100 RBP: ffff88808f51fc70 R08: 0000000000000004 R09: ffffed1011ea3f7d R10: ffffed1011ea3f7c R11: 0000000000000003 R12: 00000000000000c0 R13: ffff8880a91661c0 R14: 1ffff1101522cc10 R15: 0000000000000000 FS: 0000000001e7a880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000140 CR3: 000000009a74c000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: io_cqring_overflow_flush+0x6b9/0xa90 fs/io_uring.c:673 io_ring_ctx_wait_and_kill+0x24f/0x7c0 fs/io_uring.c:4260 io_uring_create fs/io_uring.c:4600 [inline] io_uring_setup+0x1256/0x1cc0 fs/io_uring.c:4626 __do_sys_io_uring_setup fs/io_uring.c:4639 [inline] __se_sys_io_uring_setup fs/io_uring.c:4636 [inline] __x64_sys_io_uring_setup+0x54/0x80 fs/io_uring.c:4636 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x441229 Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffd6e8aa078 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441229 RDX: 0000000000000002 RSI: 0000000020000140 RDI: 0000000000000d0d RBP: 00007ffd6e8aa090 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: ---[ end trace b0f5b127a57f623f ]--- RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline] RIP: 0010:__io_commit_cqring fs/io_uring.c:496 [inline] RIP: 0010:io_commit_cqring+0x1e1/0xdb0 fs/io_uring.c:592 Code: 03 0f 8e df 09 00 00 48 8b 45 d0 4c 8d a3 c0 00 00 00 4c 89 e2 48 c1 ea 03 44 8b b8 c0 01 00 00 48 b8 00 00 00 00 00 fc ff df <0f> b6 14 02 4c 89 e0 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 61 RSP: 0018:ffff88808f51fc08 EFLAGS: 00010006 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff815abe4a RDX: 0000000000000018 RSI: ffffffff81d168d5 RDI: ffff8880a9166100 RBP: ffff88808f51fc70 R08: 0000000000000004 R09: ffffed1011ea3f7d R10: ffffed1011ea3f7c R11: 0000000000000003 R12: 00000000000000c0 R13: ffff8880a91661c0 R14: 1ffff1101522cc10 R15: 0000000000000000 FS: 0000000001e7a880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000140 CR3: 000000009a74c000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 which is exactly the case of failing to allocate the SQ/CQ rings, and then entering shutdown. Check if the rings are valid before trying to access them at shutdown time. Reported-by: syzbot+21147d79607d724bd6f3@syzkaller.appspotmail.com Fixes: 1d7bb1d50fb4 ("io_uring: add support for backlogged CQ ring") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-13 09:11:36 -07:00
Christoph Hellwig	d41003513e	block: rework zone reporting Avoid the need to allocate a potentially large array of struct blk_zone in the block layer by switching the ->report_zones method interface to a callback model. Now the caller simply supplies a callback that is executed on each reported zone, and private data for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-12 19:12:07 -07:00
Jens Axboe	7c9e7f0fe0	io_uring: fix potential deadlock in io_poll_wake() We attempt to run the poll completion inline, but we're using trylock to do so. This avoids a deadlock since we're grabbing the locks in reverse order at this point, we already hold the poll wq lock and we're trying to grab the completion lock, while the normal rules are the reverse of that order. IO completion for a timeout link will need to grab the completion lock, but that's not safe from this context. Put the completion under the completion_lock in io_poll_wake(), and mark the request as entering the completion with the completion_lock already held. Fixes: 2665abfd757f ("io_uring: add support for linked SQE timeouts") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-12 12:26:34 -07:00
Jens Axboe	960e432dfa	io_uring: use correct "is IO worker" helper Since we switched to io-wq, the dependent link optimization for when to pass back work inline has been broken. Fix this by providing a suitable io-wq helper for io_uring to use to detect when to do this. Fixes: 561fb04a6a22 ("io_uring: replace workqueue usage with io-wq") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-12 08:02:26 -07:00
Jens Axboe	93bd25bb69	io_uring: make timeout sequence == 0 mean no sequence Currently we make sequence == 0 be the same as sequence == 1, but that's not super useful if the intent is really to have a timeout that's just a pure timeout. If the user passes in sqe->off == 0, then don't apply any sequence logic to the request, let it purely be driven by the timeout specified. Reported-by: 李通洲 <carter.li@eoitek.com> Reviewed-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-12 00:18:51 -07:00
Jens Axboe	76a46e066e	io_uring: fix -ENOENT issue with linked timer with short timeout If you prep a read (for example) that needs to get punted to async context with a timer, if the timeout is sufficiently short, the timer request will get completed with -ENOENT as it could not find the read. The issue is that we prep and start the timer before we start the read. Hence the timer can trigger before the read is even started, and the end result is then that the timer completes with -ENOENT, while the read starts instead of being cancelled by the timer. Fix this by splitting the linked timer into two parts: 1) Prep and validate the linked timer 2) Start timer The read is then started between steps 1 and 2, so we know that the timer will always have a consistent view of the read request state. Reported-by: Hrvoje Zeba <zeba.hrvoje@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-11 16:33:22 -07:00
Jens Axboe	768134d4f4	io_uring: don't do flush cancel under inflight_lock We can't safely cancel under the inflight lock. If the work hasn't been started yet, then io_wq_cancel_work() simply marks the work as cancelled and invokes the work handler. But if the work completion needs to grab the inflight lock because it's grabbing user files, then we'll deadlock trying to finish the work as we already hold that lock. Instead grab a reference to the request, if it isn't already zero. If it's zero, then we know it's going through completion anyway, and we can safely ignore it. If it's not zero, then we can drop the lock and attempt to cancel from there. This also fixes a missing finish_wait() at the end of io_uring_cancel_files(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-11 16:33:17 -07:00
Jens Axboe	c1edbf5f08	io_uring: flag SQPOLL busy condition to userspace Now that we have backpressure, for SQPOLL, we have one more condition that warrants flagging that the application needs to enter the kernel: we failed to submit IO due to backpressure. Make sure we catch that and flag it appropriately. If we run into backpressure issues with the SQPOLL thread, flag it as such to the application by setting IORING_SQ_NEED_WAKEUP. This will cause the application to enter the kernel, and that will flush the backlog and clear the condition. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-11 16:33:11 -07:00
Jens Axboe	47f467686e	io_uring: make ASYNC_CANCEL work with poll and timeout It's a little confusing that we have multiple types of command cancellation opcodes now that we have a generic one. Make the generic one work with POLL_ADD and TIMEOUT commands as well, that makes for an easier to use API for the application. The fact that they currently don't is a bit confusing. Add a helper that takes care of it, so we can user it from both IORING_OP_ASYNC_CANCEL and from the linked timeout cancellation. Reported-by: Hrvoje Zeba <zeba.hrvoje@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-11 16:33:05 -07:00
Jens Axboe	0ddf92e848	io_uring: provide fallback request for OOM situations One thing that really sucks for userspace APIs is if the kernel passes back -ENOMEM/-EAGAIN for resource shortages. The application really has no idea of what to do in those cases. Should it try and reap completions? Probably a good idea. Will it solve the issue? Who knows. This patch adds a simple fallback mechanism if we fail to allocate memory for a request. If we fail allocating memory from the slab for a request, we punt to a pre-allocated request. There's just one of these per io_ring_ctx, but the important part is if we ever return -EBUSY to the application, the applications knows that it can wait for events and make forward progress when events have completed. This is the important part. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-11 16:32:55 -07:00
Filipe Manana	e6c617102c	Btrfs: fix log context list corruption after rename exchange operation During rename exchange we might have successfully log the new name in the source root's log tree, in which case we leave our log context (allocated on stack) in the root's list of log contextes. However we might fail to log the new name in the destination root, in which case we fallback to a transaction commit later and never sync the log of the source root, which causes the source root log context to remain in the list of log contextes. This later causes invalid memory accesses because the context was allocated on stack and after rename exchange finishes the stack gets reused and overwritten for other purposes. The kernel's linked list corruption detector (CONFIG_DEBUG_LIST=y) can detect this and report something like the following: [ 691.489929] ------------[ cut here ]------------ [ 691.489947] list_add corruption. prev->next should be next (ffff88819c944530), but was ffff8881c23f7be4. (prev=ffff8881c23f7a38). [ 691.489967] WARNING: CPU: 2 PID: 28933 at lib/list_debug.c:28 __list_add_valid+0x95/0xe0 (...) [ 691.489998] CPU: 2 PID: 28933 Comm: fsstress Not tainted 5.4.0-rc6-btrfs-next-62 #1 [ 691.490001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 [ 691.490003] RIP: 0010:__list_add_valid+0x95/0xe0 (...) [ 691.490007] RSP: 0018:ffff8881f0b3faf8 EFLAGS: 00010282 [ 691.490010] RAX: 0000000000000000 RBX: ffff88819c944530 RCX: 0000000000000000 [ 691.490011] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffa2c497e0 [ 691.490013] RBP: ffff8881f0b3fe68 R08: ffffed103eaa4115 R09: ffffed103eaa4114 [ 691.490015] R10: ffff88819c944000 R11: ffffed103eaa4115 R12: 7fffffffffffffff [ 691.490016] R13: ffff8881b4035610 R14: ffff8881e7b84728 R15: 1ffff1103e167f7b [ 691.490019] FS: 00007f4b25ea2e80(0000) GS:ffff8881f5500000(0000) knlGS:0000000000000000 [ 691.490021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 691.490022] CR2: 00007fffbb2d4eec CR3: 00000001f2a4a004 CR4: 00000000003606e0 [ 691.490025] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 691.490027] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 691.490029] Call Trace: [ 691.490058] btrfs_log_inode_parent+0x667/0x2730 [btrfs] [ 691.490083] ? join_transaction+0x24a/0xce0 [btrfs] [ 691.490107] ? btrfs_end_log_trans+0x80/0x80 [btrfs] [ 691.490111] ? dget_parent+0xb8/0x460 [ 691.490116] ? lock_downgrade+0x6b0/0x6b0 [ 691.490121] ? rwlock_bug.part.0+0x90/0x90 [ 691.490127] ? do_raw_spin_unlock+0x142/0x220 [ 691.490151] btrfs_log_dentry_safe+0x65/0x90 [btrfs] [ 691.490172] btrfs_sync_file+0x9f1/0xc00 [btrfs] [ 691.490195] ? btrfs_file_write_iter+0x1800/0x1800 [btrfs] [ 691.490198] ? rcu_read_lock_any_held.part.11+0x20/0x20 [ 691.490204] ? __do_sys_newstat+0x88/0xd0 [ 691.490207] ? cp_new_stat+0x5d0/0x5d0 [ 691.490218] ? do_fsync+0x38/0x60 [ 691.490220] do_fsync+0x38/0x60 [ 691.490224] __x64_sys_fdatasync+0x32/0x40 [ 691.490228] do_syscall_64+0x9f/0x540 [ 691.490233] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 691.490235] RIP: 0033:0x7f4b253ad5f0 (...) [ 691.490239] RSP: 002b:00007fffbb2d6078 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [ 691.490242] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f4b253ad5f0 [ 691.490244] RDX: 00007fffbb2d5fe0 RSI: 00007fffbb2d5fe0 RDI: 0000000000000003 [ 691.490245] RBP: 000000000000000d R08: 0000000000000001 R09: 00007fffbb2d608c [ 691.490247] R10: 00000000000002e8 R11: 0000000000000246 R12: 00000000000001f4 [ 691.490248] R13: 0000000051eb851f R14: 00007fffbb2d6120 R15: 00005635a498bda0 This started happening recently when running some test cases from fstests like btrfs/004 for example, because support for rename exchange was added last week to fsstress from fstests. So fix this by deleting the log context for the source root from the list if we have logged the new name in the source root. Reported-by: Su Yue <Damenly_Su@gmx.com> Fixes: d4682ba03ef618 ("Btrfs: sync log after logging new name") CC: stable@vger.kernel.org # 4.19+ Tested-by: Su Yue <Damenly_Su@gmx.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-11-11 19:46:02 +01:00
Jens Axboe	8e3cca1270	io_uring: convert accept4() -ERESTARTSYS into -EINTR If we cancel a pending accept operating with a signal, we get -ERESTARTSYS returned. Turn that into -EINTR for userspace, we should not be return -ERESTARTSYS. Fixes: 17f2fe35d080 ("io_uring: add support for IORING_OP_ACCEPT") Reported-by: Hrvoje Zeba <zeba.hrvoje@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-10 20:29:49 -07:00
Jens Axboe	46568e9be7	io_uring: fix error clear of ->file_table in io_sqe_files_register() syzbot reports that when using failslab and friends, we can get a double free in io_sqe_files_unregister(): BUG: KASAN: double-free or invalid-free in io_sqe_files_unregister+0x20b/0x300 fs/io_uring.c:3185 CPU: 1 PID: 8819 Comm: syz-executor452 Not tainted 5.4.0-rc6-next-20191108 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 kasan_report_invalid_free+0x65/0xa0 mm/kasan/report.c:468 __kasan_slab_free+0x13a/0x150 mm/kasan/common.c:450 kasan_slab_free+0xe/0x10 mm/kasan/common.c:480 __cache_free mm/slab.c:3426 [inline] kfree+0x10a/0x2c0 mm/slab.c:3757 io_sqe_files_unregister+0x20b/0x300 fs/io_uring.c:3185 io_ring_ctx_free fs/io_uring.c:3998 [inline] io_ring_ctx_wait_and_kill+0x348/0x700 fs/io_uring.c:4060 io_uring_release+0x42/0x50 fs/io_uring.c:4068 __fput+0x2ff/0x890 fs/file_table.c:280 ____fput+0x16/0x20 fs/file_table.c:313 task_work_run+0x145/0x1c0 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x904/0x2e60 kernel/exit.c:817 do_group_exit+0x135/0x360 kernel/exit.c:921 __do_sys_exit_group kernel/exit.c:932 [inline] __se_sys_exit_group kernel/exit.c:930 [inline] __x64_sys_exit_group+0x44/0x50 kernel/exit.c:930 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x43f2c8 Code: 31 b8 c5 f7 ff ff 48 8b 5c 24 28 48 8b 6c 24 30 4c 8b 64 24 38 4c 8b 6c 24 40 4c 8b 74 24 48 4c 8b 7c 24 50 48 83 c4 58 c3 66 <0f> 1f 84 00 00 00 00 00 48 8d 35 59 ca 00 00 0f b6 d2 48 89 fb 48 RSP: 002b:00007ffd5b976008 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043f2c8 RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 RBP: 00000000004bf0a8 R08: 00000000000000e7 R09: ffffffffffffffd0 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001 R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000 This happens if we fail allocating the file tables. For that case we do free the file table correctly, but we forget to set it to NULL. This means that ring teardown will see it as being non-NULL, and attempt to free it again. Fix this by clearing the file_table pointer if we free the table. Reported-by: syzbot+3254bc44113ae1e331ee@syzkaller.appspotmail.com Fixes: 65e19f54d29c ("io_uring: support for larger fixed file sets") Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-10 20:29:49 -07:00
Jackie Liu	c69f8dbe24	io_uring: separate the io_free_req and io_free_req_find_next interface Similar to the distinction between io_put_req and io_put_req_find_next, io_free_req has been modified similarly, with no functional changes. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-10 20:29:49 -07:00
Jackie Liu	ec9c02ad4c	io_uring: keep io_put_req only responsible for release and put req We already have io_put_req_find_next to find the next req of the link. we should not use the io_put_req function to find them. They should be functions of the same level. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-10 20:29:49 -07:00
Jackie Liu	a197f664a0	io_uring: remove passed in 'ctx' function parameter ctx if possible Many times, the core of the function is req, and req has already set req->ctx at initialization time, so there is no need to pass in the ctx from the caller. Cleanup, no functional change. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-10 20:29:49 -07:00
Jens Axboe	206aefde4f	io_uring: reduce/pack size of io_ring_ctx With the recent flurry of additions and changes to io_uring, the layout of io_ring_ctx has become a bit stale. We're right now at 704 bytes in size on my x86-64 build, or 11 cachelines. This patch does two things: - We have to completion structs embedded, that we only use for quiesce of the ctx (or shutdown) and for sqthread init cases. That 2x32 bytes right there, let's dynamically allocate them. - Reorder the struct a bit with an eye on cachelines, use cases, and holes. With this patch, we're down to 512 bytes, or 8 cachelines. Reviewed-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-10 20:29:49 -07:00
Linus Torvalds	a5871fcba4	configfs regression fix for 5.4-rc - fix a regression from this merge window in the configfs symlink handling (Honggang Li) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl3IG00LHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYPJbBAAw57uaM2mKcUe1syqwsaliqYVGqZ7X+4sMdtwNPAI W01aolnmmukcWInlZBqiHFu3+iAVYJND9NvP7ZA+p6/meJfWiIrgWMEVS4X6HFpg qsBCP1qh8BvLxCzHC+i9tkgPMpicYjImE/pza99ZtrMHTRC7ao5I3GKbWBIdEpDv 5NZDbt2g79Y1YfqGRo0xcY0etwpau+iN4rXivjj4qMO1o+Vt4rWlGZehhD2/r2M0 NGeR3JpYe1MdxvyECMoDI+aWuOiDFrJ/ZfWfTuCskqwTyZ1BtKElBnqS6VFn7yxL XqPNwe6Sr9fz7RRZXQ1iH8O/SYct25xVyc6JQlAvcpp0emAUsj2bNAQ3Hj4W3Krw h1D5HKvte+CjIqZEnFlqI6GeHawdbSpJoE1VOECS1rXpYhvs5V+2e5RyT2gmj1Pp X58Q0xF5Ver2sF0NkhAMTxXL77L8cpRDVljveBXgUh/MzTdPcq1h/RsKXwwVBD23 Vsmg4qKlBAWJLK1TJ4wvvmmp0N/XzcaNWm7cKCxJDBlzY8LpcVTmUJfGty96mHqb cRZLMNcWbtPFSkHfzkYeuWWnhhtgXBmEcTawOd3y0s+jd8wjhRr2wuhL/MSCicJG t0A+4/5p4s8H94OAiq9tF8yOAQpvzmrrunW+lyyOQchK4kBBM8rD+cMIRziL+u/2 B84= =kgn+ -----END PGP SIGNATURE----- Merge tag 'configfs-for-5.4-2' of git://git.infradead.org/users/hch/configfs Pull configfs regression fix from Christoph Hellwig: "Fix a regression from this merge window in the configfs symlink handling (Honggang Li)" * tag 'configfs-for-5.4-2' of git://git.infradead.org/users/hch/configfs: configfs: calculate the depth of parent item	2019-11-10 12:59:34 -08:00

... 2 3 4 5 6 ...

61054 Commits