linux

iv/linux

Author	SHA1	Message	Date
Goldwyn Rodrigues	9fb9f0b59d	btrfs: mark compressed range uptodate only if all bio succeed [ Upstream commit 240246f6b913b0c23733cfd2def1d283f8cc9bbe ] In compression write endio sequence, the range which the compressed_bio writes is marked as uptodate if the last bio of the compressed (sub)bios is completed successfully. There could be previous bio which may have failed which is recorded in cb->errors. Set the writeback range as uptodate only if cb->errors is zero, as opposed to checking only the last bio's status. Backporting notes: in all versions up to 4.4 the last argument is always replaced by "!cb->errors". CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-08 08:38:53 +02:00
Junxiao Bi	fc3f43d556	ocfs2: issue zeroout to EOF blocks commit 9449ad33be8480f538b11a593e2dda2fb33ca06d upstream. For punch holes in EOF blocks, fallocate used buffer write to zero the EOF blocks in last cluster. But since ->writepage will ignore EOF pages, those zeros will not be flushed. This "looks" ok as commit 6bba4471f0cc ("ocfs2: fix data corruption by fallocate") will zero the EOF blocks when extend the file size, but it isn't. The problem happened on those EOF pages, before writeback, those pages had DIRTY flag set and all buffer_head in them also had DIRTY flag set, when writeback run by write_cache_pages(), DIRTY flag on the page was cleared, but DIRTY flag on the buffer_head not. When next write happened to those EOF pages, since buffer_head already had DIRTY flag set, it would not mark page DIRTY again. That made writeback ignore them forever. That will cause data corruption. Even directio write can't work because it will fail when trying to drop pages caches before direct io, as it found the buffer_head for those pages still had DIRTY flag set, then it will fall back to buffer io mode. To make a summary of the issue, as writeback ingores EOF pages, once any EOF page is generated, any write to it will only go to the page cache, it will never be flushed to disk even file size extends and that page is not EOF page any more. The fix is to avoid zero EOF blocks with buffer write. The following code snippet from qemu-img could trigger the corruption. 656 open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR\|O_DIRECT\|O_CLOEXEC) = 11 ... 660 fallocate(11, FALLOC_FL_KEEP_SIZE\|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680 <unfinished ...> 660 fallocate(11, 0, 2275868672, 327680) = 0 658 pwrite64(11, " Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-04 11:58:02 +02:00
Junxiao Bi	439e209bc2	ocfs2: fix zero out valid data commit f267aeb6dea5e468793e5b8eb6a9c72c0020d418 upstream. If append-dio feature is enabled, direct-io write and fallocate could run in parallel to extend file size, fallocate used "orig_isize" to record i_size before taking "ip_alloc_sem", when ocfs2_zeroout_partial_cluster() zeroout EOF blocks, i_size maybe already extended by ocfs2_dio_end_io_write(), that will cause valid data zeroed out. Link: https://lkml.kernel.org/r/20210722054923.24389-1-junxiao.bi@oracle.com Fixes: 6bba4471f0cc ("ocfs2: fix data corruption by fallocate") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-04 11:58:02 +02:00
Desmond Cheong Zhi Xi	e2c06c712a	hfs: add lock nesting notation to hfs_find_init [ Upstream commit b3b2177a2d795e35dc11597b2609eb1e7e57e570 ] Syzbot reports a possible recursive lock in [1]. This happens due to missing lock nesting information. From the logs, we see that a call to hfs_fill_super is made to mount the hfs filesystem. While searching for the root inode, the lock on the catalog btree is grabbed. Then, when the parent of the root isn't found, a call to __hfs_bnode_create is made to create the parent of the root. This eventually leads to a call to hfs_ext_read_extent which grabs a lock on the extents btree. Since the order of locking is catalog btree -> extents btree, this lock hierarchy does not lead to a deadlock. To tell lockdep that this locking is safe, we add nesting notation to distinguish between catalog btrees, extents btrees, and attributes btrees (for HFS+). This has already been done in hfsplus. Link: https://syzkaller.appspot.com/bug?id=f007ef1d7a31a469e3be7aeb0fde0769b18585db [1] Link: https://lkml.kernel.org/r/20210701030756.58760-4-desmondcheongzx@gmail.com Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Reported-by: syzbot+b718ec84a87b7e73ade4@syzkaller.appspotmail.com Tested-by: syzbot+b718ec84a87b7e73ade4@syzkaller.appspotmail.com Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-04 11:58:02 +02:00
Desmond Cheong Zhi Xi	81d6d87a80	hfs: fix high memory mapping in hfs_bnode_read [ Upstream commit 54a5ead6f5e2b47131a7385d0c0af18e7b89cb02 ] Pages that we read in hfs_bnode_read need to be kmapped into kernel address space. However, currently only the 0th page is kmapped. If the given offset + length exceeds this 0th page, then we have an invalid memory access. To fix this, we kmap relevant pages one by one and copy their relevant portions of data. An example of invalid memory access occurring without this fix can be seen in the following crash report: ================================================================== BUG: KASAN: use-after-free in memcpy include/linux/fortify-string.h:191 [inline] BUG: KASAN: use-after-free in hfs_bnode_read+0xc4/0xe0 fs/hfs/bnode.c:26 Read of size 2 at addr ffff888125fdcffe by task syz-executor5/4634 CPU: 0 PID: 4634 Comm: syz-executor5 Not tainted 5.13.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x195/0x1f8 lib/dump_stack.c:120 print_address_description.constprop.0+0x1d/0x110 mm/kasan/report.c:233 __kasan_report mm/kasan/report.c:419 [inline] kasan_report.cold+0x7b/0xd4 mm/kasan/report.c:436 check_region_inline mm/kasan/generic.c:180 [inline] kasan_check_range+0x154/0x1b0 mm/kasan/generic.c:186 memcpy+0x24/0x60 mm/kasan/shadow.c:65 memcpy include/linux/fortify-string.h:191 [inline] hfs_bnode_read+0xc4/0xe0 fs/hfs/bnode.c:26 hfs_bnode_read_u16 fs/hfs/bnode.c:34 [inline] hfs_bnode_find+0x880/0xcc0 fs/hfs/bnode.c:365 hfs_brec_find+0x2d8/0x540 fs/hfs/bfind.c:126 hfs_brec_read+0x27/0x120 fs/hfs/bfind.c:165 hfs_cat_find_brec+0x19a/0x3b0 fs/hfs/catalog.c:194 hfs_fill_super+0xc13/0x1460 fs/hfs/super.c:419 mount_bdev+0x331/0x3f0 fs/super.c:1368 hfs_mount+0x35/0x40 fs/hfs/super.c:457 legacy_get_tree+0x10c/0x220 fs/fs_context.c:592 vfs_get_tree+0x93/0x300 fs/super.c:1498 do_new_mount fs/namespace.c:2905 [inline] path_mount+0x13f5/0x20e0 fs/namespace.c:3235 do_mount fs/namespace.c:3248 [inline] __do_sys_mount fs/namespace.c:3456 [inline] __se_sys_mount fs/namespace.c:3433 [inline] __x64_sys_mount+0x2b8/0x340 fs/namespace.c:3433 do_syscall_64+0x37/0xc0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x45e63a Code: 48 c7 c2 bc ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb d2 e8 88 04 00 00 0f 1f 84 00 00 00 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f9404d410d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000020000248 RCX: 000000000045e63a RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007f9404d41120 RBP: 00007f9404d41120 R08: 00000000200002c0 R09: 0000000020000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 R13: 0000000000000003 R14: 00000000004ad5d8 R15: 0000000000000000 The buggy address belongs to the page: page:00000000dadbcf3e refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x125fdc flags: 0x2fffc0000000000(node=0\|zone=2\|lastcpupid=0x3fff) raw: 02fffc0000000000 ffffea000497f748 ffffea000497f6c8 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888125fdce80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888125fdcf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff888125fdcf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888125fdd000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888125fdd080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Link: https://lkml.kernel.org/r/20210701030756.58760-3-desmondcheongzx@gmail.com Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-04 11:58:02 +02:00
Desmond Cheong Zhi Xi	cb49f4a65b	hfs: add missing clean-up in hfs_fill_super [ Upstream commit 16ee572eaf0d09daa4c8a755fdb71e40dbf8562d ] Patch series "hfs: fix various errors", v2. This series ultimately aims to address a lockdep warning in hfs_find_init reported by Syzbot [1]. The work done for this led to the discovery of another bug, and the Syzkaller repro test also reveals an invalid memory access error after clearing the lockdep warning. Hence, this series is broken up into three patches: 1. Add a missing call to hfs_find_exit for an error path in hfs_fill_super 2. Fix memory mapping in hfs_bnode_read by fixing calls to kmap 3. Add lock nesting notation to tell lockdep that the observed locking hierarchy is safe This patch (of 3): Before exiting hfs_fill_super, the struct hfs_find_data used in hfs_find_init should be passed to hfs_find_exit to be cleaned up, and to release the lock held on the btree. The call to hfs_find_exit is missing from an error path. We add it back in by consolidating calls to hfs_find_exit for error paths. Link: https://syzkaller.appspot.com/bug?id=f007ef1d7a31a469e3be7aeb0fde0769b18585db [1] Link: https://lkml.kernel.org/r/20210701030756.58760-1-desmondcheongzx@gmail.com Link: https://lkml.kernel.org/r/20210701030756.58760-2-desmondcheongzx@gmail.com Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-04 11:58:02 +02:00
David Sterba	039f1a721c	btrfs: compression: don't try to compress if we don't have enough pages commit f2165627319ffd33a6217275e5690b1ab5c45763 upstream The early check if we should attempt compression does not take into account the number of input pages. It can happen that there's only one page, eg. a tail page after some ranges of the BTRFS_MAX_UNCOMPRESSED have been processed, or an isolated page that won't be converted to an inline extent. The single page would be compressed but a later check would drop it again because the result size must be at least one block shorter than the input. That can never work with just one page. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: David Sterba <dsterba@suse.com> [sudip: adjust context] Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-28 09:14:29 +02:00
Marcelo Henrique Cerri	947644647b	proc: Avoid mixing integer types in mem_rw() [ Upstream commit d238692b4b9f2c36e35af4c6e6f6da36184aeb3e ] Use size_t when capping the count argument received by mem_rw(). Since count is size_t, using min_t(int, ...) can lead to a negative value that will later be passed to access_remote_vm(), which can cause unexpected behavior. Since we are capping the value to at maximum PAGE_SIZE, the conversion from size_t to int when passing it to access_remote_vm() as "len" shouldn't be a problem. Link: https://lkml.kernel.org/r/20210512125215.3348316-1-marcelo.cerri@canonical.com Reviewed-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Souza Cascardo <cascardo@canonical.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Michel Lespinasse <walken@google.com> Cc: Helge Deller <deller@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-28 09:14:27 +02:00
Eric Sandeen	c5157b3e77	seq_file: disallow extremely large seq buffer allocations commit 8cae8cd89f05f6de223d63e6d15e31c8ba9cf53b upstream. There is no reasonable need for a buffer larger than this, and it avoids int overflow pitfalls. Fixes: 058504edd026 ("fs/seq_file: fallback to vmalloc allocation") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Qualys Security Advisory <qsa@qualys.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:21:16 +02:00
Zhihao Cheng	971bc532b2	ubifs: Set/Clear I_LINKABLE under i_lock for whiteout inode [ Upstream commit a801fcfeef96702fa3f9b22ad56c5eb1989d9221 ] xfstests-generic/476 reports a warning message as below: WARNING: CPU: 2 PID: 30347 at fs/inode.c:361 inc_nlink+0x52/0x70 Call Trace: do_rename+0x502/0xd40 [ubifs] ubifs_rename+0x8b/0x180 [ubifs] vfs_rename+0x476/0x1080 do_renameat2+0x67c/0x7b0 __x64_sys_renameat2+0x6e/0x90 do_syscall_64+0x66/0xe0 entry_SYSCALL_64_after_hwframe+0x44/0xae Following race case can cause this: rename_whiteout(Thread 1) wb_workfn(Thread 2) ubifs_rename do_rename __writeback_single_inode spin_lock(&inode->i_lock) whiteout->i_state \|= I_LINKABLE inode->i_state &= ~dirty; ---- How race happens on i_state: (tmp = whiteout->i_state \| I_LINKABLE) (tmp = inode->i_state & ~dirty) (whiteout->i_state = tmp) (inode->i_state = tmp) ---- spin_unlock(&inode->i_lock) inc_nlink(whiteout) WARN_ON(!(inode->i_state & I_LINKABLE)) !!! Fix to add i_lock to avoid i_state update race condition. Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:21:14 +02:00
Gao Xiang	cef9d9acb7	nfs: fix acl memory leak of posix_acl_create() [ Upstream commit 1fcb6fcd74a222d9ead54d405842fc763bb86262 ] When looking into another nfs xfstests report, I found acl and default_acl in nfs3_proc_create() and nfs3_proc_mknod() error paths are possibly leaked. Fix them in advance. Fixes: 013cdf1088d7 ("nfs: use generic posix ACL infrastructure for v3 Posix ACLs") Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:21:14 +02:00
Jeff Layton	58f8bcffc3	ceph: remove bogus checks and WARN_ONs from ceph_set_page_dirty [ Upstream commit 22d41cdcd3cfd467a4af074165357fcbea1c37f5 ] The checks for page->mapping are odd, as set_page_dirty is an address_space operation, and I don't see where it would be called on a non-pagecache page. The warning about the page lock also seems bogus. The comment over set_page_dirty() says that it can be called without the page lock in some rare cases. I don't think we want to warn if that's the case. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:21:13 +02:00
Mike Marshall	2307ca3deb	orangefs: fix orangefs df output. [ Upstream commit 0fdec1b3c9fbb5e856a40db5993c9eaf91c74a83 ] Orangefs df output is whacky. Walt Ligon suggested this might fix it. It seems way more in line with reality now... Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:21:13 +02:00
Jiapeng Chong	1fa1489cad	fs/jfs: Fix missing error code in lmLogInit() [ Upstream commit 492109333c29e1bb16d8732e1d597b02e8e0bf2e ] The error code is missing in this code scenario, add the error code '-EINVAL' to the return value 'rc. Eliminate the follow smatch warning: fs/jfs/jfs_logmgr.c:1327 lmLogInit() warn: missing error code 'rc'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:21:11 +02:00
Eric Biggers	74f26d6fb5	fscrypt: don't ignore minor_hash when hash is 0 commit 77f30bfcfcf484da7208affd6a9e63406420bf91 upstream. When initializing a no-key name, fscrypt_fname_disk_to_usr() sets the minor_hash to 0 if the (major) hash is 0. This doesn't make sense because 0 is a valid hash code, so we shouldn't ignore the filesystem-provided minor_hash in that case. Fix this by removing the special case for 'hash == 0'. This is an old bug that appears to have originated when the encryption code in ext4 and f2fs was moved into fs/crypto/. The original ext4 and f2fs code passed the hash by pointer instead of by value. So 'if (hash)' actually made sense then, as it was checking whether a pointer was NULL. But now the hashes are passed by value, and filesystems just pass 0 for any hashes they don't have. There is no need to handle this any differently from the hashes actually being 0. It is difficult to reproduce this bug, as it only made a difference in the case where a filename's 32-bit major hash happened to be 0. However, it probably had the largest chance of causing problems on ubifs, since ubifs uses minor_hash to do lookups of no-key names, in addition to using it as a readdir cookie. ext4 only uses minor_hash as a readdir cookie, and f2fs doesn't use minor_hash at all. Fixes: 0b81d0779072 ("fs crypto: move per-file encryption from f2fs tree to fs/crypto") Cc: <stable@vger.kernel.org> # v4.6+ Link: https://lore.kernel.org/r/20210527235236.2376556-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:21:11 +02:00
Pavel Skripkin	745c9a5942	jfs: fix GPF in diFree commit 9d574f985fe33efd6911f4d752de6f485a1ea732 upstream. Avoid passing inode with JFS_SBI(inode->i_sb)->ipimap == NULL to diFree()[1]. GFP will appear: struct inode ipimap = JFS_SBI(ip->i_sb)->ipimap; struct inomap imap = JFS_IP(ipimap)->i_imap; JFS_IP() will return invalid pointer when ipimap == NULL Call Trace: diFree+0x13d/0x2dc0 fs/jfs/jfs_imap.c:853 [1] jfs_evict_inode+0x2c9/0x370 fs/jfs/inode.c:154 evict+0x2ed/0x750 fs/inode.c:578 iput_final fs/inode.c:1654 [inline] iput.part.0+0x3fe/0x820 fs/inode.c:1680 iput+0x58/0x70 fs/inode.c:1670 Reported-and-tested-by: syzbot+0a89a7b56db04c21a656@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:21:10 +02:00
Miklos Szeredi	cfb2abd236	fuse: reject internal errno commit 49221cf86d18bb66fe95d3338cb33bd4b9880ca5 upstream. Don't allow userspace to report errors that could be kernel-internal. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Fixes: 334f485df85a ("[PATCH] FUSE - device functions") Cc: <stable@vger.kernel.org> # v2.6.14 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:21:08 +02:00
Arturo Giusti	371566f63c	udf: Fix NULL pointer dereference in udf_symlink function [ Upstream commit fa236c2b2d4436d9f19ee4e5d5924e90ffd7bb43 ] In function udf_symlink, epos.bh is assigned with the value returned by udf_tgetblk. The function udf_tgetblk is defined in udf/misc.c and returns the value of sb_getblk function that could be NULL. Then, epos.bh is used without any check, causing a possible NULL pointer dereference when sb_getblk fails. This fix adds a check to validate the value of epos.bh. Link: https://bugzilla.kernel.org/show_bug.cgi?id=213083 Signed-off-by: Arturo Giusti <koredump@protonmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:21:06 +02:00
Pavel Skripkin	03aec2c16a	reiserfs: add check for invalid 1st journal block [ Upstream commit a149127be52fa7eaf5b3681a0317a2bbb772d5a9 ] syzbot reported divide error in reiserfs. The problem was in incorrect journal 1st block. Syzbot's reproducer manualy generated wrong superblock with incorrect 1st block. In journal_init() wasn't any checks about this particular case. For example, if 1st journal block is before superblock 1st block, it can cause zeroing important superblock members in do_journal_end(). Link: https://lore.kernel.org/r/20210517121545.29645-1-paskripkin@gmail.com Reported-by: syzbot+0ba9909df31c6a36974d@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:21:06 +02:00
Chung-Chiang Cheng	c575c5434f	configfs: fix memleak in configfs_release_bin_file [ Upstream commit 3c252b087de08d3cb32468b54a158bd7ad0ae2f7 ] When reading binary attributes in progress, buffer->bin_buffer is setup in configfs_read_bin_file() but never freed. Fixes: 03607ace807b4 ("configfs: implement binary attributes") Signed-off-by: Chung-Chiang Cheng <cccheng@synology.com> [hch: move the vfree rather than duplicating it] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:21:05 +02:00
Muchun Song	91f805f97b	writeback: fix obtain a reference to a freeing memcg css [ Upstream commit 8b0ed8443ae6458786580d36b7d5f8125535c5d4 ] The caller of wb_get_create() should pin the memcg, because wb_get_create() relies on this guarantee. The rcu read lock only can guarantee that the memcg css returned by css_from_id() cannot be released, but the reference of the memcg can be zero. rcu_read_lock() memcg_css = css_from_id() wb_get_create(memcg_css) cgwb_create(memcg_css) // css_get can change the ref counter from 0 back to 1 css_get(memcg_css) rcu_read_unlock() Fix it by holding a reference to the css before calling wb_get_create(). This is not a problem I encountered in the real world. Just the result of a code review. Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates") Link: https://lore.kernel.org/r/20210402091145.80635-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:21:03 +02:00
Dan Carpenter	23044e81e8	ocfs2: fix snprintf() checking [ Upstream commit 54e948c60cc843b6e84dc44496edc91f51d2a28e ] The snprintf() function returns the number of bytes which would have been printed if the buffer was large enough. In other words it can return ">= remain" but this code assumes it returns "== remain". The run time impact of this bug is not very severe. The next iteration through the loop would trigger a WARN() when we pass a negative limit to snprintf(). We would then return success instead of -E2BIG. The kernel implementation of snprintf() will never return negatives so there is no need to check and I have deleted that dead code. Link: https://lkml.kernel.org/r/20210511135350.GV1955@kadam Fixes: a860f6eb4c6a ("ocfs2: sysfile interfaces for online file check") Fixes: 74ae4e104dfc ("ocfs2: Create stack glue sysfs files.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:21:01 +02:00
Alexander Aring	cfcb65adb1	fs: dlm: fix memory leak when fenced [ Upstream commit 700ab1c363c7b54c9ea3222379b33fc00ab02f7b ] I got some kmemleak report when a node was fenced. The user space tool dlm_controld will therefore run some rmdir() in dlm configfs which was triggering some memleaks. This patch stores the sps and cms attributes which stores some handling for subdirectories of the configfs cluster entry and free them if they get released as the parent directory gets freed. unreferenced object 0xffff88810d9e3e00 (size 192): comm "dlm_controld", pid 342, jiffies 4294698126 (age 55438.801s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 73 70 61 63 65 73 00 00 ........spaces.. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000db8b640b>] make_cluster+0x5d/0x360 [<000000006a571db4>] configfs_mkdir+0x274/0x730 [<00000000b094501c>] vfs_mkdir+0x27e/0x340 [<0000000058b0adaf>] do_mkdirat+0xff/0x1b0 [<00000000d1ffd156>] do_syscall_64+0x40/0x80 [<00000000ab1408c8>] entry_SYSCALL_64_after_hwframe+0x44/0xae unreferenced object 0xffff88810d9e3a00 (size 192): comm "dlm_controld", pid 342, jiffies 4294698126 (age 55438.801s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 63 6f 6d 6d 73 00 00 00 ........comms... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000a7ef6ad2>] make_cluster+0x82/0x360 [<000000006a571db4>] configfs_mkdir+0x274/0x730 [<00000000b094501c>] vfs_mkdir+0x27e/0x340 [<0000000058b0adaf>] do_mkdirat+0xff/0x1b0 [<00000000d1ffd156>] do_syscall_64+0x40/0x80 [<00000000ab1408c8>] entry_SYSCALL_64_after_hwframe+0x44/0xae Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:21:00 +02:00
Alexander Aring	186681458b	fs: dlm: cancel work sync othercon [ Upstream commit c6aa00e3d20c2767ba3f57b64eb862572b9744b3 ] These rx tx flags arguments are for signaling close_connection() from which worker they are called. Obviously the receive worker cannot cancel itself and vice versa for swork. For the othercon the receive worker should only be used, however to avoid deadlocks we should pass the same flags as the original close_connection() was called. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:20:59 +02:00
zhangyi (F)	df13e556ee	block_dump: remove block_dump feature in mark_inode_dirty() [ Upstream commit 12e0613715e1cf305fffafaf0e89d810d9a85cc0 ] block_dump is an old debugging interface, one of it's functions is used to print the information about who write which file on disk. If we enable block_dump through /proc/sys/vm/block_dump and turn on debug log level, we can gather information about write process name, target file name and disk from kernel message. This feature is realized in block_dump___mark_inode_dirty(), it print above information into kernel message directly when marking inode dirty, so it is noisy and can easily trigger log storm. At the same time, get the dentry refcount is also not safe, we found it will lead to deadlock on ext4 file system with data=journal mode. After tracepoints has been introduced into the kernel, we got a tracepoint in __mark_inode_dirty(), which is a better replacement of block_dump___mark_inode_dirty(). The only downside is that it only trace the inode number and not a file name, but it probably doesn't matter because the original printed file name in block_dump is not accurate in some cases, and we can still find it through the inode number and device id. So this patch delete the dirting inode part of block_dump feature. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210313030146.2882027-2-yi.zhang@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:20:59 +02:00
Christophe Leroy	481f944511	btrfs: disable build on platforms having page size 256K [ Upstream commit b05fbcc36be1f8597a1febef4892053a0b2f3f60 ] With a config having PAGE_SIZE set to 256K, BTRFS build fails with the following message include/linux/compiler_types.h:326:38: error: call to '__compiletime_assert_791' declared with attribute error: BUILD_BUG_ON failed: (BTRFS_MAX_COMPRESSED % PAGE_SIZE) != 0 BTRFS_MAX_COMPRESSED being 128K, BTRFS cannot support platforms with 256K pages at the time being. There are two platforms that can select 256K pages: - hexagon - powerpc Disable BTRFS when 256K page size is selected. Supporting this would require changes to the subpage mode that's currently being developed. Given that 256K is many times larger than page sizes commonly used and for what the algorithms and structures have been tuned, it's out of scope and disabling build is a reasonable option. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:20:59 +02:00
Josef Bacik	c8c6262d29	btrfs: abort transaction if we fail to update the delayed inode [ Upstream commit 04587ad9bef6ce9d510325b4ba9852b6129eebdb ] If we fail to update the delayed inode we need to abort the transaction, because we could leave an inode with the improper counts or some other such corruption behind. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:20:59 +02:00
Miklos Szeredi	78f27cb4c5	fuse: check connected before queueing on fpq->io commit 80ef08670d4c28a06a3de954bd350368780bcfef upstream. A request could end up on the fpq->io list after fuse_abort_conn() has reset fpq->connected and aborted requests on that list: Thread-1 Thread-2 ======== ======== ->fuse_simple_request() ->shutdown ->__fuse_request_send() ->queue_request() ->fuse_abort_conn() ->fuse_dev_do_read() ->acquire(fpq->lock) ->wait_for(fpq->lock) ->set err to all req's in fpq->io ->release(fpq->lock) ->acquire(fpq->lock) ->add req to fpq->io After the userspace copy is done the request will be ended, but req->out.h.error will remain uninitialized. Also the copy might block despite being already aborted. Fix both issues by not allowing the request to be queued on the fpq->io list after fuse_abort_conn() has processed this list. Reported-by: Pradeep P V K <pragalla@codeaurora.org> Fixes: fd22d62ed0c3 ("fuse: no fc->lock for iqueue parts") Cc: <stable@vger.kernel.org> # v4.2 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:20:58 +02:00
Pan Dong	7c6ae37aee	ext4: fix avefreec in find_group_orlov commit c89849cc0259f3d33624cc3bd127685c3c0fa25d upstream. The avefreec should be average free clusters instead of average free blocks, otherwize Orlov's allocator will not work properly when bigalloc enabled. Cc: stable@kernel.org Signed-off-by: Pan Dong <pandong.peter@bytedance.com> Link: https://lore.kernel.org/r/20210525073656.31594-1-pandong.peter@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:20:57 +02:00
Zhang Yi	8c5136e393	ext4: remove check for zero nr_to_scan in ext4_es_scan() commit e5e7010e5444d923e4091cafff61d05f2d19cada upstream. After converting fs shrinkers to new scan/count API, we are no longer pass zero nr_to_scan parameter to detect the number of objects to free, just remove this check. Fixes: 1ab6c4997e04 ("fs: convert fs shrinkers to new scan/count API") Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210522103045.690103-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:20:56 +02:00
Zhang Yi	d77d29f4b3	ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit commit 4fb7c70a889ead2e91e184895ac6e5354b759135 upstream. The cache_cnt parameter of tracepoint ext4_es_shrink_exit means the remaining cache count after shrink, but now it is the cache count before shrink, fix it by read sbi->s_extent_cache_cnt again. Fixes: 1ab6c4997e04 ("fs: convert fs shrinkers to new scan/count API") Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210522103045.690103-3-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:20:56 +02:00
Anirudh Rayabharam	25dcc64fa0	ext4: fix kernel infoleak via ext4_extent_header commit ce3aba43599f0b50adbebff133df8d08a3d5fffe upstream. Initialize eh_generation of struct ext4_extent_header to prevent leaking info to userspace. Fixes KMSAN kernel-infoleak bug reported by syzbot at: http://syzkaller.appspot.com/bug?id=78e9ad0e6952a3ca16e8234724b2fa92d041b9b8 Cc: stable@kernel.org Reported-by: syzbot+2dcfeaf8cb49b05e8f1a@syzkaller.appspotmail.com Fixes: a86c61812637 ("[PATCH] ext3: add extent map support") Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com> Link: https://lore.kernel.org/r/20210506185655.7118-1-mail@anirudhrb.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:20:56 +02:00
David Sterba	f1b6e445be	btrfs: clear defrag status of a root if starting transaction fails commit 6819703f5a365c95488b07066a8744841bf14231 upstream. The defrag loop processes leaves in batches and starting transaction for each. The whole defragmentation on a given root is protected by a bit but in case the transaction fails, the bit is not cleared In case the transaction fails the bit would prevent starting defragmentation again, so make sure it's cleared. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:20:56 +02:00
Desmond Cheong Zhi Xi	b0dfe924a0	ntfs: fix validity check for file name attribute commit d98e4d95411bbde2220a7afa38dcc9c14d71acbe upstream. When checking the file name attribute, we want to ensure that it fits within the bounds of ATTR_RECORD. To do this, we should check that (attr record + file name offset + file name length) < (attr record + attr record length). However, the original check did not include the file name offset in the calculation. This means that corrupted on-disk metadata might not caught by the incorrect file name check, and lead to an invalid memory access. An example can be seen in the crash report of a memory corruption error found by Syzbot: https://syzkaller.appspot.com/bug?id=a1a1e379b225812688566745c3e2f7242bffc246 Adding the file name offset to the validity check fixes this error and passes the Syzbot reproducer test. Link: https://lkml.kernel.org/r/20210614050540.289494-1-desmondcheongzx@gmail.com Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Reported-by: syzbot+213ac8bb98f7f4420840@syzkaller.appspotmail.com Tested-by: syzbot+213ac8bb98f7f4420840@syzkaller.appspotmail.com Acked-by: Anton Altaparmakov <anton@tuxera.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:20:56 +02:00
Pavel Skripkin	46c38e258b	nilfs2: fix memory leak in nilfs_sysfs_delete_device_group [ Upstream commit 8fd0c1b0647a6bda4067ee0cd61e8395954b6f28 ] My local syzbot instance hit memory leak in nilfs2. The problem was in missing kobject_put() in nilfs_sysfs_delete_device_group(). kobject_del() does not call kobject_cleanup() for passed kobject and it leads to leaking duped kobject name if kobject_put() was not called. Fail log: BUG: memory leak unreferenced object 0xffff8880596171e0 (size 8): comm "syz-executor379", pid 8381, jiffies 4294980258 (age 21.100s) hex dump (first 8 bytes): 6c 6f 6f 70 30 00 00 00 loop0... backtrace: kstrdup+0x36/0x70 mm/util.c:60 kstrdup_const+0x53/0x80 mm/util.c:83 kvasprintf_const+0x108/0x190 lib/kasprintf.c:48 kobject_set_name_vargs+0x56/0x150 lib/kobject.c:289 kobject_add_varg lib/kobject.c:384 [inline] kobject_init_and_add+0xc9/0x160 lib/kobject.c:473 nilfs_sysfs_create_device_group+0x150/0x800 fs/nilfs2/sysfs.c:999 init_nilfs+0xe26/0x12b0 fs/nilfs2/the_nilfs.c:637 Link: https://lkml.kernel.org/r/20210612140559.20022-1-paskripkin@gmail.com Fixes: da7141fb78db ("nilfs2: add /sys/fs/nilfs2/<device> group") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-30 08:49:21 -04:00
Hillf Danton	92869945cc	gfs2: Fix use-after-free in gfs2_glock_shrink_scan [ Upstream commit 1ab19c5de4c537ec0d9b21020395a5b5a6c059b2 ] The GLF_LRU flag is checked under lru_lock in gfs2_glock_remove_from_lru() to remove the glock from the lru list in __gfs2_glock_put(). On the shrink scan path, the same flag is cleared under lru_lock but because of cond_resched_lock(&lru_lock) in gfs2_dispose_glock_lru(), progress on the put side can be made without deleting the glock from the lru list. Keep GLF_LRU across the race window opened by cond_resched_lock(&lru_lock) to ensure correct behavior on both sides - clear GLF_LRU after list_del under lru_lock. Reported-by: syzbot <syzbot+34ba7ddbf3021981a228@syzkaller.appspotmail.com> Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-30 08:49:11 -04:00
Linus Torvalds	b08854de43	proc: only require mm_struct for writing commit 94f0b2d4a1d0c52035aef425da5e022bd2cb1c71 upstream. Commit 591a22c14d3f ("proc: Track /proc/$pid/attr/ opener mm_struct") we started using __mem_open() to track the mm_struct at open-time, so that we could then check it for writes. But that also ended up making the permission checks at open time much stricter - and not just for writes, but for reads too. And that in turn caused a regression for at least Fedora 29, where NIC interfaces fail to start when using NetworkManager. Since only the write side wanted the mm_struct test, ignore any failures by __mem_open() at open time, leaving reads unaffected. The write() time verification of the mm_struct pointer will then catch the failure case because a NULL pointer will not match a valid 'current->mm'. Link: https://lore.kernel.org/netdev/YMjTlp2FSJYvoyFa@unreal/ Fixes: 591a22c14d3f ("proc: Track /proc/$pid/attr/ opener mm_struct") Reported-and-tested-by: Leon Romanovsky <leon@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-16 11:36:36 +02:00
Dai Ngo	f80f193f9d	NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error. commit f8849e206ef52b584cd9227255f4724f0cc900bb upstream. Currently if __nfs4_proc_set_acl fails with NFS4ERR_BADOWNER it re-enables the idmapper by clearing NFS_CAP_UIDGID_NOMAP before retrying again. The NFS_CAP_UIDGID_NOMAP remains cleared even if the retry fails. This causes problem for subsequent setattr requests for v4 server that does not have idmapping configured. This patch modifies nfs4_proc_set_acl to detect NFS4ERR_BADOWNER and NFS4ERR_BADNAME and skips the retry, since the kernel isn't involved in encoding the ACEs, and return -EINVAL. Steps to reproduce the problem: # mount -o vers=4.1,sec=sys server:/export/test /tmp/mnt # touch /tmp/mnt/file1 # chown 99 /tmp/mnt/file1 # nfs4_setfacl -a A::unknown.user@xyz.com:wrtncy /tmp/mnt/file1 Failed setxattr operation: Invalid argument # chown 99 /tmp/mnt/file1 chown: changing ownership of ‘/tmp/mnt/file1’: Invalid argument # umount /tmp/mnt # mount -o vers=4.1,sec=sys server:/export/test /tmp/mnt # chown 99 /tmp/mnt/file1 # v2: detect NFS4ERR_BADOWNER and NFS4ERR_BADNAME and skip retry in nfs4_proc_set_acl. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-16 11:36:35 +02:00
Dan Carpenter	a979e60100	NFS: Fix a potential NULL dereference in nfs_get_client() [ Upstream commit 09226e8303beeec10f2ff844d2e46d1371dc58e0 ] None of the callers are expecting NULL returns from nfs_get_client() so this code will lead to an Oops. It's better to return an error pointer. I expect that this is dead code so hopefully no one is affected. Fixes: 31434f496abb ("nfs: check hostname in nfs_get_client") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-16 11:36:35 +02:00
Ritesh Harjani	9a27493e2a	btrfs: return value from btrfs_mark_extent_written() in case of error commit e7b2ec3d3d4ebeb4cff7ae45cf430182fa6a49fb upstream. We always return 0 even in case of an error in btrfs_mark_extent_written(). Fix it to return proper error value in case of a failure. All callers handle it. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-16 11:36:34 +02:00
Kees Cook	63ac7652ec	proc: Track /proc/$pid/attr/ opener mm_struct commit 591a22c14d3f45cc38bd1931c593c221df2f1881 upstream. Commit bfb819ea20ce ("proc: Check /proc/$pid/attr/ writes against file opener") tried to make sure that there could not be a confusion between the opener of a /proc/$pid/attr/ file and the writer. It used struct cred to make sure the privileges didn't change. However, there were existing cases where a more privileged thread was passing the opened fd to a differently privileged thread (during container setup). Instead, use mm_struct to track whether the opener and writer are still the same process. (This is what several other proc files already do, though for different reasons.) Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Reported-by: Andrea Righi <andrea.righi@canonical.com> Tested-by: Andrea Righi <andrea.righi@canonical.com> Fixes: bfb819ea20ce ("proc: Check /proc/$pid/attr/ writes against file opener") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-16 11:36:32 +02:00
Josef Bacik	7a7ccd02dc	btrfs: fixup error handling in fixup_inode_link_counts commit 011b28acf940eb61c000059dd9e2cfcbf52ed96b upstream. This function has the following pattern while (1) { ret = whatever(); if (ret) goto out; } ret = 0 out: return ret; However several places in this while loop we simply break; when there's a problem, thus clearing the return value, and in one case we do a return -EIO, and leak the memory for the path. Fix this by re-arranging the loop to deal with ret == 1 coming from btrfs_search_slot, and then simply delete the ret = 0; out: bit so everybody can break if there is an error, which will allow for proper error handling to occur. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-10 12:42:37 +02:00
Josef Bacik	247dcb0f56	btrfs: fix error handling in btrfs_del_csums commit b86652be7c83f70bf406bed18ecf55adb9bfb91b upstream. Error injection stress would sometimes fail with checksums on disk that did not have a corresponding extent. This occurred because the pattern in btrfs_del_csums was while (1) { ret = btrfs_search_slot(); if (ret < 0) break; } ret = 0; out: btrfs_free_path(path); return ret; If we got an error from btrfs_search_slot we'd clear the error because we were breaking instead of goto out. Instead of using goto out, simply handle the cases where we may leave a random value in ret, and get rid of the ret = 0; out: pattern and simply allow break to have the proper error reporting. With this fix we properly abort the transaction and do not commit thinking we successfully deleted the csum. Reviewed-by: Qu Wenruo <wqu@suse.com> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-10 12:42:36 +02:00
Junxiao Bi	33e03adafb	ocfs2: fix data corruption by fallocate commit 6bba4471f0cc1296fe3c2089b9e52442d3074b2e upstream. When fallocate punches holes out of inode size, if original isize is in the middle of last cluster, then the part from isize to the end of the cluster will be zeroed with buffer write, at that time isize is not yet updated to match the new size, if writeback is kicked in, it will invoke ocfs2_writepage()->block_write_full_page() where the pages out of inode size will be dropped. That will cause file corruption. Fix this by zero out eof blocks when extending the inode size. Running the following command with qemu-image 4.2.1 can get a corrupted coverted image file easily. qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ -O qcow2 -o compat=1.1 $qcow_image.conv The usage of fallocate in qemu is like this, it first punches holes out of inode size, then extend the inode size. fallocate(11, FALLOC_FL_KEEP_SIZE\|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 fallocate(11, 0, 2276196352, 65536) = 0 v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html v2: https://lore.kernel.org/linux-fsdevel/20210525093034.GB4112@quack2.suse.cz/T/ Link: https://lkml.kernel.org/r/20210528210648.9124-1-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-10 12:42:36 +02:00
Ye Bin	5b3a9a2be5	ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed commit 082cd4ec240b8734a82a89ffb890216ac98fec68 upstream. We got follow bug_on when run fsstress with injecting IO fault: [130747.323114] kernel BUG at fs/ext4/extents_status.c:762! [130747.323117] Internal error: Oops - BUG: 0 [#1] SMP ...... [130747.334329] Call trace: [130747.334553] ext4_es_cache_extent+0x150/0x168 [ext4] [130747.334975] ext4_cache_extents+0x64/0xe8 [ext4] [130747.335368] ext4_find_extent+0x300/0x330 [ext4] [130747.335759] ext4_ext_map_blocks+0x74/0x1178 [ext4] [130747.336179] ext4_map_blocks+0x2f4/0x5f0 [ext4] [130747.336567] ext4_mpage_readpages+0x4a8/0x7a8 [ext4] [130747.336995] ext4_readpage+0x54/0x100 [ext4] [130747.337359] generic_file_buffered_read+0x410/0xae8 [130747.337767] generic_file_read_iter+0x114/0x190 [130747.338152] ext4_file_read_iter+0x5c/0x140 [ext4] [130747.338556] __vfs_read+0x11c/0x188 [130747.338851] vfs_read+0x94/0x150 [130747.339110] ksys_read+0x74/0xf0 This patch's modification is according to Jan Kara's suggestion in: https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201-1-yebin10@huawei.com/ "I see. Now I understand your patch. Honestly, seeing how fragile is trying to fix extent tree after split has failed in the middle, I would probably go even further and make sure we fix the tree properly in case of ENOSPC and EDQUOT (those are easily user triggerable). Anything else indicates a HW problem or fs corruption so I'd rather leave the extent tree as is and don't try to fix it (which also means we will not create overlapping extents)." Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210506141042.3298679-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-10 12:42:36 +02:00
Mike Kravetz	96f7d5fd6d	hugetlbfs: hugetlb_fault_mutex_hash() cleanup commit 552546366a30d88bd1d6f5efe848b2ab50fd57e5 upstream. A new clang diagnostic (-Wsizeof-array-div) warns about the calculation to determine the number of u32's in an array of unsigned longs. Suppress warning by adding parentheses. While looking at the above issue, noticed that the 'address' parameter to hugetlb_fault_mutex_hash is no longer used. So, remove it from the definition and all callers. No functional change. Link: http://lkml.kernel.org/r/20190919011847.18400-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Ilie Halip <ilie.halip@gmail.com> Cc: David Bolvansky <david.bolvansky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-03 08:23:33 +02:00
Josef Bacik	e934c4ee17	btrfs: do not BUG_ON in link_to_fixup_dir [ Upstream commit 91df99a6eb50d5a1bc70fff4a09a0b7ae6aab96d ] While doing error injection testing I got the following panic kernel BUG at fs/btrfs/tree-log.c:1862! invalid opcode: 0000 [#1] SMP NOPTI CPU: 1 PID: 7836 Comm: mount Not tainted 5.13.0-rc1+ #305 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 RIP: 0010:link_to_fixup_dir+0xd5/0xe0 RSP: 0018:ffffb5800180fa30 EFLAGS: 00010216 RAX: fffffffffffffffb RBX: 00000000fffffffb RCX: ffff8f595287faf0 RDX: ffffb5800180fa37 RSI: ffff8f5954978800 RDI: 0000000000000000 RBP: ffff8f5953af9450 R08: 0000000000000019 R09: 0000000000000001 R10: 000151f408682970 R11: 0000000120021001 R12: ffff8f5954978800 R13: ffff8f595287faf0 R14: ffff8f5953c77dd0 R15: 0000000000000065 FS: 00007fc5284c8c40(0000) GS:ffff8f59bbd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc5287f47c0 CR3: 000000011275e002 CR4: 0000000000370ee0 Call Trace: replay_one_buffer+0x409/0x470 ? btree_read_extent_buffer_pages+0xd0/0x110 walk_up_log_tree+0x157/0x1e0 walk_log_tree+0xa6/0x1d0 btrfs_recover_log_trees+0x1da/0x360 ? replay_one_extent+0x7b0/0x7b0 open_ctree+0x1486/0x1720 btrfs_mount_root.cold+0x12/0xea ? __kmalloc_track_caller+0x12f/0x240 legacy_get_tree+0x24/0x40 vfs_get_tree+0x22/0xb0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 ? vfs_parse_fs_string+0x4d/0x90 legacy_get_tree+0x24/0x40 vfs_get_tree+0x22/0xb0 path_mount+0x433/0xa10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x3d/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae We can get -EIO or any number of legitimate errors from btrfs_search_slot(), panicing here is not the appropriate response. The error path for this code handles errors properly, simply return the error. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-03 08:23:32 +02:00
Zhang Xiaoxu	fa59cebdb7	NFSv4: Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config commit e67afa7ee4a59584d7253e45d7f63b9528819a13 upstream. Since commit bdcc2cd14e4e ("NFSv4.2: handle NFS-specific llseek errors"), nfs42_proc_llseek would return -EOPNOTSUPP rather than -ENOTSUPP when SEEK_DATA on NFSv4.0/v4.1. This will lead xfstests generic/285 not run on NFSv4.0/v4.1 when set the CONFIG_NFS_V4_2, rather than run failed. Fixes: bdcc2cd14e4e ("NFSv4.2: handle NFS-specific llseek errors") Cc: <stable.vger.kernel.org> # 4.2 Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-03 08:23:30 +02:00
Trond Myklebust	b291baae24	NFS: Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce() commit 0d0ea309357dea0d85a82815f02157eb7fcda39f upstream. The value of mirror->pg_bytes_written should only be updated after a successful attempt to flush out the requests on the list. Fixes: a7d42ddb3099 ("nfs: add mirroring support to pgio layer") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-03 08:23:30 +02:00
Dan Carpenter	b287521e9e	NFS: fix an incorrect limit in filelayout_decode_layout() commit 769b01ea68b6c49dc3cde6adf7e53927dacbd3a8 upstream. The "sizeof(struct nfs_fh)" is two bytes too large and could lead to memory corruption. It should be NFS_MAXFHSIZE because that's the size of the ->data[] buffer. I reversed the size of the arguments to put the variable on the left. Fixes: 16b374ca439f ("NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-03 08:23:30 +02:00

... 2 3 4 5 6 ...

48688 Commits