linux

iv/linux

Author	SHA1	Message	Date
David Howells	ba0803050d	smb3: missing inode locks in punch hole smb3 fallocate punch hole was not grabbing the inode or filemap_invalidate locks so could have race with pagemap reinstantiating the page. Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-23 03:15:27 -05:00
David Howells	c919c164fc	smb3: missing inode locks in zero range smb3 fallocate zero range was not grabbing the inode or filemap_invalidate locks so could have race with pagemap reinstantiating the page. Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-23 01:09:08 -05:00
Linus Torvalds	072e51356c	NFS client bugfixes for Linux 6.0 Highlights include: Stable fixes - NFS: Fix another fsync() issue after a server reboot Bugfixes - NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT - NFS: Fix missing unlock in nfs_unlink() - Add sanity checking of the file type used by __nfs42_ssc_open - Fix a case where we're failing to set task->tk_rpc_status Cleanups - Remove the flag NFS_CONTEXT_RESEND_WRITES that got obsoleted by the fsync() fix -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmMDk3wACgkQZwvnipYK APJLpw/+ONqG16L5W31/BzGJ80DlG9CERMad7Yt8+lk+ih574k/OrCotHThMyBm9 2TfY3S8zD9QoLnsPesDKeoc6AYyL3el0Wo2vKmWlGvrirvrzNt9nMc61CDMs2IHT kN7gjO2P1LCZln8GTE87C4tI3Pg0Cwr4UUlyHHjMSKdYuJckJugj1gDvblSjn5h4 bGKGEJ9G71G1REn013sVqmQ6huvQ3iif07X5NaN7T5e+TpNFet/0AlTmrA9zsUDI WPm+efP+ieTmihvhqOSYdV31uHN/ECx4p60ITzAlWwPYPyXr1M0r9acUGX10ENna eX1B9nyxbUAzO6rxxPgXi3LXgvmgRDVEmbSs5IL985XR2zsVR+AF3dgAwoJXqV9y 7mAtoiwyqe3idvaK+mHU4OWCSqdhZbauJJ+Jc0ZHZHy2vzHPS2CWcpvXHjVTw63R txOkUFL89SwnqJv03N6CZt4OyY1av97dDOEvPqHuRx4NyfT3v/QvF5W3V/UvLnt2 hTPNGIRUPZU1lpfqEgd7NXWO6LLtkWK2MciRGVnSFf2S5uKYqvlbPsVqWc6CviXc Mu4o2RoctkIwxexSfHY0p7UQrbu3OvYgTuIzgy6cIZ2GK70L29UpJYBe5YEh9Qru J/Pgn1ZSdGgDgwqzR8S92PTbbKq1caOqnGReFdyJDnCetb6LrrA= =tpKO -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client fixes from Trond Myklebust: "Stable fixes: - NFS: Fix another fsync() issue after a server reboot Bugfixes: - NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT - NFS: Fix missing unlock in nfs_unlink() - Add sanity checking of the file type used by __nfs42_ssc_open - Fix a case where we're failing to set task->tk_rpc_status Cleanups: - Remove the NFS_CONTEXT_RESEND_WRITES flag that got obsoleted by the fsync() fix" * tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: RPC level errors should set task->tk_rpc_status NFSv4.2 fix problems with __nfs42_ssc_open NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT NFS: Cleanup to remove unused flag NFS_CONTEXT_RESEND_WRITES NFS: Remove a bogus flag setting in pnfs_write_done_resend_to_mds NFS: Fix another fsync() issue after a server reboot NFS: Fix missing unlock in nfs_unlink()	2022-08-22 11:40:01 -07:00
Linus Torvalds	d3cd67d671	fs.idmapped.fixes.v6.0-rc3 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYwNkcwAKCRCRxhvAZXjc ohHlAQC0Sb0jZxLDHeKXr6lHR+a+jOYTisM/8GkCygBhYBqlFgD+KclaIVJp9v/1 O88/iv91XfomkDSxNknv+MxoWE2i7Ao= =hvvJ -----END PGP SIGNATURE----- Merge tag 'fs.idmapped.fixes.v6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull idmapping fixes from Christian Brauner: - Since Seth joined as co-maintainer for idmapped mounts we decided to use a shared git tree. Konstantin suggested we use vfs/idmapping.git on kernel.org under the vfs/ namespace. So this updates the tree in the maintainers file. - Ensure that POSIX ACLs checking, getting, and setting works correctly for filesystems mountable with a filesystem idmapping that want to support idmapped mounts. Since no filesystems mountable with an fs_idmapping do yet support idmapped mounts there is no problem. But this could change in the future, so add a check to refuse to create idmapped mounts when the mounter is not privileged over the mount's idmapping. - Check that caller is privileged over the idmapping that will be attached to a mount. Currently no FS_USERNS_MOUNT filesystems support idmapped mounts, thus this is not a problem as only CAP_SYS_ADMIN in init_user_ns is allowed to set up idmapped mounts. But this could change in the future, so add a check to refuse to create idmapped mounts when the mounter is not privileged over the mount's idmapping. - Fix POSIX ACLs for ntfs3. While looking at our current POSIX ACL handling in the context of some overlayfs work I went through a range of other filesystems checking how they handle them currently and encountered a few bugs in ntfs3. I've sent this some time ago and the fixes haven't been picked up even though the pull request for other ntfs3 fixes got sent after. This should really be fixed as right now POSIX ACLs are broken in certain circumstances for ntfs3. * tag 'fs.idmapped.fixes.v6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: ntfs: fix acl handling fs: require CAP_SYS_ADMIN in target namespace for idmapped mounts MAINTAINERS: update idmapping tree acl: handle idmapped mounts for idmapped filesystems	2022-08-22 11:33:02 -07:00
Linus Torvalds	b20ee4813f	File locking bugfix for v6.0 -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEES8DXskRxsqGE6vXTAA5oQRlWghUFAmMDXnITHGpsYXl0b25A a2VybmVsLm9yZwAKCRAADmhBGVaCFZDRD/9IvIV0OVtEW79KpHzTHs1mfJ5ImQsH 2kYnIoamjgM47IUTtwgPDQp41iORqqETSUVUlHxGZdWygl6mOvXT1ImMr7EHIA/I GOJ3j6Sol2gfpn2/Oh1QASI07wJ7xeyXQEFw5eFne54EgkR6E+UgPIaGzXpdVePf 3w26eJZnk2XfjIXiaMa4M7KB+Hb4ornc1u8dWOGrMUPBTw83OwCSCND+aZqVObp4 Vj1pB9BpK1KQBoh/nquPvjyywB+C3MZWTYVq7T7WN9r3OwTVhnFT0hLbH/ww4dgs Xfm0px/9G3cDt7mnaF0X8ynpj1+Hfu+yVTd5WmANXeYBXm8z+Nc0I6mdY8edaMPk dOEllm+uKAykv9dj8JeuF+9XZkrvOg31GAbnJWq04YI1AvDuEm7Z7STNSTAbR7TI SoJox0oQd9LpqBrUaD0QyOJyHy2Bhr0AOAq3z7e13ZpEdK9ITsk8Q49vsqfA57Js xxzpCna+ClhKsLHCoRtqAy+Kosp2+52dH0T+1BNsBNQjwD2apHZwlPm/6mJzHOOd E875Uswj567qlP00AzeYCCq0zdgqdFUwedJe34G34XJRNksyeOouRqmOZbkfF5AX niID7OSuDVdvqPGDlkzQPDkewciBjzmTf1I80m5/S1uAqzRCtB/Q9ULIiYVQGcom lvq/ZmyMSJfPaQ== =kfx9 -----END PGP SIGNATURE----- Merge tag 'filelock-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking fix from Jeff Layton: "Just a single patch for a bugfix in the flock() codepath, introduced by a patch that went in recently" * tag 'filelock-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: Fix dropped call to ->fl_release_private()	2022-08-22 10:40:09 -07:00
Josef Bacik	79d3d1d12e	btrfs: don't allow large NOWAIT direct reads Dylan and Jens reported a problem where they had an io_uring test that was returning short reads, and bisected it to `ee5b46a353` ("btrfs: increase direct io read size limit to 256 sectors"). The root cause is their test was doing larger reads via io_uring with NOWAIT and async. This was triggering a page fault during the direct read, however the first page was able to work just fine and thus we submitted a 4k read for a larger iocb. Btrfs allows for partial IO's in this case specifically because we don't allow page faults, and thus we'll attempt to do any io that we can, submit what we could, come back and fault in the rest of the range and try to do the remaining IO. However for !is_sync_kiocb() we'll call ->ki_complete() as soon as the partial dio is done, which is incorrect. In the sync case we can exit the iomap code, submit more io's, and return with the amount of IO we were able to complete successfully. We were always doing short reads in this case, but for NOWAIT we were getting saved by the fact that we were limiting direct reads to sectorsize, and if we were larger than that we would return EAGAIN. Fix the regression by simply returning EAGAIN in the NOWAIT case with larger reads, that way io_uring can retry and get the larger IO and have the fault logic handle everything properly. This still leaves the AIO short read case, but that existed before this change. The way to properly fix this would be to handle partial iocb completions, but that's a lot of work, for now deal with the regression in the most straightforward way possible. Reported-by: Dylan Yudaken <dylany@fb.com> Fixes: `ee5b46a353` ("btrfs: increase direct io read size limit to 256 sectors") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-08-22 18:08:07 +02:00
Qu Wenruo	4a445b7b61	btrfs: don't merge pages into bio if their page offset is not contiguous [BUG] Zygo reported on latest development branch, he could hit ASSERT()/BUG_ON() caused crash when doing RAID5 recovery (intentionally corrupt one disk, and let btrfs to recover the data during read/scrub). And The following minimal reproducer can cause extent state leakage at rmmod time: mkfs.btrfs -f -d raid5 -m raid5 $dev1 $dev2 $dev3 -b 1G > /dev/null mount $dev1 $mnt fsstress -w -d $mnt -n 25 -s 1660807876 sync fssum -A -f -w /tmp/fssum.saved $mnt umount $mnt # Wipe the dev1 but keeps its super block xfs_io -c "pwrite -S 0x0 1m 1023m" $dev1 mount $dev1 $mnt fssum -r /tmp/fssum.saved $mnt > /dev/null umount $mnt rmmod btrfs This will lead to the following extent states leakage: BTRFS: state leak: start 499712 end 503807 state 5 in tree 1 refs 1 BTRFS: state leak: start 495616 end 499711 state 5 in tree 1 refs 1 BTRFS: state leak: start 491520 end 495615 state 5 in tree 1 refs 1 BTRFS: state leak: start 487424 end 491519 state 5 in tree 1 refs 1 BTRFS: state leak: start 483328 end 487423 state 5 in tree 1 refs 1 BTRFS: state leak: start 479232 end 483327 state 5 in tree 1 refs 1 BTRFS: state leak: start 475136 end 479231 state 5 in tree 1 refs 1 BTRFS: state leak: start 471040 end 475135 state 5 in tree 1 refs 1 [CAUSE] Since commit `7aa51232e2` ("btrfs: pass a btrfs_bio to btrfs_repair_one_sector"), we always use btrfs_bio->file_offset to determine the file offset of a page. But that usage assume that, one bio has all its page having a continuous page offsets. Unfortunately that's not true, btrfs only requires the logical bytenr contiguous when assembling its bios. From above script, we have one bio looks like this: fssum-27671 submit_one_bio: bio logical=217739264 len=36864 fssum-27671 submit_one_bio: r/i=5/261 page_offset=466944 <<< fssum-27671 submit_one_bio: r/i=5/261 page_offset=724992 <<< fssum-27671 submit_one_bio: r/i=5/261 page_offset=729088 fssum-27671 submit_one_bio: r/i=5/261 page_offset=733184 fssum-27671 submit_one_bio: r/i=5/261 page_offset=737280 fssum-27671 submit_one_bio: r/i=5/261 page_offset=741376 fssum-27671 submit_one_bio: r/i=5/261 page_offset=745472 fssum-27671 submit_one_bio: r/i=5/261 page_offset=749568 fssum-27671 submit_one_bio: r/i=5/261 page_offset=753664 Note that the 1st and the 2nd page has non-contiguous page offsets. This means, at repair time, we will have completely wrong file offset passed in: kworker/u32:2-19927 btrfs_repair_one_sector: r/i=5/261 page_off=729088 file_off=475136 bio_offset=8192 Since the file offset is incorrect, we latter incorrectly set the extent states, and no way to really release them. Thus later it causes the leakage. In fact, this can be even worse, since the file offset is incorrect, we can hit cases like the incorrect file offset belongs to a HOLE, and later cause btrfs_num_copies() to trigger error, finally hit BUG_ON()/ASSERT() later. [FIX] Add an extra condition in btrfs_bio_add_page() for uncompressed IO. Now we will have more strict requirement for bio pages: - They should all have the same mapping (the mapping check is already implied by the call chain) - Their logical bytenr should be adjacent This is the same as the old condition. - Their page_offset() (file offset) should be adjacent This is the new check. This would result a slightly increased amount of bios from btrfs (needs holes and inside the same stripe boundary to trigger). But this would greatly reduce the confusion, as it's pretty common to assume a btrfs bio would only contain continuous page cache. Later we may need extra cleanups, as we no longer needs to handle gaps between page offsets in endio functions. Currently this should be the minimal patch to fix commit `7aa51232e2` ("btrfs: pass a btrfs_bio to btrfs_repair_one_sector"). Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Fixes: `7aa51232e2` ("btrfs: pass a btrfs_bio to btrfs_repair_one_sector") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-08-22 18:06:58 +02:00
Filipe Manana	e6e3dec6c3	btrfs: update generation of hole file extent item when merging holes When punching a hole into a file range that is adjacent with a hole and we are not using the no-holes feature, we expand the range of the adjacent file extent item that represents a hole, to save metadata space. However we don't update the generation of hole file extent item, which means a full fsync will not log that file extent item if the fsync happens in a later transaction (since commit `7f30c07288` ("btrfs: stop copying old file extents when doing a full fsync")). For example, if we do this: $ mkfs.btrfs -f -O ^no-holes /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0xab 2M 2M" /mnt/foobar $ sync We end up with 2 file extent items in our file: 1) One that represents the hole for the file range [0, 2M), with a generation of 7; 2) Another one that represents an extent covering the range [2M, 4M). After that if we do the following: $ xfs_io -c "fpunch 2M 2M" /mnt/foobar We end up with a single file extent item in the file, which represents a hole for the range [0, 4M) and with a generation of 7 - because we end dropping the data extent for range [2M, 4M) and then update the file extent item that represented the hole at [0, 2M), by increasing length from 2M to 4M. Then doing a full fsync and power failing: $ xfs_io -c "fsync" /mnt/foobar <power failure> will result in the full fsync not logging the file extent item that represents the hole for the range [0, 4M), because its generation is 7, which is lower than the generation of the current transaction (8). As a consequence, after mounting again the filesystem (after log replay), the region [2M, 4M) does not have a hole, it still points to the previous data extent. So fix this by always updating the generation of existing file extent items representing holes when we merge/expand them. This solves the problem and it's the same approach as when we merge prealloc extents that got written (at btrfs_mark_extent_written()). Setting the generation to the current transaction's generation is also what we do when merging the new hole extent map with the previous one or the next one. A test case for fstests, covering both cases of hole file extent item merging (to the left and to the right), will be sent soon. Fixes: `7f30c07288` ("btrfs: stop copying old file extents when doing a full fsync") CC: stable@vger.kernel.org # 5.18+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-08-22 18:06:42 +02:00
Zixuan Fu	9ea0106a7a	btrfs: fix possible memory leak in btrfs_get_dev_args_from_path() In btrfs_get_dev_args_from_path(), btrfs_get_bdev_and_sb() can fail if the path is invalid. In this case, btrfs_get_dev_args_from_path() returns directly without freeing args->uuid and args->fsid allocated before, which causes memory leak. To fix these possible leaks, when btrfs_get_bdev_and_sb() fails, btrfs_put_dev_args_from_path() is called to clean up the memory. Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Fixes: `faa775c41d` ("btrfs: add a btrfs_get_dev_args_from_path helper") CC: stable@vger.kernel.org # 5.16 Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Zixuan Fu <r33s3n6@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-08-22 18:06:33 +02:00
Goldwyn Rodrigues	b51111271b	btrfs: check if root is readonly while setting security xattr For a filesystem which has btrfs read-only property set to true, all write operations including xattr should be denied. However, security xattr can still be changed even if btrfs ro property is true. This happens because xattr_permission() does not have any restrictions on security., system. and in some cases trusted.* from VFS and the decision is left to the underlying filesystem. See comments in xattr_permission() for more details. This patch checks if the root is read-only before performing the set xattr operation. Testcase: DEV=/dev/vdb MNT=/mnt mkfs.btrfs -f $DEV mount $DEV $MNT echo "file one" > $MNT/f1 setfattr -n "security.one" -v 2 $MNT/f1 btrfs property set /mnt ro true setfattr -n "security.one" -v 1 $MNT/f1 umount $MNT CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-08-22 18:06:30 +02:00
Christian Brauner	0c3bc7899e	ntfs: fix acl handling While looking at our current POSIX ACL handling in the context of some overlayfs work I went through a range of other filesystems checking how they handle them currently and encountered ntfs3. The posic_acl_{from,to}_xattr() helpers always need to operate on the filesystem idmapping. Since ntfs3 can only be mounted in the initial user namespace the relevant idmapping is init_user_ns. The posix_acl_{from,to}_xattr() helpers are concerned with translating between the kernel internal struct posix_acl{_entry} and the uapi struct posix_acl_xattr_{header,entry} and the kernel internal data structure is cached filesystem wide. Additional idmappings such as the caller's idmapping or the mount's idmapping are handled higher up in the VFS. Individual filesystems usually do not need to concern themselves with these. The posix_acl_valid() helper is concerned with checking whether the values in the kernel internal struct posix_acl can be represented in the filesystem's idmapping. IOW, if they can be written to disk. So this helper too needs to take the filesystem's idmapping. Fixes: `be71b5cba2` ("fs/ntfs3: Add attrib operations") Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: ntfs3@lists.linux.dev Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-08-22 12:52:23 +02:00
Linus Torvalds	367bcbc5b5	5 cifs/smb3 fixes, one for stable -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmMBX5YACgkQiiy9cAdy T1GNGgv/bH2tKQ+xOSoHgAUB8x5C1qWCba1KqmF/1e86QWkEaOy3APPH0D4whm2T KsO9bI2xLAO7TnsGjr+llPRWLE4uyL3tg4PRuoHJdLlq0UibvWcd7crtRfJtyNOm E8S0ouJ0+0Dv026c3KKovYrnHtin/I/KLZp57W1qmslyJEaHflxW4RqEOROLYTjK FpEtKyMO1/ivcK6s8l2RcNMcHqkx+HZvzNub4hAdUJOTf8DJ2lLc/PjfDtB4HQlE d+Pant1zdZaehZcymzk6x8oq1LIgfVgkFTWJyrr/FLJq2S8E/XbC54tdzTfIySsY F5So874fBm6Eal+0qxdm4cYbrE+YjeKwyTHlY836D0InOHxT2noE8i0KSNCP9sfj K/gt4SmxkxMASUycZ5rmOvhw/RghOSWtTNTQeiU1fheuXBaCe+dKgdAEQXdqQ0Kt I4gCuT6eGRdz9J6AVVIzsaYZFODRlBh50AMS3xkrRlqJ1jxn0E3Zuu3DTTleeknW IgGiZ7LC =dod2 -----END PGP SIGNATURE----- Merge tag '6.0-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs client fixes from Steve French: - memory leak fix - two small cleanups - trivial strlcpy removal - update missing entry for cifs headers in MAINTAINERS file * tag '6.0-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: move from strlcpy with unused retval to strscpy cifs: Fix memory leak on the deferred close cifs: remove useless parameter 'is_fsctl' from SMB2_ioctl() cifs: remove unused server parameter from calc_smb_size() cifs: missing directory in MAINTAINERS file	2022-08-21 10:21:16 -07:00
Peter Xu	f369b07c86	mm/uffd: reset write protection when unregister with wp-mode The motivation of this patch comes from a recent report and patchfix from David Hildenbrand on hugetlb shared handling of wr-protected page [1]. With the reproducer provided in commit message of [1], one can leverage the uffd-wp lazy-reset of ptes to trigger a hugetlb issue which can affect not only the attacker process, but also the whole system. The lazy-reset mechanism of uffd-wp was used to make unregister faster, meanwhile it has an assumption that any leftover pgtable entries should only affect the process on its own, so not only the user should be aware of anything it does, but also it should not affect outside of the process. But it seems that this is not true, and it can also be utilized to make some exploit easier. So far there's no clue showing that the lazy-reset is important to any userfaultfd users because normally the unregister will only happen once for a specific range of memory of the lifecycle of the process. Considering all above, what this patch proposes is to do explicit pte resets when unregister an uffd region with wr-protect mode enabled. It should be the same as calling ioctl(UFFDIO_WRITEPROTECT, wp=false) right before ioctl(UFFDIO_UNREGISTER) for the user. So potentially it'll make the unregister slower. From that pov it's a very slight abi change, but hopefully nothing should break with this change either. Regarding to the change itself - core of uffd write [un]protect operation is moved into a separate function (uffd_wp_range()) and it is reused in the unregister code path. Note that the new function will not check for anything, e.g. ranges or memory types, because they should have been checked during the previous UFFDIO_REGISTER or it should have failed already. It also doesn't check mmap_changing because we're with mmap write lock held anyway. I added a Fixes upon introducing of uffd-wp shmem+hugetlbfs because that's the only issue reported so far and that's the commit David's reproducer will start working (v5.19+). But the whole idea actually applies to not only file memories but also anonymous. It's just that we don't need to fix anonymous prior to v5.19- because there's no known way to exploit. IOW, this patch can also fix the issue reported in [1] as the patch 2 does. [1] https://lore.kernel.org/all/20220811103435.188481-3-david@redhat.com/ Link: https://lkml.kernel.org/r/20220811201340.39342-1-peterx@redhat.com Fixes: `b1f9e87686` ("mm/uffd: enable write protection for shmem & hugetlbfs") Signed-off-by: Peter Xu <peterx@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-08-20 15:17:45 -07:00
Peter Xu	efd4149342	mm/smaps: don't access young/dirty bit if pte unpresent These bits should only be valid when the ptes are present. Introducing two booleans for it and set it to false when !pte_present() for both pte and pmd accountings. The bug is found during code reading and no real world issue reported, but logically such an error can cause incorrect readings for either smaps or smaps_rollup output on quite a few fields. For example, it could cause over-estimate on values like Shared_Dirty, Private_Dirty, Referenced. Or it could also cause under-estimate on values like LazyFree, Shared_Clean, Private_Clean. Link: https://lkml.kernel.org/r/20220805160003.58929-1-peterx@redhat.com Fixes: `b1d4d9e0cb` ("proc/smaps: carefully handle migration entries") Fixes: `c94b6923fa` ("/proc/PID/smaps: Add PMD migration entry parsing") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-08-20 15:17:45 -07:00
Olga Kornievskaia	fcfc8be1e9	NFSv4.2 fix problems with __nfs42_ssc_open A destination server while doing a COPY shouldn't accept using the passed in filehandle if its not a regular filehandle. If alloc_file_pseudo() has failed, we need to decrement a reference on the newly created inode, otherwise it leaks. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: `ec4b092508` ("NFS: inter ssc open") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-08-19 20:31:57 -04:00
NeilBrown	f16857e62b	NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT nfs_unlink() calls d_delete() twice if it receives ENOENT from the server - once in nfs_dentry_handle_enoent() from nfs_safe_remove and once in nfs_dentry_remove_handle_error(). nfs_rmddir() also calls it twice - the nfs_dentry_handle_enoent() call is direct and inside a region locked with ->rmdir_sem It is safe to call d_delete() twice if the refcount > 1 as the dentry is simply unhashed. If the refcount is 1, the first call sets d_inode to NULL and the second call crashes. This patch guards the d_delete() call from nfs_dentry_handle_enoent() leaving the one under ->remdir_sem in case that is important. In mainline it would be safe to remove the d_delete() call. However in older kernels to which this might be backported, that would change the behaviour of nfs_unlink(). nfs_unlink() used to unhash the dentry which resulted in nfs_dentry_handle_enoent() not calling d_delete(). So in older kernels we need the d_delete() in nfs_dentry_remove_handle_error() when called from nfs_unlink() but not when called from nfs_rmdir(). To make the code work correctly for old and new kernels, and from both nfs_unlink() and nfs_rmdir(), we protect the d_delete() call with simple_positive(). This ensures it is never called in a circumstance where it could crash. Fixes: `3c59366c20` ("NFS: don't unhash dentry during unlink/rename") Fixes: `9019fb391d` ("NFS: Label the dentry with a verifier in nfs_rmdir() and nfs_unlink()") Signed-off-by: NeilBrown <neilb@suse.de> Tested-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-08-19 20:31:36 -04:00
Linus Torvalds	50cd95ac46	execve fix for v6.0-rc2 - Replace remaining kmap() uses with kmap_local_page() (Fabio M. De Francesco) -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmL/3lEWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJu9xD/9SivxMYldHN8E+KrVUaOUZB6OF k97f+C4d438/kiZKIIGUqHv2Zbq4w9IFX3YQ9cmbN9dArEbTnvuGH/RTGI2L4yK1 uS4BG+9RZZgZfZ+I2JOxX1elqo2qloEkQQJMSvLSXFqElrb6eQWDSdizWMWto+Yb l3Fsfq3zNf2cbxcCmlSaOZ4FFS3t5Yc7r/ArIRNPXVXIMcZkBdqj0fJ7nR7PplJY YKg7ioAIYUoY2nH97hIWvMWTP+DJc9vXp9+SRUB45qph9gkOBBl34aOC39iMIkEH AhPcGPf09DCZAu9sSfx3jH/YVV2jJg2DNbRw9bQlmfu+fdQBYCw1a2mhenpc0rUI OA4R26KMCI7338aWjDJy9N+kY2fhm0J8L2NxZ4ySo0ZMQf3VxbPsXDh6ijX69ijr DllH09o18RCIxhCRLOfGa3Je93FGnd2b/CI1z1CdQqQ/mdCWRvl3jRCpOXstqcAq 5ptU0tIEeY8+hor6rd2RfT5xqd1LjIrNdE0m5UUuNEv30IP2ldKIBWLH6jvZ5LJ2 U/YWJ8/GR+SLrpMNIX6fUAS+2VZNHUbUj2cfmB87OpBmKOcp+vJwf65OzujIfT21 OeMBkjql8DQlXhHk41SMDPrzzx76TJ0IaK0JCRPeUOggZwBX7txfUW5Xyi2UxseD SRT6R+SXibr0vQyemQ== =DTae -----END PGP SIGNATURE----- Merge tag 'execve-v6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve fix from Kees Cook: - Replace remaining kmap() uses with kmap_local_page() (Fabio M. De Francesco) * tag 'execve-v6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: exec: Replace kmap{,_atomic}() with kmap_local_page()	2022-08-19 14:02:24 -07:00
Linus Torvalds	42c54d5491	for-6.0-rc1-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmL/dJgACgkQxWXV+ddt WDtUmg/9Fje7L+jAtQLqvJYvvGCFCCs8gkm+Rwh9MLYHcEDnBtRWzDWYTKcfO9WW mPmiNlNPAbnAw/9apeJDvFOrd4Vqr8ZD3Vr0flXl5EJ9QXSTL6JXfaiLS1o6rQ0q OXqxVTh5B5aEmfsEhyJBLzsZGdISJbr60dAE8/ZDX9wo8cDavZ6YxToIroUeizGO dx2DY5A7OQRKTEf4aJgu4zcm1Sq+U5A8M1pcfU4Rhb/YapVgcCc2wrdrw4NOkNJu B/NZFcD0BcF2bZW7uHT5vr8rGzj2xZUCkuggBQ6/0/h7OdRXIzHCanMw4lPy602v IfZASF7eYkq7FHRbANj/WAYHOFl41vS+whqZ+sEI/qrW+ZyrODIzS67sbU/Bsa7+ ZoL6RSlIbcMgz7XVqcc5d8bKNK/Hc3MCyMWlLT0XScm05BiJy5O2iEZ7UOJ4r8E+ 6J/pFD1bNdacp6UrzwEjyXuSufvp4pJdNWn5ttIWYTBygXMT8AJ403yIheiV3KA3 SkoMj4A54tF3G7NhkzfR5sC7hlgcA0njJLzloicmNgP7E12vmcDL2rTdTt/Yl0cw 3w6ztJS+1sWocFSJ43lE2muHlj7wW7QDYPvul1t6yWgO9wtWECo7c/ASeRl0zPkP atAJtsr3uV9/aA2ae9QeWzut1W3hbE5NFQOLPcF/iXN+kxMmesI= =HgV0 -----END PGP SIGNATURE----- Merge tag 'for-6.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few short fixes and a lockdep warning fix (needs moving some code): - tree-log replay fixes: - fix error handling when looking up extent refs - fix warning when setting inode number of links - relocation fixes: - reset block group read-only status when relocation fails - unset control structure if transaction fails when starting to process a block group - add lockdep annotations to fix a warning during relocation where blocks temporarily belong to another tree and can lead to reversed dependencies - tree-checker verifies that extent items don't overlap" * tag 'for-6.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: tree-checker: check for overlapping extent items btrfs: fix warning during log replay when bumping inode link count btrfs: fix lost error handling when looking up extended ref on log replay btrfs: fix lockdep splat with reloc root extent buffers btrfs: move lockdep class helpers to locking.c btrfs: unset reloc control if transaction commit fails in prepare_to_relocate() btrfs: reset RO counter on block group if we fail to relocate	2022-08-19 13:33:48 -07:00
Linus Torvalds	a3a78b6332	4 ksmbd server fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmL+83EACgkQiiy9cAdy T1EB9Qv/ROymhtx17C1kvaxZYzXPjMKNl3OVPpvLNbGaUbkuf3Yp/5ajrixVF/hW m+xNhRbX8TmYNPhlnnLLZusiLkrCgyKMhVttQvAjuLhtvzZLQWOuoNlG/cwyUoiv X51fVkYO4psfVw3LdWG2tZT1dTI0oDoPNWdKlMpz9Rc31GYWTteHsEdcpaGgEVaL LCBKe1+21s9qWX8+mARymFZKyyHkrPwvGkL6PepUPR3pj96PsCPm3mgiU4/BJVOA E/Bxk/dtL7WsX4TXldzjXWCvcTHUpC5uApYYWywTGOamur9uycRUHrtlIj7+/Unl NMqSKkAr5X+kCmAdT+rxhF0WQbNxbsmbI9X0zJOtQIU0UbaB7j/GIAmCiGZ258NM xoFcAB627bCjd0XgXmxKE3e2hG8VVnqQOLS+em7KmWrrdT+ieTCY4R9m08YNCzn1 bIhg0vD/HKFBeu48MrhzquTGXvu0bpbYjRoh5eerZSXTOPU6zgiZiOuYBGqJMmvy lJVvZzU3 =3FeD -----END PGP SIGNATURE----- Merge tag '5.20-rc2-ksmbd-smb3-server-fixes' of git://git.samba.org/ksmbd Pull ksmbd server fixes from Steve French: - important sparse file fix - allocation size fix - fix incorrect rc on bad share - share config fix * tag '5.20-rc2-ksmbd-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: don't remove dos attribute xattr on O_TRUNC open ksmbd: remove unnecessary generic_fillattr in smb2_open ksmbd: request update to stale share config ksmbd: return STATUS_BAD_NETWORK_NAME error status if share is not configured	2022-08-19 13:26:52 -07:00
Wolfram Sang	13609a8b3a	cifs: move from strlcpy with unused retval to strscpy Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-19 11:02:26 -05:00
Zhang Xiaoxu	ca08d0eac0	cifs: Fix memory leak on the deferred close xfstests on smb21 report kmemleak as below: unreferenced object 0xffff8881767d6200 (size 64): comm "xfs_io", pid 1284, jiffies 4294777434 (age 20.789s) hex dump (first 32 bytes): 80 5a d0 11 81 88 ff ff 78 8a aa 63 81 88 ff ff .Z......x..c.... 00 71 99 76 81 88 ff ff 00 00 00 00 00 00 00 00 .q.v............ backtrace: [<00000000ad04e6ea>] cifs_close+0x92/0x2c0 [<0000000028b93c82>] __fput+0xff/0x3f0 [<00000000d8116851>] task_work_run+0x85/0xc0 [<0000000027e14f9e>] do_exit+0x5e5/0x1240 [<00000000fb492b95>] do_group_exit+0x58/0xe0 [<00000000129a32d9>] __x64_sys_exit_group+0x28/0x30 [<00000000e3f7d8e9>] do_syscall_64+0x35/0x80 [<00000000102e8a0b>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 When cancel the deferred close work, we should also cleanup the struct cifs_deferred_close. Fixes: `9e992755be` ("cifs: Call close synchronously during unlink/rename/lease break.") Fixes: `e3fc065682` ("cifs: Deferred close performance improvements") Cc: stable@vger.kernel.org Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-19 10:59:53 -05:00
Stefan Roesch	41191cf6bf	fs: __file_remove_privs(): restore call to inode_has_no_xattr() This restores the call to inode_has_no_xattr() in the function __file_remove_privs(). In case the dentry_meeds_remove_privs() returned 0, the function inode_has_no_xattr() was not called. Signed-off-by: Stefan Roesch <shr@fb.com> Fixes: `faf99b5635` ("fs: add __remove_file_privs() with flags parameter") Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Link: https://lore.kernel.org/r/20220816153158.1925040-1-shr@fb.com Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-08-18 09:39:33 +02:00
Enzo Matsumiya	400d0ad63b	cifs: remove useless parameter 'is_fsctl' from SMB2_ioctl() SMB2_ioctl() is always called with is_fsctl = true, so doesn't make any sense to have it at all. Thus, always set SMB2_0_IOCTL_IS_FSCTL flag on the request. Also, as per MS-SMB2 3.3.5.15 "Receiving an SMB2 IOCTL Request", servers must fail the request if the request flags is zero anyway. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-17 23:30:49 -05:00
Enzo Matsumiya	68ed14496b	cifs: remove unused server parameter from calc_smb_size() This parameter is unused by the called function Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-17 18:07:13 -05:00
Linus Torvalds	3b06a27557	ntfs3 for 6.0 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEh0DEKNP0I9IjwfWEqbAzH4MkB7YFAmL7xrEACgkQqbAzH4Mk B7Y3iw/+KZLidS4J9Lq+BhoDj21brkrvE+iiBhLClN8LQNn7jwkF+mTpYcGxvm+3 fgrT5YTUt6OBx9N06qjUnKKcwBrOq1xFo6hdlCVHQiZbEUgYsSglE6xVdyt5RICb AVD6T7uKnHq4cQHe8796h23/Vfd3z2rAetvz5w/6Gd+mOpguLj+sx7QGE7R6o3V/ uvbxj8z7cFkrVy6b7H5gaDUFzAhVxBzZY+2P1CUOg4uUy0YI0PeUbJli8zt/qORP Mr5mTEeKc1sxWzuDASppjgCPjdQVN+jgy7hQEpC6SLDR5HgtjncCCRE+dA1ZcnQm PQCG1Xn8CSII8bDCu6Lvr6KtxhRBdG/wb99zpn50wBmb6CzMJGOGmBwrPMMhW8Zo 8ZBYHCY5YgDuXNkFpQrivayrADGaLhmAl1BTjXTDCQU7MoxxFsPO8D/swufvNf3W 5eC5ezQ8FY3sSHRuDvVGHe8djvgsGvfxQAMrbfMJqEBFuPg3EYjJOeRpZK6NUyk4 U31Jtz0hYSuU0dnoEaZFQ23/K7/vl2kile6VlNPApFR+y8OtSARN++Za58xdtg2y H8XmEbuN/g8XxPXz55Smf4Y8RzaIZ2S56aA19nBqza5o1a6gQUr2SomTHGRLJsFQ 5xLyuUBrZRDxS8jcxa5TTfj7CNFBJkaxtXU8M66vIzXhXcm5+9U= =iz2l -----END PGP SIGNATURE----- Merge tag 'ntfs3_for_6.0' of https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 updates from Konstantin Komarov: - implement FALLOC_FL_INSERT_RANGE - fix some logic errors - fixed xfstests (tested on x86_64): generic/064 generic/213 generic/300 generic/361 generic/449 generic/485 - some dead code removed or refactored * tag 'ntfs3_for_6.0' of https://github.com/Paragon-Software-Group/linux-ntfs3: (39 commits) fs/ntfs3: uninitialized variable in ntfs_set_acl_ex() fs/ntfs3: Remove unused function wnd_bits fs/ntfs3: Make ni_ins_new_attr return error fs/ntfs3: Create MFT zone only if length is large enough fs/ntfs3: Refactoring attr_insert_range to restore after errors fs/ntfs3: Refactoring attr_punch_hole to restore after errors fs/ntfs3: Refactoring attr_set_size to restore after errors fs/ntfs3: New function ntfs_bad_inode fs/ntfs3: Make MFT zone less fragmented fs/ntfs3: Check possible errors in run_pack in advance fs/ntfs3: Added comments to frecord functions fs/ntfs3: Fill duplicate info in ni_add_name fs/ntfs3: Make static function attr_load_runs fs/ntfs3: Add new argument is_mft to ntfs_mark_rec_free fs/ntfs3: Remove unused mi_mark_free fs/ntfs3: Fix very fragmented case in attr_punch_hole fs/ntfs3: Fix work with fragmented xattr fs/ntfs3: Make ntfs_fallocate return -ENOSPC instead of -EFBIG fs/ntfs3: extend ni_insert_nonresident to return inserted ATTR_LIST_ENTRY fs/ntfs3: Check reserved size for maximum allowed ...	2022-08-17 14:51:22 -07:00
Linus Torvalds	ae2a823643	dcache: move the DCACHE_OP_COMPARE case out of the __d_lookup_rcu loop __d_lookup_rcu() is one of the hottest functions in the kernel on certain loads, and it is complicated by filesystems that might want to have their own name compare function. We can improve code generation by moving the test of DCACHE_OP_COMPARE outside the loop, which makes the loop itself much simpler, at the cost of some code duplication. But both cases end up being simpler, and the "native" direct case-sensitive compare particularly so. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-08-17 14:33:03 -07:00
David Howells	932c29a10d	locks: Fix dropped call to ->fl_release_private() Prior to commit `4149be7bda`, sys_flock() would allocate the file_lock struct it was going to use to pass parameters, call ->flock() and then call locks_free_lock() to get rid of it - which had the side effect of calling locks_release_private() and thus ->fl_release_private(). With commit `4149be7bda`, however, this is no longer the case: the struct is now allocated on the stack, and locks_free_lock() is no longer called - and thus any remaining private data doesn't get cleaned up either. This causes afs flock to cause oops. Kasan catches this as a UAF by the list_del_init() in afs_fl_release_private() for the file_lock record produced by afs_fl_copy_lock() as the original record didn't get delisted. It can be reproduced using the generic/504 xfstest. Fix this by reinstating the locks_release_private() call in sys_flock(). I'm not sure if this would affect any other filesystems. If not, then the release could be done in afs_flock() instead. Changes ======= ver #2) - Don't need to call ->fl_release_private() after calling the security hook, only after calling ->flock(). Fixes: `4149be7bda` ("fs/lock: Don't allocate file_lock in flock_make_lock().") cc: Chuck Lever <chuck.lever@oracle.com> cc: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/166075758809.3532462.13307935588777587536.stgit@warthog.procyon.org.uk/ # v1 Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>	2022-08-17 15:08:58 -04:00
Josef Bacik	899b7f69f2	btrfs: tree-checker: check for overlapping extent items We're seeing a weird problem in production where we have overlapping extent items in the extent tree. It's unclear where these are coming from, and in debugging we realized there's no check in the tree checker for this sort of problem. Add a check to the tree-checker to make sure that the extents do not overlap each other. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-08-17 16:20:25 +02:00
Filipe Manana	769030e118	btrfs: fix warning during log replay when bumping inode link count During log replay, at add_link(), we may increment the link count of another inode that has a reference that conflicts with a new reference for the inode currently being processed. During log replay, at add_link(), we may drop (unlink) a reference from some inode in the subvolume tree if that reference conflicts with a new reference found in the log for the inode we are currently processing. After the unlink, If the link count has decreased from 1 to 0, then we increment the link count to prevent the inode from being deleted if it's evicted by an iput() call, because we may have references to add to that inode later on (and we will fixup its link count later during log replay). However incrementing the link count from 0 to 1 triggers a warning: $ cat fs/inode.c (...) void inc_nlink(struct inode *inode) { if (unlikely(inode->i_nlink == 0)) { WARN_ON(!(inode->i_state & I_LINKABLE)); atomic_long_dec(&inode->i_sb->s_remove_count); } (...) The I_LINKABLE flag is only set when creating an O_TMPFILE file, so it's never set during log replay. Most of the time, the warning isn't triggered even if we dropped the last reference of the conflicting inode, and this is because: 1) The conflicting inode was previously marked for fixup, through a call to link_to_fixup_dir(), which increments the inode's link count; 2) And the last iput() on the inode has not triggered eviction of the inode, nor was eviction triggered after the iput(). So at add_link(), even if we unlink the last reference of the inode, its link count ends up being 1 and not 0. So this means that if eviction is triggered after link_to_fixup_dir() is called, at add_link() we will read the inode back from the subvolume tree and have it with a correct link count, matching the number of references it has on the subvolume tree. So if when we are at add_link() the inode has exactly one reference only, its link count is 1, and after the unlink its link count becomes 0. So fix this by using set_nlink() instead of inc_nlink(), as the former accepts a transition from 0 to 1 and it's what we use in other similar contexts (like at link_to_fixup_dir(). Also make add_inode_ref() use set_nlink() instead of inc_nlink() to bump the link count from 0 to 1. The warning is actually harmless, but it may scare users. Josef also ran into it recently. CC: stable@vger.kernel.org # 5.1+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-08-17 16:19:50 +02:00
Filipe Manana	7a6b75b799	btrfs: fix lost error handling when looking up extended ref on log replay During log replay, when processing inode references, if we get an error when looking up for an extended reference at __add_inode_ref(), we ignore it and proceed, returning success (0) if no other error happens after the lookup. This is obviously wrong because in case an extended reference exists and it encodes some name not in the log, we need to unlink it, otherwise the filesystem state will not match the state it had after the last fsync. So just make __add_inode_ref() return an error it gets from the extended reference lookup. Fixes: `f186373fef` ("btrfs: extended inode refs") CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-08-17 16:19:45 +02:00
Josef Bacik	b40130b23c	btrfs: fix lockdep splat with reloc root extent buffers We have been hitting the following lockdep splat with btrfs/187 recently WARNING: possible circular locking dependency detected 5.19.0-rc8+ #775 Not tainted ------------------------------------------------------ btrfs/752500 is trying to acquire lock: ffff97e1875a97b8 (btrfs-treloc-02#2){+.+.}-{3:3}, at: __btrfs_tree_lock+0x24/0x110 but task is already holding lock: ffff97e1875a9278 (btrfs-tree-01/1){+.+.}-{3:3}, at: __btrfs_tree_lock+0x24/0x110 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (btrfs-tree-01/1){+.+.}-{3:3}: down_write_nested+0x41/0x80 __btrfs_tree_lock+0x24/0x110 btrfs_init_new_buffer+0x7d/0x2c0 btrfs_alloc_tree_block+0x120/0x3b0 __btrfs_cow_block+0x136/0x600 btrfs_cow_block+0x10b/0x230 btrfs_search_slot+0x53b/0xb70 btrfs_lookup_inode+0x2a/0xa0 __btrfs_update_delayed_inode+0x5f/0x280 btrfs_async_run_delayed_root+0x24c/0x290 btrfs_work_helper+0xf2/0x3e0 process_one_work+0x271/0x590 worker_thread+0x52/0x3b0 kthread+0xf0/0x120 ret_from_fork+0x1f/0x30 -> #1 (btrfs-tree-01){++++}-{3:3}: down_write_nested+0x41/0x80 __btrfs_tree_lock+0x24/0x110 btrfs_search_slot+0x3c3/0xb70 do_relocation+0x10c/0x6b0 relocate_tree_blocks+0x317/0x6d0 relocate_block_group+0x1f1/0x560 btrfs_relocate_block_group+0x23e/0x400 btrfs_relocate_chunk+0x4c/0x140 btrfs_balance+0x755/0xe40 btrfs_ioctl+0x1ea2/0x2c90 __x64_sys_ioctl+0x88/0xc0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #0 (btrfs-treloc-02#2){+.+.}-{3:3}: __lock_acquire+0x1122/0x1e10 lock_acquire+0xc2/0x2d0 down_write_nested+0x41/0x80 __btrfs_tree_lock+0x24/0x110 btrfs_lock_root_node+0x31/0x50 btrfs_search_slot+0x1cb/0xb70 replace_path+0x541/0x9f0 merge_reloc_root+0x1d6/0x610 merge_reloc_roots+0xe2/0x260 relocate_block_group+0x2c8/0x560 btrfs_relocate_block_group+0x23e/0x400 btrfs_relocate_chunk+0x4c/0x140 btrfs_balance+0x755/0xe40 btrfs_ioctl+0x1ea2/0x2c90 __x64_sys_ioctl+0x88/0xc0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd other info that might help us debug this: Chain exists of: btrfs-treloc-02#2 --> btrfs-tree-01 --> btrfs-tree-01/1 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(btrfs-tree-01/1); lock(btrfs-tree-01); lock(btrfs-tree-01/1); lock(btrfs-treloc-02#2); * DEADLOCK * 7 locks held by btrfs/752500: #0: ffff97e292fdf460 (sb_writers#12){.+.+}-{0:0}, at: btrfs_ioctl+0x208/0x2c90 #1: ffff97e284c02050 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: btrfs_balance+0x55f/0xe40 #2: ffff97e284c00878 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x236/0x400 #3: ffff97e292fdf650 (sb_internal#2){.+.+}-{0:0}, at: merge_reloc_root+0xef/0x610 #4: ffff97e284c02378 (btrfs_trans_num_writers){++++}-{0:0}, at: join_transaction+0x1a8/0x5a0 #5: ffff97e284c023a0 (btrfs_trans_num_extwriters){++++}-{0:0}, at: join_transaction+0x1a8/0x5a0 #6: ffff97e1875a9278 (btrfs-tree-01/1){+.+.}-{3:3}, at: __btrfs_tree_lock+0x24/0x110 stack backtrace: CPU: 1 PID: 752500 Comm: btrfs Not tainted 5.19.0-rc8+ #775 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack_lvl+0x56/0x73 check_noncircular+0xd6/0x100 ? lock_is_held_type+0xe2/0x140 __lock_acquire+0x1122/0x1e10 lock_acquire+0xc2/0x2d0 ? __btrfs_tree_lock+0x24/0x110 down_write_nested+0x41/0x80 ? __btrfs_tree_lock+0x24/0x110 __btrfs_tree_lock+0x24/0x110 btrfs_lock_root_node+0x31/0x50 btrfs_search_slot+0x1cb/0xb70 ? lock_release+0x137/0x2d0 ? _raw_spin_unlock+0x29/0x50 ? release_extent_buffer+0x128/0x180 replace_path+0x541/0x9f0 merge_reloc_root+0x1d6/0x610 merge_reloc_roots+0xe2/0x260 relocate_block_group+0x2c8/0x560 btrfs_relocate_block_group+0x23e/0x400 btrfs_relocate_chunk+0x4c/0x140 btrfs_balance+0x755/0xe40 btrfs_ioctl+0x1ea2/0x2c90 ? lock_is_held_type+0xe2/0x140 ? lock_is_held_type+0xe2/0x140 ? __x64_sys_ioctl+0x88/0xc0 __x64_sys_ioctl+0x88/0xc0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd This isn't necessarily new, it's just tricky to hit in practice. There are two competing things going on here. With relocation we create a snapshot of every fs tree with a reloc tree. Any extent buffers that get initialized here are initialized with the reloc root lockdep key. However since it is a snapshot, any blocks that are currently in cache that originally belonged to the fs tree will have the normal tree lockdep key set. This creates the lock dependency of reloc tree -> normal tree for the extent buffer locking during the first phase of the relocation as we walk down the reloc root to relocate blocks. However this is problematic because the final phase of the relocation is merging the reloc root into the original fs root. This involves searching down to any keys that exist in the original fs root and then swapping the relocated block and the original fs root block. We have to search down to the fs root first, and then go search the reloc root for the block we need to replace. This creates the dependency of normal tree -> reloc tree which is why lockdep complains. Additionally even if we were to fix this particular mismatch with a different nesting for the merge case, we're still slotting in a block that has a owner of the reloc root objectid into a normal tree, so that block will have its lockdep key set to the tree reloc root, and create a lockdep splat later on when we wander into that block from the fs root. Unfortunately the only solution here is to make sure we do not set the lockdep key to the reloc tree lockdep key normally, and then reset any blocks we wander into from the reloc root when we're doing the merged. This solves the problem of having mixed tree reloc keys intermixed with normal tree keys, and then allows us to make sure in the merge case we maintain the lock order of normal tree -> reloc tree We handle this by setting a bit on the reloc root when we do the search for the block we want to relocate, and any block we search into or COW at that point gets set to the reloc tree key. This works correctly because we only ever COW down to the parent node, so we aren't resetting the key for the block we're linking into the fs root. With this patch we no longer have the lockdep splat in btrfs/187. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-08-17 16:19:12 +02:00
Josef Bacik	0a27a0474d	btrfs: move lockdep class helpers to locking.c These definitions exist in disk-io.c, which is not related to the locking. Move this over to locking.h/c where it makes more sense. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-08-17 16:19:10 +02:00
Zixuan Fu	85f02d6c85	btrfs: unset reloc control if transaction commit fails in prepare_to_relocate() In btrfs_relocate_block_group(), the rc is allocated. Then btrfs_relocate_block_group() calls relocate_block_group() prepare_to_relocate() set_reloc_control() that assigns rc to the variable fs_info->reloc_ctl. When prepare_to_relocate() returns, it calls btrfs_commit_transaction() btrfs_start_dirty_block_groups() btrfs_alloc_path() kmem_cache_zalloc() which may fail for example (or other errors could happen). When the failure occurs, btrfs_relocate_block_group() detects the error and frees rc and doesn't set fs_info->reloc_ctl to NULL. After that, in btrfs_init_reloc_root(), rc is retrieved from fs_info->reloc_ctl and then used, which may cause a use-after-free bug. This possible bug can be triggered by calling btrfs_ioctl_balance() before calling btrfs_ioctl_defrag(). To fix this possible bug, in prepare_to_relocate(), check if btrfs_commit_transaction() fails. If the failure occurs, unset_reloc_control() is called to set fs_info->reloc_ctl to NULL. The error log in our fault-injection testing is shown as follows: [ 58.751070] BUG: KASAN: use-after-free in btrfs_init_reloc_root+0x7ca/0x920 [btrfs] ... [ 58.753577] Call Trace: ... [ 58.755800] kasan_report+0x45/0x60 [ 58.756066] btrfs_init_reloc_root+0x7ca/0x920 [btrfs] [ 58.757304] record_root_in_trans+0x792/0xa10 [btrfs] [ 58.757748] btrfs_record_root_in_trans+0x463/0x4f0 [btrfs] [ 58.758231] start_transaction+0x896/0x2950 [btrfs] [ 58.758661] btrfs_defrag_root+0x250/0xc00 [btrfs] [ 58.759083] btrfs_ioctl_defrag+0x467/0xa00 [btrfs] [ 58.759513] btrfs_ioctl+0x3c95/0x114e0 [btrfs] ... [ 58.768510] Allocated by task 23683: [ 58.768777] ____kasan_kmalloc+0xb5/0xf0 [ 58.769069] __kmalloc+0x227/0x3d0 [ 58.769325] alloc_reloc_control+0x10a/0x3d0 [btrfs] [ 58.769755] btrfs_relocate_block_group+0x7aa/0x1e20 [btrfs] [ 58.770228] btrfs_relocate_chunk+0xf1/0x760 [btrfs] [ 58.770655] __btrfs_balance+0x1326/0x1f10 [btrfs] [ 58.771071] btrfs_balance+0x3150/0x3d30 [btrfs] [ 58.771472] btrfs_ioctl_balance+0xd84/0x1410 [btrfs] [ 58.771902] btrfs_ioctl+0x4caa/0x114e0 [btrfs] ... [ 58.773337] Freed by task 23683: ... [ 58.774815] kfree+0xda/0x2b0 [ 58.775038] free_reloc_control+0x1d6/0x220 [btrfs] [ 58.775465] btrfs_relocate_block_group+0x115c/0x1e20 [btrfs] [ 58.775944] btrfs_relocate_chunk+0xf1/0x760 [btrfs] [ 58.776369] __btrfs_balance+0x1326/0x1f10 [btrfs] [ 58.776784] btrfs_balance+0x3150/0x3d30 [btrfs] [ 58.777185] btrfs_ioctl_balance+0xd84/0x1410 [btrfs] [ 58.777621] btrfs_ioctl+0x4caa/0x114e0 [btrfs] ... Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Zixuan Fu <r33s3n6@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-08-17 16:18:58 +02:00
Seth Forshee	bf1ac16edf	fs: require CAP_SYS_ADMIN in target namespace for idmapped mounts Idmapped mounts should not allow a user to map file ownsership into a range of ids which is not under the control of that user. However, we currently don't check whether the mounter is privileged wrt to the target user namespace. Currently no FS_USERNS_MOUNT filesystems support idmapped mounts, thus this is not a problem as only CAP_SYS_ADMIN in init_user_ns is allowed to set up idmapped mounts. But this could change in the future, so add a check to refuse to create idmapped mounts when the mounter does not have CAP_SYS_ADMIN in the target user namespace. Fixes: `bd303368b7` ("fs: support mapped mounts of mapped filesystems") Signed-off-by: Seth Forshee <sforshee@digitalocean.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Link: https://lore.kernel.org/r/20220816164752.2595240-1-sforshee@digitalocean.com Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-08-17 11:27:11 +02:00
Christian Brauner	abfcf55d8b	acl: handle idmapped mounts for idmapped filesystems Ensure that POSIX ACLs checking, getting, and setting works correctly for filesystems mountable with a filesystem idmapping ("fs_idmapping") that want to support idmapped mounts ("mnt_idmapping"). Note that no filesystems mountable with an fs_idmapping do yet support idmapped mounts. This is required infrastructure work to unblock this. As we explained in detail in [1] the fs_idmapping is irrelevant for getxattr() and setxattr() when mapping the ACL_{GROUP,USER} {g,u}ids stored in the uapi struct posix_acl_xattr_entry in posix_acl_fix_xattr_{from,to}_user(). But for acl_permission_check() and posix_acl_{g,s}etxattr_idmapped_mnt() the fs_idmapping matters. acl_permission_check(): During lookup POSIX ACLs are retrieved directly via i_op->get_acl() and are returned via the kernel internal struct posix_acl which contains e_{g,u}id members of type k{g,u}id_t that already take the fs_idmapping into acccount. For example, a POSIX ACL stored with u4 on the backing store is mapped to k10000004 in the fs_idmapping. The mnt_idmapping remaps the POSIX ACL to k20000004. In order to do that the fs_idmapping needs to be taken into account but that doesn't happen yet (Again, this is a counterfactual currently as fuse doesn't support idmapped mounts currently. It's just used as a convenient example.): fs_idmapping: u0:k10000000:r65536 mnt_idmapping: u0:v20000000:r65536 ACL_USER: k10000004 acl_permission_check() -> check_acl() -> get_acl() -> i_op->get_acl() == fuse_get_acl() -> posix_acl_from_xattr(u0:k10000000:r65536 /* fs_idmapping /, ...) { k10000004 = make_kuid(u0:k10000000:r65536 / fs_idmapping /, u4 / ACL_USER /); } -> posix_acl_permission() { -1 = make_vfsuid(u0:v20000000:r65536 / mnt_idmapping /, &init_user_ns, k10000004); vfsuid_eq_kuid(-1, k10000004 / caller_fsuid /) } In order to correctly map from the fs_idmapping into mnt_idmapping we require the relevant fs_idmaping to be passed: acl_permission_check() -> check_acl() -> get_acl() -> i_op->get_acl() == fuse_get_acl() -> posix_acl_from_xattr(u0:k10000000:r65536 / fs_idmapping /, ...) { k10000004 = make_kuid(u0:k10000000:r65536 / fs_idmapping /, u4 / ACL_USER /); } -> posix_acl_permission() { v20000004 = make_vfsuid(u0:v20000000:r65536 / mnt_idmapping /, u0:k10000000:r65536 / fs_idmapping /, k10000004); vfsuid_eq_kuid(v20000004, k10000004 / caller_fsuid */) } The initial_idmapping is only correct for the current situation because all filesystems that currently support idmapped mounts do not support being mounted with an fs_idmapping. Note that ovl_get_acl() is used to retrieve the POSIX ACLs from the relevant lower layer and the lower layer's mnt_idmapping needs to be taken into account and so does the fs_idmapping. See `0c5fd887d2` ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()") for more details. For posix_acl_{g,s}etxattr_idmapped_mnt() it is not as obvious why the fs_idmapping matters as it is for acl_permission_check(). Especially because it doesn't matter for posix_acl_fix_xattr_{from,to}_user() (See [1] for more context.). Because posix_acl_{g,s}etxattr_idmapped_mnt() operate on the uapi struct posix_acl_xattr_entry which contains {g,u}id_t values and thus give the impression that the fs_idmapping is irrelevant as at this point appropriate {g,u}id_t values have seemlingly been generated. As we've stated multiple times this assumption is wrong and in fact the uapi struct posix_acl_xattr_entry is taking idmappings into account depending at what place it is operated on. posix_acl_getxattr_idmapped_mnt() When posix_acl_getxattr_idmapped_mnt() is called the values stored in the uapi struct posix_acl_xattr_entry are mapped according to the fs_idmapping. This happened when they were read from the backing store and then translated from struct posix_acl into the uapi struct posix_acl_xattr_entry during posix_acl_to_xattr(). In other words, the fs_idmapping matters as the values stored as {g,u}id_t in the uapi struct posix_acl_xattr_entry have been generated by it. So we need to take the fs_idmapping into account during make_vfsuid() in posix_acl_getxattr_idmapped_mnt(). posix_acl_setxattr_idmapped_mnt() When posix_acl_setxattr_idmapped_mnt() is called the values stored as {g,u}id_t in uapi struct posix_acl_xattr_entry are intended to be the values that ultimately get turned back into a k{g,u}id_t in posix_acl_from_xattr() (which turns the uapi struct posix_acl_xattr_entry into the kernel internal struct posix_acl). In other words, the fs_idmapping matters as the values stored as {g,u}id_t in the uapi struct posix_acl_xattr_entry are intended to be the values that will be undone in the fs_idmapping when writing to the backing store. So we need to take the fs_idmapping into account during from_vfsuid() in posix_acl_setxattr_idmapped_mnt(). Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Fixes: `0c5fd887d2` ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()") Cc: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Link: https://lore.kernel.org/r/20220816113514.43304-1-brauner@kernel.org	2022-08-17 11:23:31 +02:00
Fabio M. De Francesco	3a608cfee9	exec: Replace kmap{,_atomic}() with kmap_local_page() The use of kmap() and kmap_atomic() are being deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and are still valid. Since the use of kmap_local_page() in exec.c is safe, it should be preferred everywhere in exec.c. As said, since kmap_local_page() can be also called from atomic context, and since remove_arg_zero() doesn't (and shouldn't ever) rely on an implicit preempt_disable(), this function can also safely replace kmap_atomic(). Therefore, replace kmap() and kmap_atomic() with kmap_local_page() in fs/exec.c. Tested with xfstests on a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with HIGHMEM64GB enabled. Cc: Eric W. Biederman <ebiederm@xmission.com> Suggested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220803182856.28246-1-fmdefrancesco@gmail.com	2022-08-16 12:11:27 -07:00
Namjae Jeon	17661ecf6a	ksmbd: don't remove dos attribute xattr on O_TRUNC open When smb client open file in ksmbd share with O_TRUNC, dos attribute xattr is removed as well as data in file. This cause the FSCTL_SET_SPARSE request from the client fails because ksmbd can't update the dos attribute after setting ATTR_SPARSE_FILE. And this patch fix xfstests generic/469 test also. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-15 21:07:01 -05:00
Hyunchul Lee	c90b31eaf9	ksmbd: remove unnecessary generic_fillattr in smb2_open Remove unnecessary generic_fillattr to fix wrong AllocationSize of SMB2_CREATE response, And Move the call of ksmbd_vfs_getattr above the place where stat is needed because of truncate. This patch fixes wrong AllocationSize of SMB2_CREATE response. Because ext4 updates inode->i_blocks only when disk space is allocated, generic_fillattr does not set stat.blocks properly for delayed allocation. But ext4 returns the blocks that include the delayed allocation blocks when getattr is called. The issue can be reproduced with commands below: touch ${FILENAME} xfs_io -c "pwrite -S 0xAB 0 40k" ${FILENAME} xfs_io -c "stat" ${FILENAME} 40KB are written, but the count of blocks is 8. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-15 21:06:54 -05:00
Al Viro	3f61631d47	take care to handle NULL ->proc_lseek() Easily done now, just by clearing FMODE_LSEEK in ->f_mode during proc_reg_open() for such entries. Fixes: `868941b144` "fs: remove no_llseek" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-08-14 15:16:18 -04:00
Linus Torvalds	aea23e7c46	fix for /proc/mounts escaping Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYvg//wAKCRBZ7Krx/gZQ 631TAP92pf2EYSUbWRjSshaxYrIJeiDpIBBikSoEOQj4GG4hUwD/U6f/LnFW0a8P 58kXaxXI9cNBEVRhwI/dZKW9q/WxBAY= =3rHw -----END PGP SIGNATURE----- Merge tag 'pull-work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull /proc/mounts fix from Al Viro: "Fix for /proc/mounts escaping - escape the '#' character too" * tag 'pull-work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: escape hash as well	2022-08-13 17:35:58 -07:00
Linus Torvalds	332019e23a	8 cifs/smb3 fixes, mostly restructuring/cleanup, including two for stable -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmL3/wkACgkQiiy9cAdy T1Glxwv/Vv6SjM+hXSGeNvSIGmp+Thxv2u19kCSEamHVoURSZoDxWtDNVw262MLF Jhd9PTK36ivG7suwxAALInN1bL8nXW6cENB3a0XOR93XaPCtTudXSiZPKXbgXIkl kib99S5N5Pm4Dxk6B4WpOCeOS/pkI5fFhR2es4ovBSQR2JacyvjMcJwRkk37lZns v9XnvlvQcuhqBL8SIs012AgTRnd1gyIskIf9lghA+OOD87cFt7QhnhHmpKmcdFjw eXYqRXncwLgCy9a/CGP0KHP251xJuhiL5iZKZ3qfRq/kvM8Z40mDtTA7M/i9UUV1 ankjdLhZTpEdBjXHd17hm5BDcxkxIrPjQki64mo73ytvFUB7+MBTGSX579X93QKT R1TtzwLvw/1H6Zo03CFREDk5Bz6rGjAC12XbgSwIOWexF4SMHgyxTrQja4R8eiHb dNdzuxbJrdYrcTMKjH8l7sB6452Etn00Ua8LHZPcJYPF/4rhFeRd3pvrZ5UwgARL UQjS7pXk =gZTH -----END PGP SIGNATURE----- Merge tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 Pull more cifs updates from Steve French: - two fixes for stable, one for a lock length miscalculation, and another fixes a lease break timeout bug - improvement to handle leases, allows the close timeout to be configured more safely - five restructuring/cleanup patches * tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: Do not access tcon->cfids->cfid directly from is_path_accessible cifs: Add constructor/destructors for tcon->cfid SMB3: fix lease break timeout when multiple deferred close handles for the same file. smb3: allow deferred close timeout to be configurable cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir cifs: Move cached-dir functions into a separate file cifs: Remove {cifs,nfs}_fscache_release_page() cifs: fix lock length calculation	2022-08-13 17:31:18 -07:00
David Howells	8549a26308	afs: Enable multipage folio support Enable multipage folio support for the afs filesystem. Support has already been implemented in netfslib, fscache and cachefiles and in most of afs, but I've waited for Matthew Wilcox's latest folio changes. Note that it does require a change to afs_write_begin() to return the correct subpage. This is a "temporary" change as we're working on getting rid of the need for ->write_begin() and ->write_end() completely, at least as far as network filesystems are concerned - but it doesn't prevent afs from making use of the capability. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: kafs-testing@auristor.com Cc: Marc Dionne <marc.dionne@auristor.com> Cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/lkml/2274528.1645833226@warthog.procyon.org.uk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-08-13 17:20:51 -07:00
Linus Torvalds	f6eb0fed6a	Misc timer fixes: - fix a potential use-after-free bug in posix timers - correct a prototype - address a build warning Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmL3epQRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iPZw/+I/9GXcf3SzbG5M6Nf21SJpSjC4hAHHgb eyv5MUNxKvCHU5iT2SrCvgKjESl5I/E70kubeRHJnvarBPUzGnHHzGlYIYOaJPQ7 irJpUj/6R8ps4UsMBJ8vj5f3b7163zhBJVP8egDW6roT1HUrYTFeIjIli/SOCxpY H1/DqHlbEALE5o5xykg3zuqAbywym+hNRleIVls4wqjZNnfqiTElSuW9xqw9xt3n 9xYmOKZaztdv5Lp2JCm7QOu2byGzeHje72ppsDcBZ3EBvHUBLSndhfe5NQUGhtxy UlBqAELA653uPgPnNKLRMqt/kop8emHqvAx8T0RawPwoUS6XGDVxRX+my8+HKklg P8KsM/8W7+3KTHz0bf72DEHTFiXCzlswRzdOSvP5bR4xw1G4ychzvuxAiPDFR3zT v7uPgykxxCrEexVCBCdPmrl4WikwLJtcrSXtJ4bsisxQFlq7WWd2/osZkTffI3pN IIxDXuHFHC78lrUMk2OQ+ITBz01z4nCFSlgMGZ6ZY6ppS1Rndy1HG/B2NgjW1zGP Y/1xq/nWaql0QO7RmyoJXt1ZSMJYCyKFocRDh9nBmtBSlYm3A8aIA8b4i1VRRG1G 8HOkdS8ef2eOWj8wqk0NvoTbiGjV7YM5pf0g1dmRLA+aGCBD1P9/iFcBv5b6Uxaq qZ7ZtuQzsyc= =Plg8 -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Misc timer fixes: - fix a potential use-after-free bug in posix timers - correct a prototype - address a build warning" * tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Cleanup CPU timers before freeing them during exec time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64 posix-timers: Make do_clock_gettime() static	2022-08-13 14:38:22 -07:00
Linus Torvalds	9872e4a873	New code for 6.0: - Return error codes from block device flushes to userspace. - Fix a deadlock between reclaim and mount time quotacheck. - Fix an unnecessary ENOSPC return when doing COW on a filesystem with severe free space fragmentation. - Fix a miscalculation in the transaction reservation computations for file removal operations. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmL1KE8ACgkQ+H93GTRK tOt3zBAAlJNBx8jbxGipyDtt7Lxo0Dev2eJEPU2n43CMjl2vnnVSeaGRSWHZNGP3 3untqmcoR2bX1mpOWwrg9zrftcimYFm3fyW4kpv91+YTL7huX+nHCMBfuDSv8I1U FLAVVOU+te0f5kJIcUAJIfotZg8lOo5Exb/lhRyNFRpH+KgBrq/PDforKSvwLs0r fGbbMI/+D7CST0+O8nYvhZc/a2ebc1EjlAoPZLTqXrXaljrJwlveRZq9QlY2x2EY OJhdc23atDp7D5TBY7Cpv8a7QqGMxSrBLkFqdY0Ne3ui0EiFlDnkQhrWQj2e5P+A MFbcwu4JoHmC/hnNq6pTMtoV09YkXKb+SpmisPHQ7jC0D5pBbdPkrVoer5FULVn6 oedirarGvARd0ymTRILUl4QIko5ITBFDqbOv1fGv4wP4dUrPLE04MP28oJDFb2V9 CIc3RQKtMdlEbNYc3ocAC+JjE4kAWr5gA0l+rIPEG/7xrcHmoie0wNLXBdGn+u6V RdyO9Vx9ma0mJ1jGWJXwqe8UMoPWsr/ASlOlO+xxSQ8k3ffoyS1z20oo/N+d8kOx yOtLA+Vk/T1N1dyDB7hcLu97+C5gwdFW7fsFQ+rHcP88mwWpM625uLGy+yt6W8qJ 5gSeEn192pmGz5aEy+ePChmjHTIMglOYr/bvPAH6eoVHMqrKjXI= =wFgu -----END PGP SIGNATURE----- Merge tag 'xfs-5.20-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull more xfs updates from Darrick Wong: "There's not a lot this time around, just the usual bug fixes and corrections for missing error returns. - Return error codes from block device flushes to userspace - Fix a deadlock between reclaim and mount time quotacheck - Fix an unnecessary ENOSPC return when doing COW on a filesystem with severe free space fragmentation - Fix a miscalculation in the transaction reservation computations for file removal operations" * tag 'xfs-5.20-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix inode reservation space for removing transaction xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork xfs: fix intermittent hang during quotacheck xfs: check return codes when flushing block devices	2022-08-13 13:50:11 -07:00
Trond Myklebust	edf79efcc9	NFS: Remove a bogus flag setting in pnfs_write_done_resend_to_mds Since pnfs_write_done_resend_to_mds() does not actually call end_page_writeback() on the pages that are being redirected to the metadata server, callers of fsync() do not see the I/O as complete until the writeback to the MDS finishes. We therefore do not need to set NFS_CONTEXT_RESEND_WRITES, since there is nothing to redrive. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-08-13 13:02:14 -04:00
Trond Myklebust	67f4b5dc49	NFS: Fix another fsync() issue after a server reboot Currently, when the writeback code detects a server reboot, it redirties any pages that were not committed to disk, and it sets the flag NFS_CONTEXT_RESEND_WRITES in the nfs_open_context of the file descriptor that dirtied the file. While this allows the file descriptor in question to redrive its own writes, it violates the fsync() requirement that we should be synchronising all writes to disk. While the problem is infrequent, we do see corner cases where an untimely server reboot causes the fsync() call to abandon its attempt to sync data to disk and causing data corruption issues due to missed error conditions or similar. In order to tighted up the client's ability to deal with this situation without introducing livelocks, add a counter that records the number of times pages are redirtied due to a server reboot-like condition, and use that in fsync() to redrive the sync to disk. Fixes: `2197e9b06c` ("NFS: Fix up fsync() when the server rebooted") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-08-13 13:02:13 -04:00
Sun Ke	2067231a9e	NFS: Fix missing unlock in nfs_unlink() Add the missing unlock before goto. Fixes: `3c59366c20` ("NFS: don't unhash dentry during unlink/rename") Signed-off-by: Sun Ke <sunke32@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-08-13 13:02:13 -04:00
Ronnie Sahlberg	7eb59a9870	cifs: Do not access tcon->cfids->cfid directly from is_path_accessible cfids will soon keep a list of cached fids so we should not access this directly from outside of cached_dir.c Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-12 17:40:15 -05:00
Ronnie Sahlberg	a63ec83c46	cifs: Add constructor/destructors for tcon->cfid and move the structure definitions into cached_dir.h Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-11 20:08:32 -05:00
Bharath SM	9e31678fb4	SMB3: fix lease break timeout when multiple deferred close handles for the same file. Solution is to send lease break ack immediately even in case of deferred close handles to avoid lease break request timing out and let deferred closed handle gets closed as scheduled. Later patches could optimize cases where we then close some of these handles sooner for the cases where lease break is to 'none' Cc: stable@kernel.org Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-11 20:07:06 -05:00
Steve French	5efdd9122e	smb3: allow deferred close timeout to be configurable Deferred close can be a very useful feature for allowing caching data for read, and for minimizing the number of reopens needed for a file that is repeatedly opened and close but there are workloads where its default (1 second, similar to actimeo/acregmax) is much too small. Allow the user to configure the amount of time we can defer sending the final smb3 close when we have a handle lease on the file (rather than forcing it to depend on value of actimeo which is often unrelated, and less safe). Adds new mount parameter "closetimeo=" which is the maximum number of seconds we can wait before sending an SMB3 close when we have a handle lease for it. Default value also is set to slightly larger at 5 seconds (although some other clients use larger default this should still help). Suggested-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-11 20:03:04 -05:00
Ronnie Sahlberg	dcb45fd7f5	cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir They are the same right now but tcon-> will later point to a different type of struct containing a list of cfids. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-11 20:03:04 -05:00
Linus Torvalds	8745889a7f	New code for 6.0: - Remove iomap_writepage and all callers, since the mm apparently never called the zonefs or gfs2 writepage functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmL1H7kACgkQ+H93GTRK tOvzfw/+JJQM3WjwCUg+11O9E+oKS3wbczr0yAd2m8j+EqapdndXzIVevcZKXoTx K4zOK9oDecPtRKgQkvrDt7HrMB7oYv8tuSzyfcsNVHbMA6U3twkLdr5c19/lm9uj rnP2Xrs0RkiiFpImmTHsviPEyzniJ+BjtRDF7FxSFELxREae4EQW3YX2MjffvqQA dT+xXptWiOSa3ygwfoGqVeOLOMt0DqXICiV0GLrGxD6S7TLRRIPo7ojYS4703vUL VFTAUvhC4CD9/vsEwPnl91Jq2s06tO3LE4V6vJDPI7/uQFPcubLmcK8GpaYB6+OQ q9Fhpc9cU/3JTKt6Sw9uNOqA5hfUKBdJmhWE3FqZ2arql2C9tY2o+cHvRBKZWMZ9 FdLKSwsuDpL+pYsWOPn7wU8BHZVTDDl7CtDNTCurNkkNgaAbK8C0X7QcT16RRyDF SAPHlg0XFewLgJ+9HNyDv70VT1VLYiJNq/h0d/EMO1+FuT4ArBOTOSe4zNNXqD3w vVFtbBhjGMf1ffqiMM5GdOPh0vxacL8jfxM7xyQ4yooSkecZCEvtNnuCysNTFDbl 53b9bjk+OSuWCb7efE6p82wU+gr617Zp2/YxALl4E0FlozeRHuRimWBtABZqi/g6 aKJL42ASY+PLJPACDjo0LhDFuCRbd75OATUGtBva7mkYWUANlMc= =FuyV -----END PGP SIGNATURE----- Merge tag 'iomap-6.0-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull more iomap updates from Darrick Wong: "In the past 10 days or so I've not heard any ZOMG STOP style complaints about removing ->writepage support from gfs2 or zonefs, so here's the pull request removing them (and the underlying fs iomap support) from the kernel: - Remove iomap_writepage and all callers, since the mm apparently never called the zonefs or gfs2 writepage functions" * tag 'iomap-6.0-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: remove iomap_writepage zonefs: remove ->writepage gfs2: remove ->writepage gfs2: stop using generic_writepages in gfs2_ail1_start_one	2022-08-11 13:11:49 -07:00
Linus Torvalds	786da5da56	We have a good pile of various fixes and cleanups from Xiubo, Jeff, Luis and others, almost exclusively in the filesystem. Several patches touch files outside of our normal purview to set the stage for bringing in Jeff's long awaited ceph+fscrypt series in the near future. All of them have appropriate acks and sat in linux-next for a while. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmL1HF8THGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHziwOuB/97JKHFuOlP1HrD6fYe5a0ul9zC9VG4 57XPDNqG2PSmfXCjvZhyVU4n53sUlJTqzKDSTXydoPCMQjtyHvysA6gEvcgUJFPd PHaZDCd9TmqX8my67NiTK70RVpNR9BujJMVMbOfM+aaisl0K6WQbitO+BfhEiJcK QStdKm5lPyf02ESH9jF+Ga0DpokARaLbtDFH7975owxske6gWuoPBCJNrkMooKiX LjgEmNgH1F/sJSZXftmKdlw9DtGBFaLQBdfbfSB5oVPRb7chI7xBeraNr6Od3rls o4davbFkcsOr+s6LJPDH2BJobmOg+HoMoma7ezspF7ZqBF4Uipv5j3VC =1427 -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.20-rc1' of https://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: "We have a good pile of various fixes and cleanups from Xiubo, Jeff, Luis and others, almost exclusively in the filesystem. Several patches touch files outside of our normal purview to set the stage for bringing in Jeff's long awaited ceph+fscrypt series in the near future. All of them have appropriate acks and sat in linux-next for a while" * tag 'ceph-for-5.20-rc1' of https://github.com/ceph/ceph-client: (27 commits) libceph: clean up ceph_osdc_start_request prototype libceph: fix ceph_pagelist_reserve() comment typo ceph: remove useless check for the folio ceph: don't truncate file in atomic_open ceph: make f_bsize always equal to f_frsize ceph: flush the dirty caps immediatelly when quota is approaching libceph: print fsid and epoch with osd id libceph: check pointer before assigned to "c->rules[]" ceph: don't get the inline data for new creating files ceph: update the auth cap when the async create req is forwarded ceph: make change_auth_cap_ses a global symbol ceph: fix incorrect old_size length in ceph_mds_request_args ceph: switch back to testing for NULL folio->private in ceph_dirty_folio ceph: call netfs_subreq_terminated with was_async == false ceph: convert to generic_file_llseek ceph: fix the incorrect comment for the ceph_mds_caps struct ceph: don't leak snap_rwsem in handle_cap_grant ceph: prevent a client from exceeding the MDS maximum xattr size ceph: choose auth MDS for getxattr with the Xs caps ceph: add session already open notify support ...	2022-08-11 12:41:07 -07:00
Ronnie Sahlberg	05b98fd2da	cifs: Move cached-dir functions into a separate file Also rename crfid to cfid to have consistent naming for this variable. This commit does not change any logic. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-11 10:33:18 -05:00
Atte Heikkilä	4963d74f8a	ksmbd: request update to stale share config ksmbd_share_config_get() retrieves the cached share config as long as there is at least one connection to the share. This is an issue when the user space utilities are used to update share configs. In that case there is a need to inform ksmbd that it should not use the cached share config for a new connection to the share. With these changes the tree connection flag KSMBD_TREE_CONN_FLAG_UPDATE indicates this. When this flag is set, ksmbd removes the share config from the shares hash table meaning that ksmbd_share_config_get() ends up requesting a share config from user space. Signed-off-by: Atte Heikkilä <atteh.mailbox@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-11 01:05:18 -05:00
Namjae Jeon	fe54833dc8	ksmbd: return STATUS_BAD_NETWORK_NAME error status if share is not configured If share is not configured in smb.conf, smb2 tree connect should return STATUS_BAD_NETWORK_NAME instead of STATUS_BAD_NETWORK_PATH. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-11 01:05:18 -05:00
David Howells	cd04345598	cifs: Remove {cifs,nfs}_fscache_release_page() Remove {cifs,nfs}_fscache_release_page() from fs/cifs/fscache.h. This functionality got built directly into cifs_release_folio() and will hopefully be replaced with netfs_release_folio() at some point. The "nfs_" version is a copy and paste error and should've been altered to read "cifs_". That can also be removed. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> cc: Steve French <smfrench@gmail.com> cc: linux-cifs@vger.kernel.org cc: samba-technical@lists.samba.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-10 21:26:08 -05:00
hexiaole	031d166f96	xfs: fix inode reservation space for removing transaction In 'fs/xfs/libxfs/xfs_trans_resv.c', the comment for transaction of removing a directory entry writes: /* fs/xfs/libxfs/xfs_trans_resv.c begin / / * For removing a directory entry we can modify: * the parent directory inode: inode size * the removed inode: inode size ... xfs_calc_remove_reservation( struct xfs_mount mp) { return XFS_DQUOT_LOGRES(mp) + xfs_calc_iunlink_add_reservation(mp) + max((xfs_calc_inode_res(mp, 1) + ... / fs/xfs/libxfs/xfs_trans_resv.c end */ There has 2 inode size of space to be reserverd, but the actual code for inode reservation space writes. There only count for 1 inode size to be reserved in 'xfs_calc_inode_res(mp, 1)', rather than 2. Signed-off-by: hexiaole <hexiaole@kylinos.cn> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: remove redundant code citations] Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-08-10 17:43:49 -07:00
Paulo Alcantara	773891ffd4	cifs: fix lock length calculation The lock length was wrongly set to 0 when fl_end == OFFSET_MAX, thus failing to lock the whole file when l_start=0 and l_len=0. This fixes test 2 from cthon04. Before patch: $ ./cthon04/lock/tlocklfs -t 2 /mnt Creating parent/child synchronization pipes. Test #1 - Test regions of an unlocked file. Parent: 1.1 - F_TEST [ 0, 1] PASSED. Parent: 1.2 - F_TEST [ 0, ENDING] PASSED. Parent: 1.3 - F_TEST [ 0,7fffffffffffffff] PASSED. Parent: 1.4 - F_TEST [ 1, 1] PASSED. Parent: 1.5 - F_TEST [ 1, ENDING] PASSED. Parent: 1.6 - F_TEST [ 1,7fffffffffffffff] PASSED. Parent: 1.7 - F_TEST [7fffffffffffffff, 1] PASSED. Parent: 1.8 - F_TEST [7fffffffffffffff, ENDING] PASSED. Parent: 1.9 - F_TEST [7fffffffffffffff,7fffffffffffffff] PASSED. Test #2 - Try to lock the whole file. Parent: 2.0 - F_TLOCK [ 0, ENDING] PASSED. Child: 2.1 - F_TEST [ 0, 1] FAILED! Child: ** Expected EACCES, returned success... Child: Probably implementation error. CHILD pass 1 results: 0/0 pass, 0/0 warn, 1/1 fail (pass/total). Parent: Child died PARENT pass 1 results: 10/10 pass, 0/0 warn, 0/0 fail (pass/total). After patch: $ ./cthon04/lock/tlocklfs -t 2 /mnt Creating parent/child synchronization pipes. Test #2 - Try to lock the whole file. Parent: 2.0 - F_TLOCK [ 0, ENDING] PASSED. Child: 2.1 - F_TEST [ 0, 1] PASSED. Child: 2.2 - F_TEST [ 0, ENDING] PASSED. Child: 2.3 - F_TEST [ 0,7fffffffffffffff] PASSED. Child: 2.4 - F_TEST [ 1, 1] PASSED. Child: 2.5 - F_TEST [ 1, ENDING] PASSED. Child: 2.6 - F_TEST [ 1,7fffffffffffffff] PASSED. Child: 2.7 - F_TEST [7fffffffffffffff, 1] PASSED. Child: 2.8 - F_TEST [7fffffffffffffff, ENDING] PASSED. Child: 2.9 - F_TEST [7fffffffffffffff,7fffffffffffffff] PASSED. Parent: 2.10 - F_ULOCK [ 0, ENDING] PASSED. PARENT pass 1 results: 2/2 pass, 0/0 warn, 0/0 fail (pass/total). ** CHILD pass 1 results: 9/9 pass, 0/0 warn, 0/0 fail (pass/total). Fixes: `d80c69846d` ("cifs: fix signed integer overflow when fl_end is OFFSET_MAX") Reported-by: Xiaoli Feng <xifeng@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-10 16:45:37 -05:00
Linus Torvalds	aeb6e6ac18	NFS client updates for Linux 5.20 Highlights include: Stable fixes: - pNFS/flexfiles: Fix infinite looping when the RDMA connection errors out Bugfixes: - NFS: fix port value parsing - SUNRPC: Reinitialise the backchannel request buffers before reuse - SUNRPC: fix expiry of auth creds - NFSv4: Fix races in the legacy idmapper upcall - NFS: O_DIRECT fixes from Jeff Layton - NFSv4.1: Fix OP_SEQUENCE error handling - SUNRPC: Fix an RPC/RDMA performance regression - NFS: Fix case insensitive renames - NFSv4/pnfs: Fix a use-after-free bug in open - NFSv4.1: RECLAIM_COMPLETE must handle EACCES Features: - NFSv4.1: session trunking enhancements - NFSv4.2: READ_PLUS performance optimisations - NFS: relax the rules for rsize/wsize mount options - NFS: don't unhash dentry during unlink/rename - SUNRPC: Fail faster on bad verifier - NFS/SUNRPC: Various tracing improvements -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmLzy24ACgkQZwvnipYK APJJvhAAnVmv3jeOLjwDm+3X9xBevTq8PXAikWVeJgbSKCdjfqXO1J+XF49MpxXl N8PQZMqnwWkF3WvhvYobblwGSl6gJ3RnhVuTgdv4jiPl0ZWyS3XngJY0dCTnNgdL 5O9AjoOMtnGVZwN5j8agymA8f2TUcel5mED6sAk10t2zDZY7VxuQqjp0m696jYjF 0PKBmxoC4+6tXtYJS7d2PGRCTjfEUx2BLnnGuLOKsB7X8f63XmxJiu8/AIiY7spr M/M+BjAF3ok86dT1LjGlkvSNp23H70Wmsv98udfvxIWBm1l4972oCR/CS+kh3/mU dYM+oQ3JxxgKEN7Fdak+zU/+qma9q5z2rPFpSIT1fMEuaKN/7H2cbiHPi5RnEBLa AHWilX/lWBIMnZJZd9g3yYcGe6E/pkT6TqW5JY+2510koyfNER4IismAWMx2iYKU D0WSZOkmEBS/OYZxpTnqGwvS4L1szo9DN3c+yG2KXLifnmVPpjXZO25wahqSuUo3 V6eYUCXRJmVg+IuXGsMNdrjxGYxD12xChoYzx5RlXls2lwHGeZr+iG3aL3+XayHa I1Kji3500UmfEOUUUr4UiQ428dOdL3QqNzVzdymN8Vh4d7v64LUL0GSseY+10Xrs xcbR6l/hwjBIo+I1Bi2mmv3W10tKErFy9eBIKzql3D6VHg7ESOo= =U00h -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - pNFS/flexfiles: Fix infinite looping when the RDMA connection errors out Bugfixes: - NFS: fix port value parsing - SUNRPC: Reinitialise the backchannel request buffers before reuse - SUNRPC: fix expiry of auth creds - NFSv4: Fix races in the legacy idmapper upcall - NFS: O_DIRECT fixes from Jeff Layton - NFSv4.1: Fix OP_SEQUENCE error handling - SUNRPC: Fix an RPC/RDMA performance regression - NFS: Fix case insensitive renames - NFSv4/pnfs: Fix a use-after-free bug in open - NFSv4.1: RECLAIM_COMPLETE must handle EACCES Features: - NFSv4.1: session trunking enhancements - NFSv4.2: READ_PLUS performance optimisations - NFS: relax the rules for rsize/wsize mount options - NFS: don't unhash dentry during unlink/rename - SUNRPC: Fail faster on bad verifier - NFS/SUNRPC: Various tracing improvements" * tag 'nfs-for-5.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (46 commits) NFS: Improve readpage/writepage tracing NFS: Improve O_DIRECT tracing NFS: Improve write error tracing NFS: don't unhash dentry during unlink/rename NFSv4/pnfs: Fix a use-after-free bug in open NFS: nfs_async_write_reschedule_io must not recurse into the writeback code SUNRPC: Don't reuse bvec on retransmission of the request SUNRPC: Reinitialise the backchannel request buffers before reuse NFSv4.1: RECLAIM_COMPLETE must handle EACCES NFSv4.1 probe offline transports for trunking on session creation SUNRPC create a function that probes only offline transports SUNRPC export xprt_iter_rewind function SUNRPC restructure rpc_clnt_setup_test_and_add_xprt NFSv4.1 remove xprt from xprt_switch if session trunking test fails SUNRPC create an rpc function that allows xprt removal from rpc_clnt SUNRPC enable back offline transports in trunking discovery SUNRPC create an iterator to list only OFFLINE xprts NFSv4.1 offline trunkable transports on DESTROY_SESSION SUNRPC add function to offline remove trunkable transports SUNRPC expose functions for offline remote xprt functionality ...	2022-08-10 14:04:32 -07:00
Linus Torvalds	b1701d5e29	- hugetlb_vmemmap cleanups from Muchun Song - hardware poisoning support for 1GB hugepages, from Naoya Horiguchi - highmem documentation fixups from Fabio De Francesco -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYvMFaQAKCRDdBJ7gKXxA jrM7AQCsaVsrRJgVMNqjDvLAsWFEm48s6/smQT4UZ9n/rPQUsAEA0iRpOfbjcxwK npAml1mp5uP2AkimTVLB6Bokt6N+wAg= =9Wta -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull remaining MM updates from Andrew Morton: "Three patch series - two that perform cleanups and one feature: - hugetlb_vmemmap cleanups from Muchun Song - hardware poisoning support for 1GB hugepages, from Naoya Horiguchi - highmem documentation fixups from Fabio De Francesco" * tag 'mm-stable-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits) Documentation/mm: add details about kmap_local_page() and preemption highmem: delete a sentence from kmap_local_page() kdocs Documentation/mm: rrefer kmap_local_page() and avoid kmap() Documentation/mm: avoid invalid use of addresses from kmap_local_page() Documentation/mm: don't kmap*() pages which can't come from HIGHMEM highmem: specify that kmap_local_page() is callable from interrupts highmem: remove unneeded spaces in kmap_local_page() kdocs mm, hwpoison: enable memory error handling on 1GB hugepage mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage mm, hwpoison: make __page_handle_poison returns int mm, hwpoison: set PG_hwpoison for busy hugetlb pages mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage mm, hwpoison, hugetlb: support saving mechanism of raw error pages mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() mm: hugetlb_vmemmap: use PTRS_PER_PTE instead of PMD_SIZE / PAGE_SIZE mm: hugetlb_vmemmap: move code comments to vmemmap_dedup.rst mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability mm: hugetlb_vmemmap: replace early_param() with core_param() mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c ...	2022-08-10 11:18:00 -07:00
Dan Carpenter	d4073595d0	fs/ntfs3: uninitialized variable in ntfs_set_acl_ex() The goto out calls kfree(value) on an uninitialized pointer. Just return directly as the other error paths do. Fixes: `460bbf2990` ("fs/ntfs3: Do not change mode if ntfs_set_ea failed") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-10 19:12:58 +03:00
Jiapeng Chong	96964352e2	fs/ntfs3: Remove unused function wnd_bits Since the function wnd_bits is defined but not called in any file, it is a useless function, and we delete it in view of the brevity of the code. Remove some warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/ntfs3/bitmap.c:54:19: warning: unused function 'wnd_bits' [-Wunused-function]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-10 19:12:48 +03:00
Linus Torvalds	e394ff83bb	NFSD 6.0 Release Notes Work on "courteous server", which was introduced in 5.19, continues apace. This release introduces a more flexible limit on the number of NFSv4 clients that NFSD allows, now that NFSv4 clients can remain in courtesy state long after the lease expiration timeout. The client limit is adjusted based on the physical memory size of the server. The NFSD filecache is a cache of files held open by NFSv4 clients or recently touched by NFSv2 or NFSv3 clients. This cache had some significant scalability constraints that have been relieved in this release. Thanks to all who contributed to this work. A data corruption bug found during the most recent NFS bake-a-thon that involves NFSv3 and NFSv4 clients writing the same file has been addressed in this release. This release includes several improvements in CPU scalability for NFSv4 operations. In addition, Neil Brown provided patches that simplify locking during file lookup, creation, rename, and removal that enables subsequent work on making these operations more scalable. We expect to see that work materialize in the next release. There are also numerous single-patch fixes, clean-ups, and the usual improvements in observability. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmLujF0ACgkQM2qzM29m f5c14w/+Lsoryo5vdTXMAZBXBNvVdXQmHqLIEotJEVA3sECr+Kad2bF8rFaCWVzS Gf9KhTetmcDO9O73I5I/UtJ2qFT9B4I6baSGpOzIkjsM/aKeEEQpbpdzPYhKrCEv bQu3P54js7snH4YV8s0I39nBFOWdnahYXaw7peqE/2GOHxaR2mz88bkrQ+OybCxz KETqUxA6bKzkOT61S0nHcnQKd8HQzhocMDtrxtANHGsMM167ngI1dw4tUQAtfAUI s9R+GS6qwiKgwGz1oqhTR6LA/h4DROxPnc7AieuD9FvuAnR3kXw61bGMN5Biwv2T JZUTBbQvWhNasSV+7qOY9nBu+sHVC6Q7OZ5C9F/KjMyqCioDX0DnbxX9uKP20CDd EAAMS8n4Tdgd4KRBWdkLXPzizWYAjZQmFIJtcZne1JzGZ4IWRnikgM5qD6n1VviZ kcPRm5EN3DRHA+Hte4jG0EHIrE/7g5gnf+zr9dWl3uNhZtfTmumCfU16YYmKG8pP QN4kXBR2w7dAvp8nRaOsY6bBFLDAk/jHbpY8Q4xoUO4tsojfWayCTGVFOrecOjxv uSn0LhiidC5pLlkcPgwemhysVywDzr+gGXBRJXeUOHfdd05Q2gbFK8OpqDSvJ3dZ aC/RxFvHc8jaktUcuIjkE6Rsz6AVaAH3EZj84oMZ4hZhyGbEreg= =PEJ3 -----END PGP SIGNATURE----- Merge tag 'nfsd-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: "Work on 'courteous server', which was introduced in 5.19, continues apace. This release introduces a more flexible limit on the number of NFSv4 clients that NFSD allows, now that NFSv4 clients can remain in courtesy state long after the lease expiration timeout. The client limit is adjusted based on the physical memory size of the server. The NFSD filecache is a cache of files held open by NFSv4 clients or recently touched by NFSv2 or NFSv3 clients. This cache had some significant scalability constraints that have been relieved in this release. Thanks to all who contributed to this work. A data corruption bug found during the most recent NFS bake-a-thon that involves NFSv3 and NFSv4 clients writing the same file has been addressed in this release. This release includes several improvements in CPU scalability for NFSv4 operations. In addition, Neil Brown provided patches that simplify locking during file lookup, creation, rename, and removal that enables subsequent work on making these operations more scalable. We expect to see that work materialize in the next release. There are also numerous single-patch fixes, clean-ups, and the usual improvements in observability" * tag 'nfsd-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (78 commits) lockd: detect and reject lock arguments that overflow NFSD: discard fh_locked flag and fh_lock/fh_unlock NFSD: use (un)lock_inode instead of fh_(un)lock for file operations NFSD: use explicit lock/unlock for directory ops NFSD: reduce locking in nfsd_lookup() NFSD: only call fh_unlock() once in nfsd_link() NFSD: always drop directory lock in nfsd_unlink() NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning. NFSD: add posix ACLs to struct nfsd_attrs NFSD: add security label to struct nfsd_attrs NFSD: set attributes when creating symlinks NFSD: introduce struct nfsd_attrs NFSD: verify the opened dentry after setting a delegation NFSD: drop fh argument from alloc_init_deleg NFSD: Move copy offload callback arguments into a separate structure NFSD: Add nfsd4_send_cb_offload() NFSD: Remove kmalloc from nfsd4_do_async_copy() NFSD: Refactor nfsd4_do_copy() NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) ...	2022-08-09 14:56:49 -07:00
Trond Myklebust	3fa5cbdc44	NFS: Improve readpage/writepage tracing Switch formatting to better match that used by other NFS tracepoints. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-08-09 14:11:34 -04:00
Trond Myklebust	b313eb9152	NFS: Improve O_DIRECT tracing Switch the formatting to match the other NFS tracepoints. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-08-09 14:11:34 -04:00
Trond Myklebust	af887e437b	NFS: Improve write error tracing Don't leak request pointers, but use the "device:inode" labelling that is used by all the other trace points. Furthermore, replace use of page indexes with an offset, again in order to align behaviour with other NFS trace points. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-08-09 14:11:34 -04:00
Thadeu Lima de Souza Cascardo	e362359ace	posix-cpu-timers: Cleanup CPU timers before freeing them during exec Commit `55e8c8eb2c` ("posix-cpu-timers: Store a reference to a pid not a task") started looking up tasks by PID when deleting a CPU timer. When a non-leader thread calls execve, it will switch PIDs with the leader process. Then, as it calls exit_itimers, posix_cpu_timer_del cannot find the task because the timer still points out to the old PID. That means that armed timers won't be disarmed, that is, they won't be removed from the timerqueue_list. exit_itimers will still release their memory, and when that list is later processed, it leads to a use-after-free. Clean up the timers from the de-threaded task before freeing them. This prevents a reported use-after-free. Fixes: `55e8c8eb2c` ("posix-cpu-timers: Store a reference to a pid not a task") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220809170751.164716-1-cascardo@canonical.com	2022-08-09 20:02:13 +02:00
Linus Torvalds	15205c2829	fscache fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmLyXZ4ACgkQ+7dXa6fL C2tJCxAAhH4ufVMk1r8H/lflCulaBw917zgoOO/daO9FjKDtL6hReTN6OgJLLULw a1ZMsRxl1NJy5duGYRswUR6gS8y4LzTdILvqfvP+9YSoZzou+uTHHuKFJheA6x++ /cRQFBQmx8JTBrEQXSHULa3FfKsU0UCwxCzy7q4o5zJogK6yiLhApxORIB5ZYi7W k/3iKA0isqKsSEwmRt3D9ekypb3E/8QGaQV7Bng6/1wldFFmL1g5w/ubY8TCsefV gnNUA3Ops3LsYj9a0vQGJ6lXKIol2nFClvmFM1qvb09u4PaVa2tofXd5FQuy/iC/ z2+ULUtBi7gs+DjsxPBf1cnbeA3BzNu5BfdQFd3QJksKFHcIMGsKnTQehPuVPhdn p0IGZrf3/sncmXWVZ322w+R0TLiwB0CX9fh4WS5aYw9WHtRg+FGeFCoVLSV774M0 OiRxefg6A4sBMP2XwHICRXWPtqnBj2PYhQ6bj6pCHJg4sGfrYnf/q7vWFI1rJ9JW 258lhyWaaO0SqPMtdYamBQzPQAAsuG72Mrf5k3pFLWIuPPb9Ho3B7fwbtjcMQcbv q+apsiiPk/aqbjbI8YclOD+XgRQMUaRO2s2+JCAQvoFoWPRLYE6ZaTDdesN0nHJB PgTFACx/dPtTCpQxIUDldDsj8wk4iPmPZiQ31gVzKr53v2Jvkhc= =LWMd -----END PGP SIGNATURE----- Merge tag 'fscache-fixes-20220809' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache updates from David Howells: - Fix a cookie access ref leak if a cookie is invalidated a second time before the first invalidation is actually processed. - Add a tracepoint to log cookie lookup failure * tag 'fscache-fixes-20220809' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache: add tracepoint when failing cookie fscache: don't leak cookie access refs if invalidation is in progress or failed	2022-08-09 10:11:56 -07:00
Linus Torvalds	4b22e20741	AFS fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmLpXWwACgkQ+7dXa6fL C2tpSg//WX4+1XvsehIZxykJYmQ7XjVjrg9uGIEaslUry2nqEECDeK7LppKLuCoD 1xMBc5GRn4h5YbBCGbMWnr+r6eezcXdwr5Fv+On4fD3l5AWTk5xTaR8v1So3G9bK nAkWaoSrFapuifWgBkDGy9NhLW/k6d24BAKCqGhOlPcVzi4B95C4gQ5sIlMlH9xw jUkHB+HSLZj/yBV47sWQUs0uy5fCiOgm3qgWYl2cTHD4qJ052i0ELO30uAv9Oykq YjgOUryZlCP/vLSWzOeK8bgINjCLetd+2jGT96bV15BouQ4uoHSrgt7ACtp1I3NP WAPGdzlzXZqaYSrqX2sPP/Tmq0/SipoKCRqK5e2yetmqs7zznKZe0ATwFhjJYp7L IOULEaOAyrAeT9KQEwqtANXXoUxyEUqz2GKiybXWC8oUFJ0RAKpSf4LRi9cpoaRz VDmi0F4/ZA+nMDOPPdbSZTtDCu+68Lz2l6Z7ONwIZ1qzyXJB4BC4wb+9rx7UftA9 QuX9TqGUtjlWiPcM0qvdFoPOIA3xOXbZVtlvqygFxzDDXGlRVZdUa3uqMjELKzK1 v1yRZIS00zqWM4lWFyaQNCZHwFhlnixRzkAmvQcioAb/NJL3eA4n3FbL+RBTSrWP XEtgpI32ROobA3S+69LomYH4AisMgP9XDP04mOQqQfIxg9GFdEI= =fXqy -----END PGP SIGNATURE----- Merge tag 'afs-fixes-20220802' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: "Fix AFS refcount handling. The first patch converts afs to use refcount_t for its refcounts and the second patch fixes afs_put_call() and afs_put_server() to save the values they're going to log in the tracepoint before decrementing the refcount" * tag 'afs-fixes-20220802' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix access after dec in put functions afs: Use refcount_t rather than atomic_t	2022-08-09 10:08:08 -07:00
Linus Torvalds	426b4ca2d6	fs.setgid.v6.0 -----BEGIN PGP SIGNATURE----- iHQEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYvIYIgAKCRCRxhvAZXjc omE8AQDAZG2YjNJfMnUUaaWYO3+zTaHlQp7OQkQTXIHfcfViXQD4vPt3Wxh3rrF+ J8fwNcWmXhSNei8HP6cA06QmSajnDQ== =GF9/ -----END PGP SIGNATURE----- Merge tag 'fs.setgid.v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull setgid updates from Christian Brauner: "This contains the work to move setgid stripping out of individual filesystems and into the VFS itself. Creating files that have both the S_IXGRP and S_ISGID bit raised in directories that themselves have the S_ISGID bit set requires additional privileges to avoid security issues. When a filesystem creates a new inode it needs to take care that the caller is either in the group of the newly created inode or they have CAP_FSETID in their current user namespace and are privileged over the parent directory of the new inode. If any of these two conditions is true then the S_ISGID bit can be raised for an S_IXGRP file and if not it needs to be stripped. However, there are several key issues with the current implementation: - S_ISGID stripping logic is entangled with umask stripping. For example, if the umask removes the S_IXGRP bit from the file about to be created then the S_ISGID bit will be kept. The inode_init_owner() helper is responsible for S_ISGID stripping and is called before posix_acl_create(). So we can end up with two different orderings: 1. FS without POSIX ACL support First strip umask then strip S_ISGID in inode_init_owner(). In other words, if a filesystem doesn't support or enable POSIX ACLs then umask stripping is done directly in the vfs before calling into the filesystem: 2. FS with POSIX ACL support First strip S_ISGID in inode_init_owner() then strip umask in posix_acl_create(). In other words, if the filesystem does support POSIX ACLs then unmask stripping may be done in the filesystem itself when calling posix_acl_create(). Note that technically filesystems are free to impose their own ordering between posix_acl_create() and inode_init_owner() meaning that there's additional ordering issues that influence S_ISGID inheritance. (Note that the commit message of commit `1639a49ccd` ("fs: move S_ISGID stripping into the vfs_() helpers") gets the ordering between inode_init_owner() and posix_acl_create() the wrong way around. I realized this too late.) - Filesystems that don't rely on inode_init_owner() don't get S_ISGID stripping logic. While that may be intentional (e.g. network filesystems might just defer setgid stripping to a server) it is often just a security issue. Note that mandating the use of inode_init_owner() was proposed as an alternative solution but that wouldn't fix the ordering issues and there are examples such as afs where the use of inode_init_owner() isn't possible. In any case, we should also try the cleaner and generalized solution first before resorting to this approach. - We still have S_ISGID inheritance bugs years after the initial round of S_ISGID inheritance fixes: `e014f37db1` ("xfs: use setattr_copy to set vfs inode attributes") `01ea173e10` ("xfs: fix up non-directory creation in SGID directories") `fd84bfdddd` ("ceph: fix up non-directory creation in SGID directories") All of this led us to conclude that the current state is too messy. While we won't be able to make it completely clean as posix_acl_create() is still a filesystem specific call we can improve the S_SIGD stripping situation quite a bit by hoisting it out of inode_init_owner() and into the respective vfs creation operations. The obvious advantage is that we don't need to rely on individual filesystems getting S_ISGID stripping right and instead can standardize the ordering between S_ISGID and umask stripping directly in the VFS. A few short implementation notes: - The stripping logic needs to happen in vfs_() helpers for the sake of stacking filesystems such as overlayfs that rely on these helpers taking care of S_ISGID stripping. - Security hooks have never seen the mode as it is ultimately seen by the filesystem because of the ordering issue we mentioned. Nothing is changed for them. We simply continue to strip the umask before passing the mode down to the security hooks. - The following filesystems use inode_init_owner() and thus relied on S_ISGID stripping: spufs, 9p, bfs, btrfs, ext2, ext4, f2fs, hfsplus, hugetlbfs, jfs, minix, nilfs2, ntfs3, ocfs2, omfs, overlayfs, ramfs, reiserfs, sysv, ubifs, udf, ufs, xfs, zonefs, bpf, tmpfs. We've audited all callchains as best as we could. More details can be found in the commit message to `1639a49ccd` ("fs: move S_ISGID stripping into the vfs_() helpers")" tag 'fs.setgid.v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: ceph: rely on vfs for setgid stripping fs: move S_ISGID stripping into the vfs_*() helpers fs: Add missing umask strip in vfs_tmpfile fs: add mode_strip_sgid() helper	2022-08-09 09:52:28 -07:00
Jeff Layton	1a1e3aca9d	fscache: add tracepoint when failing cookie Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com>	2022-08-09 14:13:59 +01:00
Jeff Layton	fb24771faf	fscache: don't leak cookie access refs if invalidation is in progress or failed It's possible for a request to invalidate a fscache_cookie will come in while we're already processing an invalidation. If that happens we currently take an extra access reference that will leak. Only call __fscache_begin_cookie_access if the FSCACHE_COOKIE_DO_INVALIDATE bit was previously clear. Also, ensure that we attempt to clear the bit when the cookie is "FAILED" and put the reference to avoid an access leak. Fixes: `85e4ea1049` ("fscache: Fix invalidation/lookup race") Suggested-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com>	2022-08-09 14:13:55 +01:00
Linus Torvalds	eb555cb5b7	12 server fixes, including for oopses, memory leaks and adding a channel lock -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmLwixsACgkQiiy9cAdy T1FT6Av+NY7R2f7/vDRzX/8u1fFXmyQDU2/dl3S0ysqqFEYE5hnHgbFtVP1IYwId /IAV8R6E/YMm8Nkbkf173EqHI8E9QNUSngTCk1bcvsiNSW4z3QOsujs9B6oTUeXM ARthU4eFJQW0wcTyjElVcm59I/jdtpQKaoVDQ/uXlCjPMT+0BUHS4mFRJkAx6DrX 7wOO0vJ/WCqO2u0rSvK4unOZsvhrKHibtPMvQSiV6k3a+LVCJ5/5lwBpENKK4Mdw n4LeNhSDtm6HE6WDFIxSj008WPX7CF3JvX+zvZJEsB7uFNry/GVkWySPLqVYUmtp aSftdnWnw0Mv9AG3L4OMzyCQQP89ccoV81d6lyyJEBaGiOFhH8L1v6b7qAStCZue zyyFWIVUXnstQKwDY8QWLKjmi7H+I1drFExxn0vABInPcaijQE8Mct0fSt015Tc1 EuCsO1JRyo+zn0mx7a1L80kHUH+yvHjKkfbINvPSa+qTFy21bgQZcG5CWSPpjadr 4zibpKko =cDaA -----END PGP SIGNATURE----- Merge tag '5.20-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd Pull ksmbd updates from Steve French: - fixes for memory access bugs (out of bounds access, oops, leak) - multichannel fixes - session disconnect performance improvement, and session register improvement - cleanup * tag '5.20-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix heap-based overflow in set_ntacl_dacl() ksmbd: prevent out of bound read for SMB2_TREE_CONNNECT ksmbd: prevent out of bound read for SMB2_WRITE ksmbd: fix use-after-free bug in smb2_tree_disconect ksmbd: fix memory leak in smb2_handle_negotiate ksmbd: fix racy issue while destroying session on multichannel ksmbd: use wait_event instead of schedule_timeout() ksmbd: fix kernel oops from idr_remove() ksmbd: add channel rwlock ksmbd: replace sessions list in connection with xarray MAINTAINERS: ksmbd: add entry for documentation ksmbd: remove unused ksmbd_share_configs_cleanup function	2022-08-08 20:15:13 -07:00
Linus Torvalds	f30adc0d33	iov_iter stuff, part 2, rebased * more new_sync_{read,write}() speedups - ITER_UBUF introduction * ITER_PIPE cleanups * unification of iov_iter_get_pages/iov_iter_get_pages_alloc and switching them to advancing semantics * making ITER_PIPE take high-order pages without splitting them * handling copy_page_from_iter() for high-order pages properly Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYvHI8QAKCRBZ7Krx/gZQ 62CQAPsGlbebqBeAT2pMulaGDxfLAsgz5Yf4BEaMLhPtRqFOQgD+KrZQId7Sd8O0 3IWucpTb2c4jvLlXhGMS+XWnusQH+AQ= =pBux -----END PGP SIGNATURE----- Merge tag 'pull-work.iov_iter-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more iov_iter updates from Al Viro: - more new_sync_{read,write}() speedups - ITER_UBUF introduction - ITER_PIPE cleanups - unification of iov_iter_get_pages/iov_iter_get_pages_alloc and switching them to advancing semantics - making ITER_PIPE take high-order pages without splitting them - handling copy_page_from_iter() for high-order pages properly * tag 'pull-work.iov_iter-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits) fix copy_page_from_iter() for compound destinations hugetlbfs: copy_page_to_iter() can deal with compound pages copy_page_to_iter(): don't split high-order page in case of ITER_PIPE expand those iov_iter_advance()... pipe_get_pages(): switch to append_pipe() get rid of non-advancing variants ceph: switch the last caller of iov_iter_get_pages_alloc() 9p: convert to advancing variant of iov_iter_get_pages_alloc() af_alg_make_sg(): switch to advancing variant of iov_iter_get_pages() iter_to_pipe(): switch to advancing variant of iov_iter_get_pages() block: convert to advancing variants of iov_iter_get_pages{,_alloc}() iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() iov_iter: saner helper for page array allocation fold __pipe_get_pages() into pipe_get_pages() ITER_XARRAY: don't open-code DIV_ROUND_UP() unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts unify xarray_get_pages() and xarray_get_pages_alloc() unify pipe_get_pages() and pipe_get_pages_alloc() iov_iter_get_pages(): sanity-check arguments iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into wrapper ...	2022-08-08 20:04:35 -07:00
Al Viro	c7d57ab163	hugetlbfs: copy_page_to_iter() can deal with compound pages ... since April 2021 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-08-08 22:37:26 -04:00
Al Viro	b53589927d	ceph: switch the last caller of iov_iter_get_pages_alloc() here nothing even looks at the iov_iter after the call, so we couldn't care less whether it advances or not. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-08-08 22:37:24 -04:00
Al Viro	7d690c157c	iter_to_pipe(): switch to advancing variant of iov_iter_get_pages() ... and untangle the cleanup on failure to add into pipe. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-08-08 22:37:23 -04:00
Al Viro	1ef255e257	iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() Most of the users immediately follow successful iov_iter_get_pages() with advancing by the amount it had returned. Provide inline wrappers doing that, convert trivial open-coded uses of those. BTW, iov_iter_get_pages() never returns more than it had been asked to; such checks in cifs ought to be removed someday... Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-08-08 22:37:22 -04:00
Al Viro	0d96493413	splice: stop abusing iov_iter_advance() to flush a pipe Use pipe_discard_from() explicitly in generic_file_read_iter(); don't bother with rather non-obvious use of iov_iter_advance() in there. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-08-08 22:37:16 -04:00
Al Viro	3e20a751af	switch new_sync_{read,write}() to ITER_UBUF Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-08-08 22:37:15 -04:00
Al Viro	fcb14cb1bd	new iov_iter flavour - ITER_UBUF Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(), checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC ones. We are going to expose the things like ->write_iter() et.al. to those in subsequent commits. New predicate (user_backed_iter()) that is true for ITER_IOVEC and ITER_UBUF; places like direct-IO handling should use that for checking that pages we modify after getting them from iov_iter_get_pages() would need to be dirtied. DO NOT assume that replacing iter_is_iovec() with user_backed_iter() will solve all problems - there's code that uses iter_is_iovec() to decide how to poke around in iov_iter guts and for that the predicate replacement obviously won't suffice. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-08-08 22:37:15 -04:00
Muchun Song	dff033818a	mm: hugetlb_vmemmap: introduce the name HVO It it inconvenient to mention the feature of optimizing vmemmap pages associated with HugeTLB pages when communicating with others since there is no specific or abbreviated name for it when it is first introduced. Let us give it a name HVO (HugeTLB Vmemmap Optimization) from now. This commit also updates the document about "hugetlb_free_vmemmap" by the way discussed in thread [1]. Link: https://lore.kernel.org/all/21aae898-d54d-cc4b-a11f-1bb7fddcfffa@redhat.com/ [1] Link: https://lkml.kernel.org/r/20220628092235.91270-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Will Deacon <will@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-08-08 18:06:42 -07:00
NeilBrown	3c59366c20	NFS: don't unhash dentry during unlink/rename NFS unlink() (and rename over existing target) must determine if the file is open, and must perform a "silly rename" instead of an unlink (or before rename) if it is. Otherwise the client might hold a file open which has been removed on the server. Consequently if it determines that the file isn't open, it must block any subsequent opens until the unlink/rename has been completed on the server. This is currently achieved by unhashing the dentry. This forces any open attempt to the slow-path for lookup which will block on i_rwsem on the directory until the unlink/rename completes. A future patch will change the VFS to only get a shared lock on i_rwsem for unlink, so this will no longer work. Instead we introduce an explicit interlock. A special value is stored in dentry->d_fsdata while the unlink/rename is running and ->d_revalidate blocks while that value is present. When ->d_revalidate unblocks, the dentry will be invalid. This closes the race without requiring exclusion on i_rwsem. d_fsdata is already used in two different ways. 1/ an IS_ROOT directory dentry might have a "devname" stored in d_fsdata. Such a dentry doesn't have a name and so cannot be the target of unlink or rename. For safety we check if an old devname is still stored, and remove it if it is. 2/ a dentry with DCACHE_NFSFS_RENAMED set will have a 'struct nfs_unlinkdata' stored in d_fsdata. While this is set maydelete() will fail, so an unlink or rename will never proceed on such a dentry. Neither of these can be in effect when a dentry is the target of unlink or rename. So we can expect d_fsdata to be NULL, and store a special value ((void*)1) which is given the name NFS_FSDATA_BLOCKED to indicate that any lookup will be blocked. The d_count() is incremented under d_lock() when a lookup finds the dentry, so we check d_count() is low, and set NFS_FSDATA_BLOCKED under the same lock to avoid any races. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-08-08 16:25:56 -04:00
Linus Torvalds	1daf117f1d	f2fs-for-6.0 In this cycle, we mainly fixed some corner cases that manipulate a per-file compression flag inappropriately. And, we found f2fs counted valid blocks in a section incorrectly when zone capacity is set, and thus, fixed it with additional sysfs entry to check it easily. Lastly, this series includes several patches with respect to the new atomic write support such as a couple of bug fixes and re-adding atomic_write_abort support that we removed by mistake in the previous release. Enhancement: - add sysfs entries to understand atomic write operations and zone capacity - introduce memory mode to get a hint for low-memory devices - adjust the waiting time of foreground GC - decompress clusters under softirq to avoid non-deterministic latency - do not skip updating inode when retrying to flush node page - enforce single zone capacity Bug fix: - set the compression/no-compression flags correctly - revive F2FS_IOC_ABORT_VOLATILE_WRITE - check inline_data during compressed inode conversion - understand zone capacity when calculating valid block count As usual, the series includes several minor clean-ups and sanity checks. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmLxOccACgkQQBSofoJI UNId/g/+Nx3FK874cyobE1PnPpUtfxLGqO9fjhrbje3bniTpgE9NtJUFg5hRQkxE XHuufMrW++aBhn2ESMjbfdQ3v6vy5XUy7bi4FR71KxW4qp15mAqjTPfAZBFKZfMv lCv54NKlura91GhI9Dl6JgGe1+MwNXIxVROyGvjXYogF0DWl+iJh4vYuCFUguiNU mP6FmnZvbtK89jYxODoqwQaC+b6DV7ceaQ+c0dtS5TRvsUNv5mjWDeTvPMgk3At/ mAuWYXfIrf5xfDY93JPbrJhBLvu7Ey3EfXBnaFGRYbYxYYub9JZ4+/5di/rB9jRc 9AZ6LcLX3aKaT71EWa9vdCIffz8/PcSRjsmpEuVs7KNySwcnolnb1tAzlJPKy2AV IJliY1Ef0+jrpg2lHYZoMb5qvo80c3xlyxlgZt0LSZKf1Wo41sjJVt6ZS7WLhHXu OlzeI7lZBS9RKPUtU5cGNWkmZqamvmq09mMvqF4IUIaY40MizKZoV0yh9BjuUoxM xniBIlC/q0HvwmbQ2OtNKDgv7+FdxrRlaDyhhkppa3UA8ZK3Edch26N9pBoh/r33 zJIR2BwCGmHz7yaX4HGzSt1phex2ABIGuZ4vBaGI7XDuYUD1tCZpC8wMCs2X3pKo ldQz3uu0GA0BSsNKpRks2dwRF0JJVGTk8UwcSXPwTdTTdqyhmvI= =dJ41 -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this cycle, we mainly fixed some corner cases that manipulate a per-file compression flag inappropriately. And, we found f2fs counted valid blocks in a section incorrectly when zone capacity is set, and thus, fixed it with additional sysfs entry to check it easily. Lastly, this series includes several patches with respect to the new atomic write support such as a couple of bug fixes and re-adding atomic_write_abort support that we removed by mistake in the previous release. Enhancements: - add sysfs entries to understand atomic write operations and zone capacity - introduce memory mode to get a hint for low-memory devices - adjust the waiting time of foreground GC - decompress clusters under softirq to avoid non-deterministic latency - do not skip updating inode when retrying to flush node page - enforce single zone capacity Bug fixes: - set the compression/no-compression flags correctly - revive F2FS_IOC_ABORT_VOLATILE_WRITE - check inline_data during compressed inode conversion - understand zone capacity when calculating valid block count As usual, the series includes several minor clean-ups and sanity checks" * tag 'f2fs-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (29 commits) f2fs: use onstack pages instead of pvec f2fs: intorduce f2fs_all_cluster_page_ready f2fs: clean up f2fs_abort_atomic_write() f2fs: handle decompress only post processing in softirq f2fs: do not allow to decompress files have FI_COMPRESS_RELEASED f2fs: do not set compression bit if kernel doesn't support f2fs: remove device type check for direct IO f2fs: fix null-ptr-deref in f2fs_get_dnode_of_data f2fs: revive F2FS_IOC_ABORT_VOLATILE_WRITE f2fs: fix to do sanity check on segment type in build_sit_entries() f2fs: obsolete unused MAX_DISCARD_BLOCKS f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page() f2fs: fix to remove F2FS_COMPR_FL and tag F2FS_NOCOMP_FL at the same time f2fs: introduce sysfs atomic write statistics f2fs: don't bother wait_ms by foreground gc f2fs: invalidate meta pages only for post_read required inode f2fs: allow compression of files without blocks f2fs: fix to check inline_data during compressed inode conversion f2fs: Delete f2fs_copy_page() and replace with memcpy_page() f2fs: fix to invalidate META_MAPPING before DIO write ...	2022-08-08 11:18:31 -07:00
Linus Torvalds	2bd5d41e0e	fuse update for 6.0 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYvD4IQAKCRDh3BK/laaZ PDHHAP93H+2E9c6biGd5pEaL2ABChRY+wsQURzD+SZ6AL3JaUQD+KHxA4q0MJvws L6CWcf2XptUDCLe3P6sgSTvv5Gk1OAM= =FoXF -----END PGP SIGNATURE----- Merge tag 'fuse-update-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Fix an issue with reusing the bdi in case of block based filesystems - Allow root (in init namespace) to access fuse filesystems in user namespaces if expicitly enabled with a module param - Misc fixes * tag 'fuse-update-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: retire block-device-based superblock on force unmount vfs: function to prevent re-use of block-device-based superblocks virtio_fs: Modify format for virtio_fs_direct_access virtiofs: delete unused parameter for virtio_fs_cleanup_vqs fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other fuse: Remove the control interface for virtio-fs fuse: ioctl: translate ENOSYS fuse: limit nsec fuse: avoid unnecessary spinlock bump fuse: fix deadlock between atomic O_TRUNC and page invalidation fuse: write inode in fuse_release()	2022-08-08 11:10:02 -07:00
Linus Torvalds	65512eb0e9	overlayfs update for 6.0 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYvDeCAAKCRDh3BK/laaZ PGjCAP9TVuId3X7Akroc9W+qswPzwlW3fwtE6+9F6ABeNJNPZAEAgU2bp95vqZRh OWP+ptnskceBcX/cRkfxkmgtiNE21wk= =sucY -----END PGP SIGNATURE----- Merge tag 'ovl-update-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs update from Miklos Szeredi: "Just a small update" * tag 'ovl-update-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix spelling mistakes ovl: drop WARN_ON() dentry is NULL in ovl_encode_fh() ovl: improve ovl_get_acl() if POSIX ACL support is off ovl: fix some kernel-doc comments ovl: warn if trusted xattr creation fails	2022-08-08 11:03:11 -07:00
Linus Torvalds	f72fb74b82	Description for this pull request: - fix the error code of rename syscall. - cleanup and suppress the superfluous error messages. - remove duplicate directory entry update. - add exfat git tree in MAINTAINERS. -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmLvEx0WHGxpbmtpbmpl b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCNnsD/wL8q7fPBn+xbwR00Qj8Jno0xu0 re6NjMsbZoWVWC7nWjTv4RZIsr8MGKmP3WLutPc2ujRlq5dG1wnzWv9fxfr/FFvK WCHvPH8krrsp6/As1feVqSznh3xqx21HA1Idv2qaSpuExbbWcDgPxd8dY6N0wPwY 7Jnv+kd8lY5g1fPQw0ZO4bYxVcovmp/ar2r52P7KwsNJEb1zswAT6cknwhkDnTsM OPrNO5luUSnphneTbO/syyg8JThhn4oUx/rPsU5fZUI25TczlEcA8lr+iWvGN0PX UDv5E2TE2GZzVN6pe690XBrQPZIh5T0MEmgQBk5qdiwk2p0u6Ge0X0k2nGUU7opv DuqY9U3rpzVQZsW4xGbQSnyveDUMffwg2yQdNTAA/+heRz+goaoq0svI+QlzUBNA fNOKgZNSAmi4xQKY8/VYuNzNIeizJqLLszmatBkxkVd2wONmwecHz/mls5x9XjOK xJLYqcsAoc/w/a0279G7jVYLXcSIxtVBtHCwKDW7dH1Gky6EVMqSHy/B7PFH4r7n AcEb5MbaZVRAMoNs9RPkEuFHNudtnzq0aqLOpQiDS6ElrJeq1pj9/XxuM+eemU+Y Xbut0FWokWECF3Qc6ydxJJU8uFIcM9GzwBbzq/u3f4+oUioaHtyTDx04KvJMEpFU OkNU4vRxc8hP/8O1ow== =GStu -----END PGP SIGNATURE----- Merge tag 'exfat-for-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - fix the error code of rename syscall - cleanup and suppress the superfluous error messages - remove duplicate directory entry update - add exfat git tree in MAINTAINERS * tag 'exfat-for-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: MAINTAINERS: Add Namjae's exfat git tree exfat: Drop superfluous new line for error messages exfat: Downgrade ENAMETOOLONG error message to debug messages exfat: Expand exfat_err() and co directly to pr_() macro exfat: Define NLS_NAME_ as bit flags explicitly exfat: Return ENAMETOOLONG consistently for oversized paths exfat: remove duplicate write inode for extending dir/file exfat: remove duplicate write inode for truncating file exfat: reuse __exfat_write_inode() to update directory entry	2022-08-08 10:57:09 -07:00
David Howells	e2ebff9c57	vfs: Check the truncate maximum size in inode_newsize_ok() If something manages to set the maximum file size to MAX_OFFSET+1, this can cause the xfs and ext4 filesystems at least to become corrupt. Ordinarily, the kernel protects against userspace trying this by checking the value early in the truncate() and ftruncate() system calls calls - but there are at least two places that this check is bypassed: (1) Cachefiles will round up the EOF of the backing file to DIO block size so as to allow DIO on the final block - but this might push the offset negative. It then calls notify_change(), but this inadvertently bypasses the checking. This can be triggered if someone puts an 8EiB-1 file on a server for someone else to try and access by, say, nfs. (2) ksmbd doesn't check the value it is given in set_end_of_file_info() and then calls vfs_truncate() directly - which also bypasses the check. In both cases, it is potentially possible for a network filesystem to cause a disk filesystem to be corrupted: cachefiles in the client's cache filesystem; ksmbd in the server's filesystem. nfsd is okay as it checks the value, but we can then remove this check too. Fix this by adding a check to inode_newsize_ok(), as called from setattr_prepare(), thereby catching the issue as filesystems set up to perform the truncate with minimal opportunity for bypassing the new check. Fixes: `1f08c925e7` ("cachefiles: Implement backing file wrangling") Fixes: `f441584858` ("cifsd: add file operations") Signed-off-by: David Howells <dhowells@redhat.com> Reported-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Cc: stable@kernel.org Acked-by: Alexander Viro <viro@zeniv.linux.org.uk> cc: Steve French <sfrench@samba.org> cc: Hyunchul Lee <hyc.lee@gmail.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-08-08 10:39:29 -07:00
Linus Torvalds	3bc1bc0b59	19 cifs/smb3 fixes, mostly cleanup, including smb1 refactoring -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmLvF24ACgkQiiy9cAdy T1HJjQv/aA6mIAdxCXQEmlp2RqaHldjQg/9MeG5/1si3cP60exLUbt5CY6mHMrOF xj1mR22/6KR4DuhuwADkzYwhr0itmR98f5BpNjgwUTfXse1IjnNDjc6S4nWJybhv mmnMjtMHwV3o5/gTC5YS2lA6zYWyhkiYQeMpmNkMJByNgK9dM+vPp97D8XwAm6s6 vFGpypWWCp1NWm6fHGLFO48FnbB2UnmTem36blN8Uz6fqNuBO6nsOLuXe0ovu7vH 5fOpE+6tXq3hD8YLTPRN8Zzl5h0wINVmYS/uXDvDdLJpL7vpCuy7hYqGaPQXH3KN pXNVg8l1Uy2RhKKQQFoxjIe3y0hNBG83x3AyvarP/kA6ZqOnz+k9r2KvOtz4oWht HK6FNtA3g+8hIRznYpNc7FM9F0MM5ToDi/sh+imVE4V09tPjLlx3XTnzcWqfs0oT 1q92yR+KxyYQ3OzcyCiQoNU9pcitX9CUcWItpCkk9FNjgasXhwk/yZKQiNfK3qXL NQIYkz66 =uhLb -----END PGP SIGNATURE----- Merge tag '5.20-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs updates from Steve French: "Mostly cleanup, including smb1 refactoring: - multichannel perf improvement - move additional SMB1 code to not be compiled in when legacy support is disabled. - bug fixes, including one important one for memory leak - various cleanup patches We are still working on and testing some deferred close improvements including an important lease break fix for case when multiple deferred closes are still open, and also some additional perf improvements - those are not included here" * tag '5.20-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module number cifs: alloc_mid function should be marked as static cifs: remove "cifs_" prefix from init/destroy mids functions cifs: remove useless DeleteMidQEntry() cifs: when insecure legacy is disabled shrink amount of SMB1 code cifs: trivial style fixup cifs: fix wrong unlock before return from cifs_tree_connect() cifs: avoid use of global locks for high contention data cifs: remove remaining build warnings cifs: list_for_each() -> list_for_each_entry() cifs: update MAINTAINERS file with reviewers smb2: small refactor in smb2_check_message() cifs: Fix memory leak when using fscache cifs: remove minor build warning cifs: remove some camelCase and also some static build warnings cifs: remove unnecessary (void*) conversions. cifs: remove unnecessary type castings cifs: remove redundant initialization to variable mnt_sign_enabled smb3: check xattr value length earlier	2022-08-07 10:50:59 -07:00
Linus Torvalds	eb5699ba31	Updates to various subsystems which I help look after. lib, ocfs2, fatfs, autofs, squashfs, procfs, etc. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYu9BeQAKCRDdBJ7gKXxA jp1DAP4mjCSvAwYzXklrIt+Knv3CEY5oVVdS+pWOAOGiJpldTAD9E5/0NV+VmlD9 kwS/13j38guulSlXRzDLmitbg81zAAI= =Zfum -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc updates from Andrew Morton: "Updates to various subsystems which I help look after. lib, ocfs2, fatfs, autofs, squashfs, procfs, etc. A relatively small amount of material this time" * tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits) scripts/gdb: ensure the absolute path is generated on initial source MAINTAINERS: kunit: add David Gow as a maintainer of KUnit mailmap: add linux.dev alias for Brendan Higgins mailmap: update Kirill's email profile: setup_profiling_timer() is moslty not implemented ocfs2: fix a typo in a comment ocfs2: use the bitmap API to simplify code ocfs2: remove some useless functions lib/mpi: fix typo 'the the' in comment proc: add some (hopefully) insightful comments bdi: remove enum wb_congested_state kernel/hung_task: fix address space of proc_dohung_task_timeout_secs lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t() squashfs: support reading fragments in readahead call squashfs: implement readahead squashfs: always build "file direct" version of page actor Revert "squashfs: provide backing_dev_info in order to disable read-ahead" fs/ocfs2: Fix spelling typo in comment ia64: old_rr4 added under CONFIG_HUGETLB_PAGE proc: fix test for "vsyscall=xonly" boot option ...	2022-08-07 10:03:24 -07:00
Linus Torvalds	ea0c39260d	9p-for-5.20 - a couple of fixes - add a tracepoint for fid refcounting - some cleanup/followup on fid lookup - some cleanup around req refcounting -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmLuKqkACgkQq06b7GqY 5nA+RBAAvuA6AjKQSvsNxHsqSFMahwoE3cCPlF/QlmAnnZa1DzmRI3kKTAuuDKkE Q4c7DjxzDJCWRTlkpNUCGZycHqpu1QQbfoTo43sIqob7W8xlajeemc5Fxqtw5sPM m+0SzN7vvNJpCr+6pxMXwwGHSZmZOvpFjwj7cUjhpF/V1WO8bNxfCGzQdF0hX1Vn 2HoBbFUOmacL9Z/pF3O/ZG9LwCFuRQsH0EmbFBlJy1WdDtlVHTXlzDaGQ7EGaY4D 17UR6iITsj9ozacLhvk094PLIc3/RHDGLrm3C4Ka3zmUI7BsiYWPDmai3Pu/DNqn JJ5sZkdrVowxyBbGxw8GpZ4YDJtGsU5XglFPdkw+ZazxhZLNEIstPXg0HTZybrOe GE+WskWB0qS+RpX0tnYEcX6qOHWm3/63Yq5NG6A9tLQUSFku02jS/bCQSLBrmGWW Js24IWvFSTvl6XytHeldYhJP618pNUxXRSqgYv96vT/LI3mUrIMN+IVBNPujO6p2 jIYXNoaqLoY3efXKW/WQmp7C/52ZP4ly4fOiz7qtHTQsCcIk8Xo6zwHtm/FkNEqc sMZdqgLrxKPNBAlT8iEtt//wU2fB7mFt988p0pc+5lAK5t0h67KZJV6vDwTAMObX wV6Ht+QhOHtJwO779fk8FhZPNRPYZYMutIyFNlRx+4gCpBuqJPc= =941k -----END PGP SIGNATURE----- Merge tag '9p-for-5.20' of https://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: - a couple of fixes - add a tracepoint for fid refcounting - some cleanup/followup on fid lookup - some cleanup around req refcounting * tag '9p-for-5.20' of https://github.com/martinetd/linux: net/9p: Initialize the iounit field during fid creation net: 9p: fix refcount leak in p9_read_work() error handling 9p: roll p9_tag_remove into p9_req_put 9p: Add client parameter to p9_req_put() 9p: Drop kref usage 9p: Fix some kernel-doc comments 9p fid refcount: cleanup p9_fid_put calls 9p fid refcount: add a 9p_fid_ref tracepoint 9p fid refcount: add p9_fid_get/put wrappers 9p: Fix minor typo in code comment 9p: Remove unnecessary variable for old fids while walking from d_parent 9p: Make the path walk logic more clear about when cloning is required 9p: Track the root fid with its own variable during lookups	2022-08-06 14:48:54 -07:00
Linus Torvalds	c42b729ef6	gfs2 fixes - Instantiate glocks ouside of the glock state engine, in the contect of the process taking the glock. This moves unnecessary complexity out of the core glock code. Clean up the instantiate logic to be more sensible. - In gfs2_glock_async_wait(), cancel pending locking request upon failure. Make sure all glocks are left in a consistent state. - Various other minor cleanups and fixes. -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmLtdg8UHGFncnVlbmJh QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrqvA//WRdBtVgT7/5pkjljRolkBZ8B3sYx T2KlHuiQdvnTGf2dWnOOoUzEZvPXPUovUZMA4dHx0jcRpOi4BsYGz986K/Zpq5hs vieFEoKQdWk9O9NoNdRJN8Rl1tHTwejZi+kLerhYoJzgMC8AvgieLGO0Ol4Y0joc lxop/8L1Tn2GiCN4NcBN7Eg2CC4ke58KZcMgWhWVBR2ZJe9/qdqlVEiehiSbCiiN l89vsYLrG6bMylvNPc+AiyEvIGF5qkEHAErPIs7SfrjNRRWVhkmvTCWAO6JnehTQ XwqYQiAWCXfxBXUYG1VSCgjmTynmO2yg1Slt+86OauI9ka+ow8epSmHh95TT1JcY pmVF6CYhLI49dNl3R68CFlQ+Ov6iGt6gx9KEud5oE/Ew0vd/WIyi2/jSGrX59S07 zktMzEDjn31+jw31Raxc6+TQEU+0jQHCwzKWjbJ0tYy3nBdkCyefHwm199Ff40M/ 6jHWaH/qcyuq8crrc8PLSJOguSd7FdfdFhXEmpaH2CPybvfuEVJfig4vYee3YtSx KtZvgpy3bxBCfBDD7CPKfKMLrKrklYH+h7/lhCxbuSH0HvyS0ayXhmSvhXgfn+4e uWY5yk7gHAaaKGOBkkYwFAWV7X32LS0ndWzI8Ac8m20ifV0eeveRNEX0A/fHIX2U DlbhYq889mc2P70= =qFus -----END PGP SIGNATURE----- Merge tag 'gfs2-v5.19-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Instantiate glocks ouside of the glock state engine, in the contect of the process taking the glock. This moves unnecessary complexity out of the core glock code. Clean up the instantiate logic to be more sensible. - In gfs2_glock_async_wait(), cancel pending locking request upon failure. Make sure all glocks are left in a consistent state. - Various other minor cleanups and fixes. * tag 'gfs2-v5.19-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: List traversal in do_promote is safe gfs2: do_promote glock holder stealing fix gfs2: Use better variable name gfs2: Make go_instantiate take a glock gfs2: Add new go_held glock operation gfs2: Revert 'Fix "truncate in progress" hang' gfs2: Instantiate glocks ouside of glock state engine gfs2: Fix up gfs2_glock_async_wait gfs2: Minor gfs2_glock_nq_m cleanup gfs2: Fix spelling mistake in comment gfs2: Rewrap overlong comment in do_promote gfs2: Remove redundant NULL check before kfree	2022-08-06 14:44:49 -07:00
Chandan Babu R	d62113303d	xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork On a higly fragmented filesystem a Direct IO write can fail with -ENOSPC error even though the filesystem has sufficient number of free blocks. This occurs if the file offset range on which the write operation is being performed has a delalloc extent in the cow fork and this delalloc extent begins much before the Direct IO range. In such a scenario, xfs_reflink_allocate_cow() invokes xfs_bmapi_write() to allocate the blocks mapped by the delalloc extent. The extent thus allocated may not cover the beginning of file offset range on which the Direct IO write was issued. Hence xfs_reflink_allocate_cow() ends up returning -ENOSPC. The following script reliably recreates the bug described above. #!/usr/bin/bash device=/dev/loop0 shortdev=$(basename $device) mntpnt=/mnt/ file1=${mntpnt}/file1 file2=${mntpnt}/file2 fragmentedfile=${mntpnt}/fragmentedfile punchprog=/root/repos/xfstests-dev/src/punch-alternating errortag=/sys/fs/xfs/${shortdev}/errortag/bmap_alloc_minlen_extent umount $device > /dev/null 2>&1 echo "Create FS" mkfs.xfs -f -m reflink=1 $device > /dev/null 2>&1 if [[ $? != 0 ]]; then echo "mkfs failed." exit 1 fi echo "Mount FS" mount $device $mntpnt > /dev/null 2>&1 if [[ $? != 0 ]]; then echo "mount failed." exit 1 fi echo "Create source file" xfs_io -f -c "pwrite 0 32M" $file1 > /dev/null 2>&1 sync echo "Create Reflinked file" xfs_io -f -c "reflink $file1" $file2 &>/dev/null echo "Set cowextsize" xfs_io -c "cowextsize 16M" $file1 > /dev/null 2>&1 echo "Fragment FS" xfs_io -f -c "pwrite 0 64M" $fragmentedfile > /dev/null 2>&1 sync $punchprog $fragmentedfile echo "Allocate block sized extent from now onwards" echo -n 1 > $errortag echo "Create 16MiB delalloc extent in CoW fork" xfs_io -c "pwrite 0 4k" $file1 > /dev/null 2>&1 sync echo "Direct I/O write at offset 12k" xfs_io -d -c "pwrite 12k 8k" $file1 This commit fixes the bug by invoking xfs_bmapi_write() in a loop until disk blocks are allocated for atleast the starting file offset of the Direct IO write range. Fixes: `3c68d44a2b` ("xfs: allocate direct I/O COW blocks in iomap_begin") Reported-and-Root-caused-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: slight editing to make the locking less grody, and fix some style things] Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-08-05 17:00:36 -07:00
Darrick J. Wong	f0c2d7d2ab	xfs: fix intermittent hang during quotacheck Every now and then, I see the following hang during mount time quotacheck when running fstests. Turning on KASAN seems to make it happen somewhat more frequently. I've edited the backtrace for brevity. XFS (sdd): Quotacheck needed: Please wait. XFS: Assertion failed: bp->b_flags & _XBF_DELWRI_Q, file: fs/xfs/xfs_buf.c, line: 2411 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1831409 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs] CPU: 0 PID: 1831409 Comm: mount Tainted: G W 5.19.0-rc6-xfsx #rc6 09911566947b9f737b036b4af85e399e4b9aef64 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:assfail+0x46/0x4a [xfs] Code: a0 8f 41 a0 e8 45 fe ff ff 8a 1d 2c 36 10 00 80 fb 01 76 0f 0f b6 f3 48 c7 c7 c0 f0 4f a0 e8 10 f0 02 e1 80 e3 01 74 02 0f 0b <0f> 0b 5b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 RSP: 0018:ffffc900078c7b30 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8880099ac000 RCX: 000000007fffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0418fa0 RBP: ffff8880197bc1c0 R08: 0000000000000000 R09: 000000000000000a R10: 000000000000000a R11: f000000000000000 R12: ffffc900078c7d20 R13: 00000000fffffff5 R14: ffffc900078c7d20 R15: 0000000000000000 FS: 00007f0449903800(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005610ada631f0 CR3: 0000000014dd8002 CR4: 00000000001706f0 Call Trace: <TASK> xfs_buf_delwri_pushbuf+0x150/0x160 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_flush_one+0xd6/0x130 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_dquot_walk.isra.0+0x109/0x1e0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_quotacheck+0x319/0x490 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_mount_quotas+0x65/0x2c0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_mountfs+0x6b5/0xab0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_fs_fill_super+0x781/0x990 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] get_tree_bdev+0x175/0x280 vfs_get_tree+0x1a/0x80 path_mount+0x6f5/0xaa0 __x64_sys_mount+0x103/0x140 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 I /think/ this can happen if xfs_qm_flush_one is racing with xfs_qm_dquot_isolate (i.e. dquot reclaim) when the second function has taken the dquot flush lock but xfs_qm_dqflush hasn't yet locked the dquot buffer, let alone queued it to the delwri list. In this case, flush_one will fail to get the dquot flush lock, but it can lock the incore buffer, but xfs_buf_delwri_pushbuf will then trip over this ASSERT, which checks that the buffer isn't on a delwri list. The hang results because the _delwri_submit_buffers ignores non DELWRI_Q buffers, which means that xfs_buf_iowait waits forever for an IO that has not yet been scheduled. AFAICT, a reasonable solution here is to detect a dquot buffer that is not on a DELWRI list, drop it, and return -EAGAIN to try the flush again. It's not /that/ big of a deal if quotacheck writes the dquot buffer repeatedly before we even set QUOTA_CHKD. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-08-05 17:00:36 -07:00
Darrick J. Wong	7d839e325a	xfs: check return codes when flushing block devices If a blkdev_issue_flush fails, fsync needs to report that to upper levels. Modify xfs_file_fsync to capture the errors, while trying to flush as much data and log updates to disk as possible. If log writes cannot flush the data device, we need to shut down the log immediately because we've violated a log invariant. Modify this code to check the return value of blkdev_issue_flush as well. This behavior seems to go back to about 2.6.15 or so, which makes this fixes tag a bit misleading. Link: https://elixir.bootlin.com/linux/v2.6.15/source/fs/xfs/xfs_vnodeops.c#L1187 Fixes: `b5071ada51` ("xfs: remove xfs_blkdev_issue_flush") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-08-05 17:00:36 -07:00
Linus Torvalds	6614a3c316	- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/ SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE= =w/UH -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ...	2022-08-05 16:32:45 -07:00
Andreas Gruenbacher	446279168e	Merge part of branch 'for-next.instantiate' into for-next	2022-08-05 18:37:03 +02:00
Steve French	0d168a58fc	cifs: update internal module number To 2.38 Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-05 11:24:17 -05:00
Steve French	ea75a78c07	cifs: alloc_mid function should be marked as static It is only used in transport.c. Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-05 11:24:12 -05:00
Enzo Matsumiya	f5fd3f2889	cifs: remove "cifs_" prefix from init/destroy mids functions Rename generic mid functions to same style, i.e. without "cifs_" prefix. cifs_{init,destroy}_mids() -> {init,destroy}_mids() Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-05 11:24:08 -05:00
Enzo Matsumiya	70f08f914a	cifs: remove useless DeleteMidQEntry() DeleteMidQEntry() was just a proxy for cifs_mid_q_entry_release(). - remove DeleteMidQEntry() - rename cifs_mid_q_entry_release() to release_mid() - rename kref_put() callback _cifs_mid_q_entry_release to __release_mid - rename AllocMidQEntry() to alloc_mid() - rename cifs_delete_mid() to delete_mid() Update callers to use new names. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-05 11:24:06 -05:00
Steve French	fb157ed226	cifs: when insecure legacy is disabled shrink amount of SMB1 code Currently much of the smb1 code is built even when CONFIG_CIFS_ALLOW_INSECURE_LEGACY is disabled. Move cifssmb.c to only be compiled when insecure legacy is disabled, and move various SMB1/CIFS helper functions to that ifdef. Some functions that were not SMB1/CIFS specific needed to be moved out of cifssmb.c This shrinks cifs.ko by more than 10% which is good - but also will help with the eventual movement of the legacy code to a distinct module. Follow on patches can shrink the number of ifdefs by code restructuring where smb1 code is wedged in functions that should be calling dialect specific helper functions instead, and also by moving some functions from file.c/dir.c/inode.c into smb1 specific c files. Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-05 11:24:03 -05:00
Fengnan Chang	01fc4b9a6e	f2fs: use onstack pages instead of pvec Since pvec have 15 pages, it not a multiple of 4, when write compressed pages, write in 64K as a unit, it will call pagevec_lookup_range_tag agagin, sometimes this will take a lot of time. Use onstack pages instead of pvec to mitigate this problem. Signed-off-by: Fengnan Chang <fengnanchang@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-08-05 04:20:55 -07:00
Fengnan Chang	4f8219f8aa	f2fs: intorduce f2fs_all_cluster_page_ready When write total cluster, all pages is uptodate, there is not need to call f2fs_prepare_compress_overwrite, intorduce f2fs_all_cluster_page_ready to avoid this. Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-08-05 04:20:28 -07:00
Chao Yu	e53f864347	f2fs: clean up f2fs_abort_atomic_write() f2fs_abort_atomic_write() has checked whether current inode is atomic_write one or not, it's redundant to check in its caller, remove it for cleanup. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-08-05 04:20:02 -07:00
Daeho Jeong	bff139b49d	f2fs: handle decompress only post processing in softirq Now decompression is being handled in workqueue and it makes read I/O latency non-deterministic, because of the non-deterministic scheduling nature of workqueues. So, I made it handled in softirq context only if possible, not in low memory devices, since this modification will maintain decompresion related memory a little longer. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-08-05 04:18:31 -07:00
Jaewook Kim	90be48bd9d	f2fs: do not allow to decompress files have FI_COMPRESS_RELEASED If a file has FI_COMPRESS_RELEASED, all writes for it should not be allowed. However, as of now, in case of compress_mode=user, writes triggered by IOCTLs like F2FS_IOC_DE/COMPRESS_FILE are allowed unexpectly, which could crash that file. To fix it, let's do not allow F2FS_IOC_DE/COMPRESS_IOCTL if a file already has FI_COMPRESS_RELEASED flag. This is the reproduction process: 1. $ touch ./file 2. $ chattr +c ./file 3. $ dd if=/dev/random of=./file bs=4096 count=30 conv=notrunc 4. $ dd if=/dev/zero of=./file bs=4096 count=34 seek=30 conv=notrunc 5. $ sync 6. $ do_compress ./file ; call F2FS_IOC_COMPRESS_FILE 7. $ get_compr_blocks ./file ; call F2FS_IOC_GET_COMPRESS_BLOCKS 8. $ release ./file ; call F2FS_IOC_RELEASE_COMPRESS_BLOCKS 9. $ do_compress ./file ; call F2FS_IOC_COMPRESS_FILE again 10. $ get_compr_blocks ./file ; call F2FS_IOC_GET_COMPRESS_BLOCKS again This reproduction process is tested in 128kb cluster size. You can find compr_blocks has a negative value. Fixes: `5fdb322ff2` ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE") Signed-off-by: Junbeom Yeom <junbeom.yeom@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Signed-off-by: Jaewook Kim <jw5454.kim@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-08-05 04:18:08 -07:00
Jaegeuk Kim	912f0d6580	f2fs: do not set compression bit if kernel doesn't support If kernel doesn't have CONFIG_F2FS_FS_COMPRESSION, a file having FS_COMPR_FL via ioctl(FS_IOC_SETFLAGS) is unaccessible due to f2fs_is_compress_backend_ready(). Let's avoid it. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-08-05 04:18:08 -07:00
Eunhee Rho	dbf8e63f48	f2fs: remove device type check for direct IO To ensure serialized IOs, f2fs allows only LFS mode for zoned device. Remove redundant check for direct IO. Signed-off-by: Eunhee Rho <eunhee83.rho@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-08-05 04:18:08 -07:00
Ye Bin	4a2c5b7994	f2fs: fix null-ptr-deref in f2fs_get_dnode_of_data There is issue as follows when test f2fs atomic write: F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock F2FS-fs (loop0): invalid crc_offset: 0 F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=1, run fsck to fix. F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=2, run fsck to fix. ================================================================== BUG: KASAN: null-ptr-deref in f2fs_get_dnode_of_data+0xac/0x16d0 Read of size 8 at addr 0000000000000028 by task rep/1990 CPU: 4 PID: 1990 Comm: rep Not tainted 5.19.0-rc6-next-20220715 #266 Call Trace: <TASK> dump_stack_lvl+0x6e/0x91 print_report.cold+0x49a/0x6bb kasan_report+0xa8/0x130 f2fs_get_dnode_of_data+0xac/0x16d0 f2fs_do_write_data_page+0x2a5/0x1030 move_data_page+0x3c5/0xdf0 do_garbage_collect+0x2015/0x36c0 f2fs_gc+0x554/0x1d30 f2fs_balance_fs+0x7f5/0xda0 f2fs_write_single_data_page+0xb66/0xdc0 f2fs_write_cache_pages+0x716/0x1420 f2fs_write_data_pages+0x84f/0x9a0 do_writepages+0x130/0x3a0 filemap_fdatawrite_wbc+0x87/0xa0 file_write_and_wait_range+0x157/0x1c0 f2fs_do_sync_file+0x206/0x12d0 f2fs_sync_file+0x99/0xc0 vfs_fsync_range+0x75/0x140 f2fs_file_write_iter+0xd7b/0x1850 vfs_write+0x645/0x780 ksys_write+0xf1/0x1e0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd As `3db1de0e58` commit changed atomic write way which new a cow_inode for atomic write file, and also mark cow_inode as FI_ATOMIC_FILE. When f2fs_do_write_data_page write cow_inode will use cow_inode's cow_inode which is NULL. Then will trigger null-ptr-deref. To solve above issue, introduce FI_COW_FILE flag for COW inode. Fiexes: 3db1de0e582c("f2fs: change the current atomic write way") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-08-05 04:18:08 -07:00
Daeho Jeong	23339e5752	f2fs: revive F2FS_IOC_ABORT_VOLATILE_WRITE F2FS_IOC_ABORT_VOLATILE_WRITE was used to abort a atomic write before. However it was removed accidentally. So revive it by changing the name, since volatile write had gone. Signed-off-by: Daeho Jeong <daehojeong@google.com> Fiexes: 7bc155fec5b3("f2fs: kill volatile write support") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-08-05 04:17:57 -07:00
Linus Torvalds	b2a88c212e	New code for 5.20: - Improve scalability of the XFS log by removing spinlocks and global synchronization points. - Add security labels to whiteout inodes to match the other filesystems. - Clean up per-ag pointer passing to simplify call sites. - Reduce verifier overhead by precalculating more AG geometry. - Implement fast-path lockless lookups in the buffer cache to reduce spinlock hammering. - Make attr forks a permanent part of the inode structure to fix a UAF bug and because most files these days tend to have security labels and soon will have parent pointers too. - Clean up XFS_IFORK_Q usage and give it a better name. - Fix more UAF bugs in the xattr code. - SOB my tags. - Fix some typos in the timestamp range documentation. - Fix a few more memory leaks. - Code cleanups and typo fixes. - Fix an unlocked inode fork pointer access in getbmap. Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmLmrLkACgkQ+H93GTRK tOviexAAo7mJ03hCCWnnkcEYbVQNMH4WRuCpR45D8lz4PU/s6yL7/uxuyodc0dMm /ZUWjCas1GMZmbOkCkL9eeatrZmgT5SeDbYc4EtHicHYi4sTgCB7ymx0soCUHXYi 7c0kdz+eQ/oY4QvY6JZwbFkRENDL2pkxM9itGHZT0OXHmAnGcIYvzP5Vuc2GtelL 0VWCcpusG0uck3+P1qa8e+TtkR2HU5PVGgAU7OhmAIs07aE3AheVEsPydgGKSIS9 PICnMg1oIgly4VQi28cp/5hU+Au6yBMGogxW8ultPFlM5RWKFt8MKUUhclzS+hZL 9dGSZ3JjpZrdmuUa9mdPnr1MsgrTF6CWHAeUsblSXUzjRT8S3Yz8I3gUMJAA/H17 ZGBu55+TlZtE4ZsK3q/4pqZXfylaaumbEqEi5lJX+7/IYh/WLAgxJihWSpSK2B4a VBqi12EvMlrjZ4vrD2hqVEJAlguoWiqxgv2gXEZ5wy9dfvzGgysXwAigj0YQeJNQ J++AYwdYs0pCK0O4eTGZsvp+6o9wj92irtrxwiucuKreDZTOlpCBOAXVTxqom1nX 1NS1YmKvC/RM1na6tiOIundwypgSXUe32qdan34xEWBVPY0mnSpX0N9Lcyoc0xbg kajAKK9TIy968su/eoBuTQf2AIu1jbWMBNZSg9oELZjfrm0CkWM= =fNjj -----END PGP SIGNATURE----- Merge tag 'xfs-5.20-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Darrick Wong: "The biggest changes for this release are the log scalability improvements, lockless lookups for the buffer cache, and making the attr fork a permanent part of the incore inode in preparation for directory parent pointers. There's also a bunch of bug fixes that have accumulated since -rc5. I might send you a second pull request with some more bug fixes that I'm still working on. Once the merge window ends, I will hand maintainership back to Dave Chinner until the 6.1-rc1 release so that I can conduct the design review for the online fsck feature, and try to get it merged. Summary: - Improve scalability of the XFS log by removing spinlocks and global synchronization points. - Add security labels to whiteout inodes to match the other filesystems. - Clean up per-ag pointer passing to simplify call sites. - Reduce verifier overhead by precalculating more AG geometry. - Implement fast-path lockless lookups in the buffer cache to reduce spinlock hammering. - Make attr forks a permanent part of the inode structure to fix a UAF bug and because most files these days tend to have security labels and soon will have parent pointers too. - Clean up XFS_IFORK_Q usage and give it a better name. - Fix more UAF bugs in the xattr code. - SOB my tags. - Fix some typos in the timestamp range documentation. - Fix a few more memory leaks. - Code cleanups and typo fixes. - Fix an unlocked inode fork pointer access in getbmap" * tag 'xfs-5.20-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (61 commits) xfs: delete extra space and tab in blank line xfs: fix NULL pointer dereference in xfs_getbmap() xfs: Fix typo 'the the' in comment xfs: Fix comment typo xfs: don't leak memory when attr fork loading fails xfs: fix for variable set but not used warning xfs: xfs_buf cache destroy isn't RCU safe xfs: delete unnecessary NULL checks xfs: fix comment for start time value of inode with bigtime enabled xfs: fix use-after-free in xattr node block inactivation xfs: lockless buffer lookup xfs: remove a superflous hash lookup when inserting new buffers xfs: reduce the number of atomic when locking a buffer after lookup xfs: merge xfs_buf_find() and xfs_buf_get_map() xfs: break up xfs_buf_find() into individual pieces xfs: add in-memory iunlink log item xfs: add log item precommit operation xfs: combine iunlink inode update functions xfs: clean up xfs_iunlink_update_inode() xfs: double link the unlinked inode list ...	2022-08-04 20:19:16 -07:00
Linus Torvalds	9daee913dc	Add new ioctls to set and get the file system UUID in the ext4 superblock and improved the performance of the online resizing of file systems with bigalloc enabled. Fixed a lot of bugs, in particular for the inline data feature, potential races when creating and deleting inodes with shared extended attribute blocks, and the handling directory blocks which are corrupted. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmLrRpEACgkQ8vlZVpUN gaN2OAf/a2lxhZ6DvkTfWjni0BG6i2ajd11lT93N+we0wWk9f0VdNR2JUdTum0HL UtwP48km+FuqkbDrODlbSky5V+IZhd90ihLyPbPKSU/52c7d6IxNOCz2Fxq981j2 Ik6QgdegvCaUDHmluJfYYcS5Pa97HXtSb6VVi1RAvHHFbYDSChObs76ZQWBmhsSh Mo84mFGS7BDIVNVkg4PBMx4b3iFvKfE1AUdfA5dhB4GXHgDA+77GByw+RjdQ6Dh/ W0l5AVAXbK7BYSVX6Cg41WUMYOBu58Hrh/CHL1DWv3khvjgxLqM7ERAFOISVI3Ax vCXPXfjpbTFElUQuOw4m33vixaFU+A== =xTsM -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Add new ioctls to set and get the file system UUID in the ext4 superblock and improved the performance of the online resizing of file systems with bigalloc enabled. Fixed a lot of bugs, in particular for the inline data feature, potential races when creating and deleting inodes with shared extended attribute blocks, and the handling of directory blocks which are corrupted" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (37 commits) ext4: add ioctls to get/set the ext4 superblock uuid ext4: avoid resizing to a partial cluster size ext4: reduce computation of overhead during resize jbd2: fix assertion 'jh->b_frozen_data == NULL' failure when journal aborted ext4: block range must be validated before use in ext4_mb_clear_bb() mbcache: automatically delete entries from cache on freeing mbcache: Remove mb_cache_entry_delete() ext2: avoid deleting xattr block that is being reused ext2: unindent codeblock in ext2_xattr_set() ext2: factor our freeing of xattr block reference ext4: fix race when reusing xattr blocks ext4: unindent codeblock in ext4_xattr_block_set() ext4: remove EA inode entry from mbcache on inode eviction mbcache: add functions to delete entry if unused mbcache: don't reclaim used entries ext4: make sure ext4_append() always allocates new block ext4: check if directory block is within i_size ext4: reflect mb_optimize_scan value in options file ext4: avoid remove directory when directory is corrupted ext4: aligned '*' in comments ...	2022-08-04 20:13:46 -07:00
Linus Torvalds	cfeafd9466	Driver core / kernfs changes for 6.0-rc1 Here is the set of driver core and kernfs changes for 6.0-rc1. "biggest" thing in here is some scalability improvements for kernfs for large systems. Other than that, included in here are: - arch topology and cache info changes that have been reviewed and discussed a lot. - potential error path cleanup fixes - deferred driver probe cleanups - firmware loader cleanups and tweaks - documentation updates - other small things All of these have been in the linux-next tree for a while with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYuqCnw8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ym/JgCcCnaycJY00ZPRQm3LQCyzfJ0HgqoAn2qxGV+K NKycLeXZSnuvIA87dycE =/4Jk -----END PGP SIGNATURE----- Merge tag 'driver-core-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core / kernfs updates from Greg KH: "Here is the set of driver core and kernfs changes for 6.0-rc1. The "biggest" thing in here is some scalability improvements for kernfs for large systems. Other than that, included in here are: - arch topology and cache info changes that have been reviewed and discussed a lot. - potential error path cleanup fixes - deferred driver probe cleanups - firmware loader cleanups and tweaks - documentation updates - other small things All of these have been in the linux-next tree for a while with no reported problems" * tag 'driver-core-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (63 commits) docs: embargoed-hardware-issues: fix invalid AMD contact email firmware_loader: Replace kmap() with kmap_local_page() sysfs docs: ABI: Fix typo in comment kobject: fix Kconfig.debug "its" grammar kernfs: Fix typo 'the the' in comment docs: driver-api: firmware: add driver firmware guidelines. (v3) arch_topology: Fix cache attributes detection in the CPU hotplug path ACPI: PPTT: Leave the table mapped for the runtime usage cacheinfo: Use atomic allocation for percpu cache attributes drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist MAINTAINERS: Change mentions of mpm to olivia docs: ABI: sysfs-devices-soc: Update Lee Jones' email address docs: ABI: sysfs-class-pwm: Update Lee Jones' email address Documentation/process: Add embargoed HW contact for LLVM Revert "kernfs: Change kernfs_notify_list to llist." ACPI: Remove the unused find_acpi_cpu_cache_topology() arch_topology: Warn that topology for nested clusters is not supported arch_topology: Add support for parsing sockets in /cpu-map arch_topology: Set cluster identifier in each core/thread from /cpu-map arch_topology: Limit span of cpu_clustergroup_mask() ...	2022-08-04 11:31:20 -07:00
Namjae Jeon	8f0541186e	ksmbd: fix heap-based overflow in set_ntacl_dacl() The testcase use SMB2_SET_INFO_HE command to set a malformed file attribute under the label `security.NTACL`. SMB2_QUERY_INFO_HE command in testcase trigger the following overflow. [ 4712.003781] ================================================================== [ 4712.003790] BUG: KASAN: slab-out-of-bounds in build_sec_desc+0x842/0x1dd0 [ksmbd] [ 4712.003807] Write of size 1060 at addr ffff88801e34c068 by task kworker/0:0/4190 [ 4712.003813] CPU: 0 PID: 4190 Comm: kworker/0:0 Not tainted 5.19.0-rc5 #1 [ 4712.003850] Workqueue: ksmbd-io handle_ksmbd_work [ksmbd] [ 4712.003867] Call Trace: [ 4712.003870] <TASK> [ 4712.003873] dump_stack_lvl+0x49/0x5f [ 4712.003935] print_report.cold+0x5e/0x5cf [ 4712.003972] ? ksmbd_vfs_get_sd_xattr+0x16d/0x500 [ksmbd] [ 4712.003984] ? cmp_map_id+0x200/0x200 [ 4712.003988] ? build_sec_desc+0x842/0x1dd0 [ksmbd] [ 4712.004000] kasan_report+0xaa/0x120 [ 4712.004045] ? build_sec_desc+0x842/0x1dd0 [ksmbd] [ 4712.004056] kasan_check_range+0x100/0x1e0 [ 4712.004060] memcpy+0x3c/0x60 [ 4712.004064] build_sec_desc+0x842/0x1dd0 [ksmbd] [ 4712.004076] ? parse_sec_desc+0x580/0x580 [ksmbd] [ 4712.004088] ? ksmbd_acls_fattr+0x281/0x410 [ksmbd] [ 4712.004099] smb2_query_info+0xa8f/0x6110 [ksmbd] [ 4712.004111] ? psi_group_change+0x856/0xd70 [ 4712.004148] ? update_load_avg+0x1c3/0x1af0 [ 4712.004152] ? asym_cpu_capacity_scan+0x5d0/0x5d0 [ 4712.004157] ? xas_load+0x23/0x300 [ 4712.004162] ? smb2_query_dir+0x1530/0x1530 [ksmbd] [ 4712.004173] ? _raw_spin_lock_bh+0xe0/0xe0 [ 4712.004179] handle_ksmbd_work+0x30e/0x1020 [ksmbd] [ 4712.004192] process_one_work+0x778/0x11c0 [ 4712.004227] ? _raw_spin_lock_irq+0x8e/0xe0 [ 4712.004231] worker_thread+0x544/0x1180 [ 4712.004234] ? __cpuidle_text_end+0x4/0x4 [ 4712.004239] kthread+0x282/0x320 [ 4712.004243] ? process_one_work+0x11c0/0x11c0 [ 4712.004246] ? kthread_complete_and_exit+0x30/0x30 [ 4712.004282] ret_from_fork+0x1f/0x30 This patch add the buffer validation for security descriptor that is stored by malformed SMB2_SET_INFO_HE command. and allocate large response buffer about SMB2_O_INFO_SECURITY file info class. Fixes: `e2f34481b2` ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17771 Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-04 09:51:38 -05:00
Jeff Layton	6930bcbfb6	lockd: detect and reject lock arguments that overflow lockd doesn't currently vet the start and length in nlm4 requests like it should, and can end up generating lock requests with arguments that overflow when passed to the filesystem. The NLM4 protocol uses unsigned 64-bit arguments for both start and length, whereas struct file_lock tracks the start and end as loff_t values. By the time we get around to calling nlm4svc_retrieve_args, we've lost the information that would allow us to determine if there was an overflow. Start tracking the actual start and len for NLM4 requests in the nlm_lock. In nlm4svc_retrieve_args, vet these values to ensure they won't cause an overflow, and return NLM4_FBIG if they do. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=392 Reported-by: Jan Kasiak <j.kasiak@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> # 5.14+	2022-08-04 10:28:48 -04:00
NeilBrown	dd8dd403d7	NFSD: discard fh_locked flag and fh_lock/fh_unlock As all inode locking is now fully balanced, fh_put() does not need to call fh_unlock(). fh_lock() and fh_unlock() are no longer used, so discard them. These are the only real users of ->fh_locked, so discard that too. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-08-04 10:28:48 -04:00
NeilBrown	bb4d53d66e	NFSD: use (un)lock_inode instead of fh_(un)lock for file operations When locking a file to access ACLs and xattrs etc, use explicit locking with inode_lock() instead of fh_lock(). This means that the calls to fh_fill_pre/post_attr() are also explicit which improves readability and allows us to place them only where they are needed. Only the xattr calls need pre/post information. When locking a file we don't need I_MUTEX_PARENT as the file is not a parent of anything, so we can use inode_lock() directly rather than the inode_lock_nested() call that fh_lock() uses. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-08-04 10:28:41 -04:00
NeilBrown	debf16f0c6	NFSD: use explicit lock/unlock for directory ops When creating or unlinking a name in a directory use explicit inode_lock_nested() instead of fh_lock(), and explicit calls to fh_fill_pre_attrs() and fh_fill_post_attrs(). This is already done for renames, with lock_rename() as the explicit locking. Also move the 'fill' calls closer to the operation that might change the attributes. This way they are avoided on some error paths. For the v2-only code in nfsproc.c, the fill calls are not replaced as they aren't needed. Making the locking explicit will simplify proposed future changes to locking for directories. It also makes it easily visible exactly where pre/post attributes are used - not all callers of fh_lock() actually need the pre/post attributes. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-08-04 10:28:20 -04:00
NeilBrown	19d008b469	NFSD: reduce locking in nfsd_lookup() nfsd_lookup() takes an exclusive lock on the parent inode, but no callers want the lock and it may not be needed at all if the result is in the dcache. Change nfsd_lookup_dentry() to not take the lock, and call lookup_one_len_locked() which takes lock only if needed. nfsd4_open() currently expects the lock to still be held, but that isn't necessary as nfsd_validate_delegated_dentry() provides required guarantees without the lock. NOTE: NFSv4 requires directory changeinfo for OPEN even when a create wasn't requested and no change happened. Now that nfsd_lookup() doesn't use fh_lock(), we need to explicitly fill the attributes when no create happens. A new fh_fill_both_attrs() is provided for that task. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-08-04 10:28:20 -04:00
NeilBrown	e18bcb33bc	NFSD: only call fh_unlock() once in nfsd_link() On non-error paths, nfsd_link() calls fh_unlock() twice. This is safe because fh_unlock() records that the unlock has been done and doesn't repeat it. However it makes the code a little confusing and interferes with changes that are planned for directory locking. So rearrange the code to ensure fh_unlock() is called exactly once if fh_lock() was called. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-08-04 10:28:19 -04:00
NeilBrown	b677c0c63a	NFSD: always drop directory lock in nfsd_unlink() Some error paths in nfsd_unlink() allow it to exit without unlocking the directory. This is not a problem in practice as the directory will be locked with an fh_put(), but it is untidy and potentially confusing. This allows us to remove all the fh_unlock() calls that are immediately after nfsd_unlink() calls. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-08-04 10:28:19 -04:00
NeilBrown	927bfc5600	NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning. nfsd_create() usually returns with the directory still locked. nfsd_symlink() usually returns with it unlocked. This is clumsy. Until recently nfsd_create() needed to keep the directory locked until ACLs and security label had been set. These are now set inside nfsd_create() (in nfsd_setattr()) so this need is gone. So change nfsd_create() and nfsd_symlink() to always unlock, and remove any fh_unlock() calls that follow calls to these functions. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-08-04 10:28:19 -04:00
NeilBrown	c0cbe70742	NFSD: add posix ACLs to struct nfsd_attrs pacl and dpacl pointers are added to struct nfsd_attrs, which requires that we have an nfsd_attrs_free() function to free them. Those nfsv4 functions that can set ACLs now set up these pointers based on the passed in NFSv4 ACL. nfsd_setattr() sets the acls as appropriate. Errors are handled as with security labels. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-08-04 10:28:03 -04:00
Linus Torvalds	a39b5dbdd2	zonefs changes for 5.20-rc1 A single change for this cycle to simplify handling of the memory page used as super block buffer during mount (from Fabio). -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYuoYcQAKCRDdoc3SxdoY dhWYAQDcDZIzcGFrKTpmmpPI9Ep6rCk64V1cIvTKOik+Hp3/8wEA+btduq8w2Wlo rrCypMOPBIygJxi9h1kugcP41HJLTwo= =InBn -----END PGP SIGNATURE----- Merge tag 'zonefs-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs update from Damien Le Moal: "A single change for this cycle to simplify handling of the memory page used as super block buffer during mount (from Fabio)" * tag 'zonefs-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: Call page_address() on page acquired with GFP_KERNEL flag	2022-08-03 15:21:53 -07:00
Linus Torvalds	f18d73096c	New code for 5.20: - Skip writeback for pages that are completely beyond EOF - Minor code cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmK92MsACgkQ+H93GTRK tOuw5w//UpB7MqPpZyfqu5nIehVy9zJX8KacZHI3k82ID48fCpa/Q+KpfxHZFbMa eqSzl2wHEv/bzH6I1UCtq2aA7y6ZhpiR3LiIV2PKEy/vgYtPeVSNhkIzwuK9YYPN Wj1fLRUi4M4Y86a97vNSzdsdnHLCRnUHY2C35ywRvRCY0opUKSWGF45mD5Mj+0BK Kl6TfPlyyR0ugBxOf2cJWA/uL/YA5sHT32uViSPxyFpXufyILz8rjxzhRwdSUwFV TEoyTey4LgoADOA2X5Czxg0CIKfjM2xVnU/ybsdL/q0Mj5U1uzuJPsNY/M4pG0X8 v1/01NSSHVH37Kmr/3L15+doZDQGOLrRoWwStag2tRd6jNRNN7YctnanGoiR6elE ElU64mDVjc1FJf1J7JF5IObUJMfw0/ulHX5rHYWZlbbO1rnjD6+fVYVQbMW7xsmu o8fYiZHrkWBo5sYkWgIYhoD8fq3oMkYaryDTEtSS/Jy1tpu2gZpXzaMusKpu9qSe yc93TTnqqgS9LrSy0gJJqMzR5SavDFZvFhcgIjZXcbonz5uPwYTY7D5lD7vOLqbR b3oV3+lOQs15nhXs0RO4g1bFIIe9dqAZRlPJX1vYvgjlA3VSQWLcqtlxgpGYptf6 XT/yftI/+sVfTQSC6QAKmp7DTEtYLRKfpNSD9L52TKBvYigQPxo= =aDOK -----END PGP SIGNATURE----- Merge tag 'iomap-5.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull iomap updates from Darrick Wong: "The most notable change in this first batch is that we no longer schedule pages beyond i_size for writeback, preferring instead to let truncate deal with those pages. Next week, there may be a second pull request to remove iomap_writepage from the other two filesystems (gfs2/zonefs) that use iomap for buffered IO. This follows in the same vein as the recent removal of writepage from XFS, since it hasn't been triggered in a few years; it does nothing during direct reclaim; and as far as the people who examined the patchset can tell, it's moving the codebase in the right direction. However, as it was a late addition to for-next, I'm holding off on that section for another week of testing to see if anyone can come up with a solid reason for holding off in the meantime. Summary: - Skip writeback for pages that are completely beyond EOF - Minor code cleanups" * tag 'iomap-5.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: dax: set did_zero to true when zeroing successfully iomap: set did_zero to true when zeroing successfully iomap: skip pages past eof in iomap_do_writepage()	2022-08-03 15:16:49 -07:00
Linus Torvalds	2e4f8c729d	affs-5.20-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmLoE6kACgkQxWXV+ddt WDvdNw/7BunZ/ZnUUADehHkXvS9PcJjgt0pV5hC73H5nujnhi3aO8frpgBbVbl5a Pq6UHLRPUMB2YUfm+HCmUyKtZ1zLeiEPSHe9VuRNzylj6jN6K4ebylsGGOpGpdfc mwN1oJ2KkUKsXLyYXRd/cWr9oiYcTdeznSG4vbPXoCI52aZOfkqkGhgsG/sY4Lyz fUk9MoaClTD5+eisuF4XWmRgizvaaVrXB5MIaDgODIOQ2YHBJ1Ahdsz2+FsyDQ9S 1Haz7DqMPkImZSlNfwogn30RuXQOAk27rXex34ZkNHwpCvi4TEyxnU67NXv5vlwG xBl+QHbMHZaJJYZd7gxJHuRo///B42A1RdaKB9uoFri6C/IlNQXt8xnKapL7yo5Q mCIwpelhUKzDNsrivw5p0Nttz9b7ugwvpHush3E2DHaaMCIi1dJ6Kkq0YDn+iI9f xLUNU/Bc25A3vdL/IS606EZ4apJObAr3ZflB4XV+8yKmca4dxB2Xoh2NPJsJdpHJ 1riua8WBxZms8PZyS6CSpIE4Y8XSi87YdOq8iavRU8uhiiIvkAPLYdqOPWB4DUBm mq58GoJN46XFd7eo/V7aanCb459+Xs9CnGnosFA28WWAaM8TqGalmZZthzEUBDXI P3kHwe8ujIF6w6EPB6/KZ7PqTa5bAi2fBjnDy/QbXTe6nZpJbM4= =o20T -----END PGP SIGNATURE----- Merge tag 'affs-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull affs fix from David Sterba: "One update to AFFS, switching away from the kmap/kmap_atomic API" * tag 'affs-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: affs: use memcpy_to_page and remove replace kmap_atomic()	2022-08-03 15:13:43 -07:00
Linus Torvalds	353767e4aa	for-5.20-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmLnyNUACgkQxWXV+ddt WDt9vA/9HcF+v5EkknyW07tatTap/Hm/ZB86Z5OZi6ikwIEcHsWhp3rUICejm88e GecDPIluDtCtyD6x4stuqkwOm22aDP5q2T9H6+gyw92ozyb436OV1Z8IrmftzXKY EpZO70PHZT+E6E/WYvyoTmmoCrjib7YlqCWZZhSLUFpsqqlOInmHEH49PW6KvM4r acUZ/RxHurKdmI3kNY6ECbAQl6CASvtTdYcVCx8fT2zN0azoLIQxpYa7n/9ca1R6 8WnYilCbLbNGtcUXvO2M3tMZ4/5kvxrwQsUn93ccCJYuiN0ASiDXbLZ2g4LZ+n56 JGu+y5v5oBwjpVf+46cuvnENP5BQ61594WPseiVjrqODWnPjN28XkcVC0XmPsiiZ lszeHO2cuIrIFoCah8ELMl8usu8+qxfXmPxIXtPu9rEyKsDtOjxVYc8SMXqLp0qQ qYtBoFm0JcZHqtZRpB+dhQ37/xXtH4ljUi/mI6x8iALVujeR273URs7yO9zgIdeW uZoFtbwpHFLUk+TL7Ku82/zOXp3fCwtDpNmlYbxeMbea/be3ShjncM4+mYzvHYri dYON2LFrq+mnRDqtIXTCaAYwX7zU8Y18Ev9QwlNll8dKlKwS89+jpqLoa+eVYy3c /HitHFza70KxmOj4dvDVZlzDpPvl7kW1UBkmskg4u3jnNWzedkM= =sS1q -----END PGP SIGNATURE----- Merge tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "This brings some long awaited changes, the send protocol bump, otherwise lots of small improvements and fixes. The main core part is reworking bio handling, cleaning up the submission and endio and improving error handling. There are some changes outside of btrfs adding helpers or updating API, listed at the end of the changelog. Features: - sysfs: - export chunk size, in debug mode add tunable for setting its size - show zoned among features (was only in debug mode) - show commit stats (number, last/max/total duration) - send protocol updated to 2 - new commands: - ability write larger data chunks than 64K - send raw compressed extents (uses the encoded data ioctls), ie. no decompression on send side, no compression needed on receive side if supported - send 'otime' (inode creation time) among other timestamps - send file attributes (a.k.a file flags and xflags) - this is first version bump, backward compatibility on send and receive side is provided - there are still some known and wanted commands that will be implemented in the near future, another version bump will be needed, however we want to minimize that to avoid causing usability issues - print checksum type and implementation at mount time - don't print some messages at mount (mentioned as people asked about it), we want to print messages namely for new features so let's make some space for that - big metadata - this has been supported for a long time and is not a feature that's worth mentioning - skinny metadata - same reason, set by default by mkfs Performance improvements: - reduced amount of reserved metadata for delayed items - when inserted items can be batched into one leaf - when deleting batched directory index items - when deleting delayed items used for deletion - overall improved count of files/sec, decreased subvolume lock contention - metadata item access bounds checker micro-optimized, with a few percent of improved runtime for metadata-heavy operations - increase direct io limit for read to 256 sectors, improved throughput by 3x on sample workload Notable fixes: - raid56 - reduce parity writes, skip sectors of stripe when there are no data updates - restore reading from on-disk data instead of using stripe cache, this reduces chances to damage correct data due to RMW cycle - refuse to replay log with unknown incompat read-only feature bit set - zoned - fix page locking when COW fails in the middle of allocation - improved tracking of active zones, ZNS drives may limit the number and there are ENOSPC errors due to that limit and not actual lack of space - adjust maximum extent size for zone append so it does not cause late ENOSPC due to underreservation - mirror reading error messages show the mirror number - don't fallback to buffered IO for NOWAIT direct IO writes, we don't have the NOWAIT semantics for buffered io yet - send, fix sending link commands for existing file paths when there are deleted and created hardlinks for same files - repair all mirrors for profiles with more than 1 copy (raid1c34) - fix repair of compressed extents, unify where error detection and repair happen Core changes: - bio completion cleanups - don't double defer compression bios - simplify endio workqueues - add more data to btrfs_bio to avoid allocation for read requests - rework bio error handling so it's same what block layer does, the submission works and errors are consumed in endio - when asynchronous bio offload fails fall back to synchronous checksum calculation to avoid errors under writeback or memory pressure - new trace points - raid56 events - ordered extent operations - super block log_root_transid deprecated (never used) - mixed_backref and big_metadata sysfs feature files removed, they've been default for sufficiently long time, there are no known users and mixed_backref could be confused with mixed_groups Non-btrfs changes, API updates: - minor highmem API update to cover const arguments - switch all kmap/kmap_atomic to kmap_local - remove redundant flush_dcache_page() - address_space_operations::writepage callback removed - add bdev_max_segments() helper" * tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (163 commits) btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read btrfs: fix repair of compressed extents btrfs: remove the start argument to check_data_csum and export btrfs: pass a btrfs_bio to btrfs_repair_one_sector btrfs: simplify the pending I/O counting in struct compressed_bio btrfs: repair all known bad mirrors btrfs: merge btrfs_dev_stat_print_on_error with its only caller btrfs: join running log transaction when logging new name btrfs: simplify error handling in btrfs_lookup_dentry btrfs: send: always use the rbtree based inode ref management infrastructure btrfs: send: fix sending link commands for existing file paths btrfs: send: introduce recorded_ref_alloc and recorded_ref_free btrfs: zoned: wait until zone is finished when allocation didn't progress btrfs: zoned: write out partially allocated region btrfs: zoned: activate necessary block group btrfs: zoned: activate metadata block group on flush_space btrfs: zoned: disable metadata overcommit for zoned btrfs: zoned: introduce space_info->active_total_bytes btrfs: zoned: finish least available block group on data bg allocation btrfs: let can_allocate_chunk return error ...	2022-08-03 14:54:52 -07:00
Linus Torvalds	ab17c0cd37	EFI efivars sysfs interface removal Remove the obsolete 'efivars' sysfs based interface to the EFI variable store, now that all users have moved to the efivarfs pseudo file system, which was created ~10 years ago to address some fundamental shortcomings in the sysfs based driver. Move the 'business logic' related to which EFI variables are important and may affect the boot flow from the efivars support layer into the efivarfs pseudo file system, so it is no longer exposed to other parts of the kernel. -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmLhuYIACgkQw08iOZLZ jyRYYgwAwUvbFtBL+uOTB+jynOq1MkM1xrNSeEgzPyLhh7CXSa/e+48XqpQjCUxY 8HNqDvd4toiCeVKp25AO+FPrQBjT7NdyjnwGMP3DLkGCOSIrVqVJ/2OmbOY52Kzy z1qbF2+7ak1TUsGzMnJHVDB+baGnUlZ39DuR6IAWstM9tkH/QEnRJV+ejS4h7jdN XW42/M96mAYFfT4Q4gs1HJMPWLP22xoEbnb9SkOyFCB2tHDjrY94CqH8L7Ee1DgA 7sc2cAZ2mPlS8Rbs8NY8IA/LpN4tRu2EhbIxtmKMRgXA88aTKcsub9Tq9RXxzE/x BpkH77mGbSBSrntg0gskKCrYGyJaGS9EYnHVy5HPU8hJagK295NmO3c/trHTb90Z 1RZClYsIUbNXe06e9rdk8w8Ozn+7ABstxzLCvj2MThtTBnAbjU0kQ9VeD0xwMvOp EaF6D6tbEmHrZkGtKP2WEyOZm0tF2I3OP1U9LzjyTpvsIOINhqfVozlhTB6pU5YE kW9E59i+ =VVQ6 -----END PGP SIGNATURE----- Merge tag 'efi-efivars-removal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull efivars sysfs interface removal from Ard Biesheuvel: "Remove the obsolete 'efivars' sysfs based interface to the EFI variable store, now that all users have moved to the efivarfs pseudo file system, which was created ~10 years ago to address some fundamental shortcomings in the sysfs based driver. Move the 'business logic' related to which EFI variables are important and may affect the boot flow from the efivars support layer into the efivarfs pseudo file system, so it is no longer exposed to other parts of the kernel" * tag 'efi-efivars-removal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: vars: Move efivar caching layer into efivarfs efi: vars: Switch to new wrapper layer efi: vars: Remove deprecated 'efivars' sysfs interface	2022-08-03 14:41:36 -07:00
Linus Torvalds	97a77ab14f	EFI updates for v5.20 - Enable mirrored memory for arm64 - Fix up several abuses of the efivar API - Refactor the efivar API in preparation for moving the 'business logic' part of it into efivarfs - Enable ACPI PRM on arm64 -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmLhuDIACgkQw08iOZLZ jyS9IQv/Wc2nhjN50S3gfrL+68/el/hGdP/J0FK5BOOjNosG2t1ZNYZtSthXqpPH hRrTU2m6PpQUalRpFDyLiHkJvdBFQe4VmvrzBa3TIBIzyflLQPJzkWrqThPchV+B qi4lmCtTDNIEJmayewqx1wWA+QmUiyI5zJ8wrZp84LTctBPL75seVv0SB20nqai0 3/I73omB2RLVGpCpeWvb++vePXL8euFW3FEwCTM8hRboICjORTyIZPy8Y5os+3xT UgrIgVDOtn1Xwd4tK0qVwjOVA51east4Fcn3yGOrL40t+3SFm2jdpAJOO3UvyNPl vkbtjvXsIjt3/oxreKxXHLbamKyueWIfZRyCLsrg6wrr96oypPk6ID4iDCQoen/X Zf0VjM2vmvSd4YgnEIblOfSBxVg48cHJA4iVHVxFodNTrVnzGGFYPTmNKmJqo+Xn JeUILM7jlR4h/t0+cTTK3Busu24annTuuz5L5rjf4bUm6pPf4crb1yJaFWtGhlpa er233D6O =zI0R -----END PGP SIGNATURE----- Merge tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: - Enable mirrored memory for arm64 - Fix up several abuses of the efivar API - Refactor the efivar API in preparation for moving the 'business logic' part of it into efivarfs - Enable ACPI PRM on arm64 * tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits) ACPI: Move PRM config option under the main ACPI config ACPI: Enable Platform Runtime Mechanism(PRM) support on ARM64 ACPI: PRM: Change handler_addr type to void pointer efi: Simplify arch_efi_call_virt() macro drivers: fix typo in firmware/efi/memmap.c efi: vars: Drop __efivar_entry_iter() helper which is no longer used efi: vars: Use locking version to iterate over efivars linked lists efi: pstore: Omit efivars caching EFI varstore access layer efi: vars: Add thin wrapper around EFI get/set variable interface efi: vars: Don't drop lock in the middle of efivar_init() pstore: Add priv field to pstore_record for backend specific use Input: applespi - avoid efivars API and invoke EFI services directly selftests/kexec: remove broken EFI_VARS secure boot fallback check brcmfmac: Switch to appropriate helper to load EFI variable contents iwlwifi: Switch to proper EFI variable store interface media: atomisp_gmin_platform: stop abusing efivar API efi: efibc: avoid efivar API for setting variables efi: avoid efivars layer when loading SSDTs from variables efi: Correct comment on efi_memmap_alloc memblock: Disable mirror feature if kernelcore is not specified ...	2022-08-03 14:38:02 -07:00
Linus Torvalds	5264406cdb	iov_iter work, part 1 - isolated cleanups and optimizations. One of the goals is to reduce the overhead of using ->read_iter() and ->write_iter() instead of ->read()/->write(); new_sync_{read,write}() has a surprising amount of overhead, in particular inside iocb_flags(). That's why the beginning of the series is in this pile; it's not directly iov_iter-related, but it's a part of the same work... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYurGOQAKCRBZ7Krx/gZQ 6ysyAP91lvBfMRepcxpd9kvtuzWkU8A3rfSziZZteEHANB9Q7QEAiPn2a2OjWkcZ uAyUWfCkHCNx+dSMkEvUgR5okQ0exAM= =9UCV -----END PGP SIGNATURE----- Merge tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs iov_iter updates from Al Viro: "Part 1 - isolated cleanups and optimizations. One of the goals is to reduce the overhead of using ->read_iter() and ->write_iter() instead of ->read()/->write(). new_sync_{read,write}() has a surprising amount of overhead, in particular inside iocb_flags(). That's the explanation for the beginning of the series is in this pile; it's not directly iov_iter-related, but it's a part of the same work..." * tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: first_iovec_segment(): just return address iov_iter: massage calling conventions for first_{iovec,bvec}_segment() iov_iter: first_{iovec,bvec}_segment() - simplify a bit iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment() iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT iov_iter_bvec_advance(): don't bother with bvec_iter copy_page_{to,from}_iter(): switch iovec variants to generic keep iocb_flags() result cached in struct file iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC struct file: use anonymous union member for rcuhead and llist btrfs: use IOMAP_DIO_NOSYNC teach iomap_dio_rw() to suppress dsync No need of likely/unlikely on calls of check_copy_size()	2022-08-03 13:50:22 -07:00
Linus Torvalds	200e340f21	Main part here is making parallel lookups safe for RT - making sure preemption is disabled in start_dir_add()/ end_dir_add() sections (on non-RT it's automatic, on RT it needs to to be done explicitly) and moving wakeups from __d_lookup_done() inside of such to the end of those sections. Wakeups can be safely delayed for as long as ->d_lock on in-lookup dentry is held; proving that has caught a bug in d_add_ci() that allows memory corruption when sufficiently bogus ntfs (or case-insensitive xfs) image is mounted. Easily fixed, fortunately. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYurAlAAKCRBZ7Krx/gZQ 6x0mAP9JI80PC/lkYLda+AJ7NmweorBDwrOxzB34biXtyhYDDQEAvdrV07LUkETM FDN0+jgSpUikcs/kz5NxVBPRRN+RRAY= =qpTA -----END PGP SIGNATURE----- Merge tag 'pull-work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs dcache updates from Al Viro: "The main part here is making parallel lookups safe for RT - making sure preemption is disabled in start_dir_add()/ end_dir_add() sections (on non-RT it's automatic, on RT it needs to to be done explicitly) and moving wakeups from __d_lookup_done() inside of such to the end of those sections. Wakeups can be safely delayed for as long as ->d_lock on in-lookup dentry is held; proving that has caught a bug in d_add_ci() that allows memory corruption when sufficiently bogus ntfs (or case-insensitive xfs) image is mounted. Easily fixed, fortunately" * tag 'pull-work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs/dcache: Move wakeup out of i_seq_dir write held region. fs/dcache: Move the wakeup from __d_lookup_done() to the caller. fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT d_add_ci(): make sure we don't miss d_lookup_done()	2022-08-03 11:43:12 -07:00
Linus Torvalds	a782e86649	Saner handling of "lseek should fail with ESPIPE" - gets rid of magical no_llseek thing and makes checks consistent. In particular, ad-hoc "can we do splice via internal pipe" checks got saner (and somewhat more permissive, which is what Jason had been after, AFAICT) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYug2xgAKCRBZ7Krx/gZQ 6wxWAQDqeg+xMq2FGPXmgjCa+Cp3PXH96Lp6f3hHzakIDx+t8gEAxvuiXAD22Mct 6S1SKuGj0iDIuM4L7hUiWTiY/bDXSAc= =3EC/ -----END PGP SIGNATURE----- Merge tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs lseek updates from Al Viro: "Jason's lseek series. Saner handling of 'lseek should fail with ESPIPE' - this gets rid of the magical no_llseek thing and makes checks consistent. In particular, the ad-hoc "can we do splice via internal pipe" checks got saner (and somewhat more permissive, which is what Jason had been after, AFAICT)" * tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: remove no_llseek fs: check FMODE_LSEEK to control internal pipe splicing vfio: do not set FMODE_LSEEK flag dma-buf: remove useless FMODE_LSEEK flag fs: do not compare against ->llseek fs: clear or set FMODE_LSEEK based on llseek function	2022-08-03 11:35:20 -07:00
Linus Torvalds	d9395512c5	RCU pathwalk cleanups. Storing sampled ->d_seq of the next dentry in nameidata simplifies life considerably, especially if we delay fetching ->d_inode until step_into(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYug0NQAKCRBZ7Krx/gZQ 66TIAQDziGCTjwvyzm+E/HDcWNl2LjZ4kbsuL3yv2DTW9jExyAD/UuptUgy2RKT1 RfJET/hnPhlIbWxbjhYLRtIj8QpcUAY= =II7C -----END PGP SIGNATURE----- Merge tag 'pull-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs namei updates from Al Viro: "RCU pathwalk cleanups. Storing sampled ->d_seq of the next dentry in nameidata simplifies life considerably, especially if we delay fetching ->d_inode until step_into()" * tag 'pull-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: step_into(): move fetching ->d_inode past handle_mounts() lookup_fast(): don't bother with inode follow_dotdot{,_rcu}(): don't bother with inode step_into(): lose inode argument namei: stash the sampled ->d_seq into nameidata namei: move clearing LOOKUP_RCU towards rcu_read_unlock() switch try_to_unlazy_next() to __legitimize_mnt() follow_dotdot{,_rcu}(): change calling conventions namei: get rid of pointless unlikely(read_seqcount_retry(...)) __follow_mount_rcu(): verify that mount_lock remains unchanged	2022-08-03 11:31:42 -07:00
Linus Torvalds	f00654007f	Folio changes for 6.0 - Fix an accounting bug that made NR_FILE_DIRTY grow without limit when running xfstests - Convert more of mpage to use folios - Remove add_to_page_cache() and add_to_page_cache_locked() - Convert find_get_pages_range() to filemap_get_folios() - Improvements to the read_cache_page() family of functions - Remove a few unnecessary checks of PageError - Some straightforward filesystem conversions to use folios - Split PageMovable users out from address_space_operations into their own movable_operations - Convert aops->migratepage to aops->migrate_folio - Remove nobh support (Christoph Hellwig) -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmLpViQACgkQDpNsjXcp gj5pBgf/f3+K7Hi3qw7aYQCYJQ7IA/bLyE/DLWI59kuiao6wDSve40B9YH9X++Ha mRLp55bkQS+bwS2xa4jlqrIDJzAfNoWlXaXZHUXGL1C/52ChTF6jaH2cvO9PVlDS 7fLv1hy2LwiIdzpKJkUW7T+kcQGj3QLKqtQ4x8zD0LGMg055yvt/qndHSUi41nWT /58+6W8Sk4vvRgkpeChFzF1lGLy00+FGT8y5V2kM9uRliFQ7XPCwqB2a3e5jbW6z C1NXQmRnopCrnOT1TFIhK3DyX6MDIWV5qcikNAmCKFb9fQFPmjDLPt9iSoMGjw2M Z+UVhJCaU3ISccd0DG5Ra/vzs9/O9Q== =DgUi -----END PGP SIGNATURE----- Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache Pull folio updates from Matthew Wilcox: - Fix an accounting bug that made NR_FILE_DIRTY grow without limit when running xfstests - Convert more of mpage to use folios - Remove add_to_page_cache() and add_to_page_cache_locked() - Convert find_get_pages_range() to filemap_get_folios() - Improvements to the read_cache_page() family of functions - Remove a few unnecessary checks of PageError - Some straightforward filesystem conversions to use folios - Split PageMovable users out from address_space_operations into their own movable_operations - Convert aops->migratepage to aops->migrate_folio - Remove nobh support (Christoph Hellwig) * tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits) fs: remove the NULL get_block case in mpage_writepages fs: don't call ->writepage from __mpage_writepage fs: remove the nobh helpers jfs: stop using the nobh helper ext2: remove nobh support ntfs3: refactor ntfs_writepages mm/folio-compat: Remove migration compatibility functions fs: Remove aops->migratepage() secretmem: Convert to migrate_folio hugetlb: Convert to migrate_folio aio: Convert to migrate_folio f2fs: Convert to filemap_migrate_folio() ubifs: Convert to filemap_migrate_folio() btrfs: Convert btrfs_migratepage to migrate_folio mm/migrate: Add filemap_migrate_folio() mm/migrate: Convert migrate_page() to migrate_folio() nfs: Convert to migrate_folio btrfs: Convert btree_migratepage to migrate_folio mm/migrate: Convert expected_page_refs() to folio_expected_refs() mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio() ...	2022-08-03 10:35:43 -07:00
Konstantin Komarov	451e45a0e6	fs/ntfs3: Make ni_ins_new_attr return error Function ni_ins_new_attr now returns ERR_PTR(err), so we check it now in other functions like ni_expand_mft_list Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:10 +03:00
Konstantin Komarov	8039edba04	fs/ntfs3: Create MFT zone only if length is large enough Also removed uninformative print Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:10 +03:00
Konstantin Komarov	9256ec3535	fs/ntfs3: Refactoring attr_insert_range to restore after errors Added done and undo labels for restoring after errors Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:09 +03:00
Konstantin Komarov	20abc64f78	fs/ntfs3: Refactoring attr_punch_hole to restore after errors Added comments to code Added new function run_clone to make a copy of run Added done and undo labels for restoring after errors Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:09 +03:00
Konstantin Komarov	0e5b044cbf	fs/ntfs3: Refactoring attr_set_size to restore after errors Added comments to code Added two undo labels for restoring after errors Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:09 +03:00
Konstantin Komarov	c12df45ee6	fs/ntfs3: New function ntfs_bad_inode There are repetitive steps in case of bad inode This commit wraps them in function Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:08 +03:00
Konstantin Komarov	8335ebe195	fs/ntfs3: Make MFT zone less fragmented Now we take free space after the MFT zone if the MFT zone shrinks. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:08 +03:00
Konstantin Komarov	e6d9398c07	fs/ntfs3: Check possible errors in run_pack in advance Checking in advance speeds things up in some cases. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:08 +03:00
Konstantin Komarov	54033c1350	fs/ntfs3: Added comments to frecord functions Added some comments in frecord.c for more context. Also changed run_lookup to static because it's an internal function. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:07 +03:00
Konstantin Komarov	42f66a7fda	fs/ntfs3: Fill duplicate info in ni_add_name Work with names must be completed in ni_add_name Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:07 +03:00
Konstantin Komarov	cf760ec0a0	fs/ntfs3: Make static function attr_load_runs attr_load_runs is an internal function Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:06 +03:00
Konstantin Komarov	071100ea0e	fs/ntfs3: Add new argument is_mft to ntfs_mark_rec_free This argument helps in avoiding double locking Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:06 +03:00
Konstantin Komarov	6700eabb90	fs/ntfs3: Remove unused mi_mark_free Cleaning up dead code Fix wrong comments Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:06 +03:00
Konstantin Komarov	560f7736b9	fs/ntfs3: Fix very fragmented case in attr_punch_hole In some cases we need to ni_find_attr attr_b Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:05 +03:00
Konstantin Komarov	42f86b1226	fs/ntfs3: Fix work with fragmented xattr In some cases xattr is too fragmented, so we need to load it before writing. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:05 +03:00
Konstantin Komarov	b3e048720d	fs/ntfs3: Make ntfs_fallocate return -ENOSPC instead of -EFBIG In some cases we need to return ENOSPC Fixes xfstest generic/213 Fixes: `114346978c` ("fs/ntfs3: Check new size for limits") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:05 +03:00
Konstantin Komarov	c1e0ab3789	fs/ntfs3: extend ni_insert_nonresident to return inserted ATTR_LIST_ENTRY Fixes xfstest generic/300 Fixes: `4534a70b70` ("fs/ntfs3: Add headers and misc files") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:04 +03:00
Konstantin Komarov	13747aac89	fs/ntfs3: Check reserved size for maximum allowed Also don't mask EFBIG Fixes xfstest generic/485 Fixes: `4342306f0f` ("fs/ntfs3: Add file operations and implementation") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:25:04 +03:00
Konstantin Komarov	460bbf2990	fs/ntfs3: Do not change mode if ntfs_set_ea failed ntfs_set_ea can fail with NOSPC, so we don't need to change mode in this situation. Fixes xfstest generic/449 Fixes: `be71b5cba2` ("fs/ntfs3: Add attrib operations") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2022-08-03 18:24:44 +03:00
Jeff Layton	a8af0d682a	libceph: clean up ceph_osdc_start_request prototype This function always returns 0, and ignores the nofail boolean. Drop the nofail argument, make the function void return and fix up the callers. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 14:05:39 +02:00
Jeremy Bongio	d95efb14c0	ext4: add ioctls to get/set the ext4 superblock uuid This fixes a race between changing the ext4 superblock uuid and operations like mounting, resizing, changing features, etc. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jeremy Bongio <bongiojp@gmail.com> Link: https://lore.kernel.org/r/20220721224422.438351-1-bongiojp@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:26 -04:00
Kiselev, Oleg	69cb8e9d8c	ext4: avoid resizing to a partial cluster size This patch avoids an attempt to resize the filesystem to an unaligned cluster boundary. An online resize to a size that is not integral to cluster size results in the last iteration attempting to grow the fs by a negative amount, which trips a BUG_ON and leaves the fs with a corrupted in-memory superblock. Signed-off-by: Oleg Kiselev <okiselev@amazon.com> Link: https://lore.kernel.org/r/0E92A0AB-4F16-4F1A-94B7-702CC6504FDE@amazon.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:26 -04:00
Kiselev, Oleg	026d0d27c4	ext4: reduce computation of overhead during resize This patch avoids doing an O(n**2)-complexity walk through every flex group. Instead, it uses the already computed overhead information for the newly allocated space, and simply adds it to the previously calculated overhead stored in the superblock. This drastically reduces the time taken to resize very large bigalloc filesystems (from 3+ hours for a 64TB fs down to milliseconds). Signed-off-by: Oleg Kiselev <okiselev@amazon.com> Link: https://lore.kernel.org/r/CE4F359F-4779-45E6-B6A9-8D67FDFF5AE2@amazon.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Zhihao Cheng	4a734f0869	jbd2: fix assertion 'jh->b_frozen_data == NULL' failure when journal aborted Following process will fail assertion 'jh->b_frozen_data == NULL' in jbd2_journal_dirty_metadata(): jbd2_journal_commit_transaction unlink(dir/a) jh->b_transaction = trans1 jh->b_jlist = BJ_Metadata journal->j_running_transaction = NULL trans1->t_state = T_COMMIT unlink(dir/b) handle->h_trans = trans2 do_get_write_access jh->b_modified = 0 jh->b_frozen_data = frozen_buffer jh->b_next_transaction = trans2 jbd2_journal_dirty_metadata is_handle_aborted is_journal_aborted // return false --> jbd2 abort <-- while (commit_transaction->t_buffers) if (is_journal_aborted) jbd2_journal_refile_buffer __jbd2_journal_refile_buffer WRITE_ONCE(jh->b_transaction, jh->b_next_transaction) WRITE_ONCE(jh->b_next_transaction, NULL) __jbd2_journal_file_buffer(jh, BJ_Reserved) J_ASSERT_JH(jh, jh->b_frozen_data == NULL) // assertion failure ! The reproducer (See detail in [Link]) reports: ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:1629! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 2 PID: 584 Comm: unlink Tainted: G W 5.19.0-rc6-00115-g4a57a8400075-dirty #697 RIP: 0010:jbd2_journal_dirty_metadata+0x3c5/0x470 RSP: 0018:ffffc90000be7ce0 EFLAGS: 00010202 Call Trace: <TASK> __ext4_handle_dirty_metadata+0xa0/0x290 ext4_handle_dirty_dirblock+0x10c/0x1d0 ext4_delete_entry+0x104/0x200 __ext4_unlink+0x22b/0x360 ext4_unlink+0x275/0x390 vfs_unlink+0x20b/0x4c0 do_unlinkat+0x42f/0x4c0 __x64_sys_unlink+0x37/0x50 do_syscall_64+0x35/0x80 After journal aborting, __jbd2_journal_refile_buffer() is executed with holding @jh->b_state_lock, we can fix it by moving 'is_handle_aborted()' into the area protected by @jh->b_state_lock. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216251 Fixes: `470decc613` ("[PATCH] jbd2: initial copy of files from jbd") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20220715125152.4022726-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Lukas Czerner	1e1c2b86ef	ext4: block range must be validated before use in ext4_mb_clear_bb() Block range to free is validated in ext4_free_blocks() using ext4_inode_block_valid() and then it's passed to ext4_mb_clear_bb(). However in some situations on bigalloc file system the range might be adjusted after the validation in ext4_free_blocks() which can lead to troubles on corrupted file systems such as one found by syzkaller that resulted in the following BUG kernel BUG at fs/ext4/ext4.h:3319! PREEMPT SMP NOPTI CPU: 28 PID: 4243 Comm: repro Kdump: loaded Not tainted 5.19.0-rc6+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014 RIP: 0010:ext4_free_blocks+0x95e/0xa90 Call Trace: <TASK> ? lock_timer_base+0x61/0x80 ? __es_remove_extent+0x5a/0x760 ? __mod_timer+0x256/0x380 ? ext4_ind_truncate_ensure_credits+0x90/0x220 ext4_clear_blocks+0x107/0x1b0 ext4_free_data+0x15b/0x170 ext4_ind_truncate+0x214/0x2c0 ? _raw_spin_unlock+0x15/0x30 ? ext4_discard_preallocations+0x15a/0x410 ? ext4_journal_check_start+0xe/0x90 ? __ext4_journal_start_sb+0x2f/0x110 ext4_truncate+0x1b5/0x460 ? __ext4_journal_start_sb+0x2f/0x110 ext4_evict_inode+0x2b4/0x6f0 evict+0xd0/0x1d0 ext4_enable_quotas+0x11f/0x1f0 ext4_orphan_cleanup+0x3de/0x430 ? proc_create_seq_private+0x43/0x50 ext4_fill_super+0x295f/0x3ae0 ? snprintf+0x39/0x40 ? sget_fc+0x19c/0x330 ? ext4_reconfigure+0x850/0x850 get_tree_bdev+0x16d/0x260 vfs_get_tree+0x25/0xb0 path_mount+0x431/0xa70 __x64_sys_mount+0xe2/0x120 do_syscall_64+0x5b/0x80 ? do_user_addr_fault+0x1e2/0x670 ? exc_page_fault+0x70/0x170 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fdf4e512ace Fix it by making sure that the block range is properly validated before used every time it changes in ext4_free_blocks() or ext4_mb_clear_bb(). Link: https://syzkaller.appspot.com/bug?id=5266d464285a03cee9dbfda7d2452a72c3c2ae7c Reported-by: syzbot+15cd994e273307bf5cfa@syzkaller.appspotmail.com Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Tadeusz Struk <tadeusz.struk@linaro.org> Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org> Link: https://lore.kernel.org/r/20220714165903.58260-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	307af6c879	mbcache: automatically delete entries from cache on freeing Use the fact that entries with elevated refcount are not removed from the hash and just move removal of the entry from the hash to the entry freeing time. When doing this we also change the generic code to hold one reference to the cache entry, not two of them, which makes code somewhat more obvious. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-10-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	75896339e4	mbcache: Remove mb_cache_entry_delete() Nobody uses mb_cache_entry_delete() anymore. Remove it. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-9-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	1189d8ec51	ext2: avoid deleting xattr block that is being reused Currently when we decide to reuse xattr block we detect the case when the last reference to xattr block is being dropped at the same time and cancel the reuse attempt. Convert ext2 to a new scheme when as soon as matching mbcache entry is found, we wait with dropping the last xattr block reference until mbcache entry reference is dropped (meaning either the xattr block reference is increased or we decided not to reuse the block). Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-8-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	b67798d551	ext2: unindent codeblock in ext2_xattr_set() Replace one else in ext2_xattr_set() with a goto. This makes following code changes simpler to follow. No functional changes. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220712105436.32204-7-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	90ae40d243	ext2: factor our freeing of xattr block reference Free of xattr block reference is opencode in two places. Factor it out into a separate function and use it. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-6-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	65f8b80053	ext4: fix race when reusing xattr blocks When ext4_xattr_block_set() decides to remove xattr block the following race can happen: CPU1 CPU2 ext4_xattr_block_set() ext4_xattr_release_block() new_bh = ext4_xattr_block_cache_find() lock_buffer(bh); ref = le32_to_cpu(BHDR(bh)->h_refcount); if (ref == 1) { ... mb_cache_entry_delete(); unlock_buffer(bh); ext4_free_blocks(); ... ext4_forget(..., bh, ...); jbd2_journal_revoke(..., bh); ext4_journal_get_write_access(..., new_bh, ...) do_get_write_access() jbd2_journal_cancel_revoke(..., new_bh); Later the code in ext4_xattr_block_set() finds out the block got freed and cancels reusal of the block but the revoke stays canceled and so in case of block reuse and journal replay the filesystem can get corrupted. If the race works out slightly differently, we can also hit assertions in the jbd2 code. Fix the problem by making sure that once matching mbcache entry is found, code dropping the last xattr block reference (or trying to modify xattr block in place) waits until the mbcache entry reference is dropped. This way code trying to reuse xattr block is protected from someone trying to drop the last reference to xattr block. Reported-and-tested-by: Ritesh Harjani <ritesh.list@gmail.com> CC: stable@vger.kernel.org Fixes: `82939d7999` ("ext4: convert to mbcache2") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-5-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	fd48e9acdf	ext4: unindent codeblock in ext4_xattr_block_set() Remove unnecessary else (and thus indentation level) from a code block in ext4_xattr_block_set(). It will also make following code changes easier. No functional changes. CC: stable@vger.kernel.org Fixes: `82939d7999` ("ext4: convert to mbcache2") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	6bc0d63dad	ext4: remove EA inode entry from mbcache on inode eviction Currently we remove EA inode from mbcache as soon as its xattr refcount drops to zero. However there can be pending attempts to reuse the inode and thus refcount handling code has to handle the situation when refcount increases from zero anyway. So save some work and just keep EA inode in mbcache until it is getting evicted. At that moment we are sure following iget() of EA inode will fail anyway (or wait for eviction to finish and load things from the disk again) and so removing mbcache entry at that moment is fine and simplifies the code a bit. CC: stable@vger.kernel.org Fixes: `82939d7999` ("ext4: convert to mbcache2") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	3dc96bba65	mbcache: add functions to delete entry if unused Add function mb_cache_entry_delete_or_get() to delete mbcache entry if it is unused and also add a function to wait for entry to become unused - mb_cache_entry_wait_unused(). We do not share code between the two deleting function as one of them will go away soon. CC: stable@vger.kernel.org Fixes: `82939d7999` ("ext4: convert to mbcache2") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	5831891418	mbcache: don't reclaim used entries Do not reclaim entries that are currently used by somebody from a shrinker. Firstly, these entries are likely useful. Secondly, we will need to keep such entries to protect pending increment of xattr block refcount. CC: stable@vger.kernel.org Fixes: `82939d7999` ("ext4: convert to mbcache2") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Lukas Czerner	b8a04fe77e	ext4: make sure ext4_append() always allocates new block ext4_append() must always allocate a new block, otherwise we run the risk of overwriting existing directory block corrupting the directory tree in the process resulting in all manner of problems later on. Add a sanity check to see if the logical block is already allocated and error out if it is. Cc: stable@kernel.org Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20220704142721.157985-2-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:17 -04:00
Lukas Czerner	65f8ea4cd5	ext4: check if directory block is within i_size Currently ext4 directory handling code implicitly assumes that the directory blocks are always within the i_size. In fact ext4_append() will attempt to allocate next directory block based solely on i_size and the i_size is then appropriately increased after a successful allocation. However, for this to work it requires i_size to be correct. If, for any reason, the directory inode i_size is corrupted in a way that the directory tree refers to a valid directory block past i_size, we could end up corrupting parts of the directory tree structure by overwriting already used directory blocks when modifying the directory. Fix it by catching the corruption early in __ext4_read_dirblock(). Addresses Red-Hat-Bugzilla: #2070205 CVE: CVE-2022-1184 Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20220704142721.157985-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:17 -04:00
Ojaswin Mujoo	3fa5d23e68	ext4: reflect mb_optimize_scan value in options file Add support to display the mb_optimize_scan value in /proc/fs/ext4/<dev>/options file. The option is only displayed when the value is non default. Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20220704054603.21462-1-ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:17 -04:00
Ye Bin	b24e77ef1c	ext4: avoid remove directory when directory is corrupted Now if check directoy entry is corrupted, ext4_empty_dir may return true then directory will be removed when file system mounted with "errors=continue". In order not to make things worse just return false when directory is corrupted. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220622090223.682234-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:17 -04:00
Jiang Jian	c64a92992e	ext4: aligned '' in comments The '' in the comment is not aligned. Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com> Link: https://lore.kernel.org/r/20220621061531.19669-1-jiangjian@cdjrlc.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:17 -04:00
Li Lingfeng	07ea7a617d	ext4: recover csum seed of tmp_inode after migrating to extents When migrating to extents, the checksum seed of temporary inode need to be replaced by inode's, otherwise the inode checksums will be incorrect when swapping the inodes data. However, the temporary inode can not match it's checksum to itself since it has lost it's own checksum seed. mkfs.ext4 -F /dev/sdc mount /dev/sdc /mnt/sdc xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/sdc/testfile chattr -e /mnt/sdc/testfile chattr +e /mnt/sdc/testfile umount /dev/sdc fsck -fn /dev/sdc ======== ... Pass 1: Checking inodes, blocks, and sizes Inode 13 passes checks, but checksum does not match inode. Fix? no ... ======== The fix is simple, save the checksum seed of temporary inode, and recover it after migrating to extents. Fixes: `e81c9302a6` ("ext4: set csum seed in tmp inode while migrating to extents") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220617062515.2113438-1-lilingfeng3@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:02 -04:00
Ye Bin	51ae846cff	ext4: fix warning in ext4_iomap_begin as race between bmap and write We got issue as follows: ------------[ cut here ]------------ WARNING: CPU: 3 PID: 9310 at fs/ext4/inode.c:3441 ext4_iomap_begin+0x182/0x5d0 RIP: 0010:ext4_iomap_begin+0x182/0x5d0 RSP: 0018:ffff88812460fa08 EFLAGS: 00010293 RAX: ffff88811f168000 RBX: 0000000000000000 RCX: ffffffff97793c12 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: ffff88812c669160 R08: ffff88811f168000 R09: ffffed10258cd20f R10: ffff88812c669077 R11: ffffed10258cd20e R12: 0000000000000001 R13: 00000000000000a4 R14: 000000000000000c R15: ffff88812c6691ee FS: 00007fd0d6ff3740(0000) GS:ffff8883af180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd0d6dda290 CR3: 0000000104a62000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: iomap_apply+0x119/0x570 iomap_bmap+0x124/0x150 ext4_bmap+0x14f/0x250 bmap+0x55/0x80 do_vfs_ioctl+0x952/0xbd0 __x64_sys_ioctl+0xc6/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Above issue may happen as follows: bmap write bmap ext4_bmap iomap_bmap ext4_iomap_begin ext4_file_write_iter ext4_buffered_write_iter generic_perform_write ext4_da_write_begin ext4_da_write_inline_data_begin ext4_prepare_inline_data ext4_create_inline_data ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA); if (WARN_ON_ONCE(ext4_has_inline_data(inode))) ->trigger bug_on To solved above issue hold inode lock in ext4_bamp. Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20220617013935.397596-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2022-08-02 23:56:02 -04:00
Baokun Li	fd7e672ea9	ext4: correct the misjudgment in ext4_iget_extra_inode Use the EXT4_INODE_HAS_XATTR_SPACE macro to more accurately determine whether the inode have xattr space. Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220616021358.2504451-5-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:55:55 -04:00
Baokun Li	c9fd167d57	ext4: correct max_inline_xattr_value_size computing If the ext4 inode does not have xattr space, 0 is returned in the get_max_inline_xattr_value_size function. Otherwise, the function returns a negative value when the inode does not contain EXT4_STATE_XATTR. Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220616021358.2504451-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:52:44 -04:00
Baokun Li	67d7d8ad99	ext4: fix use-after-free in ext4_xattr_set_entry Hulk Robot reported a issue: ================================================================== BUG: KASAN: use-after-free in ext4_xattr_set_entry+0x18ab/0x3500 Write of size 4105 at addr ffff8881675ef5f4 by task syz-executor.0/7092 CPU: 1 PID: 7092 Comm: syz-executor.0 Not tainted 4.19.90-dirty #17 Call Trace: [...] memcpy+0x34/0x50 mm/kasan/kasan.c:303 ext4_xattr_set_entry+0x18ab/0x3500 fs/ext4/xattr.c:1747 ext4_xattr_ibody_inline_set+0x86/0x2a0 fs/ext4/xattr.c:2205 ext4_xattr_set_handle+0x940/0x1300 fs/ext4/xattr.c:2386 ext4_xattr_set+0x1da/0x300 fs/ext4/xattr.c:2498 __vfs_setxattr+0x112/0x170 fs/xattr.c:149 __vfs_setxattr_noperm+0x11b/0x2a0 fs/xattr.c:180 __vfs_setxattr_locked+0x17b/0x250 fs/xattr.c:238 vfs_setxattr+0xed/0x270 fs/xattr.c:255 setxattr+0x235/0x330 fs/xattr.c:520 path_setxattr+0x176/0x190 fs/xattr.c:539 __do_sys_lsetxattr fs/xattr.c:561 [inline] __se_sys_lsetxattr fs/xattr.c:557 [inline] __x64_sys_lsetxattr+0xc2/0x160 fs/xattr.c:557 do_syscall_64+0xdf/0x530 arch/x86/entry/common.c:298 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x459fe9 RSP: 002b:00007fa5e54b4c08 EFLAGS: 00000246 ORIG_RAX: 00000000000000bd RAX: ffffffffffffffda RBX: 000000000051bf60 RCX: 0000000000459fe9 RDX: 00000000200003c0 RSI: 0000000020000180 RDI: 0000000020000140 RBP: 000000000051bf60 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000001009 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc73c93fc0 R14: 000000000051bf60 R15: 00007fa5e54b4d80 [...] ================================================================== Above issue may happen as follows: ------------------------------------- ext4_xattr_set ext4_xattr_set_handle ext4_xattr_ibody_find >> s->end < s->base >> no EXT4_STATE_XATTR >> xattr_check_inode is not executed ext4_xattr_ibody_set ext4_xattr_set_entry >> size_t min_offs = s->end - s->base >> UAF in memcpy we can easily reproduce this problem with the following commands: mkfs.ext4 -F /dev/sda mount -o debug_want_extra_isize=128 /dev/sda /mnt touch /mnt/file setfattr -n user.cat -v `seq -s z 4096\|tr -d '[:digit:]'` /mnt/file In ext4_xattr_ibody_find, we have the following assignment logic: header = IHDR(inode, raw_inode) = raw_inode + EXT4_GOOD_OLD_INODE_SIZE + i_extra_isize is->s.base = IFIRST(header) = header + sizeof(struct ext4_xattr_ibody_header) is->s.end = raw_inode + s_inode_size In ext4_xattr_set_entry min_offs = s->end - s->base = s_inode_size - EXT4_GOOD_OLD_INODE_SIZE - i_extra_isize - sizeof(struct ext4_xattr_ibody_header) last = s->first free = min_offs - ((void *)last - s->base) - sizeof(__u32) = s_inode_size - EXT4_GOOD_OLD_INODE_SIZE - i_extra_isize - sizeof(struct ext4_xattr_ibody_header) - sizeof(__u32) In the calculation formula, all values except s_inode_size and i_extra_size are fixed values. When i_extra_size is the maximum value s_inode_size - EXT4_GOOD_OLD_INODE_SIZE, min_offs is -4 and free is -8. The value overflows. As a result, the preceding issue is triggered when memcpy is executed. Therefore, when finding xattr or setting xattr, check whether there is space for storing xattr in the inode to resolve this issue. Cc: stable@kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220616021358.2504451-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:52:34 -04:00
Baokun Li	179b14152d	ext4: add EXT4_INODE_HAS_XATTR_SPACE macro in xattr.h When adding an xattr to an inode, we must ensure that the inode_size is not less than EXT4_GOOD_OLD_INODE_SIZE + extra_isize + pad. Otherwise, the end position may be greater than the start position, resulting in UAF. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220616021358.2504451-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:52:34 -04:00
Eric Whitney	7f0d8e1d60	ext4: fix extent status tree race in writeback error recovery path A race can occur in the unlikely event ext4 is unable to allocate a physical cluster for a delayed allocation in a bigalloc file system during writeback. Failure to allocate a cluster forces error recovery that includes a call to mpage_release_unused_pages(). That function removes any corresponding delayed allocated blocks from the extent status tree. If a new delayed write is in progress on the same cluster simultaneously, resulting in the addition of an new extent containing one or more blocks in that cluster to the extent status tree, delayed block accounting can be thrown off if that delayed write then encounters a similar cluster allocation failure during future writeback. Write lock the i_data_sem in mpage_release_unused_pages() to fix this problem. Ext4's block/cluster accounting code for bigalloc relies on i_data_sem for mutual exclusion, as is found in the delayed write path, and the locking in mpage_release_unused_pages() is missing. Cc: stable@kernel.org Reported-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20220615160530.1928801-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:52:19 -04:00
Zhang Yi	a89573ce4a	jbd2: fix outstanding credits assert in jbd2_journal_commit_transaction() We catch an assert problem in jbd2_journal_commit_transaction() when doing fsstress and request falut injection tests. The problem is happened in a race condition between jbd2_journal_commit_transaction() and ext4_end_io_end(). Firstly, ext4_writepages() writeback dirty pages and start reserved handle, and then the journal was aborted due to some previous metadata IO error, jbd2_journal_abort() start to commit current running transaction, the committing procedure could be raced by ext4_end_io_end() and lead to subtract j_reserved_credits twice from commit_transaction->t_outstanding_credits, finally the t_outstanding_credits is mistakenly smaller than t_nr_buffers and trigger assert. kjournald2 kworker jbd2_journal_commit_transaction() write_unlock(&journal->j_state_lock); atomic_sub(j_reserved_credits, t_outstanding_credits); //sub once jbd2_journal_start_reserved() start_this_handle() //detect aborted journal jbd2_journal_free_reserved() //get running transaction read_lock(&journal->j_state_lock) __jbd2_journal_unreserve_handle() atomic_sub(j_reserved_credits, t_outstanding_credits); //sub again read_unlock(&journal->j_state_lock); journal->j_running_transaction = NULL; J_ASSERT(t_nr_buffers <= t_outstanding_credits) //bomb!!! Fix this issue by using journal->j_state_lock to protect the subtraction in jbd2_journal_commit_transaction(). Fixes: `96f1e09745` ("jbd2: avoid long hold times of j_state_lock while committing a transaction") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220611130426.2013258-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:52:19 -04:00
Jan Kara	d132495856	jbd2: unexport jbd2_log_start_commit() jbd2_log_start_commit() is not used outside of jbd2 so unexport it. Also make __jbd2_log_start_commit() static when we are at it. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20220608112355.4397-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:52:19 -04:00
Jan Kara	68af74e92a	jbd2: remove unused exports for jbd2 debugging Jbd2 exports jbd2_journal_enable_debug and __jbd2_debug() depite the first is used only in fs/jbd2/journal.c and the second only within jbd2 code. Remove the pointless exports make jbd2_journal_enable_debug static. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20220608112355.4397-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:52:19 -04:00
Jan Kara	cb3b3bf22c	jbd2: rename jbd_debug() to jbd2_debug() The name of jbd_debug() is confusing as all functions inside jbd2 have jbd2_ prefix. Rename jbd_debug() to jbd2_debug(). No functional changes. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20220608112355.4397-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:52:19 -04:00
Jan Kara	4978c659e7	ext4: use ext4_debug() instead of jbd_debug() We use jbd_debug() in some places in ext4. It seems a bit strange to use jbd2 debugging output function for ext4 code. Also these days ext4_debug() uses dynamic printk so each debug message can be enabled / disabled on its own so the time when it made some sense to have these combined (to allow easier common selecting of messages to report) has passed. Just convert all jbd_debug() uses in ext4 to ext4_debug(). Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20220608112355.4397-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:52:19 -04:00
hanjinke	218a69441b	ext4: reuse order and buddy in mb_mark_used when buddy split After each buddy split, mb_mark_used will search the proper order for the block which may consume some loop in mb_find_order_for_block. In fact, we can reuse the order and buddy generated by the buddy split. Reviewed by: lei.rao@intel.com Signed-off-by: hanjinke <hanjinke.666@bytedance.com> Link: https://lore.kernel.org/r/20220606155305.74146-1-hanjinke.666@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:52:19 -04:00
Theodore Ts'o	827891a38a	ext4: update the s_overhead_clusters in the backup sb's when resizing When the EXT4_IOC_RESIZE_FS ioctl is complete, update the backup superblocks. We don't do this for the old-style resize ioctls since they are quite ancient, and only used by very old versions of resize2fs --- and we don't want to update the backup superblocks every time EXT4_IOC_GROUP_ADD is called, since it might get called a lot. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20220629040026.112371-2-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:52:19 -04:00
Theodore Ts'o	de394a8665	ext4: update s_overhead_clusters in the superblock during an on-line resize When doing an online resize, the on-disk superblock on-disk wasn't updated. This means that when the file system is unmounted and remounted, and the on-disk overhead value is non-zero, this would result in the results of statfs(2) to be incorrect. This was partially fixed by Commits `10b01ee92d` ("ext4: fix overhead calculation to account for the reserved gdt blocks"), `85d825dbf4` ("ext4: force overhead calculation if the s_overhead_cluster makes no sense"), and `eb7054212e` ("ext4: update the cached overhead value in the superblock"). However, since it was too expensive to forcibly recalculate the overhead for bigalloc file systems at every mount, this didn't fix the problem for bigalloc file systems. This commit should address the problem when resizing file systems with the bigalloc feature enabled. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20220629040026.112371-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:52:19 -04:00
Zhang Yi	5a57bca905	ext4: fix reading leftover inlined symlinks Since commit `6493792d32` ("ext4: convert symlink external data block mapping to bdev"), create new symlink with inline_data is not supported, but it missing to handle the leftover inlined symlinks, which could cause below error message and fail to read symlink. ls: cannot read symbolic link 'foo': Structure needs cleaning EXT4-fs error (device sda): ext4_map_blocks:605: inode #12: block 2021161080: comm ls: lblock 0 mapped to illegal pblock 2021161080 (length 1) Fix this regression by adding ext4_read_inline_link(), which read the inline data directly and convert it through a kmalloced buffer. Fixes: `6493792d32` ("ext4: convert symlink external data block mapping to bdev") Cc: stable@kernel.org Reported-by: Torge Matthies <openglfreak@googlemail.com> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Tested-by: Torge Matthies <openglfreak@googlemail.com> Link: https://lore.kernel.org/r/20220630090100.2769490-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:37:50 -04:00
Linus Torvalds	c2a24a7a03	This update includes the following changes: API: - Make proc files report fips module name and version. Algorithms: - Move generic SHA1 code into lib/crypto. - Implement Chinese Remainder Theorem for RSA. - Remove blake2s. - Add XCTR with x86/arm64 acceleration. - Add POLYVAL with x86/arm64 acceleration. - Add HCTR2. - Add ARIA. Drivers: - Add support for new CCP/PSP device ID in ccp. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmLosAAACgkQxycdCkmx i6dvgxAAzcw0cKMuq3dbQamzeVu1bDW8rPb7yHnpXal3ao5ewa15+hFjsKhdh/s3 cjM5Lu7Qx4lnqtsh2JVSU5o2SgEpptxXNfxAngcn46ld5EgV/G4DYNKuXsatMZ2A erCzXqG9dDxJmREat+5XgVfD1RFVsglmEA/Nv4Rvn+9O4O6PfwRa8GyUzeKC+byG qs/1JyiPqpyApgzCvlQFAdTF4PM7ruDtg3mnMy2EKAzqj4JUseXRi1i81vLVlfBL T40WESG/CnOwIF5MROhziAtkJMS4Y4v2VQ2++1p0gwG6pDCnq4w7u9cKPXYfNgZK fMVCxrNlxIH3W99VfVXbXwqDSN6qEZtQvhnliwj9aEbEltIoH+B02wNfS/BDsTec im+5NCnNQ6olMPyL0yHrMKisKd+DwTrEfYT5H2kFhcdcYZncQ9C6el57kimnJRzp 4ymPRudCKm/8weWGTtmjFMi+PFP4LgvCoR+VMUd+gVe91F9ZMAO0K7b5z5FVDyDf wmsNBvsEnTdm/r7YceVzGwdKQaP9sE5wq8iD/yySD1PjlmzZos1CtCrqAIT/v2RK pQdZCIkT8qCB+Jm03eEd4pwjEDnbZdQmpKt4cTy0HWIeLJVG1sXPNpgwPCaBEV4U g0nctILtypChlSDmuGhTCyuElfMg6CXt4cgSZJTBikT+QcyWOm4= =rfWK -----END PGP SIGNATURE----- Merge tag 'v5.20-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Make proc files report fips module name and version Algorithms: - Move generic SHA1 code into lib/crypto - Implement Chinese Remainder Theorem for RSA - Remove blake2s - Add XCTR with x86/arm64 acceleration - Add POLYVAL with x86/arm64 acceleration - Add HCTR2 - Add ARIA Drivers: - Add support for new CCP/PSP device ID in ccp" * tag 'v5.20-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (89 commits) crypto: tcrypt - Remove the static variable initialisations to NULL crypto: arm64/poly1305 - fix a read out-of-bound crypto: hisilicon/zip - Use the bitmap API to allocate bitmaps crypto: hisilicon/sec - fix auth key size error crypto: ccree - Remove a useless dma_supported() call crypto: ccp - Add support for new CCP/PSP device ID crypto: inside-secure - Add missing MODULE_DEVICE_TABLE for of crypto: hisilicon/hpre - don't use GFP_KERNEL to alloc mem during softirq crypto: testmgr - some more fixes to RSA test vectors cyrpto: powerpc/aes - delete the rebundant word "block" in comments hwrng: via - Fix comment typo crypto: twofish - Fix comment typo crypto: rmd160 - fix Kconfig "its" grammar crypto: keembay-ocs-ecc - Drop if with an always false condition Documentation: qat: rewrite description Documentation: qat: Use code block for qat sysfs example crypto: lib - add module license to libsha1 crypto: lib - make the sha1 library optional crypto: lib - move lib/sha1.c into lib/crypto/ crypto: fips - make proc files report fips module name and version ...	2022-08-02 17:45:14 -07:00
Xiubo Li	c460f4e4bb	ceph: remove useless check for the folio The netfs_write_begin() won't set the folio if the return value is non-zero. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:13 +02:00
Hu Weiwen	7cb9994754	ceph: don't truncate file in atomic_open Clear O_TRUNC from the flags sent in the MDS create request. `atomic_open' is called before permission check. We should not do any modification to the file here. The caller will do the truncation afterward. Fixes: `124e68e740` ("ceph: file operations") Signed-off-by: Hu Weiwen <sehuww@mail.scut.edu.cn> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:13 +02:00
Xiubo Li	0c04a117d7	ceph: make f_bsize always equal to f_frsize The f_frsize maybe changed in the quota size is less than the defualt 4MB. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:13 +02:00
Xiubo Li	e027ddb6d3	ceph: flush the dirty caps immediatelly when quota is approaching When the quota is approaching we need to notify it to the MDS as soon as possible, or the client could write to the directory more than expected. This will flush the dirty caps without delaying after each write, though this couldn't prevent the real size of a directory exceed the quota but could prevent it as soon as possible. Link: https://tracker.ceph.com/issues/56180 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Luís Henriques <lhenriques@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:13 +02:00
Xiubo Li	4849077604	ceph: don't get the inline data for new creating files If the 'i_inline_version' is 1, that means the file is just new created and there shouldn't have any inline data in it, we should skip retrieving the inline data from MDS. This also could help reduce possiblity of dead lock issue introduce by the inline data and Fcr caps. Gradually we will remove the inline feature from kclient after ceph's scrub too have support to unline the inline data, currently this could help reduce the teuthology test failures. This is possiblly could also fix a bug that for some old clients if they couldn't explictly uninline the inline data when writing, the inline version will keep as 1 always. We may always reading non-exist data from inline data. Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Xiubo Li	0006164589	ceph: update the auth cap when the async create req is forwarded For async create we will always try to choose the auth MDS of frag the dentry belonged to of the parent directory to send the request and ususally this works fine, but if the MDS migrated the directory to another MDS before it could be handled the request will be forwarded. And then the auth cap will be changed. We need to update the auth cap in this case before the request is forwarded. Link: https://tracker.ceph.com/issues/55857 Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Xiubo Li	e19feff963	ceph: make change_auth_cap_ses a global symbol Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Jeff Layton	020bc44a9f	ceph: switch back to testing for NULL folio->private in ceph_dirty_folio Willy requested that we change this back to warning on folio->private being non-NULl. He's trying to kill off the PG_private flag, and so we'd like to catch where it's non-NULL. Add a VM_WARN_ON_FOLIO (since it doesn't exist yet) and change over to using that instead of VM_BUG_ON_FOLIO along with testing the ->private pointer. [ xiubli: define VM_WARN_ON_FOLIO macro in case DEBUG_VM is disabled reported by kernel test robot <lkp@intel.com> ] Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Jeff Layton	7467b04418	ceph: call netfs_subreq_terminated with was_async == false "was_async" is a bit misleadingly named. It's supposed to indicate whether it's safe to call blocking operations from the context you're calling it from, but it sounds like it's asking whether this was done via async operation. For ceph, this it's always called from kernel thread context so it should be safe to set this to false. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Jeff Layton	e821450335	ceph: convert to generic_file_llseek There's no reason we need to lock the inode for write in order to handle an llseek. I suspect this should have been dropped in 2013 when we stopped doing vmtruncate in llseek. With that gone, ceph_llseek is functionally equivalent to generic_file_llseek, so just call that after getting the size. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Luís Henriques <lhenriques@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Jeff Layton	58dd438557	ceph: don't leak snap_rwsem in handle_cap_grant When handle_cap_grant is called on an IMPORT op, then the snap_rwsem is held and the function is expected to release it before returning. It currently fails to do that in all cases which could lead to a deadlock. Fixes: `6f05b30ea0` ("ceph: reset i_requested_max_size if file write is not wanted") Link: https://tracker.ceph.com/issues/55857 Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Luís Henriques <lhenriques@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Luís Henriques	d93231a6bc	ceph: prevent a client from exceeding the MDS maximum xattr size The MDS tries to enforce a limit on the total key/values in extended attributes. However, this limit is enforced only if doing a synchronous operation (MDS_OP_SETXATTR) -- if we're buffering the xattrs, the MDS doesn't have a chance to enforce these limits. This patch adds support for decoding the xattrs maximum size setting that is distributed in the mdsmap. Then, when setting an xattr, the kernel client will revert to do a synchronous operation if that maximum size is exceeded. While there, fix a dout() that would trigger a printk warning: [ 98.718078] ------------[ cut here ]------------ [ 98.719012] precision 65536 too large [ 98.719039] WARNING: CPU: 1 PID: 3755 at lib/vsprintf.c:2703 vsnprintf+0x5e3/0x600 ... Link: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Xiubo Li	8266c4d7a7	ceph: choose auth MDS for getxattr with the Xs caps And for the 'Xs' caps for getxattr we will also choose the auth MDS, because the MDS side code is buggy due to setxattr won't notify the replica MDSes when the values changed and the replica MDS will return the old values. Though we will fix it in MDS code, but this still makes sense for old ceph. Link: https://tracker.ceph.com/issues/55331 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Xiubo Li	300e42a2e7	ceph: add session already open notify support If the connection was accidently closed due to the socket issue or something else the clients will try to open the opened sessions, the MDSes will send the session open reply one more time if the clients support the notify feature. When the clients retry to open the sessions the s_seq will be 0 as default, we need to update it anyway. Link: https://tracker.ceph.com/issues/53911 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Xiubo Li	4868e537fa	ceph: wait for the first reply of inflight async unlink In async unlink case the kclient won't wait for the first reply from MDS and just drop all the links and unhash the dentry and then succeeds immediately. For any new create/link/rename,etc requests followed by using the same file names we must wait for the first reply of the inflight unlink request, or the MDS possibly will fail these following requests with -EEXIST if the inflight async unlink request was delayed for some reasons. And the worst case is that for the none async openc request it will successfully open the file if the CDentry hasn't been unlinked yet, but later the previous delayed async unlink request will remove the CDenty. That means the just created file is possiblly deleted later by accident. We need to wait for the inflight async unlink requests to finish when creating new files/directories by using the same file names. Link: https://tracker.ceph.com/issues/55332 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Xiubo Li	4f48d5da81	fs/dcache: export d_same_name() helper Compare dentry name with case-exact name, return true if names are same, or false. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Xiubo Li	7c2e3d9194	ceph: remove useless CEPHFS_FEATURES_CLIENT_REQUIRED This macro was added but never be used. And check the ceph code there has another CEPHFS_FEATURES_MDS_REQUIRED but always be empty. We should clean up all this related code, which make no sense but introducing confusion. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Luís Henriques <lhenriques@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Luís Henriques	fea013e020	ceph: use correct index when encoding client supported features Feature bits have to be encoded into the correct locations. This hasn't been an issue so far because the only hole in the feature bits was in bit 10 (CEPHFS_FEATURE_RECLAIM_CLIENT), which is located in the 2nd byte. When adding more bits that go beyond the this 2nd byte, the bug will show up. [xiubli: remove incorrect comment for CEPHFS_FEATURES_CLIENT_SUPPORTED] Fixes: `9ba1e22453` ("ceph: allocate the correct amount of extra bytes for the session features") Signed-off-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Jeff Layton	637fa738b5	fscrypt: add fscrypt_context_for_new_inode Most filesystems just call fscrypt_set_context on new inodes, which usually causes a setxattr. That's a bit late for ceph, which can send along a full set of attributes with the create request. Doing so allows it to avoid race windows that where the new inode could be seen by other clients without the crypto context attached. It also avoids the separate round trip to the server. Refactor the fscrypt code a bit to allow us to create a new crypto context, attach it to the inode, and write it to the buffer, but without calling set_context on it. ceph can later use this to marshal the context into the attributes we send along with the create request. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:11 +02:00
Jeff Layton	d3e94fdc4e	fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size For ceph, we want to use our own scheme for handling filenames that are are longer than NAME_MAX after encryption and Base64 encoding. This allows us to have a consistent view of the encrypted filenames for clients that don't support fscrypt and clients that do but that don't have the key. Currently, fs/crypto only supports encrypting filenames using fscrypt_setup_filename, but that also handles encoding nokey names. Ceph can't use that because it handles nokey names in a different way. Export fscrypt_fname_encrypt. Rename fscrypt_fname_encrypted_size to __fscrypt_fname_encrypted_size and add a new wrapper called fscrypt_fname_encrypted_size that takes an inode argument rather than a pointer to a fscrypt_policy union. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:11 +02:00
Jeff Layton	18cc912b8a	fs: change test in inode_insert5 for adding to the sb list inode_insert5 currently looks at I_CREATING to decide whether to insert the inode into the sb list. This test is a bit ambiguous, as I_CREATING state is not directly related to that list. This test is also problematic for some upcoming ceph changes to add fscrypt support. We need to be able to allocate an inode using new_inode and insert it into the hash later iff we end up using it, and doing that now means that we double add it and corrupt the list. What we really want to know in this test is whether the inode is already in its superblock list, and then add it if it isn't. Have it test for list_empty instead and ensure that we always initialize the list by doing it in inode_init_once. It's only ever removed from the list with list_del_init, so that should be sufficient. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:11 +02:00
Linus Torvalds	569bede0cf	fsverity update for 5.20 Just a small documentation update to mention the btrfs support. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCYumAiBQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK/pjAQDJbkG6S1eEdhC3m6oHlSToiy2p0FDH +qr4fQndCO0l+QEAgo3ULXvbCKlLPOQHM2gVjnUR+UUHnjJ3p2F5aODsfQ4= =UMFK -----END PGP SIGNATURE----- Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fsverity update from Eric Biggers: "Just a small documentation update to mention the btrfs support" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fs-verity: mention btrfs support	2022-08-02 15:24:36 -07:00
Linus Torvalds	d7b767b508	execve updates for v5.20-rc1 - Allow unsharing time namespace on vfork+exec (Andrei Vagin) - Replace usage of deprecated kmap APIs (Fabio M. De Francesco) - Fix spelling mistake (Zhang Jiaming) -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmLoDyAWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJh0mEACL07hj3eT3rWg6ohZx9sCTcAjY /tG+zxLQ7xu717nM1a4j7CI5kdNNpYbsCqG71ikDDRrOCeEutu7M8zE1emctjtHv oh853D6BKhV2Hvsiuk1oM2ZHR1bmgiW1eFNAJcCLz6rE6wYu564R0wYJV0h418fH Rjk+Y989A7Srs9t/9GQSktjX3Q039/PG28avhA5q144/ZNycr5FnLFOf4RlmzEUz 7E8TfGsftX8eRAfxW/dPiWuIKMuYPLqspca9pT3aFj3ze2qKnldjNV3c9M5ajL5Q q7KKWeWzunKyYHMaRzIxkHyhs396ZGKFN2PbcNYyml+NBItyc3fCHishMF7bW0Vb nyZbmYJslBloYmrSJYgqCfxyjUuhe0cMMk9iMzDVp6ROwtLgFFLwfwunM6RwRmnr dAmM8QGwSE3qYLhVnLEcRqpgdXzVd+S0TGhB5k5AyI3628/mLxhE66/eWq0X8QF5 los5zku1GagMkylt6SOGb3TME4JZe6ZdZpU4fe/ilM22qw852xgbF3+6Zap6IBbD AdzXVCHyU/obORfIxx5KTF213m4KpkWBBi3N1/vVlxIAFAUy1WdXDM1o2RPMD7hw DeHe8sgfTZxLmSqfWLuX+3qC94IvrbDPFaRCIMj1QNK0ltM8I9oHRPcUFyZMaV0O xHN/5QtmgVDfKA3mTw== =82SS -----END PGP SIGNATURE----- Merge tag 'execve-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: - Allow unsharing time namespace on vfork+exec (Andrei Vagin) - Replace usage of deprecated kmap APIs (Fabio M. De Francesco) - Fix spelling mistake (Zhang Jiaming) * tag 'execve-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: exec: Call kmap_local_page() in copy_string_kernel() exec: Fix a spelling mistake selftests/timens: add a test for vfork+exit fs/exec: allow to unshare a time namespace on vfork+exec	2022-08-02 14:36:19 -07:00
Linus Torvalds	ddd1949f58	pstore updates for v5.20-rc1 - Migrate to modern acomp crypto interface (Ard Biesheuvel) - Use better return type for "rcnt" (Dan Carpenter) -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmLoDWAWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJmSWD/9BKLN+f7kpYfGKk0q9nnceo1CU q9uFHIW6eqFcInHTGIzc68iig1SbtQePjhrXafLez8ZKeX/L3yy/G4R4+vKj2XDA Lkjd9Xrj1d1nhh/mu/SSkDlsIUXNkcxbpfQX7qUXPfNc7Wln7r+aWqHFT5cziDYf G58TOlmn4WsdGWyhadsBxcPE9UA7eGsoqya+PD/bKgDCqp6ArhzZd3bFK/OLkSxF xR2n26sS2Qfg7ApbTrBSyG4lJkuIxZILFF8a8CsmYj78xCGSrwPlOFRHW5oKHCdI JEQ3CR/I3HofP1ajQCti07XT/dDbwzQaHkF/sjZmhAh9OnPXClGxYlczT+Cw9ffS R6o/XWldz8KgG9m3K6j1NmkBdwF0zukznNRrEBaeAONAbquG36b8fcH9GTRWBRAS x57bTHOCws9RLvjCGoxdEOYzs2Sm/23lUuF7nv0wRaDmlH+6eG6LU26ZBJpC1ple xdvnJxUI6tU5ojSUbaqQ4jyel/H5NZTb0l32gAmh5nZy/2U7jlLyeGFg814Hz9Jy ga8IWfumYezYMZfY5zTnmzhxF5mcLB34c/B5Qkb5SgER/WwnrTkRuJAXfbPE1j5y h8360yT5HPg5WSUe+N7S78LFqMWscu4iEPMdXbnKbLLp0Ytzz6xTlKi++teOfVle PI2xO97HN6oLA02IBw== =jJ4i -----END PGP SIGNATURE----- Merge tag 'pstore-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: - Migrate to modern acomp crypto interface (Ard Biesheuvel) - Use better return type for "rcnt" (Dan Carpenter) * tag 'pstore-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/zone: cleanup "rcnt" type pstore: migrate to crypto acomp interface	2022-08-02 14:31:46 -07:00
Linus Torvalds	c013d0af81	for-5.20/block-2022-07-29 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLko3gQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpmQaD/90NKFj4v8I456TUQyg1jimXEsL+e84E6o2 ALWVb6JzQvlPVQXNLnK5YKIunMWOTtTMz0nyB8sVRwVJVJO0P5d7QopAkZM8fkyU MK5OCzoryENw4DTc2wJS4in6cSbGylIuN74wMzlf7+M67JTImfoZQhbTMcjwzZfn b3OlL6sID7zMXwGcuOJPZyUJICCpDhzdSF9JXqKma5PQuG2SBmQyvFxJAcsoFBPc YetnoRIOIN6yBvsIZaPaYq7XI9MIvF0e67EQtyCEHj4tHpyVnyDWkeObVFULsISU gGEKbkYPvNUzRAU5Q1NBBHh1tTfkf/MaUxTuZwoEwZ/s04IGBGMmrZGyfvdfzYo6 M7NwSEg/TrUSNfTwn65mQi7uOXu1pGkJrqz84Flm8u9Qid9Vd7LExLG5p/ggnWdH 5th93MDEmtEg29e9DXpEAuS5d0t3TtSvosflaKpyfNNfr+P0rWCN6GM/uW62VUTK ls69SQh/AQJRbg64jU4xper6WhaYtSXK7TKEnxJycoEn9gYNyCcdot2uekth0xRH ChHGmRlteiqe/y4uFWn/2dcxWjoleiHbFjTaiRL75WVl8wIDEjw02LGuoZ61Ss9H WOV+MT7KqNjBGe6lreUY+O/PO02dzmoR6heJXN19p8zr/pBuLCTGX7UpO7rzgaBR 4N1HEozvIw== =celk -----END PGP SIGNATURE----- Merge tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: - Improve the type checking of request flags (Bart) - Ensure queue mapping for a single queues always picks the right queue (Bart) - Sanitize the io priority handling (Jan) - rq-qos race fix (Jinke) - Reserved tags handling improvements (John) - Separate memory alignment from file/disk offset aligment for O_DIRECT (Keith) - Add new ublk driver, userspace block driver using io_uring for communication with the userspace backend (Ming) - Use try_cmpxchg() to cleanup the code in various spots (Uros) - Finally remove bdevname() (Christoph) - Clean up the zoned device handling (Christoph) - Clean up independent access range support (Christoph) - Clean up and improve block sysfs handling (Christoph) - Clean up and improve teardown of block devices. This turns the usual two step process into something that is simpler to implement and handle in block drivers (Christoph) - Clean up chunk size handling (Christoph) - Misc cleanups and fixes (Bart, Bo, Dan, GuoYong, Jason, Keith, Liu, Ming, Sebastian, Yang, Ying) * tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block: (178 commits) ublk_drv: fix double shift bug ublk_drv: make sure that correct flags(features) returned to userspace ublk_drv: fix error handling of ublk_add_dev ublk_drv: fix lockdep warning block: remove __blk_get_queue block: call blk_mq_exit_queue from disk_release for never added disks blk-mq: fix error handling in __blk_mq_alloc_disk ublk: defer disk allocation ublk: rewrite ublk_ctrl_get_queue_affinity to not rely on hctx->cpumask ublk: fold __ublk_create_dev into ublk_ctrl_add_dev ublk: cleanup ublk_ctrl_uring_cmd ublk: simplify ublk_ch_open and ublk_ch_release ublk: remove the empty open and release block device operations ublk: remove UBLK_IO_F_PREFLUSH ublk: add a MAINTAINERS entry block: don't allow the same type rq_qos add more than once mmc: fix disk/queue leak in case of adding disk failure ublk_drv: fix an IS_ERR() vs NULL check ublk: remove UBLK_IO_F_INTEGRITY ublk_drv: remove unneeded semicolon ...	2022-08-02 13:46:35 -07:00
Linus Torvalds	98e2474640	for-5.20/io_uring-buffered-writes-2022-07-29 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLkm7UQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpldTEADTg/96R+eq78UZBNZmdifY9/qwQD+kzNiK ACDoYZFSbWUjMeOWqRxbYr6mXBKHnGHyTGlraTTpLDzhpB1xwoWfgOK9uOYXW/Ik eWfgTujPW/8v/l/z86khE+GH9b/maGCRqNZgS6uLVLzhxG6oCkoYTyOh1iHaF1VM Rma4nbJ8GSEDtiXNDl0Bznnyks/pzwoz/9slwzZ7PxtFwZsBxKuxgMUR5HIXdRp7 5iUoFJhZrGWyi/dbQZUsK/9VYVVnKkcBCz2pb4GEmC+3dS/vlPEoeWUpPHInNyd1 9NB9v8c+KFmQaWnCxuxcdHvCfmRRQrX8Pr8/OBNZKO6McYrKWKA+lurp4EGClE3m cZdK+P/9FS/Eeua8hum9UnbPAqsJPqLTbpbrySeBdd4iFA6u7rRqDX2+nz3PNe9U 1b7V1bWBIEY/Rsw/PKo59oIeV0auD8v9OCHJ0lF2pv6dRln2/W0y1Qfd1DI18xFG +9bBnQzhF7R0O8UP5ApVayQCYrd906YsSVUOqAiLmUs/BoOgRq6g/0BqSOVVKE2u 5iq8zTsVMkxY0ZpExwZST/700JwkPIV4SVPEYRC6QssFTcylvlisIek6XYSS9HX4 Z6gzMwJW1H47bEfG4JolTI8uBjp0hQLCPX0O0XFLVnbHQwN0kjIBmv3axAwJO2NV qrrHXjf09w== =hV7G -----END PGP SIGNATURE----- Merge tag 'for-5.20/io_uring-buffered-writes-2022-07-29' of git://git.kernel.dk/linux-block Pull io_uring buffered writes support from Jens Axboe: "This contains support for buffered writes, specifically for XFS. btrfs is in progress, will be coming in the next release. io_uring does support buffered writes on any file type, but since the buffered write path just always -EAGAIN (or -EOPNOTSUPP) any attempt to do so if IOCB_NOWAIT is set, any buffered write will effectively be handled by io-wq offload. This isn't very efficient, and we even have specific code in io-wq to serialize buffered writes to the same inode to avoid further inefficiencies with thread offload. This is particularly sad since most buffered writes don't block, they simply copy data to a page and dirty it. With this pull request, we can handle buffered writes a lot more effiently. If balance_dirty_pages() needs to block, we back off on writes as indicated. This improves buffered write support by 2-3x. Jan Kara helped with the mm bits for this, and Stefan handled the fs/iomap/xfs/io_uring parts of it" * tag 'for-5.20/io_uring-buffered-writes-2022-07-29' of git://git.kernel.dk/linux-block: mm: honor FGP_NOWAIT for page cache page allocation xfs: Add async buffered write support xfs: Specify lockmode when calling xfs_ilock_for_iomap() io_uring: Add tracepoint for short writes io_uring: fix issue with io_write() not always undoing sb_start_write() io_uring: Add support for async buffered writes fs: Add async write file modification handling. fs: Split off inode_needs_update_time and __file_update_time fs: add __remove_file_privs() with flags parameter fs: add a FMODE_BUF_WASYNC flags for f_mode iomap: Return -EAGAIN from iomap_write_iter() iomap: Add async buffered write support iomap: Add flags parameter to iomap_page_create() mm: Add balance_dirty_pages_ratelimited_flags() function mm: Move updates of dirty_exceeded into one place mm: Move starting of background writeback into the main balancing loop	2022-08-02 13:27:23 -07:00
Linus Torvalds	b349b1181d	for-5.20/io_uring-2022-07-29 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLkm5gQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpmKMD/4l3QIrLbjYIxlfrzQcHbmYuUkbQtj3SbZg 6ejbnGVhCs1P9DdXH8MgE2BxgpiXQE0CqOK7vbSoo5ep2n2UTLI2DIxAl74SMIo7 0wmJXtUJySuViKr3NYVHqlN180MkQYddBz0nGElhkQBPBCMhW8CrtPCeURr/YyHp 2RxSYBXiUx2gRyig+klnp6oPEqelcBZJUyNHdA9yVrgl/RhB/t2rKj7D++8ukQM3 Zuyh8WIkTeTfUz9hdGG7fuCEdZN4DlO2CCEc7uy0cKi6VRCKH4hYUCqClJ+/cfd2 43dUI2O7B6D1t/ObFh8AGIDXBDqVA6ePQohQU6gooRkfQiBPKkc9d0ts4yIhRqca AjkzNM+0Eve3A01loJ8J84w8oZnvNpYEv5n8/sZVLWcyU3UIs0I88nC2OBiFtoRq d77CtFLwOTo+r3STtAhnZOqez90rhS6BqKtqlUP346PCuFItl6/MbGtwdTbLYEFj CVNIb2pERWSr2NxGv4lFyXaX/cRwruxojWH7yc3rRYjr4Ykevd1pe/fMGNiMAnKw 5em/3QU3qq0ZVcXLMihksKeHHFIQwGDRMuyuv/fktV10+yYXQ0t16WzkJT3aR8Xo cqs0r8+6Jnj3uYcOMzj/FoLcpEPr21hnwAtzLto1mG1Wh4JRn/D7Nx5zqxPLxcW+ NiU6VihPOw== =gxeV -----END PGP SIGNATURE----- Merge tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block Pull io_uring updates from Jens Axboe: - As per (valid) complaint in the last merge window, fs/io_uring.c has grown quite large these days. io_uring isn't really tied to fs either, as it supports a wide variety of functionality outside of that. Move the code to io_uring/ and split it into files that either implement a specific request type, and split some code into helpers as well. The code is organized a lot better like this, and io_uring.c is now < 4K LOC (me). - Deprecate the epoll_ctl opcode. It'll still work, just trigger a warning once if used. If we don't get any complaints on this, and I don't expect any, then we can fully remove it in a future release (me). - Improve the cancel hash locking (Hao) - kbuf cleanups (Hao) - Efficiency improvements to the task_work handling (Dylan, Pavel) - Provided buffer improvements (Dylan) - Add support for recv/recvmsg multishot support. This is similar to the accept (or poll) support for have for multishot, where a single SQE can trigger everytime data is received. For applications that expect to do more than a few receives on an instantiated socket, this greatly improves efficiency (Dylan). - Efficiency improvements for poll handling (Pavel) - Poll cancelation improvements (Pavel) - Allow specifiying a range for direct descriptor allocations (Pavel) - Cleanup the cqe32 handling (Pavel) - Move io_uring types to greatly cleanup the tracing (Pavel) - Tons of great code cleanups and improvements (Pavel) - Add a way to do sync cancelations rather than through the sqe -> cqe interface, as that's a lot easier to use for some use cases (me). - Add support to IORING_OP_MSG_RING for sending direct descriptors to a different ring. This avoids the usually problematic SCM case, as we disallow those. (me) - Make the per-command alloc cache we use for apoll generic, place limits on it, and use it for netmsg as well (me). - Various cleanups (me, Michal, Gustavo, Uros) * tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block: (172 commits) io_uring: ensure REQ_F_ISREG is set async offload net: fix compat pointer in get_compat_msghdr() io_uring: Don't require reinitable percpu_ref io_uring: fix types in io_recvmsg_multishot_overflow io_uring: Use atomic_long_try_cmpxchg in __io_account_mem io_uring: support multishot in recvmsg net: copy from user before calling __get_compat_msghdr net: copy from user before calling __copy_msghdr io_uring: support 0 length iov in buffer select in compat io_uring: fix multishot ending when not polled io_uring: add netmsg cache io_uring: impose max limit on apoll cache io_uring: add abstraction around apoll cache io_uring: move apoll cache to poll.c io_uring: consolidate hash_locked io-wq handling io_uring: clear REQ_F_HASH_LOCKED on hash removal io_uring: don't race double poll setting REQ_F_ASYNC_DATA io_uring: don't miss setting REQ_F_DOUBLE_POLL io_uring: disable multishot recvmsg io_uring: only trace one of complete or overflow ...	2022-08-02 13:20:44 -07:00
Trond Myklebust	2135e5d562	NFSv4/pnfs: Fix a use-after-free bug in open If someone cancels the open RPC call, then we must not try to free either the open slot or the layoutget operation arguments, since they are likely still in use by the hung RPC call. Fixes: `6949493884` ("NFSv4: Don't hold the layoutget locks across multiple RPC calls") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-08-02 16:04:29 -04:00
Trond Myklebust	b1a28f2eb9	NFS: nfs_async_write_reschedule_io must not recurse into the writeback code It is not safe to call filemap_fdatawrite_range() from nfs_async_write_reschedule_io(), since we're often calling from a page reclaim context. Just let fsync() redrive the writeback for us. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-08-02 16:03:12 -04:00
David Howells	2757a4dc18	afs: Fix access after dec in put functions Reference-putting functions should not access the object being put after decrementing the refcount unless they reduce the refcount to zero. Fix a couple of instances of this in afs by copying the information to be logged by tracepoint to local variables before doing the decrement. [Fixed a bit in afs_put_server() that I'd missed but Marc caught] Fixes: `341f741f04` ("afs: Refcount the afs_call struct") Fixes: `4521819369` ("afs: Trace afs_server usage") Fixes: `977e5f8ed0` ("afs: Split the usage count on struct afs_server") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/165911278430.3745403.16526310736054780645.stgit@warthog.procyon.org.uk/ # v1	2022-08-02 18:21:29 +01:00
David Howells	c56f9ec8b2	afs: Use refcount_t rather than atomic_t Use refcount_t rather than atomic_t in afs to make use of the count checking facilities provided. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/165911277768.3745403.423349776836296452.stgit@warthog.procyon.org.uk/ # v1	2022-08-02 18:10:11 +01:00
Christoph Hellwig	cf5e7a6521	fs: remove the NULL get_block case in mpage_writepages No one calls mpage_writepages with a NULL get_block paramter, so remove support for that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:04 -04:00
Christoph Hellwig	f2d3e573bf	fs: don't call ->writepage from __mpage_writepage All callers of mpage_writepage use block_write_full_page as their ->writepage implementation when called from mpage_writepages (although for ntfs3 this is obsfucated a bit). Just call block_write_full_page directly instead of going through the ->writepage indirection. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:04 -04:00
Christoph Hellwig	cc9cf350d1	fs: remove the nobh helpers All callers are gone, so remove the now dead code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:04 -04:00
Christoph Hellwig	002cbb1356	jfs: stop using the nobh helper The nobh mode is an obscure feature to save lowlevel for large memory 32-bit configurations while trading for much slower performance and has been long obsolete. Switch to the regular buffer head based helpers instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:04 -04:00
Christoph Hellwig	0cc5b4ce7a	ext2: remove nobh support The nobh mode is an obscure feature to save lowlevel for large memory 32-bit configurations while trading for much slower performance and has been long obsolete. Remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:04 -04:00
Christoph Hellwig	9139710148	ntfs3: refactor ntfs_writepages Handle the resident case with an explicit generic_writepages call instead of using the obscure overload that makes mpage_writepages with a NULL get_block do the same thing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:04 -04:00
Matthew Wilcox (Oracle)	b890ec2a2c	hugetlb: Convert to migrate_folio This involves converting migrate_huge_page_move_mapping(). We also need a folio variant of hugetlb_set_page_subpool(), but that's for a later patch. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>	2022-08-02 12:34:04 -04:00
Matthew Wilcox (Oracle)	3648951ceb	aio: Convert to migrate_folio Use a folio throughout this function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2022-08-02 12:34:04 -04:00
Matthew Wilcox (Oracle)	1d5b9bd656	f2fs: Convert to filemap_migrate_folio() filemap_migrate_folio() fits f2fs's needs perfectly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Chao Yu <chao@kernel.org>	2022-08-02 12:34:04 -04:00
Matthew Wilcox (Oracle)	e7b15bae55	ubifs: Convert to filemap_migrate_folio() filemap_migrate_folio() is a little more general than ubifs really needs, but it's better to share the code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:04 -04:00
Matthew Wilcox (Oracle)	e7a60a1787	btrfs: Convert btrfs_migratepage to migrate_folio Use filemap_migrate_folio() to do the bulk of the work, and then copy the ordered flag across if needed. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David Sterba <dsterba@suse.com>	2022-08-02 12:34:04 -04:00
Matthew Wilcox (Oracle)	2ec810d596	mm/migrate: Add filemap_migrate_folio() There is nothing iomap-specific about iomap_migratepage(), and it fits a pattern used by several other filesystems, so move it to mm/migrate.c, convert it to be filemap_migrate_folio() and convert the iomap filesystems to use it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2022-08-02 12:34:04 -04:00
Matthew Wilcox (Oracle)	541846502f	mm/migrate: Convert migrate_page() to migrate_folio() Convert all callers to pass a folio. Most have the folio already available. Switch all users from aops->migratepage to aops->migrate_folio. Also turn the documentation into kerneldoc. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David Sterba <dsterba@suse.com>	2022-08-02 12:34:04 -04:00
Matthew Wilcox (Oracle)	4ae84a8047	nfs: Convert to migrate_folio Use a folio throughout this function. migrate_page() will be converted later. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2022-08-02 12:34:03 -04:00
Matthew Wilcox (Oracle)	8958b55142	btrfs: Convert btree_migratepage to migrate_folio Use a folio throughout this function. migrate_page() will be converted later. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David Sterba <dsterba@suse.com>	2022-08-02 12:34:03 -04:00
Matthew Wilcox (Oracle)	67235182a4	mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio() Use a folio throughout __buffer_migrate_folio(), add kernel-doc for buffer_migrate_folio() and buffer_migrate_folio_norefs(), move their declarations to buffer.h and switch all filesystems that have wired them up. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2022-08-02 12:34:03 -04:00
Matthew Wilcox (Oracle)	37ce0b319b	ext2: Use a folio in ext2_get_page() Remove a call to read_mapping_page(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:03 -04:00
Matthew Wilcox (Oracle)	240159077d	gfs2: Convert gfs2_jhead_process_page() to use a folio Use folio_put_refs() to perform only one atomic operation instead of two. The other changes are straightforward conversions from page APIs to their folio equivalents. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:03 -04:00
Matthew Wilcox (Oracle)	9bb88987bc	ocfs2: Convert ocfs2_read_folio() to use a folio Use the folio API throughout. There are a few places where we convert back to a page to call into the rest of the filesystem, so folio usage needs to be pushed down to those functions later. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:03 -04:00
Matthew Wilcox (Oracle)	36a43502e1	freevxfs: Convert vxfs_immed_read_folio() to use a folio Reorganise the file to remove the forward declaration. Use folios throughout vxfs_immed_read_folio(). Use memcpy_to_page() instead of an open-coded kmap()/kunmap(). Remove flush_dcache_page() as this is embedded in memcpy_to_page(). Use folio_pos() instead of opencoding it. Handle multi-page folios. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:03 -04:00
Matthew Wilcox (Oracle)	9a0a953323	coda: Convert coda_symlink_filler() to use a folio This is a straightforward conversion from the page APIs to the folio APIs. Symlinks are not allowed to be larger than PAGE_SIZE, so there is little work to do here. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:03 -04:00
Matthew Wilcox (Oracle)	ac09d88b9f	befs: Convert befs_symlink_read_folio() to use a folio This is a straightforward conversion from the page APIs to the folio APIs. Symlinks are not allowed to be larger than PAGE_SIZE, so there is little work to do here. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:03 -04:00
Matthew Wilcox (Oracle)	cf948cbc35	cramfs: read_mapping_page() is synchronous Since commit `67f9fd91f9`, the code to wait for the read to complete has been dead. That commit wrongly stated that the read was synchronous already; this seems to have been a confusion about which ->readpage operation was being called. Instead of reintroducing an asynchronous version of read_mapping_page(), call the readahead code directly to submit all reads first before waiting for them in read_mapping_page(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:02 -04:00
Matthew Wilcox (Oracle)	97a3a383c4	ocfs2: Use filemap_write_and_wait_range() in ocfs2_cow_sync_writeback() Remove the open-coding of filemap_fdatawait_range(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:02 -04:00
Matthew Wilcox (Oracle)	e775dfb33d	hostfs: Handle page write errors correctly If a page can't be written back, we need to call mapping_set_error(), not clear the page's Uptodate flag. Also remove the clearing of PageError on success; that flag is used for read errors, not write errors. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:02 -04:00
Matthew Wilcox (Oracle)	31e748e4b1	squashfs: Return the actual error from squashfs_read_folio() Since we actually know what error happened, we can report it instead of having the generic code return -EIO for pages that were unlocked without being marked uptodate. Also remove a test of PageError since we have the return value at this point. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:02 -04:00
Matthew Wilcox (Oracle)	b7a6eb22ba	buffer: Don't test folio error in block_read_full_folio() We can cache this information in a local variable instead of communicating from one part of the function to another via folio flags. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-08-02 12:34:02 -04:00
William Dean	4f1196288d	ovl: fix spelling mistakes fix follow spelling misktakes: decendant ==> descendant indentify ==> identify underlaying ==> underlying Reported-by: Hacash Robot <hacashRobot@santino.com> Signed-off-by: William Dean <williamsukatube@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2022-08-02 15:41:10 +02:00
David Sterba	5abbb7b928	affs: use memcpy_to_page and remove replace kmap_atomic() The use of kmap() is being deprecated in favor of kmap_local_page() where it is feasible. For kmap around a memcpy there's a convenience helper memcpy_to_page that also makes the flush_dcache_page() redundant. CC: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-08-01 19:53:31 +02:00
Linus Torvalds	0fac198def	fs.idmapped.overlay.acl.v5.20 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYufiMwAKCRCRxhvAZXjc os2iAQDr3tK9e2EUZDZ3Vgu3tvmTLKiU7W7f4U/ZAjJE5snBOwD+OqK8r1RdvXf8 TatkVFFNZYlINDN6JrS5yGSKBm1+RwE= =8eZE -----END PGP SIGNATURE----- Merge tag 'fs.idmapped.overlay.acl.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull acl updates from Christian Brauner: "Last cycle we introduced support for mounting overlayfs on top of idmapped mounts. While looking into additional testing we realized that posix acls don't really work correctly with stacking filesystems on top of idmapped layers. We already knew what the fix were but it would require work that is more suitable for the merge window so we turned off posix acls for v5.19 for overlayfs on top of idmapped layers with Miklos routing my patch upstream in `72a8e05d4f` ("Merge tag 'ovl-fixes-5.19-rc7' [..]"). This contains the work to support posix acls for overlayfs on top of idmapped layers. Since the posix acl fixes should use the new vfs{g,u}id_t work the associated branch has been merged in. (We sent a pull request for this earlier.) We've also pulled in Miklos pull request containing my patch to turn of posix acls on top of idmapped layers. This allowed us to avoid rebasing the branch which we didn't like because we were already at rc7 by then. Merging it in allows this branch to first fix posix acls and then to cleanly revert the temporary fix it brought in by commit `4a47c6385b` ("ovl: turn of SB_POSIXACL with idmapped layers temporarily"). The last patch in this series adds Seth Forshee as a co-maintainer for idmapped mounts. Seth has been integral to all of this work and is also the main architect behind the filesystem idmapping work which ultimately made filesystems such as FUSE and overlayfs available in containers. He continues to be active in both development and review. I'm very happy he decided to help and he has my full trust. This increases the bus factor which is always great for work like this. I'm honestly very excited about this because I think in general we don't do great in the bringing on new maintainers department" For more explanations of the ACL issues, see https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org/ * tag 'fs.idmapped.overlay.acl.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: Add Seth Forshee as co-maintainer for idmapped mounts Revert "ovl: turn of SB_POSIXACL with idmapped layers temporarily" ovl: handle idmappings in ovl_get_acl() acl: make posix_acl_clone() available to overlayfs acl: port to vfs{g,u}id_t acl: move idmapped mount fixup into vfs_{g,s}etxattr() mnt_idmapping: add vfs[g,u]id_into_k[g,u]id()	2022-08-01 09:10:07 -07:00
Linus Torvalds	bdfae5ce38	fs.idmapped.vfsuid.v5.20 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYufP6AAKCRCRxhvAZXjc omzRAQCGJ11r7T0C7t1kTdQiFSs5XN9ksFa86Hfj3dHEBIj+LQEA+bZ2/LLpElDz zPekgXkFQqdMr+FUL8sk94dzHT0GAgk= =BcK/ -----END PGP SIGNATURE----- Merge tag 'fs.idmapped.vfsuid.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull fs idmapping updates from Christian Brauner: "This introduces the new vfs{g,u}id_t types we agreed on. Similar to k{g,u}id_t the new types are just simple wrapper structs around regular {g,u}id_t types. They allow to establish a type safety boundary in the VFS for idmapped mounts preventing confusion betwen {g,u}ids mapped into an idmapped mount and {g,u}ids mapped into the caller's or the filesystem's idmapping. An initial set of helpers is introduced that allows to operate on vfs{g,u}id_t types. We will remove all references to non-type safe idmapped mounts helpers in the very near future. The patches do already exist. This converts the core attribute changing codepaths which become significantly easier to reason about because of this change. Just a few highlights here as the patches give detailed overviews of what is happening in the commit messages: - The kernel internal struct iattr contains type safe vfs{g,u}id_t values clearly communicating that these values have to take a given mount's idmapping into account. - The ownership values placed in struct iattr to change ownership are identical for idmapped and non-idmapped mounts going forward. This also allows to simplify stacking filesystems such as overlayfs that change attributes In other words, they always represent the values. - Instead of open coding checks for whether ownership changes have been requested and an actual update of the inode is required we now have small static inline wrappers that abstract this logic away removing a lot of code duplication from individual filesystems that all open-coded the same checks" * tag 'fs.idmapped.vfsuid.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: mnt_idmapping: align kernel doc and parameter order mnt_idmapping: use new helpers in mapped_fs{g,u}id() fs: port HAS_UNMAPPED_ID() to vfs{g,u}id_t mnt_idmapping: return false when comparing two invalid ids attr: fix kernel doc attr: port attribute changes to new types security: pass down mount idmapping to setattr hook quota: port quota helpers mount ids fs: port to iattr ownership update helpers fs: introduce tiny iattr ownership update helpers fs: use mount types in iattr fs: add two type safe mapping helpers mnt_idmapping: add vfs{g,u}id_t	2022-08-01 08:56:55 -07:00
Linus Torvalds	e6a7cf70a3	File locking changes for v6.0 -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEES8DXskRxsqGE6vXTAA5oQRlWghUFAmLnshQTHGpsYXl0b25A a2VybmVsLm9yZwAKCRAADmhBGVaCFa1ZEACWzjP9gDRO+b5HuovRofO5gCfi0LNK jQAnUQmFBbV28MuRBr8lzjZFsn52C3nEz/unHpl2NXrg1dErdXmTZIUYkZIoESQl 0hyA2lhdm/pvqfj5t9xwt9lK9xts7G+Q1Q2JsT53QlpGd7q9VOq0CFrFTuIe+HmZ qw9Sy/3rfP/rPALv1OzIlGDdBuslfPuijJJZq0wYx4WupA6vlGGSZXn+LxF2dHW9 Ex/Z+n6o5mzEuPedopBBsCvdMTO2/sVmz33puqM0KBb/gmL47i15o1XXdg1O0cbL 7LxIDOfaIm6gFsznUwrJV54WrL8zISQd/BhXbQOrbE8kmnNii1kfIyJHYx55Sa4X y6TmqVbYERXIwCFquO78Uywt8UgjRjuxG8SRe0AmqsxvIn/IxTjqMn5yaMURCTxA uyOmXHxLss3Jf2LNfd6nnrK5qKpOnPOBAn8I/4UY+eNdJGqesLyKoVPZ9O6K1dr3 +jZJ8Ju4TVs7L3fljq6pHvbhAWivM3JEZmYrv+y8QKSRZBV0XqHagwDGHUaaLe9H 6eHgU5yxCb+fj8EXbwxzKnJkhHXJikd4bbPOaJC+QZEKPCJJMo/pyXmDkCVwhJ73 pjO4W0w6TGmCHinlVX6dkyYrCvWYjWglHyO5BnTY2F0Ub87/59KmepZz4dh81hi+ ZdOIvHoF6uca3A== =wDtt -----END PGP SIGNATURE----- Merge tag 'filelock-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "Just a couple of flock() patches from Kuniyuki Iwashima. The main change is that this moves a file_lock allocation from the slab to the stack" * tag 'filelock-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs/lock: Rearrange ops in flock syscall. fs/lock: Don't allocate file_lock in flock_make_lock().	2022-08-01 08:54:59 -07:00
Linus Torvalds	e88745dcfd	Changes since last update: - Add Yue Hu and Jeffle Xu as reviewers; - Add the missing wake_up when updating lzma streams; - Avoid consecutive detection for Highmem memory; - Prepare for multi-reference pclusters and get rid of PG_error; - Fix ctx->pos update for NFS export; - minor cleanups. -----BEGIN PGP SIGNATURE----- iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYub3chEceGlhbmdAa2Vy bmVsLm9yZwAKCRA5NzHcH7XmBMMSAP9P7kMPLuc0RP9AjoiQXKNAfWqIbGnbkI5C ACUUu5tZEgD/T7HhkDYIs/wAZzYB7qTkpepkY/XzuwnlodhaSTwnPQ8= =/vU1 -----END PGP SIGNATURE----- Merge tag 'erofs-for-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "First of all, we'd like to add Yue Hu and Jeffle Xu as two new reviewers. Thank them for spending time working on EROFS! There is no major feature outstanding in this cycle, mainly a patchset I worked on to prepare for rolling hash deduplication and folios for compressed data as the next big features. It kills the unneeded PG_error flag dependency as well. Apart from that, there are bugfixes and cleanups as always. Details are listed below: - Add Yue Hu and Jeffle Xu as reviewers - Add the missing wake_up when updating lzma streams - Avoid consecutive detection for Highmem memory - Prepare for multi-reference pclusters and get rid of PG_error - Fix ctx->pos update for NFS export - minor cleanups" * tag 'erofs-for-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (23 commits) erofs: update ctx->pos for every emitted dirent erofs: get rid of the leftover PAGE_SIZE in dir.c erofs: get rid of erofs_prepare_dio() helper erofs: introduce multi-reference pclusters (fully-referenced) erofs: record the longest decompressed size in this round erofs: introduce z_erofs_do_decompressed_bvec() erofs: try to leave (de)compressed_pages on stack if possible erofs: introduce struct z_erofs_decompress_backend erofs: get rid of `z_pagemap_global' erofs: clean up `enum z_erofs_collectmode' erofs: get rid of `enum z_erofs_page_type' erofs: rework online page handling erofs: switch compressed_pages[] to bufvec erofs: introduce `z_erofs_parse_in_bvecs' erofs: drop the old pagevec approach erofs: introduce bufvec to store decompressed buffers erofs: introduce `z_erofs_parse_out_bvecs()' erofs: clean up z_erofs_collector_begin() erofs: get rid of unneeded `inode', `map' and `sb' erofs: avoid consecutive detection for Highmem memory ...	2022-08-01 08:52:37 -07:00
Linus Torvalds	bec14d79f7	Merge tag 'fsnotify_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: - support for FAN_MARK_IGNORE which untangles some of the not well defined corner cases with fanotify ignore masks - small cleanups * tag 'fsnotify_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: Fix comment typo fanotify: introduce FAN_MARK_IGNORE fanotify: cleanups for fanotify_mark() input validations fanotify: prepare for setting event flags in ignore mask fs: inotify: Fix typo in inotify comment	2022-08-01 08:50:39 -07:00
Linus Torvalds	af07685b9c	Merge tag 'fs_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2 and reiserfs updates from Jan Kara: "A fix for ext2 handling of a corrupted fs image and cleanups in ext2 and reiserfs" * tag 'fs_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Add more validity checks for inode counts fs/reiserfs/inode: remove dead code in _get_block_create_0() fs/ext2: replace ternary operator with min_t()	2022-08-01 08:48:37 -07:00
Linus Torvalds	eb43bbac4c	dlm for 6.0 Changes in this set of commits: . Delay the cleanup of interrupted posix lock requests until the user space result arrives. Previously, the immediate cleanup would lead to extraneous warnings when the result arrived. . Tracepoint improvements, e.g. adding the lock resource name. . Delay the completion of lockspace creation until one full recovery cycle has completed. This allows more error cases to be returned to the caller. . Remove warnings from the locking layer about delayed network replies. The recently added midcomms warnings are much more useful. . Begin the process of deprecating two unused lock-timeout-related features. These features now require enabling via a Kconfig option, and enabling them triggers deprecation warnings. We expect to remove the code in v6.2. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJi5+RGAAoJEDgbc8f8gGmq1RwP/2xZaVKiTPYcQ0GfcmefCnnG 8WxpMNv4ZPkjKVv7csBA8mcQyNuQqA4yLb3P+jEgkWDOKesJQNeTvfrittXCyfhG C7uvbUe3OCg9m+dIrzNKBu+2WtFu6tKa3aSlpPUF3Bhhe8IhwRmAkyd/Ky0VCGr9 5jQWvy8D1p2pNoFsGKqhkfolqovmeTxgYtGxd/eHtiApo6tNwzbgcQAZw4vquCjk FSPO7s5HyINik0nQQ9b8MCjywmF6HG6UZjcd/qYHTUmcBZkgpegCKZRnYwQklnBD 6BYj6X+w7WxgVsHgYBAtgd8oLRN5CtCmPljvnPTCjvgx6N9FTl8RJV8rwMqZ9C8U 9+w7WosLxQFSyRm7KxHmKaatkOa3Baqg7cPXSwaZnsA3vBpitHWKs9cyDKwA0j3/ sUWZFw+3VSuf7AJkSA848tC8Xs8G6YXvZgzvxzNEvtTJgO3X7sXB2lavZDyI0S26 nwgXgs/Dt6QcOoQKGv8WgRSOMrFxtq/gX+f3gwPCHvM3panttPevXwKKQW2UtVOn u/BF3Oe9bGhf+J0o58Zp3gjtfDIz+c3yPkxeQqAc3pC/o1Lw7AMV2WxlxULoBLsv aErKwT0UemrQYRZnBmlGPaV4H1KyXzwC/fA1N8YAObJ/Ohe6x7oCKioWWMA4ggiD A4mOIY95o24rm++lNUkD =Dnn4 -----END PGP SIGNATURE----- Merge tag 'dlm-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: - Delay the cleanup of interrupted posix lock requests until the user space result arrives. Previously, the immediate cleanup would lead to extraneous warnings when the result arrived. - Tracepoint improvements, e.g. adding the lock resource name. - Delay the completion of lockspace creation until one full recovery cycle has completed. This allows more error cases to be returned to the caller. - Remove warnings from the locking layer about delayed network replies. The recently added midcomms warnings are much more useful. - Begin the process of deprecating two unused lock-timeout-related features. These features now require enabling via a Kconfig option, and enabling them triggers deprecation warnings. We expect to remove the code in v6.2. * tag 'dlm-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: fs: dlm: move kref_put assert for lkb structs fs: dlm: don't use deprecated timeout features by default fs: dlm: add deprecation Kconfig and warnings for timeouts fs: dlm: remove timeout from dlm_user_adopt_orphan fs: dlm: remove waiter warnings fs: dlm: fix grammar in lowcomms output fs: dlm: add comment about lkb IFL flags fs: dlm: handle recovery result outside of ls_recover fs: dlm: make new_lockspace() wait until recovery completes fs: dlm: call dlm_lsop_recover_prep once fs: dlm: update comments about recovery and membership handling fs: dlm: add resource name to tracepoints fs: dlm: remove additional dereference of lksb fs: dlm: change ast and bast trace order fs: dlm: change posix lock sigint handling fs: dlm: use dlm_plock_info for do_unlock_close fs: dlm: change plock interrupted message to debug again fs: dlm: add pid to debug log fs: dlm: plock use list_first_entry	2022-08-01 08:46:53 -07:00
Alexander Aring	9585898922	fs: dlm: move kref_put assert for lkb structs The unhold_lkb() function decrements the lock's kref, and asserts that the ref count was not the final one. Use the kref_put release function (which should not be called) to call the assert, rather than doing the assert based on the kref_put return value. Using kill_lkb() as the release function doesn't make sense if we only want to assert. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-01 09:31:46 -05:00
Alexander Aring	6b0afc0cc3	fs: dlm: don't use deprecated timeout features by default This patch will disable use of deprecated timeout features if CONFIG_DLM_DEPRECATED_API is not set. The deprecated features will be removed in upcoming kernel release v6.2. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-01 09:31:38 -05:00
Alexander Aring	81eeb82fc2	fs: dlm: add deprecation Kconfig and warnings for timeouts This patch adds a CONFIG_DLM_DEPRECATED_API Kconfig option that must be enabled to use two timeout-related features that we intend to remove in kernel v6.2. Warnings are printed if either is enabled and used. Neither has ever been used as far as we know. . The DLM_LSFL_TIMEWARN lockspace creation flag will be removed, along with the associated configfs entry for setting the timeout. Setting the flag and configfs file would cause dlm to track how long locks were waiting for reply messages. After a timeout, a kernel message would be logged, and a netlink message would be sent to userspace. Recently, midcomms messages have been added that produce much better logging about actual problems with messages. No use has ever been found for the netlink messages. . The userspace libdlm API has allowed the DLM_LKF_TIMEOUT flag with a timeout value to be set in lock requests. The lock request would be cancelled after the timeout. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-01 09:31:32 -05:00
Steve French	97b82c07c4	cifs: trivial style fixup missing blank line after declaration Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:37:38 -05:00
Yang Yingliang	aea02fc40a	cifs: fix wrong unlock before return from cifs_tree_connect() It should unlock 'tcon->tc_lock' before return from cifs_tree_connect(). Fixes: fe67bd563ec2 ("cifs: avoid use of global locks for high contention data") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:45 -05:00
Shyam Prasad N	d7d7a66aac	cifs: avoid use of global locks for high contention data During analysis of multichannel perf, it was seen that the global locks cifs_tcp_ses_lock and GlobalMid_Lock, which were shared between various data structures were causing a lot of contention points. With this change, we're breaking down the use of these locks by introducing new locks at more granular levels. i.e. server->srv_lock, ses->ses_lock and tcon->tc_lock to protect the unprotected fields of server, session and tcon structs; and server->mid_lock to protect mid related lists and entries at server level. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:45 -05:00
Steve French	1bfa25ee30	cifs: remove remaining build warnings Removed remaining warnings related to externs. These warnings although harmless could be distracting e.g. fs/cifs/cifsfs.c: note: in included file: fs/cifs/cifsglob.h:1968:24: warning: symbol 'sesInfoAllocCount' was not declared. Should it be static? Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Enzo Matsumiya	9543c8ab30	cifs: list_for_each() -> list_for_each_entry() Replace list_for_each() by list_for_each_entr() where appropriate. Remove no longer used list_head stack variables. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Enzo Matsumiya	da3847894f	smb2: small refactor in smb2_check_message() If the command is SMB2_IOCTL, OutputLength and OutputContext are optional and can be zero, so return early and skip calculated length check. Move the mismatched length message to the end of the check, to avoid unnecessary logs when the check was not a real miscalculation. Also change the pr_warn_once() to a pr_warn() so we're sure to get a log for the real mismatches. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Matthew Wilcox (Oracle)	c6f62f81b4	cifs: Fix memory leak when using fscache If we hit the 'index == next_cached' case, we leak a refcount on the struct page. Fix this by using readahead_folio() which takes care of the refcount for you. Fixes: `0174ee9947` ("cifs: Implement cache I/O by accessing the cache directly") Cc: David Howells <dhowells@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Steve French	89e42f49ef	cifs: remove minor build warning The build warning: warning: symbol 'cifs_tcp_ses_lock' was not declared. Should it be static? can be distracting. Fix two of these. Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Steve French	c2c17ddbf3	cifs: remove some camelCase and also some static build warnings Remove warnings for five global variables. For example: fs/cifs/cifsglob.h:1984:24: warning: symbol 'midCount' was not declared. Should it be static? Also change them from camelCase (e.g. "midCount" to "mid_count") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Yu Zhe	0827f71b88	cifs: remove unnecessary (void) conversions. One more. remove unnecessary void type castings. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Yu Zhe	0f46608ae7	cifs: remove unnecessary type castings remove unnecessary void* type castings. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Colin Ian King	4da2cd0517	cifs: remove redundant initialization to variable mnt_sign_enabled Variable mnt_sign_enabled is being initialized with a value that is never read, it is being reassigned later on with a different value. The initialization is redundant and can be removed. Cleans up clang scan-build warning: fs/cifs/cifssmb.c:465:7: warning: Value stored to 'mnt_sign_enabled during its initialization is never read Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Steve French	5fa2cffba0	smb3: check xattr value length earlier Coverity complains about assigning a pointer based on value length before checking that value length goes beyond the end of the SMB. Although this is even more unlikely as value length is a single byte, and the pointer is not dereferenced until laterm, it is clearer to check the lengths first. Addresses-Coverity: 1467704 ("Speculative execution data leak") Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-08-01 01:34:44 -05:00
Hyunchul Lee	824d4f64c2	ksmbd: prevent out of bound read for SMB2_TREE_CONNNECT if Status is not 0 and PathLength is long, smb_strndup_from_utf16 could make out of bound read in smb2_tree_connnect. This bug can lead an oops looking something like: [ 1553.882047] BUG: KASAN: slab-out-of-bounds in smb_strndup_from_utf16+0x469/0x4c0 [ksmbd] [ 1553.882064] Read of size 2 at addr ffff88802c4eda04 by task kworker/0:2/42805 ... [ 1553.882095] Call Trace: [ 1553.882098] <TASK> [ 1553.882101] dump_stack_lvl+0x49/0x5f [ 1553.882107] print_report.cold+0x5e/0x5cf [ 1553.882112] ? smb_strndup_from_utf16+0x469/0x4c0 [ksmbd] [ 1553.882122] kasan_report+0xaa/0x120 [ 1553.882128] ? smb_strndup_from_utf16+0x469/0x4c0 [ksmbd] [ 1553.882139] __asan_report_load_n_noabort+0xf/0x20 [ 1553.882143] smb_strndup_from_utf16+0x469/0x4c0 [ksmbd] [ 1553.882155] ? smb_strtoUTF16+0x3b0/0x3b0 [ksmbd] [ 1553.882166] ? __kmalloc_node+0x185/0x430 [ 1553.882171] smb2_tree_connect+0x140/0xab0 [ksmbd] [ 1553.882185] handle_ksmbd_work+0x30e/0x1020 [ksmbd] [ 1553.882197] process_one_work+0x778/0x11c0 [ 1553.882201] ? _raw_spin_lock_irq+0x8e/0xe0 [ 1553.882206] worker_thread+0x544/0x1180 [ 1553.882209] ? __cpuidle_text_end+0x4/0x4 [ 1553.882214] kthread+0x282/0x320 [ 1553.882218] ? process_one_work+0x11c0/0x11c0 [ 1553.882221] ? kthread_complete_and_exit+0x30/0x30 [ 1553.882225] ret_from_fork+0x1f/0x30 [ 1553.882231] </TASK> There is no need to check error request validation in server. This check allow invalid requests not to validate message. Fixes: `e2f34481b2` ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17818 Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-31 23:14:32 -05:00
Hyunchul Lee	ac60778b87	ksmbd: prevent out of bound read for SMB2_WRITE OOB read memory can be written to a file, if DataOffset is 0 and Length is too large in SMB2_WRITE request of compound request. To prevent this, when checking the length of the data area of SMB2_WRITE in smb2_get_data_area_len(), let the minimum of DataOffset be the size of SMB2 header + the size of SMB2_WRITE header. This bug can lead an oops looking something like: [ 798.008715] BUG: KASAN: slab-out-of-bounds in copy_page_from_iter_atomic+0xd3d/0x14b0 [ 798.008724] Read of size 252 at addr ffff88800f863e90 by task kworker/0:2/2859 ... [ 798.008754] Call Trace: [ 798.008756] <TASK> [ 798.008759] dump_stack_lvl+0x49/0x5f [ 798.008764] print_report.cold+0x5e/0x5cf [ 798.008768] ? __filemap_get_folio+0x285/0x6d0 [ 798.008774] ? copy_page_from_iter_atomic+0xd3d/0x14b0 [ 798.008777] kasan_report+0xaa/0x120 [ 798.008781] ? copy_page_from_iter_atomic+0xd3d/0x14b0 [ 798.008784] kasan_check_range+0x100/0x1e0 [ 798.008788] memcpy+0x24/0x60 [ 798.008792] copy_page_from_iter_atomic+0xd3d/0x14b0 [ 798.008795] ? pagecache_get_page+0x53/0x160 [ 798.008799] ? iov_iter_get_pages_alloc+0x1590/0x1590 [ 798.008803] ? ext4_write_begin+0xfc0/0xfc0 [ 798.008807] ? current_time+0x72/0x210 [ 798.008811] generic_perform_write+0x2c8/0x530 [ 798.008816] ? filemap_fdatawrite_wbc+0x180/0x180 [ 798.008820] ? down_write+0xb4/0x120 [ 798.008824] ? down_write_killable+0x130/0x130 [ 798.008829] ext4_buffered_write_iter+0x137/0x2c0 [ 798.008833] ext4_file_write_iter+0x40b/0x1490 [ 798.008837] ? __fsnotify_parent+0x275/0xb20 [ 798.008842] ? __fsnotify_update_child_dentry_flags+0x2c0/0x2c0 [ 798.008846] ? ext4_buffered_write_iter+0x2c0/0x2c0 [ 798.008851] __kernel_write+0x3a1/0xa70 [ 798.008855] ? __x64_sys_preadv2+0x160/0x160 [ 798.008860] ? security_file_permission+0x4a/0xa0 [ 798.008865] kernel_write+0xbb/0x360 [ 798.008869] ksmbd_vfs_write+0x27e/0xb90 [ksmbd] [ 798.008881] ? ksmbd_vfs_read+0x830/0x830 [ksmbd] [ 798.008892] ? _raw_read_unlock+0x2a/0x50 [ 798.008896] smb2_write+0xb45/0x14e0 [ksmbd] [ 798.008909] ? __kasan_check_write+0x14/0x20 [ 798.008912] ? _raw_spin_lock_bh+0xd0/0xe0 [ 798.008916] ? smb2_read+0x15e0/0x15e0 [ksmbd] [ 798.008927] ? memcpy+0x4e/0x60 [ 798.008931] ? _raw_spin_unlock+0x19/0x30 [ 798.008934] ? ksmbd_smb2_check_message+0x16af/0x2350 [ksmbd] [ 798.008946] ? _raw_spin_lock_bh+0xe0/0xe0 [ 798.008950] handle_ksmbd_work+0x30e/0x1020 [ksmbd] [ 798.008962] process_one_work+0x778/0x11c0 [ 798.008966] ? _raw_spin_lock_irq+0x8e/0xe0 [ 798.008970] worker_thread+0x544/0x1180 [ 798.008973] ? __cpuidle_text_end+0x4/0x4 [ 798.008977] kthread+0x282/0x320 [ 798.008982] ? process_one_work+0x11c0/0x11c0 [ 798.008985] ? kthread_complete_and_exit+0x30/0x30 [ 798.008989] ret_from_fork+0x1f/0x30 [ 798.008995] </TASK> Fixes: `e2f34481b2` ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17817 Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-31 23:14:32 -05:00
Namjae Jeon	cf6531d981	ksmbd: fix use-after-free bug in smb2_tree_disconect smb2_tree_disconnect() freed the struct ksmbd_tree_connect, but it left the dangling pointer. It can be accessed again under compound requests. This bug can lead an oops looking something link: [ 1685.468014 ] BUG: KASAN: use-after-free in ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd] [ 1685.468068 ] Read of size 4 at addr ffff888102172180 by task kworker/1:2/4807 ... [ 1685.468130 ] Call Trace: [ 1685.468132 ] <TASK> [ 1685.468135 ] dump_stack_lvl+0x49/0x5f [ 1685.468141 ] print_report.cold+0x5e/0x5cf [ 1685.468145 ] ? ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd] [ 1685.468157 ] kasan_report+0xaa/0x120 [ 1685.468194 ] ? ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd] [ 1685.468206 ] __asan_report_load4_noabort+0x14/0x20 [ 1685.468210 ] ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd] [ 1685.468222 ] smb2_tree_disconnect+0x175/0x250 [ksmbd] [ 1685.468235 ] handle_ksmbd_work+0x30e/0x1020 [ksmbd] [ 1685.468247 ] process_one_work+0x778/0x11c0 [ 1685.468251 ] ? _raw_spin_lock_irq+0x8e/0xe0 [ 1685.468289 ] worker_thread+0x544/0x1180 [ 1685.468293 ] ? __cpuidle_text_end+0x4/0x4 [ 1685.468297 ] kthread+0x282/0x320 [ 1685.468301 ] ? process_one_work+0x11c0/0x11c0 [ 1685.468305 ] ? kthread_complete_and_exit+0x30/0x30 [ 1685.468309 ] ret_from_fork+0x1f/0x30 Fixes: `e2f34481b2` ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17816 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-31 23:14:32 -05:00
Namjae Jeon	aa7253c239	ksmbd: fix memory leak in smb2_handle_negotiate The allocated memory didn't free under an error path in smb2_handle_negotiate(). Fixes: `e2f34481b2` ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17815 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-31 23:14:32 -05:00
Namjae Jeon	af7c39d971	ksmbd: fix racy issue while destroying session on multichannel After multi-channel connection with windows, Several channels of session are connected. Among them, if there is a problem in one channel, Windows connects again after disconnecting the channel. In this process, the session is released and a kernel oop can occurs while processing requests to other channels. When the channel is disconnected, if other channels still exist in the session after deleting the channel from the channel list in the session, the session should not be released. Finally, the session will be released after all channels are disconnected. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-31 23:14:32 -05:00
Namjae Jeon	a14c573870	ksmbd: use wait_event instead of schedule_timeout() ksmbd threads eating masses of cputime when connection is disconnected. If connection is disconnected, ksmbd thread waits for pending requests to be processed using schedule_timeout. schedule_timeout() incorrectly is used, and it is more efficient to use wait_event/wake_up than to check r_count every time with timeout. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-31 23:14:32 -05:00
Takashi Iwai	512b74d17a	exfat: Drop superfluous new line for error messages exfat_err() adds the new line at the end of the message by itself, hence the passed string shouldn't contain a new line. Drop the superfluous newline letters in the error messages in a few places that have been put mistakenly. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:07 +09:00
Takashi Iwai	64fca6e621	exfat: Downgrade ENAMETOOLONG error message to debug messages The ENAMETOOLONG error message is printed at each time when user tries to operate with a too long name, and this can flood the kernel logs easily, as every user can trigger this. Let's downgrade this error message level to a debug message for suppressing the superfluous logs. BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1201725 Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:07 +09:00
Takashi Iwai	6425baabda	exfat: Expand exfat_err() and co directly to pr_() macro Currently the error and info messages handled by exfat_err() and co are tossed to exfat_msg() function that does nothing but passes the strings with printk() invocation. Not only that this is more overhead by the indirect calls, but also this makes harder to extend for the debug print usage; because of the direct printk() call, you cannot make it for dynamic debug or without debug like the standard helpers such as pr_debug() or dev_dbg(). For addressing the problem, this patch replaces exfat_() macro to expand to pr_*() directly. Along with it, add the new exfat_debug() macro that is expanded to pr_debug() (which output can be gracefully suppressed via dyndbg). Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:07 +09:00
Takashi Iwai	1b1a9195ae	exfat: Define NLS_NAME_* as bit flags explicitly NLS_NAME_* are bit flags although they are currently defined as enum; it's casually working so far (from 0 to 2), but it's error-prone and may bring a problem when we want to add more flag. This patch changes the definitions of NLS_NAME_* explicitly being bit flags. Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:06 +09:00
Takashi Iwai	86da53e8ff	exfat: Return ENAMETOOLONG consistently for oversized paths LTP has a test for oversized file path renames and it expects the return value to be ENAMETOOLONG. However, exfat returns EINVAL unexpectedly in some cases, hence LTP test fails. The further investigation indicated that the problem happens only when iocharset isn't set to utf8. The difference comes from that, in the case of utf8, exfat_utf8_to_utf16() returns the error -ENAMETOOLONG directly and it's treated as the final error code. Meanwhile, on other iocharsets, exfat_nls_to_ucs2() returns the max path size but it sets NLS_NAME_OVERLEN to lossy flag instead; the caller side checks only whether lossy flag is set or not, resulting in always -EINVAL unconditionally. This patch aligns the return code for both cases by checking the lossy flag bit and returning ENAMETOOLONG when NLS_NAME_OVERLEN bit is set. BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1201725 Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:06 +09:00
Yuezhang Mo	be17b1ccd4	exfat: remove duplicate write inode for extending dir/file Since the timestamps need to be updated, the directory entries will be updated by mark_inode_dirty() whether or not a new cluster is allocated for the file or directory, so there is no need to use __exfat_write_inode() to update the directory entries when allocating a new cluster for a file or directory. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:06 +09:00
Yuezhang Mo	4493895b2b	exfat: remove duplicate write inode for truncating file This commit moves updating file attributes and timestamps before calling __exfat_write_inode(), so that all updates of the inode had been written by __exfat_write_inode(), mark_inode_dirty() is unneeded. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:06 +09:00
Yuezhang Mo	23e6e1c9b3	exfat: reuse __exfat_write_inode() to update directory entry __exfat_write_inode() is used to update file and stream directory entries, except for file->start_clu and stream->flags. This commit moves update file->start_clu and stream->flags to __exfat_write_inode() and reuse __exfat_write_inode() to update directory entries. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:05 +09:00
Xie Shaowen	5e9466a5d0	xfs: delete extra space and tab in blank line delete extra space and tab in blank line, there is no functional change. Reported-by: Hacash Robot <hacashRobot@santino.com> Signed-off-by: Xie Shaowen <studentxswpy@163.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-07-31 09:21:27 -07:00
ChenXiaoSong	001c179c4e	xfs: fix NULL pointer dereference in xfs_getbmap() Reproducer: 1. fallocate -l 100M image 2. mkfs.xfs -f image 3. mount image /mnt 4. setxattr("/mnt", "trusted.overlay.upper", NULL, 0, XATTR_CREATE) 5. char arg[32] = "\x01\xff\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00" "\x00\x00\x00\x00\x00\x08\x00\x00\x00\xc6\x2a\xf7"; fd = open("/mnt", O_RDONLY\|O_DIRECTORY); ioctl(fd, _IOC(_IOC_READ\|_IOC_WRITE, 0x58, 0x2c, 0x20), arg); NULL pointer dereference will occur when race happens between xfs_getbmap() and xfs_bmap_set_attrforkoff(): ioctl \| setxattr ----------------------------\|--------------------------- xfs_getbmap \| xfs_ifork_ptr \| xfs_inode_has_attr_fork \| ip->i_forkoff == 0 \| return NULL \| ifp == NULL \| \| xfs_bmap_set_attrforkoff \| ip->i_forkoff > 0 xfs_inode_has_attr_fork \| ip->i_forkoff > 0 \| ifp == NULL \| ifp->if_format \| Fix this by locking i_lock before xfs_ifork_ptr(). Fixes: `abbf9e8a45` ("xfs: rewrite getbmap using the xfs_iext_* helpers") Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: added fixes tag] Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-07-31 09:21:27 -07:00
Hongnan Li	ecce9212d0	erofs: update ctx->pos for every emitted dirent erofs_readdir update ctx->pos after filling a batch of dentries and it may cause dir/files duplication for NFS readdirplus which depends on ctx->pos to fill dir correctly. So update ctx->pos for every emitted dirent in erofs_fill_dentries to fix it. Also fix the update of ctx->pos when the initial file position has exceeded nameoff. Fixes: `3e917cc305` ("erofs: make filesystem exportable") Signed-off-by: Hongnan Li <hongnan.li@linux.alibaba.com> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220722082732.30935-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-07-31 22:26:29 +08:00
Chao Yu	09beadf289	f2fs: fix to do sanity check on segment type in build_sit_entries() As Wenqing Liu <wenqingliu0120@gmail.com> reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216285 RIP: 0010:memcpy_erms+0x6/0x10 f2fs_update_meta_page+0x84/0x570 [f2fs] change_curseg.constprop.0+0x159/0xbd0 [f2fs] f2fs_do_replace_block+0x5c7/0x18a0 [f2fs] f2fs_replace_block+0xeb/0x180 [f2fs] recover_data+0x1abd/0x6f50 [f2fs] f2fs_recover_fsync_data+0x12ce/0x3250 [f2fs] f2fs_fill_super+0x4459/0x6190 [f2fs] mount_bdev+0x2cf/0x3b0 legacy_get_tree+0xed/0x1d0 vfs_get_tree+0x81/0x2b0 path_mount+0x47e/0x19d0 do_mount+0xce/0xf0 __x64_sys_mount+0x12c/0x1a0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd The root cause is segment type is invalid, so in f2fs_do_replace_block(), f2fs accesses f2fs_sm_info::curseg_array with out-of-range segment type, result in accessing invalid curseg->sum_blk during memcpy in f2fs_update_meta_page(). Fix this by adding sanity check on segment type in build_sit_entries(). Reported-by: Wenqing Liu <wenqingliu0120@gmail.com> Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:19:00 -07:00
Chao Yu	7b01ad7f33	f2fs: obsolete unused MAX_DISCARD_BLOCKS After commit `a7eeb82385` ("f2fs: use bitmap in discard_entry"), MAX_DISCARD_BLOCKS became obsolete, remove it. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:18:09 -07:00
Chao Yu	141170b759	f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page() As Dipanjan Das <mail.dipanjan.das@gmail.com> reported, syzkaller found a f2fs bug as below: RIP: 0010:f2fs_new_node_page+0x19ac/0x1fc0 fs/f2fs/node.c:1295 Call Trace: write_all_xattrs fs/f2fs/xattr.c:487 [inline] __f2fs_setxattr+0xe76/0x2e10 fs/f2fs/xattr.c:743 f2fs_setxattr+0x233/0xab0 fs/f2fs/xattr.c:790 f2fs_xattr_generic_set+0x133/0x170 fs/f2fs/xattr.c:86 __vfs_setxattr+0x115/0x180 fs/xattr.c:182 __vfs_setxattr_noperm+0x125/0x5f0 fs/xattr.c:216 __vfs_setxattr_locked+0x1cf/0x260 fs/xattr.c:277 vfs_setxattr+0x13f/0x330 fs/xattr.c:303 setxattr+0x146/0x160 fs/xattr.c:611 path_setxattr+0x1a7/0x1d0 fs/xattr.c:630 __do_sys_lsetxattr fs/xattr.c:653 [inline] __se_sys_lsetxattr fs/xattr.c:649 [inline] __x64_sys_lsetxattr+0xbd/0x150 fs/xattr.c:649 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 NAT entry and nat bitmap can be inconsistent, e.g. one nid is free in nat bitmap, and blkaddr in its NAT entry is not NULL_ADDR, it may trigger BUG_ON() in f2fs_new_node_page(), fix it. Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com> Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:17:57 -07:00
Chao Liu	8ee236dcaa	f2fs: fix to remove F2FS_COMPR_FL and tag F2FS_NOCOMP_FL at the same time If the inode has the compress flag, it will fail to use 'chattr -c +m' to remove its compress flag and tag no compress flag. However, the same command will be successful when executed again, as shown below: $ touch foo.txt $ chattr +c foo.txt $ chattr -c +m foo.txt chattr: Invalid argument while setting flags on foo.txt $ chattr -c +m foo.txt $ f2fs_io getflags foo.txt get a flag on foo.txt ret=0, flags=nocompression,inline_data Fix this by removing some checks in f2fs_setflags_common() that do not affect the original logic. I go through all the possible scenarios, and the results are as follows. Bold is the only thing that has changed. +---------------+-----------+-----------+----------+ \| \| file flags \| + command +-----------+-----------+----------+ \| \| no flag \| compr \| nocompr \| +---------------+-----------+-----------+----------+ \| chattr +c \| compr \| compr \| -EINVAL \| \| chattr -c \| no flag \| no flag \| nocompr \| \| chattr +m \| nocompr \| -EINVAL \| nocompr \| \| chattr -m \| no flag \| compr \| no flag \| \| chattr +c +m \| -EINVAL \| -EINVAL \| -EINVAL \| \| chattr +c -m \| compr \| compr \| compr \| \| chattr -c +m \| nocompr \| nocompr \| nocompr \| \| chattr -c -m \| no flag \| no flag \| no flag \| +---------------+-----------+-----------+----------+ Link: https://lore.kernel.org/linux-f2fs-devel/20220621064833.1079383-1-chaoliu719@gmail.com/ Fixes: `4c8ff7095b` ("f2fs: support data compression") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Chao Liu <liuchao@coolpad.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:17:07 -07:00
Daeho Jeong	f8e2f32bcd	f2fs: introduce sysfs atomic write statistics introduce the below 4 new sysfs node for atomic write statistics. - current_atomic_write: the total current atomic write block count, which is not committed yet. - peak_atomic_write: the peak value of total current atomic write block count after boot. - committed_atomic_block: the accumulated total committed atomic write block count after boot. - revoked_atomic_block: the accumulated total revoked atomic write block count after boot. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:17:07 -07:00
qixiaoyu1	1adaa71ea9	f2fs: don't bother wait_ms by foreground gc f2fs_gc returns -EINVAL via f2fs_balance_fs when there is enough free secs after write checkpoint, but with gc_merge enabled, it will cause the sleep time of gc thread to be set to no_gc_sleep_time even if there are many dirty segments can be selected. Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-07-30 20:17:07 -07:00

... 4 5 6 7 8 ...

78138 Commits