688 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
5d170fe435 |
f2fs-for-6.1-rc1
This round looks fairly small comparing to the previous updates which includes mostly minor bug fixes. Nevertheless, as we've still interested in improving the stability, Chao added some debugging methods to diagnoze subtle runtime inconsistency problem. Enhancement - store all the corruption or failure reasons in superblock - detect meta inode, summary info, and block address inconsistency - increase the limit for reserve_root for low-end devices - add the number of compressed IO in iostat Bug fix - DIO write fix for zoned devices - do out-of-place writes for cold files - fix some stat updates (FS_CP_DATA_IO, dirty page count) - fix race condition on setting FI_NO_EXTENT flag - fix data races when freezing super - fix wrong continue condition check in GC - do not allow ATGC for LFS mode In addition, there're some code enhancement and clean-ups as usual. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmNEVIkACgkQQBSofoJI UNL/Qg//eu7k196yIKflDZmp5aJbb5ybpFmh7XkPiqAV17ns+R2uLGq68BvTs+Tg rqCjB7j2kkBh1kN32R7aGcx6tcbHjWc94pi59YTGQ6+pwkop3KJxFHSwAaUw6y34 8NZwmsnrm9rv0A0QPhQPK19yWmG/2smUE9b/u7M3+20I1WANaxdS/vOKbZz/amOu f/BvsIIGS7Zzm9OpBCvGmq9Qpd83jlH6PuYGTC/OVbCrUiAJEmwN8wGsKP/9qB/5 KxVpdlh3vxulS6ixNbMu2qw9GBAQpAOz50+eDL5ZtGvGIQNHZRpGlfpJoW1lz0EO 4fJtpf5OMGqUbNaPCTG4qQGYAtKWA9YnFeWSS7RViQ6MryRXZMK8ka5eIe5Qblcf AXD/eU2gKzOu0fuvdBRCt/wTSb4gY8sMNhe4psDsZxfhaYIpX8Ee/XVa4d+Z4frg irN9gid1k3laMTx9dwJL8m7gIFvy3pak6l3B0bA69fAXd3faI40enuyfubFxnDet OuRNxj8j3J5C140ag5KOuBCRub2/aPaj9YSQqUstf64d8FzN/Ypn5iVPTs2DP/3D bcAFBwCS2+MCsk9+ra0WldZ5awdd6CRHDkvaYeDEuLCaLHUCo6CXe3aIyWawJBvJ RnghKNv82RIV+rQlI1/sg8lseoDnEZTp5iwDGw/qZ+ZUyn05apM= =aZ9y -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This round looks fairly small comparing to the previous updates and includes mostly minor bug fixes. Nevertheless, as we've still interested in improving the stability, Chao added some debugging methods to diagnoze subtle runtime inconsistency problem. Enhancements: - store all the corruption or failure reasons in superblock - detect meta inode, summary info, and block address inconsistency - increase the limit for reserve_root for low-end devices - add the number of compressed IO in iostat Bug fixes: - DIO write fix for zoned devices - do out-of-place writes for cold files - fix some stat updates (FS_CP_DATA_IO, dirty page count) - fix race condition on setting FI_NO_EXTENT flag - fix data races when freezing super - fix wrong continue condition check in GC - do not allow ATGC for LFS mode In addition, there're some code enhancement and clean-ups as usual" * tag 'f2fs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (32 commits) f2fs: change to use atomic_t type form sbi.atomic_files f2fs: account swapfile inodes f2fs: allow direct read for zoned device f2fs: support recording errors into superblock f2fs: support recording stop_checkpoint reason into super_block f2fs: remove the unnecessary check in f2fs_xattr_fiemap f2fs: introduce cp_status sysfs entry f2fs: fix to detect corrupted meta ino f2fs: fix to account FS_CP_DATA_IO correctly f2fs: code clean and fix a type error f2fs: add "c_len" into trace_f2fs_update_extent_tree_range for compressed file f2fs: fix to do sanity check on summary info f2fs: port to vfs{g,u}id_t and associated helpers f2fs: fix to do sanity check on destination blkaddr during recovery f2fs: let FI_OPU_WRITE override FADVISE_COLD_BIT f2fs: fix race condition on setting FI_NO_EXTENT flag f2fs: remove redundant check in f2fs_sanity_check_cluster f2fs: add static init_idisk_time function to reduce the code f2fs: fix typo f2fs: fix wrong dirty page count when race between mmap and fallocate. ... |
||
Chao Yu
|
95fa90c9e5 |
f2fs: support recording errors into superblock
This patch supports to record detail reason of FSCORRUPTED error into f2fs_super_block.s_errors[]. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Chao Yu
|
a9cfee0ef9 |
f2fs: support recording stop_checkpoint reason into super_block
This patch supports to record stop_checkpoint error into f2fs_super_block.s_stop_reason[]. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Christoph Hellwig
|
0e91fc1e0f |
fscrypt: work on block_devices instead of request_queues
request_queues are a block layer implementation detail that should not leak into file systems. Change the fscrypt inline crypto code to retrieve block devices instead of request_queues from the file system. As part of that, clean up the interaction with multi-device file systems by returning both the number of devices and the actual device array in a single method call. Signed-off-by: Christoph Hellwig <hch@lst.de> [ebiggers: bug fixes and minor tweaks] Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220901193208.138056-4-ebiggers@kernel.org |
||
Jaegeuk Kim
|
da35fe96d1 |
f2fs: increase the limit for reserve_root
This patch increases the threshold that limits the reserved root space from 0.2% to 12.5% by using simple shift operation. Typically Android sets 128MB, but if the storage capacity is 32GB, 0.2% which is around 64MB becomes too small. Let's relax it. Cc: stable@vger.kernel.org Reported-by: Aran Dalton <arda@allwinnertech.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Jaegeuk Kim
|
4f99484d27 |
f2fs: complete checkpoints during remount
Otherwise, pending checkpoints can contribute a race condition to give a quota warning. - Thread - checkpoint thread add checkpoints to the list do_remount() down_write(&sb->s_umount); f2fs_remount() block_operations() down_read_trylock(&sb->s_umount) = 0 up_write(&sb->s_umount); f2fs_quota_sync() dquot_writeback_dquots() WARN_ON_ONCE(!rwsem_is_locked(&sb->s_umount)); Or, do_remount() down_write(&sb->s_umount); f2fs_remount() create a ckpt thread f2fs_enable_checkpoint() adds checkpoints wait for f2fs_sync_fs() trigger another pending checkpoint block_operations() down_read_trylock(&sb->s_umount) = 0 up_write(&sb->s_umount); f2fs_quota_sync() dquot_writeback_dquots() WARN_ON_ONCE(!rwsem_is_locked(&sb->s_umount)); Cc: stable@vger.kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Jaegeuk Kim
|
c7b5857637 |
f2fs: flush pending checkpoints when freezing super
This avoids -EINVAL when trying to freeze f2fs. Cc: stable@vger.kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Eric Biggers
|
b87846bd61 |
f2fs: use memcpy_{to,from}_page() where possible
This is simpler, and as a side effect it replaces several uses of kmap_atomic() with its recommended replacement kmap_local_page(). Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Jaegeuk Kim
|
80dc113aaa |
f2fs: LFS mode does not support ATGC
ATGC is using SSR which violates LFS mode used by zoned device. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Linus Torvalds
|
1daf117f1d |
f2fs-for-6.0
In this cycle, we mainly fixed some corner cases that manipulate a per-file compression flag inappropriately. And, we found f2fs counted valid blocks in a section incorrectly when zone capacity is set, and thus, fixed it with additional sysfs entry to check it easily. Lastly, this series includes several patches with respect to the new atomic write support such as a couple of bug fixes and re-adding atomic_write_abort support that we removed by mistake in the previous release. Enhancement: - add sysfs entries to understand atomic write operations and zone capacity - introduce memory mode to get a hint for low-memory devices - adjust the waiting time of foreground GC - decompress clusters under softirq to avoid non-deterministic latency - do not skip updating inode when retrying to flush node page - enforce single zone capacity Bug fix: - set the compression/no-compression flags correctly - revive F2FS_IOC_ABORT_VOLATILE_WRITE - check inline_data during compressed inode conversion - understand zone capacity when calculating valid block count As usual, the series includes several minor clean-ups and sanity checks. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmLxOccACgkQQBSofoJI UNId/g/+Nx3FK874cyobE1PnPpUtfxLGqO9fjhrbje3bniTpgE9NtJUFg5hRQkxE XHuufMrW++aBhn2ESMjbfdQ3v6vy5XUy7bi4FR71KxW4qp15mAqjTPfAZBFKZfMv lCv54NKlura91GhI9Dl6JgGe1+MwNXIxVROyGvjXYogF0DWl+iJh4vYuCFUguiNU mP6FmnZvbtK89jYxODoqwQaC+b6DV7ceaQ+c0dtS5TRvsUNv5mjWDeTvPMgk3At/ mAuWYXfIrf5xfDY93JPbrJhBLvu7Ey3EfXBnaFGRYbYxYYub9JZ4+/5di/rB9jRc 9AZ6LcLX3aKaT71EWa9vdCIffz8/PcSRjsmpEuVs7KNySwcnolnb1tAzlJPKy2AV IJliY1Ef0+jrpg2lHYZoMb5qvo80c3xlyxlgZt0LSZKf1Wo41sjJVt6ZS7WLhHXu OlzeI7lZBS9RKPUtU5cGNWkmZqamvmq09mMvqF4IUIaY40MizKZoV0yh9BjuUoxM xniBIlC/q0HvwmbQ2OtNKDgv7+FdxrRlaDyhhkppa3UA8ZK3Edch26N9pBoh/r33 zJIR2BwCGmHz7yaX4HGzSt1phex2ABIGuZ4vBaGI7XDuYUD1tCZpC8wMCs2X3pKo ldQz3uu0GA0BSsNKpRks2dwRF0JJVGTk8UwcSXPwTdTTdqyhmvI= =dJ41 -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this cycle, we mainly fixed some corner cases that manipulate a per-file compression flag inappropriately. And, we found f2fs counted valid blocks in a section incorrectly when zone capacity is set, and thus, fixed it with additional sysfs entry to check it easily. Lastly, this series includes several patches with respect to the new atomic write support such as a couple of bug fixes and re-adding atomic_write_abort support that we removed by mistake in the previous release. Enhancements: - add sysfs entries to understand atomic write operations and zone capacity - introduce memory mode to get a hint for low-memory devices - adjust the waiting time of foreground GC - decompress clusters under softirq to avoid non-deterministic latency - do not skip updating inode when retrying to flush node page - enforce single zone capacity Bug fixes: - set the compression/no-compression flags correctly - revive F2FS_IOC_ABORT_VOLATILE_WRITE - check inline_data during compressed inode conversion - understand zone capacity when calculating valid block count As usual, the series includes several minor clean-ups and sanity checks" * tag 'f2fs-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (29 commits) f2fs: use onstack pages instead of pvec f2fs: intorduce f2fs_all_cluster_page_ready f2fs: clean up f2fs_abort_atomic_write() f2fs: handle decompress only post processing in softirq f2fs: do not allow to decompress files have FI_COMPRESS_RELEASED f2fs: do not set compression bit if kernel doesn't support f2fs: remove device type check for direct IO f2fs: fix null-ptr-deref in f2fs_get_dnode_of_data f2fs: revive F2FS_IOC_ABORT_VOLATILE_WRITE f2fs: fix to do sanity check on segment type in build_sit_entries() f2fs: obsolete unused MAX_DISCARD_BLOCKS f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page() f2fs: fix to remove F2FS_COMPR_FL and tag F2FS_NOCOMP_FL at the same time f2fs: introduce sysfs atomic write statistics f2fs: don't bother wait_ms by foreground gc f2fs: invalidate meta pages only for post_read required inode f2fs: allow compression of files without blocks f2fs: fix to check inline_data during compressed inode conversion f2fs: Delete f2fs_copy_page() and replace with memcpy_page() f2fs: fix to invalidate META_MAPPING before DIO write ... |
||
Chao Yu
|
e53f864347 |
f2fs: clean up f2fs_abort_atomic_write()
f2fs_abort_atomic_write() has checked whether current inode is atomic_write one or not, it's redundant to check in its caller, remove it for cleanup. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Daeho Jeong
|
f8e2f32bcd |
f2fs: introduce sysfs atomic write statistics
introduce the below 4 new sysfs node for atomic write statistics. - current_atomic_write: the total current atomic write block count, which is not committed yet. - peak_atomic_write: the peak value of total current atomic write block count after boot. - committed_atomic_block: the accumulated total committed atomic write block count after boot. - revoked_atomic_block: the accumulated total revoked atomic write block count after boot. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Jaegeuk Kim
|
b771aadc6e |
f2fs: enforce single zone capacity
In order to simplify the complicated per-zone capacity, let's support only one capacity for entire zoned device. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Daeho Jeong
|
7a8fc58618 |
f2fs: introduce memory mode
Introduce memory mode to supports "normal" and "low" memory modes. "low" mode is to support low memory devices. Because of the nature of low memory devices, in this mode, f2fs will try to save memory sometimes by sacrificing performance. "normal" mode is the default mode and same as before. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Roman Gushchin
|
e33c267ab7 |
mm: shrinkers: provide shrinkers with names
Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Eric Biggers
|
c5bca38d2e |
f2fs: use the updated test_dummy_encryption helper functions
Switch f2fs over to the functions that are replacing fscrypt_set_test_dummy_encryption(). Since f2fs hasn't been converted to the new mount API yet, this doesn't really provide a benefit for f2fs. But it allows fscrypt_set_test_dummy_encryption() to be removed. Also take the opportunity to eliminate an #ifdef. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Linus Torvalds
|
1501f707d2 |
f2fs-for-5.19
In this round, we've refactored the existing atomic write support implemented by in-memory operations to have storing data in disk temporarily, which can give us a benefit to accept more atomic writes. At the same time, we removed the existing volatile write support. We've also revisited the file pinning and GC flows and found some corner cases which contributeed abnormal system behaviours. As usual, there're several minor code refactoring for readability, sanity check, and clean ups. Enhancement - allow compression for mmap files in compress_mode=user - kill volatile write support - change the current atomic write way - give priority to select unpinned section for foreground GC - introduce data read/write showing path info - remove unnecessary f2fs_lock_op in f2fs_new_inode Bug fix - fix the file pinning flow during checkpoint=disable and GCs - fix foreground and background GCs to select the right victims and get free sections on time - fix GC flags on defragmenting pages - avoid an infinite loop to flush node pages - fix fallocate to use file_modified to update permissions consistently -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmKWfyEACgkQQBSofoJI UNJaAQ/9Hs3aGIyriGV8CMbarklRuQ24o3khQKdia5gHseFVsydMfba8tyvl7vYV fZnHKp9rnEV1emxWn7hHLaGOvPV8leajZqMLhqG384BIb0yoTnRipnK5t0JkoiJX 53XC5yfxQd01dwS+J4uOSu2jW0Gs6iBLD6H9ahOs86OE6jF1TeQ/fqjsrhm9I8Zr GsNON6zxafPn248sYyVBB3Y5GjPBPf+USif3ZEidAWimW/TIGbXLUT1hA0B79YoX DRAmN3tYS75yXauQvFPerMbOmP2gwCPcvdCI/PZ4U/ApsEPP7k1SbOZYAjjGUB30 Qn8cSMxzPZ1cHvzIC96vwJk8XPdcDhICfzROb7jJdeznD8cWTDv0E+Vd33HUf/mG pi5Lkpc4STvYD+KUaKpdnHVg6ARWw4HOnUtW43MF3OsfuyGEEPlROs6lBVYnk/Hz smlrgnnLMTOpH9y2JyuyExeHEJ3EAgWbJ8aRpq7Ua7FvKF45Yj1lIytWlvWXSnRf rp+A5QJhVtYvT+y2Rk2h5oTRj/9l3+pR0X7CTOfSivJuf6aH5XVgI0EmxT2iBTCp 4SDBjLC+nXXP3EK1HamLiz1mU23Qg1Qwvx3Wc4xgdwQf3s+jyYxki9tIjzdwJCCZ adjd3fc/GrD9UPDmJDXlD5QSoOJ94K/NOwYpu1L1/Q+dVwkl+IE= =ta8Y -----END PGP SIGNATURE----- Merge tag 'f2fs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've refactored the existing atomic write support implemented by in-memory operations to have storing data in disk temporarily, which can give us a benefit to accept more atomic writes. At the same time, we removed the existing volatile write support. We've also revisited the file pinning and GC flows and found some corner cases which contributeed abnormal system behaviours. As usual, there're several minor code refactoring for readability, sanity check, and clean ups. Enhancements: - allow compression for mmap files in compress_mode=user - kill volatile write support - change the current atomic write way - give priority to select unpinned section for foreground GC - introduce data read/write showing path info - remove unnecessary f2fs_lock_op in f2fs_new_inode Bug fixes: - fix the file pinning flow during checkpoint=disable and GCs - fix foreground and background GCs to select the right victims and get free sections on time - fix GC flags on defragmenting pages - avoid an infinite loop to flush node pages - fix fallocate to use file_modified to update permissions consistently" * tag 'f2fs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits) f2fs: fix to tag gcing flag on page during file defragment f2fs: replace F2FS_I(inode) and sbi by the local variable f2fs: add f2fs_init_write_merge_io function f2fs: avoid unneeded error handling for revoke_entry_slab allocation f2fs: allow compression for mmap files in compress_mode=user f2fs: fix typo in comment f2fs: make f2fs_read_inline_data() more readable f2fs: fix to do sanity check for inline inode f2fs: fix fallocate to use file_modified to update permissions consistently f2fs: don't use casefolded comparison for "." and ".." f2fs: do not stop GC when requiring a free section f2fs: keep wait_ms if EAGAIN happens f2fs: introduce f2fs_gc_control to consolidate f2fs_gc parameters f2fs: reject test_dummy_encryption when !CONFIG_FS_ENCRYPTION f2fs: kill volatile write support f2fs: change the current atomic write way f2fs: don't need inode lock for system hidden quota f2fs: stop allocating pinned sections if EAGAIN happens f2fs: skip GC if possible when checkpoint disabling f2fs: give priority to select unpinned section for foreground GC ... |
||
Yufen Yu
|
908ea65416 |
f2fs: add f2fs_init_write_merge_io function
Almost all other initialization of variables in f2fs_fill_super are extraced to a single function. Also do it for write_io[], which can make code more clean. This patch just refactors the code, theres no functional change. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> [Jaegeuk Kim: clean up] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Jaegeuk Kim
|
c81d5bae40 |
f2fs: do not stop GC when requiring a free section
The f2fs_gc uses a bitmap to indicate pinned sections, but when disabling chckpoint, we call f2fs_gc() with NULL_SEGNO which selects the same dirty segment as a victim all the time, resulting in checkpoint=disable failure, for example. Let's pick another one, if we fail to collect it. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Jaegeuk Kim
|
d147ea4adb |
f2fs: introduce f2fs_gc_control to consolidate f2fs_gc parameters
No functional change. - remove checkpoint=disable check for f2fs_write_checkpoint - get sec_freed all the time Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Eric Biggers
|
64e3ed0b8e |
f2fs: reject test_dummy_encryption when !CONFIG_FS_ENCRYPTION
There is no good reason to allow this mount option when the kernel isn't configured with encryption support. Since this option is only for testing, we can just fix this; we don't really need to worry about breaking anyone who might be counting on this option being ignored. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Daeho Jeong
|
3db1de0e58 |
f2fs: change the current atomic write way
Current atomic write has three major issues like below. - keeps the updates in non-reclaimable memory space and they are even hard to be migrated, which is not good for contiguous memory allocation. - disk spaces used for atomic files cannot be garbage collected, so this makes it difficult for the filesystem to be defragmented. - If atomic write operations hit the threshold of either memory usage or garbage collection failure count, All the atomic write operations will fail immediately. To resolve the issues, I will keep a COW inode internally for all the updates to be flushed from memory, when we need to flush them out in a situation like high memory pressure. These COW inodes will be tagged as orphan inodes to be reclaimed in case of sudden power-cut or system failure during atomic writes. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Jaegeuk Kim
|
6213f5d4d2 |
f2fs: don't need inode lock for system hidden quota
Let's avoid false-alarmed lockdep warning. [ 58.914674] [T1501146] -> #2 (&sb->s_type->i_mutex_key#20){+.+.}-{3:3}: [ 58.915975] [T1501146] system_server: down_write+0x7c/0xe0 [ 58.916738] [T1501146] system_server: f2fs_quota_sync+0x60/0x1a8 [ 58.917563] [T1501146] system_server: block_operations+0x16c/0x43c [ 58.918410] [T1501146] system_server: f2fs_write_checkpoint+0x114/0x318 [ 58.919312] [T1501146] system_server: f2fs_issue_checkpoint+0x178/0x21c [ 58.920214] [T1501146] system_server: f2fs_sync_fs+0x48/0x6c [ 58.920999] [T1501146] system_server: f2fs_do_sync_file+0x334/0x738 [ 58.921862] [T1501146] system_server: f2fs_sync_file+0x30/0x48 [ 58.922667] [T1501146] system_server: __arm64_sys_fsync+0x84/0xf8 [ 58.923506] [T1501146] system_server: el0_svc_common.llvm.12821150825140585682+0xd8/0x20c [ 58.924604] [T1501146] system_server: do_el0_svc+0x28/0xa0 [ 58.925366] [T1501146] system_server: el0_svc+0x24/0x38 [ 58.926094] [T1501146] system_server: el0_sync_handler+0x88/0xec [ 58.926920] [T1501146] system_server: el0_sync+0x1b4/0x1c0 [ 58.927681] [T1501146] -> #1 (&sbi->cp_global_sem){+.+.}-{3:3}: [ 58.928889] [T1501146] system_server: down_write+0x7c/0xe0 [ 58.929650] [T1501146] system_server: f2fs_write_checkpoint+0xbc/0x318 [ 58.930541] [T1501146] system_server: f2fs_issue_checkpoint+0x178/0x21c [ 58.931443] [T1501146] system_server: f2fs_sync_fs+0x48/0x6c [ 58.932226] [T1501146] system_server: sync_filesystem+0xac/0x130 [ 58.933053] [T1501146] system_server: generic_shutdown_super+0x38/0x150 [ 58.933958] [T1501146] system_server: kill_block_super+0x24/0x58 [ 58.934791] [T1501146] system_server: kill_f2fs_super+0xcc/0x124 [ 58.935618] [T1501146] system_server: deactivate_locked_super+0x90/0x120 [ 58.936529] [T1501146] system_server: deactivate_super+0x74/0xac [ 58.937356] [T1501146] system_server: cleanup_mnt+0x128/0x168 [ 58.938150] [T1501146] system_server: __cleanup_mnt+0x18/0x28 [ 58.938944] [T1501146] system_server: task_work_run+0xb8/0x14c [ 58.939749] [T1501146] system_server: do_notify_resume+0x114/0x1e8 [ 58.940595] [T1501146] system_server: work_pending+0xc/0x5f0 [ 58.941375] [T1501146] -> #0 (&sbi->gc_lock){+.+.}-{3:3}: [ 58.942519] [T1501146] system_server: __lock_acquire+0x1270/0x2868 [ 58.943366] [T1501146] system_server: lock_acquire+0x114/0x294 [ 58.944169] [T1501146] system_server: down_write+0x7c/0xe0 [ 58.944930] [T1501146] system_server: f2fs_issue_checkpoint+0x13c/0x21c [ 58.945831] [T1501146] system_server: f2fs_sync_fs+0x48/0x6c [ 58.946614] [T1501146] system_server: f2fs_do_sync_file+0x334/0x738 [ 58.947472] [T1501146] system_server: f2fs_ioc_commit_atomic_write+0xc8/0x14c [ 58.948439] [T1501146] system_server: __f2fs_ioctl+0x674/0x154c [ 58.949253] [T1501146] system_server: f2fs_ioctl+0x54/0x88 [ 58.950018] [T1501146] system_server: __arm64_sys_ioctl+0xa8/0x110 [ 58.950865] [T1501146] system_server: el0_svc_common.llvm.12821150825140585682+0xd8/0x20c [ 58.951965] [T1501146] system_server: do_el0_svc+0x28/0xa0 [ 58.952727] [T1501146] system_server: el0_svc+0x24/0x38 [ 58.953454] [T1501146] system_server: el0_sync_handler+0x88/0xec [ 58.954279] [T1501146] system_server: el0_sync+0x1b4/0x1c0 Cc: stable@vger.kernel.org Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Weichao Guo
|
2880f47b94 |
f2fs: skip GC if possible when checkpoint disabling
If the number of unusable blocks is not larger than unusable capacity, we can skip GC when checkpoint disabling. Signed-off-by: Weichao Guo <guoweichao@oppo.com> Signed-off-by: Chao Yu <chao@kernel.org> [Jaegeuk Kim: Fix missing gc_mode assignment] Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Matthew Wilcox (Oracle)
|
9d6b0cd757 |
fs: Remove flags parameter from aops->write_begin
There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
Luis Chamberlain
|
7f262f7375 |
f2fs: ensure only power of 2 zone sizes are allowed
F2FS zoned support has power of 2 zone size assumption in many places such as in __f2fs_issue_discard_zone, init_blkz_info. As the power of 2 requirement has been removed from the block layer, explicitly add a condition in f2fs to allow only power of 2 zone size devices. This condition will be relaxed once those calculation based on power of 2 is made generic. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Luis Chamberlain
|
d46db4595b |
f2fs: call bdev_zone_sectors() only once on init_blkz_info()
Instead of calling bdev_zone_sectors() multiple times, call it once and cache the value locally. This will make the subsequent change easier to read. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Niels Dossche
|
4de851459e |
f2fs: extend stat_lock to avoid potential race in statfs
There are multiple calculations and reads of fields of sbi that should be protected by stat_lock. As stat_lock is not used to read these values in statfs, this can lead to inconsistent results. Extend the locking to prevent this issue. Commit c9c8ed50d94c ("f2fs: fix to avoid potential race on sbi->unusable_block_count access/update") already added the use of sbi->stat_lock in statfs in order to make the calculation of multiple, different fields atomic so that results are consistent. This is similar to that patch regarding the change in statfs. Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Jaegeuk Kim
|
930e260763 |
f2fs: remove obsolete whint_mode
This patch removes obsolete whint_mode. Fixes: 41d36a9f3e53 ("fs: remove kiocb.ki_hint") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Linus Torvalds
|
3bf03b9a08 |
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ... |
||
Muchun Song
|
65d3af647b |
f2fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-6-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
NeilBrown
|
a64239d0ef |
f2fs: replace congestion_wait() calls with io_schedule_timeout()
As congestion is no longer tracked, congestion_wait() is effectively equivalent to io_schedule_timeout(). So introduce f2fs_io_schedule_timeout() which sets TASK_UNINTERRUPTIBLE and call that instead. Link: https://lkml.kernel.org/r/164549983744.9187.6425865370954230902.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Paolo Valente <paolo.valente@linaro.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
ef510682af |
f2fs-for-5.18
In this cycle, f2fs has some performance improvements for Android workloads such as using read-unfair rwsems and adding some sysfs entries to control GCs and discard commands in more details. In addtiion, it has some tunings to improve the recovery speed after sudden power-cut. Enhancement: - add reader-unfair rwsems with F2FS_UNFAIR_RWSEM : will replace with generic API support - adjust to make the readahead/recovery flow more efficiently - sysfs entries to control issue speeds of GCs and Discard commands - enable idmapped mounts Bug fix: - correct wrong error handling routines - fix missing conditions in quota - fix a potential deadlock between writeback and block plug routines - fix a deadlock btween freezefs and evict_inode We've added some boundary checks to avoid kernel panics on corrupted images, and several minor code clean-ups. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmI44c4ACgkQQBSofoJI UNIqdBAAgBjV/76Gphbpg2lR5+13pWBV0jp66yYaaPiqmM6IsSPYKTlGMJpEBy41 x6M+MRc+NjtwSEHAWOptOIPbP9zwXYJn/KSDMCAP3+454YhBFDLqDAkAxBt1frYT 0EkwCIYw/LqmVnuIttQ01gnT8v5zH4d/x4+gsdM+b7flmpCP/AoZDvI19Zd66F0y RdOdQQWyhvmmetZbaeaPoxbjS8LJ9b0ZMcxidTv9a+5GylCAXNicBdM9x1iVVmJ1 dT1n2w7USKVdL4ydpwPUiec6RwACRk49CL3FgyyGNRlcpMmU9ArcY2l/Qr+At7ky tgPODXme/EvH12DsfoixjkNSLc4a7RHPfiJ3qy8XC6dshWYMKIegjateG8lVhf0P kdifMRCdOa+/l+RoyD1IjKTXPmVl9ihh6RBYDr6YrFclxg3uI4CvJCXht4dSXOCE 5vLIVZEf5yk+6Ee2ozcNTG2hZ8gd+aNy1WqBN3/5lFxhBYVNlTnUYd0URzenwIdW i2QP99mFrntCL25lhF7f7AeTHxSg/UVXnRA1oQZ+6qIPPLhNdApfd1lov/6+Hhe4 0zDbCbmIfVko/vZJeYOppaj+6jSZ3FafMfH5dDYyis4S4RbX2sjR9wGSd8PEdOTw /4dZXXfB2XslPb3KQsJSyGz75af3PxZ8PHLxj0HBSQXOA140htY= =t75l -----END PGP SIGNATURE----- Merge tag 'f2fs-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this cycle, f2fs has some performance improvements for Android workloads such as using read-unfair rwsems and adding some sysfs entries to control GCs and discard commands in more details. In addtiion, it has some tunings to improve the recovery speed after sudden power-cut. Enhancement: - add reader-unfair rwsems with F2FS_UNFAIR_RWSEM: will replace with generic API support - adjust to make the readahead/recovery flow more efficiently - sysfs entries to control issue speeds of GCs and Discard commands - enable idmapped mounts Bug fix: - correct wrong error handling routines - fix missing conditions in quota - fix a potential deadlock between writeback and block plug routines - fix a deadlock btween freezefs and evict_inode We've added some boundary checks to avoid kernel panics on corrupted images, and several minor code clean-ups" * tag 'f2fs-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (27 commits) f2fs: fix to do sanity check on .cp_pack_total_block_count f2fs: make gc_urgent and gc_segment_mode sysfs node readable f2fs: use aggressive GC policy during f2fs_disable_checkpoint() f2fs: fix compressed file start atomic write may cause data corruption f2fs: initialize sbi->gc_mode explicitly f2fs: introduce gc_urgent_mid mode f2fs: compress: fix to print raw data size in error path of lz4 decompression f2fs: remove redundant parameter judgment f2fs: use spin_lock to avoid hang f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs f2fs: remove unnecessary read for F2FS_FITS_IN_INODE f2fs: introduce F2FS_UNFAIR_RWSEM to support unfair rwsem f2fs: avoid an infinite loop in f2fs_sync_dirty_inodes f2fs: fix to do sanity check on curseg->alloc_type f2fs: fix to avoid potential deadlock f2fs: quota: fix loop condition at f2fs_quota_sync() f2fs: Restore rwsem lockdep support f2fs: fix missing free nid in f2fs_handle_failed_inode f2fs: support idmapped mounts f2fs: add a way to limit roll forward recovery time ... |
||
Chao Yu
|
98e92867b9 |
f2fs: use aggressive GC policy during f2fs_disable_checkpoint()
Let's enable GC_URGENT_HIGH mode during f2fs_disable_checkpoint(), so that we can use SSR allocator for GCed data/node persistence, it can improve the performance due to it avoiding migration of data/node locates in selected target segment of SSR allocator. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Chao Yu
|
c86868bbc2 |
f2fs: initialize sbi->gc_mode explicitly
It needs to initialized sbi->gc_mode to GC_NORMAL explicitly. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Jaegeuk Kim
|
ba900534f8 |
f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs
Let's purge inode cache in order to avoid the below deadlock. [freeze test] shrinkder freeze_super - pwercpu_down_write(SB_FREEZE_FS) - super_cache_scan - down_read(&sb->s_umount) - prune_icache_sb - dispose_list - evict - f2fs_evict_inode thaw_super - down_write(&sb->s_umount); - __percpu_down_read(SB_FREEZE_FS) Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Juhyung Park
|
680af5b824 |
f2fs: quota: fix loop condition at f2fs_quota_sync()
cnt should be passed to sb_has_quota_active() instead of type to check active quota properly. Moreover, when the type is -1, the compiler with enough inline knowledge can discard sb_has_quota_active() check altogether, causing a NULL pointer dereference at the following inode_lock(dqopt->files[cnt]): [ 2.796010] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0 [ 2.796024] Mem abort info: [ 2.796025] ESR = 0x96000005 [ 2.796028] EC = 0x25: DABT (current EL), IL = 32 bits [ 2.796029] SET = 0, FnV = 0 [ 2.796031] EA = 0, S1PTW = 0 [ 2.796032] Data abort info: [ 2.796034] ISV = 0, ISS = 0x00000005 [ 2.796035] CM = 0, WnR = 0 [ 2.796046] user pgtable: 4k pages, 39-bit VAs, pgdp=00000003370d1000 [ 2.796048] [00000000000000a0] pgd=0000000000000000, pud=0000000000000000 [ 2.796051] Internal error: Oops: 96000005 [#1] PREEMPT SMP [ 2.796056] CPU: 7 PID: 640 Comm: f2fs_ckpt-259:7 Tainted: G S 5.4.179-arter97-r8-64666-g2f16e087f9d8 #1 [ 2.796057] Hardware name: Qualcomm Technologies, Inc. Lahaina MTP lemonadep (DT) [ 2.796059] pstate: 80c00005 (Nzcv daif +PAN +UAO) [ 2.796065] pc : down_write+0x28/0x70 [ 2.796070] lr : f2fs_quota_sync+0x100/0x294 [ 2.796071] sp : ffffffa3f48ffc30 [ 2.796073] x29: ffffffa3f48ffc30 x28: 0000000000000000 [ 2.796075] x27: ffffffa3f6d718b8 x26: ffffffa415fe9d80 [ 2.796077] x25: ffffffa3f7290048 x24: 0000000000000001 [ 2.796078] x23: 0000000000000000 x22: ffffffa3f7290000 [ 2.796080] x21: ffffffa3f72904a0 x20: ffffffa3f7290110 [ 2.796081] x19: ffffffa3f77a9800 x18: ffffffc020aae038 [ 2.796083] x17: ffffffa40e38e040 x16: ffffffa40e38e6d0 [ 2.796085] x15: ffffffa40e38e6cc x14: ffffffa40e38e6d0 [ 2.796086] x13: 00000000000004f6 x12: 00162c44ff493000 [ 2.796088] x11: 0000000000000400 x10: ffffffa40e38c948 [ 2.796090] x9 : 0000000000000000 x8 : 00000000000000a0 [ 2.796091] x7 : 0000000000000000 x6 : 0000d1060f00002a [ 2.796093] x5 : ffffffa3f48ff718 x4 : 000000000000000d [ 2.796094] x3 : 00000000060c0000 x2 : 0000000000000001 [ 2.796096] x1 : 0000000000000000 x0 : 00000000000000a0 [ 2.796098] Call trace: [ 2.796100] down_write+0x28/0x70 [ 2.796102] f2fs_quota_sync+0x100/0x294 [ 2.796104] block_operations+0x120/0x204 [ 2.796106] f2fs_write_checkpoint+0x11c/0x520 [ 2.796107] __checkpoint_and_complete_reqs+0x7c/0xd34 [ 2.796109] issue_checkpoint_thread+0x6c/0xb8 [ 2.796112] kthread+0x138/0x414 [ 2.796114] ret_from_fork+0x10/0x18 [ 2.796117] Code: aa0803e0 aa1f03e1 52800022 aa0103e9 (c8e97d02) [ 2.796120] ---[ end trace 96e942e8eb6a0b53 ]--- [ 2.800116] Kernel panic - not syncing: Fatal exception [ 2.800120] SMP: stopping secondary CPUs Fixes: 9de71ede81e6 ("f2fs: quota: fix potential deadlock") Cc: <stable@vger.kernel.org> # v5.15+ Signed-off-by: Juhyung Park <qkrwngud825@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Chao Yu
|
984fc4e76d |
f2fs: support idmapped mounts
This patch enables idmapped mounts for f2fs, since all dedicated helpers for this functionality existsm, so, in this patch we just pass down the user_namespace argument from the VFS methods to the relevant helpers. Simple idmap example on f2fs image: 1. truncate -s 128M f2fs.img 2. mkfs.f2fs f2fs.img 3. mount f2fs.img /mnt/f2fs/ 4. touch /mnt/f2fs/file 5. ls -ln /mnt/f2fs/ total 0 -rw-r--r-- 1 0 0 0 2月 4 13:17 file 6. ./mount-idmapped --map-mount b:0:1001:1 /mnt/f2fs/ /mnt/scratch_f2fs/ 7. ls -ln /mnt/scratch_f2fs/ total 0 -rw-r--r-- 1 1001 1001 0 2月 4 13:17 file Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Jaegeuk Kim
|
47c8ebcce8 |
f2fs: add a way to limit roll forward recovery time
This adds a sysfs entry to call checkpoint during fsync() in order to avoid long elapsed time to run roll-forward recovery when booting the device. Default value doesn't enforce the limitation which is same as before. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Chao Yu
|
1018a5463a |
f2fs: introduce F2FS_IPU_HONOR_OPU_WRITE ipu policy
Once F2FS_IPU_FORCE policy is enabled in some cases: a) f2fs forces to use F2FS_IPU_FORCE in a small-sized volume b) user sets F2FS_IPU_FORCE policy via sysfs Then we may fail to defragment file due to IPU policy check, it doesn't make sense, let's introduce a new IPU policy to allow OPU during file defragmentation. In small-sized volume, let's enable F2FS_IPU_HONOR_OPU_WRITE policy by default. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Linus Torvalds
|
630c12862c |
Fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into the
previous CONFIG_UNICODE. It is -rc material since we don't want to expose the former symbol on 5.17. This has been living on linux-next for the past week. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8jAUPq50yNjPBCi4QEuZqsMcppQFAmH4lC0ACgkQQEuZqsMc ppRl1Q/+Lyba+DORs26C4p1GDS5ezHOCdbBUE8RFwWjIl+h5ckQ/8kndaXPRLorZ 1S9E6h5RfqhekGKOhMTXyfzqcW8qMzUy4i3J2lmJpDwATqLt+4Wu/M2BBH2CaIIL EhhW8D+WduAEM/TFYihH9LJ0RopvIsqcy8qdu+oSBGfPAdxJ0f2+Yx0pNTRfqVmi 8+Dry0nRhP12o9wXElpZ0/BYEZTlY+Zo6L/heT6/GKDLpz/YmZp18GAc/0TWb3LL ASujr+anU2LxSFskkyuMu+rbFE8eDshvHEuBZLxlD2o+tG6lAi4mNWZYc0/+jPMw 8TdJ5MEX3IlljXLRKuYctoCdsFQKLxH5IN5wLkiLvM5fBpeb/sWqNolx8f2s/f9R TaUdjwiqFnML4VnlEH3hd3/hUUVbnE+xJo6g1iRGgJY3eecimvwl8P5H7k9Sn3OS 4zh0bHT9pfg+vUR0BVnfdWi4OpPxSrdqCgFhHsmKaGMvTApm0qMKK1Cg4OPNtYwr d1RMqsqEBSJTHzr0nHoiWLhkIo8npRPy+LMK51D8j6wg0kOj4GGYerWm1MD9ZlbI rhPy7nDgdcH48Gk1m6o7dROZKCvkZK+/QDPelBgZHGcGB94lUugYVJQrlBjI+2+7 Wx5oQLgQgeabeMtDZ/YNy5Dsre20vas2oLj5cs6uuoWNOcBO6Ew= =YVNN -----END PGP SIGNATURE----- Merge tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode Pull unicode cleanup from Gabriel Krisman Bertazi: "A fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into the previous CONFIG_UNICODE. It is -rc material since we don't want to expose the former symbol on 5.17. This has been living on linux-next for the past week" * tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: unicode: clean up the Kconfig symbol confusion |
||
Tim Murray
|
e4544b63a7 |
f2fs: move f2fs to use reader-unfair rwsems
f2fs rw_semaphores work better if writers can starve readers, especially for the checkpoint thread, because writers are strictly more important than reader threads. This prevents significant priority inversion between low-priority readers that blocked while trying to acquire the read lock and a second acquisition of the write lock that might be blocking high priority work. Signed-off-by: Tim Murray <timmurray@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Christoph Hellwig
|
5298d4bfe8 |
unicode: clean up the Kconfig symbol confusion
Turn the CONFIG_UNICODE symbol into a tristate that generates some always built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol. Note that a lot of the IS_ENABLED() checks could be turned from cpp statements into normal ifs, but this change is intended to be fairly mechanic, so that should be cleaned up later. Fixes: 2b3d04787012 ("unicode: Add utf8-data module") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> |
||
Linus Torvalds
|
1d1df41c5a |
f2fs-for-5.17-rc1
In this round, we've tried to address some performance issues in f2fs_checkpoint and direct IO flows. Also, there was a work to enhance the page cache management used for compression. Other than them, we've done typical work including sysfs, code clean-ups, tracepoint, sanity check, in addition to bug fixes on corner cases. Enhancement: - use iomap for direct IO - try to avoid lock contention to improve f2fs_ckpt speed - avoid unnecessary memory allocation in compression flow - POSIX_FADV_DONTNEED drops the page cache containing compression pages - add some sysfs entries (gc_urgent_high_remaining, pending_discard) Bug fix: - try not to expose unwritten blocks to user by DIO : this was added to avoid merge conflict; another patch is coming to address other missing case. - relax minor error condition for file pinning feature used in Android OTA - fix potential deadlock case in compression flow - should not truncate any block on pinned file In addition, we've done some code clean-ups and tracepoint/sanity check improvement. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmHnY0sACgkQQBSofoJI UNIOkg//UmjCSSG63/YZM/lQQQe4kK/tT6QTT8W/VQtzWL9vXcL7bcaxzwX3LQbR Gb47Zmsw9bzVJt6GQ2VRbODE1py/KPNMl5SDXJXHo6fOZ/dOnHve32gLwcLEzhPd casB0TbwQJ6bpEsJiZ5ho741mURxUrSCHAAX6QIQVXh8ofm9qAqlWu74OLI6UHiV MM84XmXcHtGUZG5SCTWfSCJhJM6Az/3A83ws9KVeu86dlE7IrigphU2nI2vdCKiO trR3CiLC/364fiM+9ssLS3X2wKFPD/unEU7ljBv5UaG36jsVfW+tisjTKldzpiKK 44cNgDv1FEDxC0g3FKUhEGezAhxT8AJZB0in0zn8+5scarKGJtFCy9XhCGMVaeP+ usxvHVy8Ga1I7sMV6oHEBcGiPJWkmurzq1XXobtj6oL/JxN4gqUJeHTcod89hQHA lx9kZs7MLKm2au+T3gZf5xyx35YCie8sY/N1qoPy8tU9Q7FJ54NdqqAc9JEZ6mSk k9ybMaa/srHG/EI/XYPw0DrobHg6P5+bYtmsRvw2vP/nsNsD3ZI/EwBBEll2ITxC V5Dn7MljYWI/5kB41Hl5xz6X65WeIN7koRyTXw5mp9tkNrLugqII5hzhwhSlcqJ1 3k9TAN3RbVpWHBcyryDyLbm/+dcbwIJ4v/eJEMIDk8F2SrBGOZs= =LCJH -----END PGP SIGNATURE----- Merge tag 'f2fs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've tried to address some performance issues in f2fs_checkpoint and direct IO flows. Also, there was a work to enhance the page cache management used for compression. Other than them, we've done typical work including sysfs, code clean-ups, tracepoint, sanity check, in addition to bug fixes on corner cases. Enhancements: - use iomap for direct IO - try to avoid lock contention to improve f2fs_ckpt speed - avoid unnecessary memory allocation in compression flow - POSIX_FADV_DONTNEED drops the page cache containing compression pages - add some sysfs entries (gc_urgent_high_remaining, pending_discard) Bug fixes: - try not to expose unwritten blocks to user by DIO (this was added to avoid merge conflict; another patch is coming to address other missing case) - relax minor error condition for file pinning feature used in Android OTA - fix potential deadlock case in compression flow - should not truncate any block on pinned file In addition, we've done some code clean-ups and tracepoint/sanity check improvement" * tag 'f2fs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (29 commits) f2fs: do not allow partial truncation on pinned file f2fs: remove redunant invalidate compress pages f2fs: Simplify bool conversion f2fs: don't drop compressed page cache in .{invalidate,release}page f2fs: fix to reserve space for IO align feature f2fs: fix to check available space of CP area correctly in update_ckpt_flags() f2fs: support fault injection to f2fs_trylock_op() f2fs: clean up __find_inline_xattr() with __find_xattr() f2fs: fix to do sanity check on last xattr entry in __f2fs_setxattr() f2fs: do not bother checkpoint by f2fs_get_node_info f2fs: avoid down_write on nat_tree_lock during checkpoint f2fs: compress: fix potential deadlock of compress file f2fs: avoid EINVAL by SBI_NEED_FSCK when pinning a file f2fs: add gc_urgent_high_remaining sysfs node f2fs: fix to do sanity check in is_alive() f2fs: fix to avoid panic in is_alive() if metadata is inconsistent f2fs: fix to do sanity check on inode type during garbage collection f2fs: avoid duplicate call of mark_inode_dirty f2fs: show number of pending discard commands f2fs: support POSIX_FADV_DONTNEED drop compressed page cache ... |
||
Linus Torvalds
|
6661224e66 |
unicode patches for 5.17
This includes patches from Christoph Hellwig to split the large data tables of the unicode subsystem into a loadable module, which allow users to not have them around if case-insensitive filesystems are not to be used. It also includes minor code fixes to unicode and its users, from the same author. There is a trivial conflict in the function encoding_show in fs/f2fs/sysfs.c reported by linux-next between commit 84eab2a899f2 ("f2fs: replace snprintf in show functions with sysfs_emit") and commit a440943e68cd ("unicode: remove the charset field from struct unicode_map"). from my tree. All the patches here have been on linux-next releases for the past months. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8jAUPq50yNjPBCi4QEuZqsMcppQFAmHeLp0ACgkQQEuZqsMc ppRWdhAAstuibIlhUj1Vae070P92oaxM/Azz3IgyVFWensJyQV1PvbtFQDhyKM4w M3tQ45eK49vVHn+JpLHbiAdZV66rD/sMSsruCVIf/8KNVDisOBQtFar5yxVr0Ion AOMoG6/Xrk8BZlZH62fhtJGtu/EFmeFoGVdC81NdTSroe9G+26we3IULwHSE1lNH XMJFCgU6otuLDOna16U7kL77Tu7GXRJcQe1+2nRJ+u6Agxy2xTo/s4FHuxzRK0/e GsgO1scY6unWM23O6z+qJYazng2Zt3EOZtSGqU4TsvZwjUi2UtAYW1/vAQGc/q3Y hGxPYGgKC1VrXLfIcuyng7j0vFPtADbdHMbsJPoyy+Nz4znDJ81IAKAHMO1in3C8 CHKjW+6InmXNye/uwdRt8Tx49jxUHmWUbQRT5FwMDpzC7MAL+DVdPpVVQgpLVM/H gW3YpBEk5qQvVdh8DWZVW3rT3SnMX/v0+u+76FsMHKYNJMNrCnP6vXpCPQl/Gyut ycgK7qVF3o/bgNBf072H3ZBZajTv7ePvacP4Wth7m9I2ykk+p4IjQLpTC5rJK0By VC1xS4im2VqiIWE9eE5y9cXU1oa/AfOcOF+7FZcxT13IL6hKTtd4+H4yKgdcNsyk 7RjpGgjp+SU51/EilhEqMFgEe07CURxwGwhApizBSiTIOgZS96U= =4q9x -----END PGP SIGNATURE----- Merge tag 'unicode-for-next-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode Pull unicode updates from Gabriel Krisman Bertazi: "This includes patches from Christoph Hellwig to split the large data tables of the unicode subsystem into a loadable module, which allow users to not have them around if case-insensitive filesystems are not to be used. It also includes minor code fixes to unicode and its users, from the same author. All the patches here have been on linux-next releases for the past months" * tag 'unicode-for-next-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: unicode: only export internal symbols for the selftests unicode: Add utf8-data module unicode: cache the normalization tables in struct unicode_map unicode: move utf8cursor to utf8-selftest.c unicode: simplify utf8len unicode: remove the unused utf8{,n}age{min,max} functions unicode: pass a UNICODE_AGE() tripple to utf8_load unicode: mark the version field in struct unicode_map unsigned unicode: remove the charset field from struct unicode_map f2fs: simplify f2fs_sb_read_encoding ext4: simplify ext4_sb_read_encoding |
||
NeilBrown
|
4034247a0d |
mm: introduce memalloc_retry_wait()
Various places in the kernel - largely in filesystems - respond to a memory allocation failure by looping around and re-trying. Some of these cannot conveniently use __GFP_NOFAIL, for reasons such as: - a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on - a need to check for the process being signalled between failures - the possibility that other recovery actions could be performed - the allocation is quite deep in support code, and passing down an extra flag to say if __GFP_NOFAIL is wanted would be clumsy. Many of these currently use congestion_wait() which (in almost all cases) simply waits the given timeout - congestion isn't tracked for most devices. It isn't clear what the best delay is for loops, but it is clear that the various filesystems shouldn't be responsible for choosing a timeout. This patch introduces memalloc_retry_wait() with takes on that responsibility. Code that wants to retry a memory allocation can call this function passing the GFP flags that were used. It will wait however is appropriate. For now, it only considers __GFP_NORETRY and whatever gfpflags_allow_blocking() tests. If blocking is allowed without __GFP_NORETRY, then alloc_page either made some reclaim progress, or waited for a while, before failing. So there is no need for much further waiting. memalloc_retry_wait() will wait until the current jiffie ends. If this condition is not met, then alloc_page() won't have waited much if at all. In that case memalloc_retry_wait() waits about 200ms. This is the delay that most current loops uses. linux/sched/mm.h needs to be included in some files now, but linux/backing-dev.h does not. Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name Signed-off-by: NeilBrown <neilb@suse.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Chao Yu
|
300a842937 |
f2fs: fix to reserve space for IO align feature
https://bugzilla.kernel.org/show_bug.cgi?id=204137 With below script, we will hit panic during new segment allocation: DISK=bingo.img MOUNT_DIR=/mnt/f2fs dd if=/dev/zero of=$DISK bs=1M count=105 mkfs.f2fe -a 1 -o 19 -t 1 -z 1 -f -q $DISK mount -t f2fs $DISK $MOUNT_DIR -o "noinline_dentry,flush_merge,noextent_cache,mode=lfs,io_bits=7,fsync_mode=strict" for (( i = 0; i < 4096; i++ )); do name=`head /dev/urandom | tr -dc A-Za-z0-9 | head -c 10` mkdir $MOUNT_DIR/$name done umount $MOUNT_DIR rm $DISK --- Core dump --- Call Trace: allocate_segment_by_default+0x9d/0x100 [f2fs] f2fs_allocate_data_block+0x3c0/0x5c0 [f2fs] do_write_page+0x62/0x110 [f2fs] f2fs_outplace_write_data+0x43/0xc0 [f2fs] f2fs_do_write_data_page+0x386/0x560 [f2fs] __write_data_page+0x706/0x850 [f2fs] f2fs_write_cache_pages+0x267/0x6a0 [f2fs] f2fs_write_data_pages+0x19c/0x2e0 [f2fs] do_writepages+0x1c/0x70 __filemap_fdatawrite_range+0xaa/0xe0 filemap_fdatawrite+0x1f/0x30 f2fs_sync_dirty_inodes+0x74/0x1f0 [f2fs] block_operations+0xdc/0x350 [f2fs] f2fs_write_checkpoint+0x104/0x1150 [f2fs] f2fs_sync_fs+0xa2/0x120 [f2fs] f2fs_balance_fs_bg+0x33c/0x390 [f2fs] f2fs_write_node_pages+0x4c/0x1f0 [f2fs] do_writepages+0x1c/0x70 __writeback_single_inode+0x45/0x320 writeback_sb_inodes+0x273/0x5c0 wb_writeback+0xff/0x2e0 wb_workfn+0xa1/0x370 process_one_work+0x138/0x350 worker_thread+0x4d/0x3d0 kthread+0x109/0x140 ret_from_fork+0x25/0x30 The root cause here is, with IO alignment feature enables, in worst case, we need F2FS_IO_SIZE() free blocks space for single one 4k write due to IO alignment feature will fill dummy pages to make IO being aligned. So we will easily run out of free segments during non-inline directory's data writeback, even in process of foreground GC. In order to fix this issue, I just propose to reserve additional free space for IO alignment feature to handle worst case of free space usage ratio during FGGC. Fixes: 0a595ebaaa6b ("f2fs: support IO alignment for DATA and NODE writes") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Chao Yu
|
3e0203893e |
f2fs: support fault injection to f2fs_trylock_op()
f2fs: support fault injection for f2fs_trylock_op() This patch supports to inject fault into f2fs_trylock_op(). Usage: a) echo 65536 > /sys/fs/f2fs/<dev>/inject_type or b) mount -o fault_type=65536 <dev> <mountpoint> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Daeho Jeong
|
325163e989 |
f2fs: add gc_urgent_high_remaining sysfs node
Added a new sysfs node called gc_urgent_high_remaining. The user can set the trial count limit for GC urgent high mode with this value. If GC thread gets to the limit, the mode will turn back to GC normal mode. By default, the value is zero, which means there is no limit like before. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Linus Torvalds
|
c8c109546a |
Update to zstd-1.4.10
This PR includes 5 commits that update the zstd library version: 1. Adds a new kernel-style wrapper around zstd. This wrapper API is functionally equivalent to the subset of the current zstd API that is currently used. The wrapper API changes to be kernel style so that the symbols don't collide with zstd's symbols. The update to zstd-1.4.10 maintains the same API and preserves the semantics, so that none of the callers need to be updated. All callers are updated in the commit, because there are zero functional changes. 2. Adds an indirection for `lib/decompress_unzstd.c` so it doesn't depend on the layout of `lib/zstd/` to include every source file. This allows the next patch to be automatically generated. 3. Imports the zstd-1.4.10 source code. This commit is automatically generated from upstream zstd (https://github.com/facebook/zstd). 4. Adds me (terrelln@fb.com) as the maintainer of `lib/zstd`. 5. Fixes a newly added build warning for clang. The discussion around this patchset has been pretty long, so I've included a FAQ-style summary of the history of the patchset, and why we are taking this approach. Why do we need to update? ------------------------- The zstd version in the kernel is based off of zstd-1.3.1, which is was released August 20, 2017. Since then zstd has seen many bug fixes and performance improvements. And, importantly, upstream zstd is continuously fuzzed by OSS-Fuzz, and bug fixes aren't backported to older versions. So the only way to sanely get these fixes is to keep up to date with upstream zstd. There are no known security issues that affect the kernel, but we need to be able to update in case there are. And while there are no known security issues, there are relevant bug fixes. For example the problem with large kernel decompression has been fixed upstream for over 2 years https://lkml.org/lkml/2020/9/29/27. Additionally the performance improvements for kernel use cases are significant. Measured for x86_64 on my Intel i9-9900k @ 3.6 GHz: - BtrFS zstd compression at levels 1 and 3 is 5% faster - BtrFS zstd decompression+read is 15% faster - SquashFS zstd decompression+read is 15% faster - F2FS zstd compression+write at level 3 is 8% faster - F2FS zstd decompression+read is 20% faster - ZRAM decompression+read is 30% faster - Kernel zstd decompression is 35% faster - Initramfs zstd decompression+build is 5% faster On top of this, there are significant performance improvements coming down the line in the next zstd release, and the new automated update patch generation will allow us to pull them easily. How is the update patch generated? ---------------------------------- The first two patches are preparation for updating the zstd version. Then the 3rd patch in the series imports upstream zstd into the kernel. This patch is automatically generated from upstream. A script makes the necessary changes and imports it into the kernel. The changes are: - Replace all libc dependencies with kernel replacements and rewrite includes. - Remove unncessary portability macros like: #if defined(_MSC_VER). - Use the kernel xxhash instead of bundling it. This automation gets tested every commit by upstream's continuous integration. When we cut a new zstd release, we will submit a patch to the kernel to update the zstd version in the kernel. The automated process makes it easy to keep the kernel version of zstd up to date. The current zstd in the kernel shares the guts of the code, but has a lot of API and minor changes to work in the kernel. This is because at the time upstream zstd was not ready to be used in the kernel envrionment as-is. But, since then upstream zstd has evolved to support being used in the kernel as-is. Why are we updating in one big patch? ------------------------------------- The 3rd patch in the series is very large. This is because it is restructuring the code, so it both deletes the existing zstd, and re-adds the new structure. Future updates will be directly proportional to the changes in upstream zstd since the last import. They will admittidly be large, as zstd is an actively developed project, and has hundreds of commits between every release. However, there is no other great alternative. One option ruled out is to replay every upstream zstd commit. This is not feasible for several reasons: - There are over 3500 upstream commits since the zstd version in the kernel. - The automation to automatically generate the kernel update was only added recently, so older commits cannot easily be imported. - Not every upstream zstd commit builds. - Only zstd releases are "supported", and individual commits may have bugs that were fixed before a release. Another option to reduce the patch size would be to first reorganize to the new file structure, and then apply the patch. However, the current kernel zstd is formatted with clang-format to be more "kernel-like". But, the new method imports zstd as-is, without additional formatting, to allow for closer correlation with upstream, and easier debugging. So the patch wouldn't be any smaller. It also doesn't make sense to import upstream zstd commit by commit going forward. Upstream zstd doesn't support production use cases running of the development branch. We have a lot of post-commit fuzzing that catches many bugs, so indiviudal commits may be buggy, but fixed before a release. So going forward, I intend to import every (important) zstd release into the Kernel. So, while it isn't ideal, updating in one big patch is the only patch I see forward. Who is responsible for this code? --------------------------------- I am. This patchset adds me as the maintainer for zstd. Previously, there was no tree for zstd patches. Because of that, there were several patches that either got ignored, or took a long time to merge, since it wasn't clear which tree should pick them up. I'm officially stepping up as maintainer, and setting up my tree as the path through which zstd patches get merged. I'll make sure that patches to the kernel zstd get ported upstream, so they aren't erased when the next version update happens. How is this code tested? ------------------------ I tested every caller of zstd on x86_64 (BtrFS, ZRAM, SquashFS, F2FS, Kernel, InitRAMFS). I also tested Kernel & InitRAMFS on i386 and aarch64. I checked both performance and correctness. Also, thanks to many people in the community who have tested these patches locally. If you have tested the patches, please reply with a Tested-By so I can collect them for the PR I will send to Linus. Lastly, this code will bake in linux-next before being merged into v5.16. Why update to zstd-1.4.10 when zstd-1.5.0 has been released? ------------------------------------------------------------ This patchset has been outstanding since 2020, and zstd-1.4.10 was the latest release when it was created. Since the update patch is automatically generated from upstream, I could generate it from zstd-1.5.0. However, there were some large stack usage regressions in zstd-1.5.0, and are only fixed in the latest development branch. And the latest development branch contains some new code that needs to bake in the fuzzer before I would feel comfortable releasing to the kernel. Once this patchset has been merged, and we've released zstd-1.5.1, we can update the kernel to zstd-1.5.1, and exercise the update process. You may notice that zstd-1.4.10 doesn't exist upstream. This release is an artifical release based off of zstd-1.4.9, with some fixes for the kernel backported from the development branch. I will tag the zstd-1.4.10 release after this patchset is merged, so the Linux Kernel is running a known version of zstd that can be debugged upstream. Why was a wrapper API added? ---------------------------- The first versions of this patchset migrated the kernel to the upstream zstd API. It first added a shim API that supported the new upstream API with the old code, then updated callers to use the new shim API, then transitioned to the new code and deleted the shim API. However, Cristoph Hellwig suggested that we transition to a kernel style API, and hide zstd's upstream API behind that. This is because zstd's upstream API is supports many other use cases, and does not follow the kernel style guide, while the kernel API is focused on the kernel's use cases, and follows the kernel style guide. Where is the previous discussion? --------------------------------- Links for the discussions of the previous versions of the patch set. The largest changes in the design of the patchset are driven by the discussions in V11, V5, and V1. Sorry for the mix of links, I couldn't find most of the the threads on lkml.org. V12: https://www.spinics.net/lists/linux-crypto/msg58189.html V11: https://lore.kernel.org/linux-btrfs/20210430013157.747152-1-nickrterrell@gmail.com/ V10: https://lore.kernel.org/lkml/20210426234621.870684-2-nickrterrell@gmail.com/ V9: https://lore.kernel.org/linux-btrfs/20210330225112.496213-1-nickrterrell@gmail.com/ V8: https://lore.kernel.org/linux-f2fs-devel/20210326191859.1542272-1-nickrterrell@gmail.com/ V7: https://lkml.org/lkml/2020/12/3/1195 V6: https://lkml.org/lkml/2020/12/2/1245 V5: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ V4: https://www.spinics.net/lists/linux-btrfs/msg105783.html V3: https://lkml.org/lkml/2020/9/23/1074 V2: https://www.spinics.net/lists/linux-btrfs/msg105505.html V1: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ Signed-off-by: Nick Terrell <terrelln@fb.com> Tested By: Paul Jones <paul@pauljones.id.au> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64 Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEmIwAqlFIzbQodPwyuzRpqaNEqPUFAmGJyKIACgkQuzRpqaNE qPXnmw/+PKyCn6LvRQqNfdpF5f59j/B1Fab15tkpVyz3UWnCw+EKaPZOoTfIsjRf 7TMUVm4iGsm+6xBO/YrGdRl4IxocNgXzsgnJ1lTGDbvfRC1tG+YNwuv+EEXwKYq5 Yz3DRwDotgsrV0Kg05b+VIgkmAuY3ukmu2n09LnAdKkxoIgmHw3MIDCdVZW2Br4c sjJmYI+fiJd7nAlbDa42VOrdTiLzkl/2BsjWBqTv6zbiQ5uuJGsKb7P3kpcybWzD 5C118pyE3qlVyvFz+UFu8WbN0NSf47DP22KV/3IrhNX7CVQxYBe+9/oVuPWTgRx0 4Vl0G6u7rzh4wDZuGqTC3LYWwH9GfycI0fnVC0URP2XMOcGfPlGd3L0PEmmAeTmR fEbaGAN4dr0jNO3lmbyAGe/G8tvtXQx/4ZjS9Pa3TlQP24GARU/f78/blbKR87Vz BGMndmSi92AscgXb9buO3bCwAY1YtH5WiFaZT1XVk42cj4MiOLvPTvP4UMzDDxcZ 56ahmAP/84kd6H+cv9LmgEMqcIBmxdUcO1nuAItJ4wdrMUgw3+lrbxwFkH9xPV7I okC1K0TIVEobADbxbdMylxClAylbuW+37Pko97NmAlnzNCPNE38f3s3gtXRrUTaR IP8jv5UQ7q3dFiWnNLLodx5KM6s32GVBKRLRnn/6SJB7QzlyHXU= =Xb18 -----END PGP SIGNATURE----- Merge tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux Pull zstd update from Nick Terrell: "Update to zstd-1.4.10. Add myself as the maintainer of zstd and update the zstd version in the kernel, which is now 4 years out of date, to a much more recent zstd release. This includes bug fixes, much more extensive fuzzing, and performance improvements. And generates the kernel zstd automatically from upstream zstd, so it is easier to keep the zstd verison up to date, and we don't fall so far out of date again. This includes 5 commits that update the zstd library version: - Adds a new kernel-style wrapper around zstd. This wrapper API is functionally equivalent to the subset of the current zstd API that is currently used. The wrapper API changes to be kernel style so that the symbols don't collide with zstd's symbols. The update to zstd-1.4.10 maintains the same API and preserves the semantics, so that none of the callers need to be updated. All callers are updated in the commit, because there are zero functional changes. - Adds an indirection for `lib/decompress_unzstd.c` so it doesn't depend on the layout of `lib/zstd/` to include every source file. This allows the next patch to be automatically generated. - Imports the zstd-1.4.10 source code. This commit is automatically generated from upstream zstd (https://github.com/facebook/zstd). - Adds me (terrelln@fb.com) as the maintainer of `lib/zstd`. - Fixes a newly added build warning for clang. The discussion around this patchset has been pretty long, so I've included a FAQ-style summary of the history of the patchset, and why we are taking this approach. Why do we need to update? ------------------------- The zstd version in the kernel is based off of zstd-1.3.1, which is was released August 20, 2017. Since then zstd has seen many bug fixes and performance improvements. And, importantly, upstream zstd is continuously fuzzed by OSS-Fuzz, and bug fixes aren't backported to older versions. So the only way to sanely get these fixes is to keep up to date with upstream zstd. There are no known security issues that affect the kernel, but we need to be able to update in case there are. And while there are no known security issues, there are relevant bug fixes. For example the problem with large kernel decompression has been fixed upstream for over 2 years [1] Additionally the performance improvements for kernel use cases are significant. Measured for x86_64 on my Intel i9-9900k @ 3.6 GHz: - BtrFS zstd compression at levels 1 and 3 is 5% faster - BtrFS zstd decompression+read is 15% faster - SquashFS zstd decompression+read is 15% faster - F2FS zstd compression+write at level 3 is 8% faster - F2FS zstd decompression+read is 20% faster - ZRAM decompression+read is 30% faster - Kernel zstd decompression is 35% faster - Initramfs zstd decompression+build is 5% faster On top of this, there are significant performance improvements coming down the line in the next zstd release, and the new automated update patch generation will allow us to pull them easily. How is the update patch generated? ---------------------------------- The first two patches are preparation for updating the zstd version. Then the 3rd patch in the series imports upstream zstd into the kernel. This patch is automatically generated from upstream. A script makes the necessary changes and imports it into the kernel. The changes are: - Replace all libc dependencies with kernel replacements and rewrite includes. - Remove unncessary portability macros like: #if defined(_MSC_VER). - Use the kernel xxhash instead of bundling it. This automation gets tested every commit by upstream's continuous integration. When we cut a new zstd release, we will submit a patch to the kernel to update the zstd version in the kernel. The automated process makes it easy to keep the kernel version of zstd up to date. The current zstd in the kernel shares the guts of the code, but has a lot of API and minor changes to work in the kernel. This is because at the time upstream zstd was not ready to be used in the kernel envrionment as-is. But, since then upstream zstd has evolved to support being used in the kernel as-is. Why are we updating in one big patch? ------------------------------------- The 3rd patch in the series is very large. This is because it is restructuring the code, so it both deletes the existing zstd, and re-adds the new structure. Future updates will be directly proportional to the changes in upstream zstd since the last import. They will admittidly be large, as zstd is an actively developed project, and has hundreds of commits between every release. However, there is no other great alternative. One option ruled out is to replay every upstream zstd commit. This is not feasible for several reasons: - There are over 3500 upstream commits since the zstd version in the kernel. - The automation to automatically generate the kernel update was only added recently, so older commits cannot easily be imported. - Not every upstream zstd commit builds. - Only zstd releases are "supported", and individual commits may have bugs that were fixed before a release. Another option to reduce the patch size would be to first reorganize to the new file structure, and then apply the patch. However, the current kernel zstd is formatted with clang-format to be more "kernel-like". But, the new method imports zstd as-is, without additional formatting, to allow for closer correlation with upstream, and easier debugging. So the patch wouldn't be any smaller. It also doesn't make sense to import upstream zstd commit by commit going forward. Upstream zstd doesn't support production use cases running of the development branch. We have a lot of post-commit fuzzing that catches many bugs, so indiviudal commits may be buggy, but fixed before a release. So going forward, I intend to import every (important) zstd release into the Kernel. So, while it isn't ideal, updating in one big patch is the only patch I see forward. Who is responsible for this code? --------------------------------- I am. This patchset adds me as the maintainer for zstd. Previously, there was no tree for zstd patches. Because of that, there were several patches that either got ignored, or took a long time to merge, since it wasn't clear which tree should pick them up. I'm officially stepping up as maintainer, and setting up my tree as the path through which zstd patches get merged. I'll make sure that patches to the kernel zstd get ported upstream, so they aren't erased when the next version update happens. How is this code tested? ------------------------ I tested every caller of zstd on x86_64 (BtrFS, ZRAM, SquashFS, F2FS, Kernel, InitRAMFS). I also tested Kernel & InitRAMFS on i386 and aarch64. I checked both performance and correctness. Also, thanks to many people in the community who have tested these patches locally. Lastly, this code will bake in linux-next before being merged into v5.16. Why update to zstd-1.4.10 when zstd-1.5.0 has been released? ------------------------------------------------------------ This patchset has been outstanding since 2020, and zstd-1.4.10 was the latest release when it was created. Since the update patch is automatically generated from upstream, I could generate it from zstd-1.5.0. However, there were some large stack usage regressions in zstd-1.5.0, and are only fixed in the latest development branch. And the latest development branch contains some new code that needs to bake in the fuzzer before I would feel comfortable releasing to the kernel. Once this patchset has been merged, and we've released zstd-1.5.1, we can update the kernel to zstd-1.5.1, and exercise the update process. You may notice that zstd-1.4.10 doesn't exist upstream. This release is an artifical release based off of zstd-1.4.9, with some fixes for the kernel backported from the development branch. I will tag the zstd-1.4.10 release after this patchset is merged, so the Linux Kernel is running a known version of zstd that can be debugged upstream. Why was a wrapper API added? ---------------------------- The first versions of this patchset migrated the kernel to the upstream zstd API. It first added a shim API that supported the new upstream API with the old code, then updated callers to use the new shim API, then transitioned to the new code and deleted the shim API. However, Cristoph Hellwig suggested that we transition to a kernel style API, and hide zstd's upstream API behind that. This is because zstd's upstream API is supports many other use cases, and does not follow the kernel style guide, while the kernel API is focused on the kernel's use cases, and follows the kernel style guide. Where is the previous discussion? --------------------------------- Links for the discussions of the previous versions of the patch set below. The largest changes in the design of the patchset are driven by the discussions in v11, v5, and v1. Sorry for the mix of links, I couldn't find most of the the threads on lkml.org" Link: https://lkml.org/lkml/2020/9/29/27 [1] Link: https://www.spinics.net/lists/linux-crypto/msg58189.html [v12] Link: https://lore.kernel.org/linux-btrfs/20210430013157.747152-1-nickrterrell@gmail.com/ [v11] Link: https://lore.kernel.org/lkml/20210426234621.870684-2-nickrterrell@gmail.com/ [v10] Link: https://lore.kernel.org/linux-btrfs/20210330225112.496213-1-nickrterrell@gmail.com/ [v9] Link: https://lore.kernel.org/linux-f2fs-devel/20210326191859.1542272-1-nickrterrell@gmail.com/ [v8] Link: https://lkml.org/lkml/2020/12/3/1195 [v7] Link: https://lkml.org/lkml/2020/12/2/1245 [v6] Link: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ [v5] Link: https://www.spinics.net/lists/linux-btrfs/msg105783.html [v4] Link: https://lkml.org/lkml/2020/9/23/1074 [v3] Link: https://www.spinics.net/lists/linux-btrfs/msg105505.html [v2] Link: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ [v1] Signed-off-by: Nick Terrell <terrelln@fb.com> Tested By: Paul Jones <paul@pauljones.id.au> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64 Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf> * tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux: lib: zstd: Add cast to silence clang's -Wbitwise-instead-of-logical MAINTAINERS: Add maintainer entry for zstd lib: zstd: Upgrade to latest upstream zstd version 1.4.10 lib: zstd: Add decompress_sources.h for decompress_unzstd lib: zstd: Add kernel-specific API |