IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This pull request includes a number of patches
addressing improvements in the cache portions of the 9p
client.
The biggest improvements have to do with fixing handling
of inodes and eliminating duplicate structures and unnecessary
allocation/release of inode structures and many associated
unnecessary protocol traffic. This also dramatically
reduced code complexity across the code and sets us up to add
proper temporal cache capabilities.
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElpbw0ZalkJikytFRiP/V+0pf/5gFAmX0XX0ACgkQiP/V+0pf
/5jdgg//UCY+RTB9UNr/B1I5sj1m36l0BfFB/jaAMApH5/bQPF2mgtjD1jQL2qk2
gbAvR25erDdKiggRAbNU6OGVBIKXHiUPnL5d6UNgeyQQu1wSzStCejTfoQ/mwgTl
1N2Uge63zz5L0XgJm31gPcKSlJAD1r70UgsPw0ldkimEPqiBQjQu5VQqUWKgrkBe
hcdXbuZbmXBLfZv9i0XI9BmwIeaSRwY2qx/JUr2eCoAXrDRNfwBRVMAfqpJ6L6I0
56C5Q2EDjipUnYp1Z3fqg80IUa+MnoF3rbMyMj54JBWxAPyMspamUBltYlZb5pRv
zvjSJk3Uof8Wzc+beVVbyV2NmC6gC/cm2MnSYZBTX1KWTMl46eiloSrFz+Pv1/O1
8rcZFQuw8uAox9/4qc1lPCAC0hZVUFelj8NpCPN8aHS3cF0eN/MCQ3qfBkFgo8um
jMYTUiai1tjdxWUWrUSpSKFHzBcL986+1bnE7qwI3qPXrxrnB9kOhzSilDJSuHnV
Ei6zinvMEUBmKYLdNaCQ9N1KTKPr0bqXIlEvk4LDaGLGkZbI7eKuN5mCxZB0DXCc
65WSc5G9NYux9QjobR+TUwwLdiQH84w/3zYS41r/as7VqABcj/X/2b2ZfQ6/WBuF
I0CmSWz2gfcpQQc642dzZJk7UP4wOVtsW3F2JAQ1GPq/D5UOmWU=
=1LU1
-----END PGP SIGNATURE-----
Merge tag '9p-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9p updates from Eric Van Hensbergen:
"This includes a number of patches addressing improvements in the cache
portions of the 9p client.
The biggest improvements have to do with fixing handling of inodes and
eliminating duplicate structures and unnecessary allocation/release of
inode structures and many associated unnecessary protocol traffic.
This also dramatically reduced code complexity across the code and
sets us up to add proper temporal cache capabilities"
* tag '9p-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
fs/9p: fix dups even in uncached mode
fs/9p: simplify iget to remove unnecessary paths
fs/9p: rework qid2ino logic
fs/9p: Eliminate now unused v9fs_get_inode
fs/9p: Eliminate redundant non-cache path in mknod
fs/9p: remove walk and inode allocation from symlink
fs/9p: convert mkdir to use get_new_inode
fs/9p: switch vfsmount to use v9fs_get_new_inode
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZfLjeQAKCRDh3BK/laaZ
PBYQAQDqYZzq91Kn5jdvjaSd+6I/+x7MDLOIP5hPX0HJLuBxWAEAqENoo4Of0GTC
ltW7DKrQy9E3CMp6VKSLVJPN4BYP9gk=
=GvOE
-----END PGP SIGNATURE-----
Merge tag 'fuse-update-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Add passthrough mode for regular file I/O.
This allows performing read and write (also via memory maps) on a
backing file without incurring the overhead of roundtrips to
userspace. For now this is only allowed to privileged servers, but
this limitation will go away in the future (Amir Goldstein)
- Fix interaction of direct I/O mode with memory maps (Bernd Schubert)
- Export filesystem tags through sysfs for virtiofs (Stefan Hajnoczi)
- Allow resending queued requests for server crash recovery (Zhao Chen)
- Misc fixes and cleanups
* tag 'fuse-update-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (38 commits)
fuse: get rid of ff->readdir.lock
fuse: remove unneeded lock which protecting update of congestion_threshold
fuse: Fix missing FOLL_PIN for direct-io
fuse: remove an unnecessary if statement
fuse: Track process write operations in both direct and writethrough modes
fuse: Use the high bit of request ID for indicating resend requests
fuse: Introduce a new notification type for resend pending requests
fuse: add support for explicit export disabling
fuse: __kuid_val/__kgid_val helpers in fuse_fill_attr_from_inode()
fuse: fix typo for fuse_permission comment
fuse: Convert fuse_writepage_locked to take a folio
fuse: Remove fuse_writepage
virtio_fs: remove duplicate check if queue is broken
fuse: use FUSE_ROOT_ID in fuse_get_root_inode()
fuse: don't unhash root
fuse: fix root lookup with nonzero generation
fuse: replace remaining make_bad_inode() with fuse_make_bad()
virtiofs: drop __exit from virtio_fs_sysfs_exit()
fuse: implement passthrough for mmap
fuse: implement splice read/write passthrough
...
tests.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmXydHkACgkQ8vlZVpUN
gaPFcQf/e1DcEw7dITXoOJ16Sz3pI3ykFEae3aIp1C0DoBL6ncnx4NrKJlbKVmfG
CvYwwaPIILps0W5gwRll0wG8G9wrx+QY+xx5elsFKlfLsiRmkvXEIFPELYvtblcG
u6fXumpArtH2dbjsmxw+gxEuborl3aeOIWW62dVvarEpfdvFlEwMAfBYlJ/E4HKM
z74CmR09sr51XuQZTKaUNioyS6qNR/HIBoelJ50Xt6qLZrpfyIxtU/wHbN1GAM5+
pBXCYxlBaiSJHb8p9R99DT5JqVD5zwrqWscbajEhOJo4QQQacJGGvIOHz6b6FMRV
+dPnTBh79t8DAktqT6LAf83bmiRCWQ==
=4/t9
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Ext4 bug fixes and cleanups, plus some additional kunit tests"
* tag 'ext4_for_linus-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
ext4: initialize sbi->s_freeclusters_counter and sbi->s_dirtyclusters_counter before use in kunit test
ext4: hold group lock in ext4 kunit test
ext4: alloc test super block from sget
ext4: kunit: use dynamic inode allocation
ext4: enable meta_bg only when new desc blocks are needed
ext4: remove unused parameter biop in ext4_issue_discard()
ext4: remove SLAB_MEM_SPREAD flag usage
ext4: verify s_clusters_per_group even without bigalloc
ext4: fix corruption during on-line resize
ext4: don't report EOPNOTSUPP errors from discard
ext4: drop duplicate ea_inode handling in ext4_xattr_block_set()
ext4: fold quota accounting into ext4_xattr_inode_lookup_create()
ext4: correct best extent lstart adjustment logic
ext4: forbid commit inconsistent quota data when errors=remount-ro
ext4: add a hint for block bitmap corrupt state in mb_groups
ext4: fix the comment of ext4_map_blocks()/ext4_ext_map_blocks()
ext4: improve error msg for ext4_mb_seq_groups_show
ext4: remove unused buddy_loaded in ext4_mb_seq_groups_show
ext4: Add unit test for ext4_mb_mark_diskspace_used
ext4: Add unit test for mb_free_blocks
...
- Subvolume children btree; this is needed for providing a userspace
interface for walking subvolumes, which will come later
- Lots of improvements to directory structure checking
- Improved journal pipelining, significantly improving performance on
high iodepth write workloads
- Discard path improvements: the discard path is more efficient, and no
longer flushes the journal unnecessarily
- Buffered write path can now avoid taking the inode lock
- new mm helper: memalloc_flags_{save|restore}
- mempool now does kvmalloc mempools
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmXycEcACgkQE6szbY3K
bnYUTg/+K4Nv2EdAqOCyHRTKaF2OgJDUb25ZDmbGpfT1XyPrNB7/+CxHqSdEP7/e
FVuhtP61vnQAImDv82u9iZiab/TnuCZPUrjSobFEvrWYoGRtP9Bm9MyYB28NzmMa
AXGmS4yJGVwtxxrFNxZP98IbiHYiHSoYbkqxX2E5VgLag8Ru8peb7oD0Ro3zw0rb
z+6UM/seJ7on5i/9IJEMKKXFVEoZC2J5DAVoe1TghG2kgOw3cKu5OUdltLPOY5jL
jkm5J5wa6Ep46nufHat92yiMxXIQrf4U9LkXxzTi5ThoSmt+Af2qXcBjqTTVqd2D
1dGxj+UG8iu4DCCbQC6EA7J5EMvxfJM0+9lk1ULUgxUs3X69co6nlI6XH1fwEMqk
KpIqd35+Y/IYgogt9ioXI0dtXyL7dbaTVt6NZhc9SaPGPX+C2V0+l4bqToFdNaPH
0KATjjyQaJRE4ZFIjr6GliYOtKWDLi/HPEyoBivniUn7cF5vjSvti+cSQwNDSPpa
6jOd5Y923Iq9ZqDAPM3+mvTH8nNaaf2T2fmbPNrc5pdWbha9bGwOU71zvKHNFGm/
66ZsnwhKSk+uwglTMZHPKSkJJXUYAHESw3slQtEWHZVlliArc55+pBHwE00bvRt7
KHUUqkqXBUPzbp/kdZGylMAdH9+8j9TE5QJ2RaoryFm/eCfexmI=
=6xnj
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs updates from Kent Overstreet:
- Subvolume children btree; this is needed for providing a userspace
interface for walking subvolumes, which will come later
- Lots of improvements to directory structure checking
- Improved journal pipelining, significantly improving performance on
high iodepth write workloads
- Discard path improvements: the discard path is more efficient, and no
longer flushes the journal unnecessarily
- Buffered write path can now avoid taking the inode lock
- new mm helper: memalloc_flags_{save|restore}
- mempool now does kvmalloc mempools
* tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefs: (128 commits)
bcachefs: time_stats: shrink time_stat_buffer for better alignment
bcachefs: time_stats: split stats-with-quantiles into a separate structure
bcachefs: mean_and_variance: put struct mean_and_variance_weighted on a diet
bcachefs: time_stats: add larger units
bcachefs: pull out time_stats.[ch]
bcachefs: reconstruct_alloc cleanup
bcachefs: fix bch_folio_sector padding
bcachefs: Fix btree key cache coherency during replay
bcachefs: Always flush write buffer in delete_dead_inodes()
bcachefs: Fix order of gc_done passes
bcachefs: fix deletion of indirect extents in btree_gc
bcachefs: Prefer struct_size over open coded arithmetic
bcachefs: Kill unused flags argument to btree_split()
bcachefs: Check for writing superblocks with nonsense member seq fields
bcachefs: fix bch2_journal_buf_to_text()
lib/generic-radix-tree.c: Make nodes more reasonably sized
bcachefs: copy_(to|from)_user_errcode()
bcachefs: Split out bkey_types.h
bcachefs: fix lost journal buf wakeup due to improved pipelining
bcachefs: intercept mountoption value for bool type
...
heap optimizations".
- Kuan-Wei Chiu has also sped up the library sorting code in the series
"lib/sort: Optimize the number of swaps and comparisons".
- Alexey Gladkov has added the ability for code running within an IPC
namespace to alter its IPC and MQ limits. The series is "Allow to
change ipc/mq sysctls inside ipc namespace".
- Geert Uytterhoeven has contributed some dhrystone maintenance work in
the series "lib: dhry: miscellaneous cleanups".
- Ryusuke Konishi continues nilfs2 maintenance work in the series
"nilfs2: eliminate kmap and kmap_atomic calls"
"nilfs2: fix kernel bug at submit_bh_wbc()"
- Nathan Chancellor has updated our build tools requirements in the
series "Bump the minimum supported version of LLVM to 13.0.1".
- Muhammad Usama Anjum continues with the selftests maintenance work in
the series "selftests/mm: Improve run_vmtests.sh".
- Oleg Nesterov has done some maintenance work against the signal code
in the series "get_signal: minor cleanups and fix".
Plus the usual shower of singleton patches in various parts of the tree.
Please see the individual changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfMnvgAKCRDdBJ7gKXxA
jjKMAP4/Upq07D4wjkMVPb+QrkipbbLpdcgJ++q3z6rba4zhPQD+M3SFriIJk/Xh
tKVmvihFxfAhdDthseXcIf1nBjMALwY=
=8rVc
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min
heap optimizations".
- Kuan-Wei Chiu has also sped up the library sorting code in the series
"lib/sort: Optimize the number of swaps and comparisons".
- Alexey Gladkov has added the ability for code running within an IPC
namespace to alter its IPC and MQ limits. The series is "Allow to
change ipc/mq sysctls inside ipc namespace".
- Geert Uytterhoeven has contributed some dhrystone maintenance work in
the series "lib: dhry: miscellaneous cleanups".
- Ryusuke Konishi continues nilfs2 maintenance work in the series
"nilfs2: eliminate kmap and kmap_atomic calls"
"nilfs2: fix kernel bug at submit_bh_wbc()"
- Nathan Chancellor has updated our build tools requirements in the
series "Bump the minimum supported version of LLVM to 13.0.1".
- Muhammad Usama Anjum continues with the selftests maintenance work in
the series "selftests/mm: Improve run_vmtests.sh".
- Oleg Nesterov has done some maintenance work against the signal code
in the series "get_signal: minor cleanups and fix".
Plus the usual shower of singleton patches in various parts of the tree.
Please see the individual changelogs for details.
* tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
nilfs2: prevent kernel bug at submit_bh_wbc()
nilfs2: fix failure to detect DAT corruption in btree and direct mappings
ocfs2: enable ocfs2_listxattr for special files
ocfs2: remove SLAB_MEM_SPREAD flag usage
assoc_array: fix the return value in assoc_array_insert_mid_shortcut()
buildid: use kmap_local_page()
watchdog/core: remove sysctl handlers from public header
nilfs2: use div64_ul() instead of do_div()
mul_u64_u64_div_u64: increase precision by conditionally swapping a and b
kexec: copy only happens before uchunk goes to zero
get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task
get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig
get_signal: don't abuse ksig->info.si_signo and ksig->sig
const_structs.checkpatch: add device_type
Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>"
dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace()
list: leverage list_is_head() for list_entry_is_head()
nilfs2: MAINTAINERS: drop unreachable project mirror site
smp: make __smp_processor_id() 0-argument macro
fat: fix uninitialized field in nostale filehandles
...
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series "mm:
zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is hotplugged
as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving policy
wherein we allocate memory across nodes in a weighted fashion rather
than uniformly. This is beneficial in heterogeneous memory environments
appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the process
has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown situations.
The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings" Ryan
Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's series
"Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page faults.
He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction test",
Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in
our handling of DAX on archiecctures which have virtually aliasing data
caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic
improvements in worst-case mmap_lock hold times during certain
userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability improvements
in his series "Mitigate a vmap lock contention". It realizes a 12x
improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging of
large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages() to
an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which are
configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA
joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx
TMNhHfyiHYDTx/GAV9NXW84tasJSDgA=
=TG55
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series
"mm: zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is
hotplugged as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving
policy wherein we allocate memory across nodes in a weighted fashion
rather than uniformly. This is beneficial in heterogeneous memory
environments appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the
process has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown
situations. The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings"
Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's
series "Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page
faults. He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction
test", Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and
refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
in our handling of DAX on archiecctures which have virtually aliasing
data caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides
dramatic improvements in worst-case mmap_lock hold times during
certain userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability
improvements in his series "Mitigate a vmap lock contention". It
realizes a 12x improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging
of large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages()
to an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which
are configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM
Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
mm/zswap: remove the memcpy if acomp is not sleepable
crypto: introduce: acomp_is_async to expose if comp drivers might sleep
memtest: use {READ,WRITE}_ONCE in memory scanning
mm: prohibit the last subpage from reusing the entire large folio
mm: recover pud_leaf() definitions in nopmd case
selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
selftests/mm: skip uffd hugetlb tests with insufficient hugepages
selftests/mm: dont fail testsuite due to a lack of hugepages
mm/huge_memory: skip invalid debugfs new_order input for folio split
mm/huge_memory: check new folio order when split a folio
mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
mm: fix list corruption in put_pages_list
mm: remove folio from deferred split list before uncharging it
filemap: avoid unnecessary major faults in filemap_fault()
mm,page_owner: drop unnecessary check
mm,page_owner: check for null stack_record before bumping its refcount
mm: swap: fix race between free_swap_and_cache() and swapoff()
mm/treewide: align up pXd_leaf() retval across archs
mm/treewide: drop pXd_large()
...
Fix a bug where nilfs_get_block() returns a successful status when
searching and inserting the specified block both fail inconsistently. If
this inconsistent behavior is not due to a previously fixed bug, then an
unexpected race is occurring, so return a temporary error -EAGAIN instead.
This prevents callers such as __block_write_begin_int() from requesting a
read into a buffer that is not mapped, which would cause the BUG_ON check
for the BH_Mapped flag in submit_bh_wbc() to fail.
Link: https://lkml.kernel.org/r/20240313105827.5296-3-konishi.ryusuke@gmail.com
Fixes: 1f5abe7e7d ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "nilfs2: fix kernel bug at submit_bh_wbc()".
This resolves a kernel BUG reported by syzbot. Since there are two
flaws involved, I've made each one a separate patch.
The first patch alone resolves the syzbot-reported bug, but I think
both fixes should be sent to stable, so I've tagged them as such.
This patch (of 2):
Syzbot has reported a kernel bug in submit_bh_wbc() when writing file data
to a nilfs2 file system whose metadata is corrupted.
There are two flaws involved in this issue.
The first flaw is that when nilfs_get_block() locates a data block using
btree or direct mapping, if the disk address translation routine
nilfs_dat_translate() fails with internal code -ENOENT due to DAT metadata
corruption, it can be passed back to nilfs_get_block(). This causes
nilfs_get_block() to misidentify an existing block as non-existent,
causing both data block lookup and insertion to fail inconsistently.
The second flaw is that nilfs_get_block() returns a successful status in
this inconsistent state. This causes the caller __block_write_begin_int()
or others to request a read even though the buffer is not mapped,
resulting in a BUG_ON check for the BH_Mapped flag in submit_bh_wbc()
failing.
This fixes the first issue by changing the return value to code -EINVAL
when a conversion using DAT fails with code -ENOENT, avoiding the
conflicting condition that leads to the kernel bug described above. Here,
code -EINVAL indicates that metadata corruption was detected during the
block lookup, which will be properly handled as a file system error and
converted to -EIO when passing through the nilfs2 bmap layer.
Link: https://lkml.kernel.org/r/20240313105827.5296-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240313105827.5296-2-konishi.ryusuke@gmail.com
Fixes: c3a7abf06c ("nilfs2: support contiguous lookup of blocks")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+cfed5b56649bddf80d6e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=cfed5b56649bddf80d6e
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For special files in S_IFBLK/S_IFCHR/S_IFIFO type, we already have
ocfs2_setattr and ocfs2_getattr enabled. It's confusing for user space if
it can use setattr/getattr to control one attribute appointed but can not
list attributes using listxattr for above type files:
$ mknod /mnt/b b 0 0
$ setfattr -h -n trusted.name -v 0xbabe /mnt/b
$ getfattr -n trusted.name /mnt/b
getfattr: Removing leading '/' from absolute path names
trusted.name=0sur4=
$ getfattr -m trusted /mnt/b
$
Fix it by enabling ocfs2_listxattr for ocfs2_special_file_iops. After the
commit, fstests/generic/062 will pass.
Link: https://lkml.kernel.org/r/20240312042908.8889-1-l@damenly.org
Signed-off-by: Su Yue <glass.su@suse.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The SLAB_MEM_SPREAD flag is already a no-op as of 6.8-rc1, remove
its usage so we can delete it from slab. No functional change.
Link: https://lkml.kernel.org/r/20240224135008.829878-1-chengming.zhou@linux.dev
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shrink this percpu object by one array element so that the object size
becomes exactly 512 bytes. This will lead to more efficient memory use,
hopefully.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Currently, struct time_stats has the optional ability to quantize the
information that it collects. This is /probably/ useful for callers who
want to see quantized information, but it more than doubles the size of
the structure from 224 bytes to 464. For users who don't care about
that (e.g. upcoming xfs patches) and want to avoid wasting 240 bytes per
counter, split the two into separate pieces.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The only caller of this code (time_stats) always knows the weights and
whether or not any information has been collected. Pass this
information into the mean and variance code so that it doesn't have to
store that information. This reduces the structure size from 24 to 16
bytes, which shrinks each time_stats counter to 192 bytes from 208.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Filesystems can stay mounted for a very long time, so add some larger
units.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Now that we've got the errors_silent mechanism, we don't have to check
if the reconstruct_alloc option is set all over the place.
Also - users no longer have to explicitly select fsck and fix_errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
gc_stripes_done() and gc_reflink_done() may do alloc btree updates (i.e.
when deleting an indirect extent) - we need bucket gens to be fixed by
then.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
we need to run the normal extent update path on deletion -
bch2_bkey_make_mut() is incorrect when key type is changing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This is an effort to get rid of all multiplications from allocation
functions in order to prevent integer overflows [1][2].
As the "op" variable is a pointer to "struct promote_op" and this
structure ends in a flexible array:
struct promote_op {
[...]
struct bio_vec bi_inline_vecs[];
};
and the "t" variable is a pointer to "struct journal_seq_blacklist_table"
and this structure also ends in a flexible array:
struct journal_seq_blacklist_table {
[...]
struct journal_seq_blacklist_table_entry {
u64 start;
u64 end;
bool dirty;
} entries[];
};
the preferred way in the kernel is to use the struct_size() helper to
do the arithmetic instead of the argument "size + size * count" in the
kzalloc() functions.
This way, the code is more readable and safer.
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1]
Link: https://github.com/KSPP/linux/issues/160 [2]
Signed-off-by: Erick Archer <erick.archer@gmx.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We're seeing some unmountable filesystems due to split brain detection
going awry; it seems we somehow wrote out superblocks where we updated
the superblock seq without updating any member seq fields.
A given device's superblock should always have the main seq equal to
it's member seq field, so this is easy to check for.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
we've got some helpers that return errors sanely, move them to a more
common location for use in fs-ioctl.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The journal_write_done() handler was reworked into a loop in commit
746a33c96b7a ("bcachefs: better journal pipelining"). As part of this,
the journal buffer wake was factored into a post-loop branch that
executes if at least one journal buffer has completed.
The journal buffer processing loop iterates on the journal buffer
pointer, however. This means that w refers to the last buffer processed
by the loop, which may or may not be done. This also means that if
multiple buffers are processed by the loop, only the last is awoken.
This lost wakeup behavior has lead to stalling problems in various CI
and fstests, such as generic/703.
Lift the wake into the loop so each done buffer sees a wake call as
it is processed.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
For mount option with bool type, the value must be 0 or 1 (See
bch2_opt_parse). But this seems does not well intercepted cause
for other value(like 2...), it returns the unexpect return code
with error message printed.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Avoid the private error code return to caller. The error code
should be transformed into genernal error code.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Non append, non extending buffered writes can now avoid taking the inode
lock.
To ensure atomicity of writes w.r.t. other writes, we lock every folio
that we'll be writing to, and if this fails we fall back to taking the
inode lock.
Extensive comments are provided as to corner cases.
Link: https://lore.kernel.org/linux-fsdevel/Zdkxfspq3urnrM6I@bombadil.infradead.org/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Rename and export __file_remove_privs(); for a buffered write path that
doesn't take the inode lock we need to be able to check if the operation
needs to do work first.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Improved journal pipelining broke journal_noflush_seq(); it implicitly
assumed only the oldest outstanding journal buf could be in flight, but
that's no longer true.
Make this more straightforward by just setting buf->must_flush whenever
we know a journal buf is going to be flush.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When mount with incorrect options such as:
"mount -t bcachefs -o errors=back /dev/loop1 /mnt/bcachefs/".
It rebacks the error "mount: /mnt/bcachefs: permission denied."
cause bch2_parse_mount_opts returns -1 and bch2_mount throws
it up. This is unreasonable.
The real error message should be like this:
"mount: /mnt/bcachefs: wrong fs type, bad option, bad
superblock on /dev/loop1, missing codepage or helper program,
or other error."
Adding three private error codes for mounting error. Here are:
- BCH_ERR_mount_option as the parent class for option error.
- BCH_ERR_option_name represents the invalid option name.
- BCH_ERR_option_value represents the invalid option value.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a tracepoint for downcasting private errors to standard errors, so
they can be recovered even when not logged; also, add some
documentation.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Variable ret is being assigned a value that is never read, it is
being re-assigned a couple of statements later on. The assignment
is redundant and can be removed.
Cleans up clang scan build warning:
fs/bcachefs/super-io.c:806:2: warning: Value stored to 'ret' is
never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
32-bit arm builds emit a lot of spam like this:
fs/bcachefs/backpointers.c: In function ‘extent_matches_bp’:
fs/bcachefs/backpointers.c:15:13: note: parameter passing for argument of type ‘struct bch_backpointer’ changed in GCC 9.1
Apply the change from commit ebcc5928c5 ("arm64: Silence gcc warnings
about arch ABI drift") to fs/bcachefs/ to silence them.
Signed-off-by: Calvin Owens <jcalvinowens@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
All jounal_buf bitfield updates must happen under the journal lock -
perhaps we should just switch these to atomic bit flags.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Buckets usually can't be discarded until the transaction that made them
empty has been committed in the journal.
Tracing has indicated that we're queuing the discard worker excessively,
only for it to skip over many buckets that are still waiting on a
journal commit, discarding only one or two buckets per iteration.
We want to switch to only queuing the discard worker after a journal
flush write, but there's an important optimization we need to preserve:
if a bucket becomes empty and it was never committed in the journal
while it was in use, we want to discard it and reuse it right away -
since overwriting it before the previous writes are flushed from the
device cache eans those writes only cost bus bandwidth.
So, this patch implements a fast path for buckets that can be discarded
right away. We need new locking between the two discard workers; the new
list of buckets being discarded provides that locking.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If a path doesn't have any active references, we shouldn't downgrade it;
it'll either be reused, possibly with intent refs again, or dropped at
bch2_trans_begin() time.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Now that checking subvolume structure is a separate pass, the main
check_directory_connectivity() pass only needs to walk up to a given
inode's subvolume root.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>