IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
broken in 6.5; we really can't lock two unrelated directories
without holding ->s_vfs_rename_mutex first and in case of
same-parent rename of a subdirectory 6.5 ends up doing just
that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZZ+lyQAKCRBZ7Krx/gZQ
60MWAP94hTqeMIpjhsUIkrTnylrIFaiw4UCWFJzIRG1QQYKqCgD/XUaWI9np7dL6
0wR/j4CQSdJjiEFKUFE2pD3QoSuJYAQ=
=+x0+
-----END PGP SIGNATURE-----
Merge tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull rename updates from Al Viro:
"Fix directory locking scheme on rename
This was broken in 6.5; we really can't lock two unrelated directories
without holding ->s_vfs_rename_mutex first and in case of same-parent
rename of a subdirectory 6.5 ends up doing just that"
* tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
rename(): avoid a deadlock in the case of parents having no common ancestor
kill lock_two_inodes()
rename(): fix the locking of subdirectories
f2fs: Avoid reading renamed directory if parent does not change
ext4: don't access the source subdirectory content on same-directory rename
ext2: Avoid reading renamed directory if parent does not change
udf_rename(): only access the child content on cross-directory rename
ocfs2: Avoid touching renamed directory if parent does not change
reiserfs: Avoid touching renamed directory if parent does not change
- The minimum Sphinx requirement has been raised to 2.4.4, following a
warning that was added in 6.2.
- Some reworking of the Documentation/process front page to, hopefully,
make it more useful.
- Various kernel-doc tweaks to, for example, make it deal properly with
__counted_by annotations.
- We have also restored a warning for documentation of nonexistent
structure members that disappeared a while back. That had the delightful
consequence of adding some 600 warnings to the docs build. A sustained
effort by Randy, Vegard, and myself has addressed almost all of those,
bringing the documentation back into sync with the code. The fixes are
going through the appropriate maintainer trees.
- Various improvements to the HTML rendered docs, including automatic links
to Git revisions and a nice new pulldown to make translations easy to
access.
- Speaking of translations, more of those for Spanish and Chinese.
...plus the usual stream of documentation updates and typo fixes.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmWcRKMPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YTKIH/AxBt/3iWt40dPf18arZHLU6tdUbmg01ttef
CNKWkniCmABGKc//KYDXvjZMRDt0YlrS0KgUzrb8nIQTBlZG40D+88EwjXE0HeGP
xt1Fk7OPOiJEqBZ3HEe0PDVfOiA+4yR6CmDKklCJuKg77X9atklneBwPUw/cOASk
CWj+BdbwPBiSNQv48Lp87rGusKwnH/g0MN2uS0z9MPr1DYjM1K8+ngZjGW24lZHt
qs5yhP43mlZGBF/lwNJXQp/xhnKAqJ9XwylBX9Wmaoxaz9yyzNVsADGvROMudgzi
9YB+Jdy7Z0JSrVoLIRhUuDOv7aW8vk+8qLmGJt2aTIsqehbQ6pk=
=fCtT
-----END PGP SIGNATURE-----
Merge tag 'docs-6.8' of git://git.lwn.net/linux
Pull documentation update from Jonathan Corbet:
"Another moderately busy cycle for documentation, including:
- The minimum Sphinx requirement has been raised to 2.4.4, following
a warning that was added in 6.2
- Some reworking of the Documentation/process front page to,
hopefully, make it more useful
- Various kernel-doc tweaks to, for example, make it deal properly
with __counted_by annotations
- We have also restored a warning for documentation of nonexistent
structure members that disappeared a while back. That had the
delightful consequence of adding some 600 warnings to the docs
build. A sustained effort by Randy, Vegard, and myself has
addressed almost all of those, bringing the documentation back into
sync with the code. The fixes are going through the appropriate
maintainer trees
- Various improvements to the HTML rendered docs, including automatic
links to Git revisions and a nice new pulldown to make translations
easy to access
- Speaking of translations, more of those for Spanish and Chinese
... plus the usual stream of documentation updates and typo fixes"
* tag 'docs-6.8' of git://git.lwn.net/linux: (57 commits)
MAINTAINERS: use tabs for indent of CONFIDENTIAL COMPUTING THREAT MODEL
A reworked process/index.rst
ring-buffer/Documentation: Add documentation on buffer_percent file
Translated the RISC-V architecture boot documentation.
Docs: remove mentions of fdformat from util-linux
Docs/zh_CN: Fix the meaning of DEBUG to pr_debug()
Documentation: move driver-api/dcdbas to userspace-api/
Documentation: move driver-api/isapnp to userspace-api/
Documentation/core-api : fix typo in workqueue
Documentation/trace: Fixed typos in the ftrace FLAGS section
kernel-doc: handle a void function without producing a warning
scripts/get_abi.pl: ignore some temp files
docs: kernel_abi.py: fix command injection
scripts/get_abi: fix source path leak
CREDITS, MAINTAINERS, docs/process/howto: Update man-pages' maintainer
docs: translations: add translations links when they exist
kernel-doc: Align quick help and the code
MAINTAINERS: add reviewer for Spanish translations
docs: ignore __counted_by attribute in structure definitions
scripts: kernel-doc: Clarify missing struct member description
..
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmWeozwACgkQEVvVyTe/
1WqnyA//U2Ka5ZIncs/hA5D03LMyuCh9qlH5qAGce5vrBTxogTlFuTGGKtsUuCB5
Y4GALO+fw8aWAowt5X1XfHD3TETLVbCshT7dYjKsKy/ojANCbgkCipXBudYx+l9m
fllwTZyueK0UY14kCU2DAV5PYsI/XVVykk71GSMOMLCUfRJfDI7R0vBD0NaUd7Kz
Wp/M6t0MnXX23nGUdgNoroZPPj3Ts/gK2MXID+QHXGaR2+M1B1lLKfSu6TcRDLtn
tbe/ivaw4y1jj3jfFwMC7sSSDyIJeZh9tBB4Rvv2vsMiYU8zAC6Eg35eIbPONu42
pUMd0QQa79H3cyYEDtUzyskzur0Jry5azzb8JdQWipgVKFh5g3CHce2XAFlVjw2a
9RyCKg41A9LvdB5l/PvBtsxig2PzaYqE09rXAfUM7eLNFlOLbL99uc1WJbIFfG43
Czh9vPxsuJ5RkdwS7R0m4GYDw8+BKW6WjpaC+Eje4I8X1rAQK0H/BLTCxe2dLRB7
0neAg8e3g6NdisRSLOP74xoEn/dhijNP7ENOFF1EdP/BFPHL7+sRsV6XYwwBeUAc
c6YsxeAPylm6gvIq/ESoRiY+e5QWvImHIWP+zB/cySYdT0fQHL9WjO6/uZW0ALuv
oZugICSmZ15pYlACIU8iYztRkS19CJZrUV7Gbq4+AurUKP8kCEI=
=2Ohx
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs updates from Amir Goldstein:
"This is a very small update with no bug fixes and no new features.
The larger update of overlayfs for this cycle, the re-factoring of
overlayfs code into generic backing_file helpers, was already merged
via Christian.
Summary:
- Simplify/clarify some code
No bug fixes here, just some changes following questions from Al
about overlayfs code that could be a little more simple to follow.
- Overlayfs documentation style fixes
Mainly fixes for ReST formatting suggested by documentation
developers"
* tag 'ovl-update-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
overlayfs.rst: fix ReST formatting
overlayfs.rst: use consistent feature names
ovl: initialize ovl_copy_up_ctx.destname inside ovl_do_copy_up()
ovl: remove redundant ofs->indexdir member
Adjust the timing of the fscrypt keyring destruction, to prepare for
btrfs's fscrypt support. Also document that CephFS supports fscrypt now.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZZx4UBQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK85+AQCBHoG6R5UuPqafoDtabcCpxRW/ZHdo
WzOwjvHz1/tq5AEApogvjPI/3v2gelLnG9ZrXUBZMWZN6W0LQbH/k1VHjQ8=
=nvWY
-----END PGP SIGNATURE-----
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux
Pull fscrypt updates from Eric Biggers:
"Adjust the timing of the fscrypt keyring destruction, to prepare for
btrfs's fscrypt support.
Also document that CephFS supports fscrypt now"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fs: move fscrypt keyring destruction to after ->put_super
f2fs: move release of block devices to after kill_block_super()
fscrypt: document that CephFS supports fscrypt now
fscrypt: update comment for do_remove_key()
fscrypt.rst: update definition of struct fscrypt_context_v2
* New features/functionality
* Online repair
* Reserve disk space for online repairs.
* Fix misinteraction between the AIL and btree bulkloader because of
which the bulk load fails to queue a buffer for writeback if it
happens to be on the AIL list.
* Prevent transaction reservation overflows when reaping blocks during
online repair.
* Whenever possible, bulkloader now copies multiple records into a
block.
* Support repairing of
1. Per-AG free space, inode and refcount btrees.
2. Ondisk inodes.
3. File data and attribute fork mappings.
* Verify the contents of
1. Inode and data fork of realtime bitmap file.
2. Quota files.
* Introduce MF_MEM_PRE_REMOVE. This will be used to notify tasks about
a pmem device being removed.
* Bug fixes
* Fix memory leak of recovered attri intent items.
* Fix UAF during log intent recovery.
* Fix realtime geometry integer overflows.
* Prevent scrub from live locking in xchk_iget.
* Prevent fs shutdown when removing files during low free disk space.
* Prevent transaction reservation overflow when extending an RT device.
* Prevent incorrect warning from being printed when extending a
filesystem.
* Fix an off-by-one error in xreap_agextent_binval.
* Serialize access to perag radix tree during deletion operation.
* Fix perag memory leak during growfs.
* Allow allocation of minlen realtime extent when the maximum sized
realtime free extent is minlen in size.
* Cleanups
* Remove duplicate boilerplate code spread across functionality associated
with different log items.
* Cleanup resblks interfaces.
* Pass defer ops pointer to defer helpers instead of an enum.
* Initialize di_crc in xfs_log_dinode to prevent KMSAN warnings.
* Use static_assert() instead of BUILD_BUG_ON_MSG() to validate size of
structures and structure member offsets. This is done in order to be
able to share the code with userspace.
* Move XFS documentation under a new directory specific to XFS.
* Do not invoke deferred ops' ->create_done callback if the deferred
operation does not have an intent item associated with it.
* Remove duplicate inclusion of header files from scrub/health.c.
* Refactor Realtime code.
* Cleanup attr code.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZZJQbwAKCRAH7y4RirJu
9JjkAP9Zg0QZNmAMsZwvgEBbuF/OnHKl4GmPA5uq0jPmSWCOqAEA0HjlOmuNfQWn
93fIw6CPbt+9QCluTYBwUisKLIJ/wgA=
=qmO0
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Chandan Babu:
"New features/functionality:
- Online repair:
- Reserve disk space for online repairs
- Fix misinteraction between the AIL and btree bulkloader because
of which the bulk load fails to queue a buffer for writeback if
it happens to be on the AIL list
- Prevent transaction reservation overflows when reaping blocks
during online repair
- Whenever possible, bulkloader now copies multiple records into
a block
- Support repairing of
1. Per-AG free space, inode and refcount btrees
2. Ondisk inodes
3. File data and attribute fork mappings
- Verify the contents of
1. Inode and data fork of realtime bitmap file
2. Quota files
- Introduce MF_MEM_PRE_REMOVE. This will be used to notify tasks
about a pmem device being removed
Bug fixes:
- Fix memory leak of recovered attri intent items
- Fix UAF during log intent recovery
- Fix realtime geometry integer overflows
- Prevent scrub from live locking in xchk_iget
- Prevent fs shutdown when removing files during low free disk space
- Prevent transaction reservation overflow when extending an RT
device
- Prevent incorrect warning from being printed when extending a
filesystem
- Fix an off-by-one error in xreap_agextent_binval
- Serialize access to perag radix tree during deletion operation
- Fix perag memory leak during growfs
- Allow allocation of minlen realtime extent when the maximum sized
realtime free extent is minlen in size
Cleanups:
- Remove duplicate boilerplate code spread across functionality
associated with different log items
- Cleanup resblks interfaces
- Pass defer ops pointer to defer helpers instead of an enum
- Initialize di_crc in xfs_log_dinode to prevent KMSAN warnings
- Use static_assert() instead of BUILD_BUG_ON_MSG() to validate size
of structures and structure member offsets. This is done in order
to be able to share the code with userspace
- Move XFS documentation under a new directory specific to XFS
- Do not invoke deferred ops' ->create_done callback if the deferred
operation does not have an intent item associated with it
- Remove duplicate inclusion of header files from scrub/health.c
- Refactor Realtime code
- Cleanup attr code"
* tag 'xfs-6.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (123 commits)
xfs: use the op name in trace_xlog_intent_recovery_failed
xfs: fix a use after free in xfs_defer_finish_recovery
xfs: turn the XFS_DA_OP_REPLACE checks in xfs_attr_shortform_addname into asserts
xfs: remove xfs_attr_sf_hdr_t
xfs: remove struct xfs_attr_shortform
xfs: use xfs_attr_sf_findname in xfs_attr_shortform_getvalue
xfs: remove xfs_attr_shortform_lookup
xfs: simplify xfs_attr_sf_findname
xfs: move the xfs_attr_sf_lookup tracepoint
xfs: return if_data from xfs_idata_realloc
xfs: make if_data a void pointer
xfs: fold xfs_rtallocate_extent into xfs_bmap_rtalloc
xfs: simplify and optimize the RT allocation fallback cascade
xfs: reorder the minlen and prod calculations in xfs_bmap_rtalloc
xfs: remove XFS_RTMIN/XFS_RTMAX
xfs: remove rt-wrappers from xfs_format.h
xfs: factor out a xfs_rtalloc_sumlevel helper
xfs: tidy up xfs_rtallocate_extent_exact
xfs: merge the calls to xfs_rtallocate_range in xfs_rtallocate_block
xfs: reflow the tail end of xfs_rtallocate_extent_block
...
many places. The notable patch series are:
- nilfs2 folio conversion from Matthew Wilcox in "nilfs2: Folio
conversions for file paths".
- Additional nilfs2 folio conversion from Ryusuke Konishi in "nilfs2:
Folio conversions for directory paths".
- IA64 remnant removal in Heiko Carstens's "Remove unused code after
IA-64 removal".
- Arnd Bergmann has enabled the -Wmissing-prototypes warning everywhere
in "Treewide: enable -Wmissing-prototypes". This had some followup
fixes:
- Nathan Chancellor has cleaned up the hexagon build in the series
"hexagon: Fix up instances of -Wmissing-prototypes".
- Nathan also addressed some s390 warnings in "s390: A couple of
fixes for -Wmissing-prototypes".
- Arnd Bergmann addresses the same warnings for MIPS in his series
"mips: address -Wmissing-prototypes warnings".
- Baoquan He has made kexec_file operate in a top-down-fitting manner
similar to kexec_load in the series "kexec_file: Load kernel at top of
system RAM if required"
- Baoquan He has also added the self-explanatory "kexec_file: print out
debugging message if required".
- Some checkstack maintenance work from Tiezhu Yang in the series
"Modify some code about checkstack".
- Douglas Anderson has disentangled the watchdog code's logging when
multiple reports are occurring simultaneously. The series is "watchdog:
Better handling of concurrent lockups".
- Yuntao Wang has contributed some maintenance work on the crash code in
"crash: Some cleanups and fixes".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZ2R6AAKCRDdBJ7gKXxA
juCVAP4t76qUISDOSKugB/Dn5E4Nt9wvPY9PcufnmD+xoPsgkQD+JVl4+jd9+gAV
vl6wkJDiJO5JZ3FVtBtC3DFA/xHtVgk=
=kQw+
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Quite a lot of kexec work this time around. Many singleton patches in
many places. The notable patch series are:
- nilfs2 folio conversion from Matthew Wilcox in 'nilfs2: Folio
conversions for file paths'.
- Additional nilfs2 folio conversion from Ryusuke Konishi in 'nilfs2:
Folio conversions for directory paths'.
- IA64 remnant removal in Heiko Carstens's 'Remove unused code after
IA-64 removal'.
- Arnd Bergmann has enabled the -Wmissing-prototypes warning
everywhere in 'Treewide: enable -Wmissing-prototypes'. This had
some followup fixes:
- Nathan Chancellor has cleaned up the hexagon build in the series
'hexagon: Fix up instances of -Wmissing-prototypes'.
- Nathan also addressed some s390 warnings in 's390: A couple of
fixes for -Wmissing-prototypes'.
- Arnd Bergmann addresses the same warnings for MIPS in his series
'mips: address -Wmissing-prototypes warnings'.
- Baoquan He has made kexec_file operate in a top-down-fitting manner
similar to kexec_load in the series 'kexec_file: Load kernel at top
of system RAM if required'
- Baoquan He has also added the self-explanatory 'kexec_file: print
out debugging message if required'.
- Some checkstack maintenance work from Tiezhu Yang in the series
'Modify some code about checkstack'.
- Douglas Anderson has disentangled the watchdog code's logging when
multiple reports are occurring simultaneously. The series is
'watchdog: Better handling of concurrent lockups'.
- Yuntao Wang has contributed some maintenance work on the crash code
in 'crash: Some cleanups and fixes'"
* tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (157 commits)
crash_core: fix and simplify the logic of crash_exclude_mem_range()
x86/crash: use SZ_1M macro instead of hardcoded value
x86/crash: remove the unused image parameter from prepare_elf_headers()
kdump: remove redundant DEFAULT_CRASH_KERNEL_LOW_SIZE
scripts/decode_stacktrace.sh: strip unexpected CR from lines
watchdog: if panicking and we dumped everything, don't re-enable dumping
watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
watchdog/hardlockup: adopt softlockup logic avoiding double-dumps
kexec_core: fix the assignment to kimage->control_page
x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init()
lib/trace_readwrite.c:: replace asm-generic/io with linux/io
nilfs2: cpfile: fix some kernel-doc warnings
stacktrace: fix kernel-doc typo
scripts/checkstack.pl: fix no space expression between sp and offset
x86/kexec: fix incorrect argument passed to kexec_dprintk()
x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs
nilfs2: add missing set_freezable() for freezable kthread
kernel: relay: remove relay_file_splice_read dead code, doesn't work
docs: submit-checklist: remove all of "make namespacecheck"
...
are included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the
series
"maple_tree: add mt_free_one() and mt_attr() helpers"
"Some cleanups of maple tree"
- In the series "mm: use memmap_on_memory semantics for dax/kmem"
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few
fixes) in the patch series
"Add folio_zero_tail() and folio_fill_tail()"
"Make folio_start_writeback return void"
"Fix fault handler's handling of poisoned tail pages"
"Convert aops->error_remove_page to ->error_remove_folio"
"Finish two folio conversions"
"More swap folio conversions"
- Kefeng Wang has also contributed folio-related work in the series
"mm: cleanup and use more folio in page fault"
- Jim Cromie has improved the kmemleak reporting output in the
series "tweak kmemleak report format".
- In the series "stackdepot: allow evicting stack traces" Andrey
Konovalov to permits clients (in this case KASAN) to cause
eviction of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series "mm:
page_alloc: fixes for high atomic reserve caluculations".
- Dmitry Rokosov has added to the samples/ dorectory some sample
code for a userspace memcg event listener application. See the
series "samples: introduce cgroup events listeners".
- Some mapletree maintanance work from Liam Howlett in the series
"maple_tree: iterator state changes".
- Nhat Pham has improved zswap's approach to writeback in the
series "workload-specific and memory pressure-driven zswap
writeback".
- DAMON/DAMOS feature and maintenance work from SeongJae Park in
the series
"mm/damon: let users feed and tame/auto-tune DAMOS"
"selftests/damon: add Python-written DAMON functionality tests"
"mm/damon: misc updates for 6.8"
- Yosry Ahmed has improved memcg's stats flushing in the series
"mm: memcg: subtree stats flushing and thresholds".
- In the series "Multi-size THP for anonymous memory" Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series "More buffer_head
cleanups".
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
"userfaultfd move option". UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a "KSM Advisor", in the series
"mm/ksm: Add ksm advisor". This is a governor which tunes KSM's
scanning aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory
use in the series "mm/zswap: dstmem reuse optimizations and
cleanups".
- Matthew Wilcox has performed some maintenance work on the
writeback code, both code and within filesystems. The series is
"Clean up the writeback paths".
- Andrey Konovalov has optimized KASAN's handling of alloc and
free stack traces for secondary-level allocators, in the series
"kasan: save mempool stack traces".
- Andrey also performed some KASAN maintenance work in the series
"kasan: assorted clean-ups".
- David Hildenbrand has gone to town on the rmap code. Cleanups,
more pte batching, folio conversions and more. See the series
"mm/rmap: interface overhaul".
- Kinsey Ho has contributed some maintenance work on the MGLRU
code in the series "mm/mglru: Kconfig cleanup".
- Matthew Wilcox has contributed lruvec page accounting code
cleanups in the series "Remove some lruvec page accounting
functions".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA
jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27
Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU=
=0NHs
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the series
'maple_tree: add mt_free_one() and mt_attr() helpers'
'Some cleanups of maple tree'
- In the series 'mm: use memmap_on_memory semantics for dax/kmem'
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few fixes)
in the patch series
'Add folio_zero_tail() and folio_fill_tail()'
'Make folio_start_writeback return void'
'Fix fault handler's handling of poisoned tail pages'
'Convert aops->error_remove_page to ->error_remove_folio'
'Finish two folio conversions'
'More swap folio conversions'
- Kefeng Wang has also contributed folio-related work in the series
'mm: cleanup and use more folio in page fault'
- Jim Cromie has improved the kmemleak reporting output in the series
'tweak kmemleak report format'.
- In the series 'stackdepot: allow evicting stack traces' Andrey
Konovalov to permits clients (in this case KASAN) to cause eviction
of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series 'mm:
page_alloc: fixes for high atomic reserve caluculations'.
- Dmitry Rokosov has added to the samples/ dorectory some sample code
for a userspace memcg event listener application. See the series
'samples: introduce cgroup events listeners'.
- Some mapletree maintanance work from Liam Howlett in the series
'maple_tree: iterator state changes'.
- Nhat Pham has improved zswap's approach to writeback in the series
'workload-specific and memory pressure-driven zswap writeback'.
- DAMON/DAMOS feature and maintenance work from SeongJae Park in the
series
'mm/damon: let users feed and tame/auto-tune DAMOS'
'selftests/damon: add Python-written DAMON functionality tests'
'mm/damon: misc updates for 6.8'
- Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
memcg: subtree stats flushing and thresholds'.
- In the series 'Multi-size THP for anonymous memory' Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series 'More buffer_head
cleanups'.
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
Add ksm advisor'. This is a governor which tunes KSM's scanning
aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory use
in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.
- Matthew Wilcox has performed some maintenance work on the writeback
code, both code and within filesystems. The series is 'Clean up the
writeback paths'.
- Andrey Konovalov has optimized KASAN's handling of alloc and free
stack traces for secondary-level allocators, in the series 'kasan:
save mempool stack traces'.
- Andrey also performed some KASAN maintenance work in the series
'kasan: assorted clean-ups'.
- David Hildenbrand has gone to town on the rmap code. Cleanups, more
pte batching, folio conversions and more. See the series 'mm/rmap:
interface overhaul'.
- Kinsey Ho has contributed some maintenance work on the MGLRU code
in the series 'mm/mglru: Kconfig cleanup'.
- Matthew Wilcox has contributed lruvec page accounting code cleanups
in the series 'Remove some lruvec page accounting functions'"
* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
mm, treewide: introduce NR_PAGE_ORDERS
selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
selftests/mm: skip test if application doesn't has root privileges
selftests/mm: conform test to TAP format output
selftests: mm: hugepage-mmap: conform to TAP format output
selftests/mm: gup_test: conform test to TAP format output
mm/selftests: hugepage-mremap: conform test to TAP format output
mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
mm/memcontrol: remove __mod_lruvec_page_state()
mm/khugepaged: use a folio more in collapse_file()
slub: use a folio in __kmalloc_large_node
slub: use folio APIs in free_large_kmalloc()
slub: use alloc_pages_node() in alloc_slab_page()
mm: remove inc/dec lruvec page state functions
mm: ratelimit stat flush from workingset shrinker
kasan: stop leaking stack trace handles
mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
mm/mglru: add dummy pmd_dirty()
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUx4wAKCRCRxhvAZXjc
osaNAQC/c+xXVfiq/pFbuK9MQLna4RGZaGcG9k312YniXbHq0AD9HAf4aPcZwPy1
/wkD4pauj3UZ3f0xBSyazGBvAXyN0Qc=
=iFAQ
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs super updates from Christian Brauner:
"This contains the super work for this cycle including the long-awaited
series by Jan to make it possible to prevent writing to mounted block
devices:
- Writing to mounted devices is dangerous and can lead to filesystem
corruption as well as crashes. Furthermore syzbot comes with more
and more involved examples how to corrupt block device under a
mounted filesystem leading to kernel crashes and reports we can do
nothing about. Add tracking of writers to each block device and a
kernel cmdline argument which controls whether other writeable
opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are
allowed.
Note that this effectively only prevents modification of the
particular block device's page cache by other writers. The actual
device content can still be modified by other means - e.g. by
issuing direct scsi commands, by doing writes through devices lower
in the storage stack (e.g. in case loop devices, DM, or MD are
involved) etc. But blocking direct modifications of the block
device page cache is enough to give filesystems a chance to perform
data validation when loading data from the underlying storage and
thus prevent kernel crashes.
Syzbot can use this cmdline argument option to avoid uninteresting
crashes. Also users whose userspace setup does not need writing to
mounted block devices can set this option for hardening. We expect
that this will be interesting to quite a few workloads.
Btrfs is currently opted out of this because they still haven't
merged patches we require for this to work from three kernel
releases ago.
- Reimplement block device freezing and thawing as holder operations
on the block device.
This allows us to extend block device freezing to all devices
associated with a superblock and not just the main device. It also
allows us to remove get_active_super() and thus another function
that scans the global list of superblocks.
Freezing via additional block devices only works if the filesystem
chooses to use @fs_holder_ops for these additional devices as well.
That currently only includes ext4 and xfs.
Earlier releases switched get_tree_bdev() and mount_bdev() to use
@fs_holder_ops. The remaining nilfs2 open-coded version of
mount_bdev() has been converted to rely on @fs_holder_ops as well.
So block device freezing for the main block device will continue to
work as before.
There should be no regressions in functionality. The only special
case is btrfs where block device freezing for the main block device
never worked because sb->s_bdev isn't set. Block device freezing
for btrfs can be fixed once they can switch to @fs_holder_ops but
that can happen whenever they're ready"
* tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
block: Fix a memory leak in bdev_open_by_dev()
super: don't bother with WARN_ON_ONCE()
super: massage wait event mechanism
ext4: Block writes to journal device
xfs: Block writes to log device
fs: Block writes to mounted block devices
btrfs: Do not restrict writes to btrfs devices
block: Add config option to not allow writing to mounted devices
block: Remove blkdev_get_by_*() functions
bcachefs: Convert to bdev_open_by_path()
fs: handle freezing from multiple devices
fs: remove dead check
nilfs2: simplify device handling
fs: streamline thaw_super_locked
ext4: simplify device handling
xfs: simplify device handling
fs: simplify setup_bdev_super() calls
blkdev: comment fs_holder_ops
porting: document block device freeze and thaw changes
fs: remove unused helper
...
The help text for CONFIG_FS_ENCRYPTION and the fscrypt.rst documentation
file both list the filesystems that support fscrypt. CephFS added
support for fscrypt in v6.6, so add CephFS to the list.
Link: https://lore.kernel.org/r/20231227045158.87276-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
In preparation for adding support for anonymous multi-size THP, introduce
new sysfs structure that will be used to control the new behaviours. A
new directory is added under transparent_hugepage for each supported THP
size, and contains an `enabled` file, which can be set to "inherit" (to
inherit the global setting), "always", "madvise" or "never". For now, the
kernel still only supports PMD-sized anonymous THP, so only 1 directory is
populated.
The first half of the change converts transhuge_vma_suitable() and
hugepage_vma_check() so that they take a bitfield of orders for which the
user wants to determine support, and the functions filter out all the
orders that can't be supported, given the current sysfs configuration and
the VMA dimensions. The resulting functions are renamed to
thp_vma_suitable_orders() and thp_vma_allowable_orders() respectively.
Convenience functions that take a single, unencoded order and return a
boolean are also defined as thp_vma_suitable_order() and
thp_vma_allowable_order().
The second half of the change implements the new sysfs interface. It has
been done so that each supported THP size has a `struct thpsize`, which
describes the relevant metadata and is itself a kobject. This is pretty
minimal for now, but should make it easy to add new per-thpsize files to
the interface if needed in future (e.g. per-size defrag). Rather than
keep the `enabled` state directly in the struct thpsize, I've elected to
directly encode it into huge_anon_orders_[always|madvise|inherit]
bitfields since this reduces the amount of work required in
thp_vma_allowable_orders() which is called for every page fault.
See Documentation/admin-guide/mm/transhuge.rst, as modified by this
commit, for details of how the new sysfs interface works.
[ryan.roberts@arm.com: fix build warning when CONFIG_SYSFS is disabled]
Link: https://lkml.kernel.org/r/20231211125320.3997543-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20231207161211.2374093-4-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Barry Song <v-songbaohua@oppo.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Itaru Kitayama <itaru.kitayama@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix some indentation issues and fix missing newlines in quoted text
by converting quoted text to code blocks.
Reported-by: Christian Brauner <brauner@kernel.org>
Suggested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Use the feature names "metacopy" and "index" consistently throughout
the document.
Covert the numbered list of features "redirect_dir", "index", "xino"
to section headings, so that those features could be referenced in the
document by their name.
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
When SQUASHFS_CHOICE_DECOMP_BY_MOUNT is set, the "threads" mount option
can be used to specify the decompression mode: single-threaded,
multi-threaded, percpu or the number of threads used for decompression.
When SQUASHFS_CHOICE_DECOMP_BY_MOUNT is not set, SQUASHFS_DECOMP_MULTI and
SQUASHFS_MOUNT_DECOMP_THREADS are both set, the "threads" option can also
be used to specify the number of threads used for decompression. This
mount option is only mentioned in fs/squashfs/Kconfig, which makes it
difficult to find.
Another mount option available is "errors", which can be configured to
panic the kernel when squashfs errors are encountered.
Add both these options to the squashfs documentation, making them more
noticeable.
Link: https://lkml.kernel.org/r/20231117161215.140282-1-amiculas@cisco.com
Signed-off-by: Ariel Miculas <amiculas@cisco.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There were already assertions that we were not passing a tail page to
error_remove_page(), so make the compiler enforce that by converting
everything to pass and use a folio.
Link: https://lkml.kernel.org/r/20231117161447.2461643-7-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Get the copy of the fscrypt_context_v2 definition in the documentation
in sync with the actual definition, which was changed recently by
commit 5b1188847180 ("fscrypt: support crypto data unit size less than
filesystem block size").
Link: https://lore.kernel.org/r/20231206001901.14371-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
XFS docs are currently in upper-level Documentation/filesystems.
Although these are currently 4 docs, they are already outstanding as
a group and can be moved to its own subdirectory.
Consolidate them into Documentation/filesystems/xfs/.
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
By default, shared mmap is disabled in FUSE DIRECT_IO mode. However,
when the DIRECT_IO_ALLOW_MMAP flag is enabled in the FUSE_INIT reply,
shared mmap is allowed.
Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
... and fix the directory locking documentation and proof of correctness.
Holding ->s_vfs_rename_mutex *almost* prevents ->d_parent changes; the
case where we really don't want it is splicing the root of disconnected
tree to somewhere.
In other words, ->s_vfs_rename_mutex is sufficient to stabilize "X is an
ancestor of Y" only if X and Y are already in the same tree. Otherwise
it can go from false to true, and one can construct a deadlock on that.
Make lock_two_directories() report an error in such case and update the
callers of lock_rename()/lock_rename_child() to handle such errors.
And yes, such conditions are not impossible to create ;-/
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We should never lock two subdirectories without having taken
->s_vfs_rename_mutex; inode pointer order or not, the "order" proposed
in 28eceeda130f "fs: Lock moved directories" is not transitive, with
the usual consequences.
The rationale for locking renamed subdirectory in all cases was
the possibility of race between rename modifying .. in a subdirectory to
reflect the new parent and another thread modifying the same subdirectory.
For a lot of filesystems that's not a problem, but for some it can lead
to trouble (e.g. the case when short directory contents is kept in the
inode, but creating a file in it might push it across the size limit
and copy its contents into separate data block(s)).
However, we need that only in case when the parent does change -
otherwise ->rename() doesn't need to do anything with .. entry in the
first place. Some instances are lazy and do a tautological update anyway,
but it's really not hard to avoid.
Amended locking rules for rename():
find the parent(s) of source and target
if source and target have the same parent
lock the common parent
else
lock ->s_vfs_rename_mutex
lock both parents, in ancestor-first order; if neither
is an ancestor of another, lock the parent of source
first.
find the source and target.
if source and target have the same parent
if operation is an overwriting rename of a subdirectory
lock the target subdirectory
else
if source is a subdirectory
lock the source
if target is a subdirectory
lock the target
lock non-directories involved, in inode pointer order if both
source and target are such.
That way we are guaranteed that parents are locked (for obvious reasons),
that any renamed non-directory is locked (nfsd relies upon that),
that any victim is locked (emptiness check needs that, among other things)
and subdirectory that changes parent is locked (needed to protect the update
of .. entries). We are also guaranteed that any operation locking more
than one directory either takes ->s_vfs_rename_mutex or locks a parent
followed by its child.
Cc: stable@vger.kernel.org
Fixes: 28eceeda130f "fs: Lock moved directories"
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The structure is called struct xattr_handler, singular, not plural.
Fixing the typo also makes it greppable with the whole word matching
flag.
Signed-off-by: Ariel Miculas <amiculas@cisco.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20231027152101.226296-1-amiculas@cisco.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZUpEaAAKCRCRxhvAZXjc
ounBAQCAoS66gnOZ+k4kOWwB2zZ1Ueh3dPFC7IcEZ+pwFS8hpAEAxUQxV0TSWf5l
W/1oKRtAJyuSYvehHeMUSJmHVBiM8w4=
=bNm0
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.7.fsid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fanotify fsid updates from Christian Brauner:
"This work is part of the plan to enable fanotify to serve as a drop-in
replacement for inotify. While inotify is availabe on all filesystems,
fanotify currently isn't.
In order to support fanotify on all filesystems two things are needed:
(1) all filesystems need to support AT_HANDLE_FID
(2) all filesystems need to report a non-zero f_fsid
This contains (1) and allows filesystems to encode non-decodable file
handlers for fanotify without implementing any exportfs operations by
encoding a file id of type FILEID_INO64_GEN from i_ino and
i_generation.
Filesystems that want to opt out of encoding non-decodable file ids
for fanotify that don't support NFS export can do so by providing an
empty export_operations struct.
This also partially addresses (2) by generating f_fsid for simple
filesystems as well as freevxfs. Remaining filesystems will be dealt
with by separate patches.
Finally, this contains the patch from the current exportfs maintainers
which moves exportfs under vfs with Chuck, Jeff, and Amir as
maintainers and vfs.git as tree"
* tag 'vfs-6.7.fsid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
MAINTAINERS: create an entry for exportfs
fs: fix build error with CONFIG_EXPORTFS=m or not defined
freevxfs: derive f_fsid from bdev->bd_dev
fs: report f_fsid from s_dev for "simple" filesystems
exportfs: support encoding non-decodeable file handles by default
exportfs: define FILEID_INO64_GEN* file handle types
exportfs: make ->encode_fh() a mandatory method for NFS export
exportfs: add helpers to check if filesystem can encode/decode file handles
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmVIg7wACgkQEVvVyTe/
1WpPtw//WiQc+gwGvLEyi9YFZWTuAOO8ZGSxF7t+CU2SYy9d91s3K8dNRx2kWBh0
ycsFIYhTyq6BrHqlg1JpI4WW8S9SW7BkXbE4f5Lm/kiiGqlJn+eCA+aR8AqPMpVx
KemCHJQj2WwcjlHRonAYIyQOApNePwy7EPPDlk+TlVEgMRtDHW+CY1ftChqi8bVf
/aYoFdJIbliIUWNKzBeje/Hypz1a3aUrMitquoK3A91RgIgQQmnWN6yirT4Z4gKB
Vd9vnnFPu9LWjCv+RUuP3C0G4zHkP92sfbEIulKqml0Vx68JZyTy5jsUzxaGmXaQ
im4neCEUV88PZoAJgGFcBHbi4y0bt5WDIf7kdZGheZJv8H/X5TQTxlXle3h0qHEp
Rx65OjPGzZjfia1nFzNjUCd+jCtdp02H1WxchfsRpwmXqGPYBLsNPWN+BoSaLjtL
gbEMRMAs4mAJObnhAIOZclzh0boLi4IWz0yNkBoZRxkUOAUDV7UaICQoSneWj36j
OH75cNrRV7vZeMQy2aLWkxGm2LabhjzDZXwO9zwHZFMKIpGg7zko10Fko9qS+20p
BHe+aqWgr0qxcY41fpKzu0ftSuNs9Jsn1/I/jqmgwFw+ogRwjuxgl2LkCHOPh3cS
AMwd1ZahQDQKq6Wgh1WCOi31n1d8+/Gy74IojwSC++XaPMqYB0I=
=MPCb
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs updates from Amir Goldstein:
- Overlayfs aio cleanups and fixes
Cleanups and minor fixes in preparation for factoring out of
read/write passthrough code.
- Overlayfs lock ordering changes
Hold mnt_writers only throughout copy up instead of a long lived
elevated refcount.
- Add support for nesting overlayfs private xattrs
There are cases where you want to use an overlayfs mount as a
lowerdir for another overlayfs mount. For example, if the system
rootfs is on overlayfs due to composefs, or to make it volatile (via
tmpfs), then you cannot currently store a lowerdir on the rootfs,
because the inner overlayfs will eat all the whiteouts and overlay
xattrs. This means you can't e.g. store on the rootfs a prepared
container image for use with overlayfs.
This adds support for nesting of overlayfs mounts by escaping the
problematic features and unescaping them when exposing to the
overlayfs user.
- Add new mount options for appending lowerdirs
* tag 'ovl-update-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: add support for appending lowerdirs one by one
ovl: refactor layer parsing helpers
ovl: store and show the user provided lowerdir mount option
ovl: remove unused code in lowerdir param parsing
ovl: Add documentation on nesting of overlayfs mounts
ovl: Add an alternative type of whiteout
ovl: Support escaped overlay.* xattrs
ovl: Add OVL_XATTR_TRUSTED/USER_PREFIX_LEN macros
ovl: Move xattr support to new xattrs.c file
ovl: do not encode lower fh with upper sb_writers held
ovl: do not open/llseek lower file with upper sb_writers held
ovl: reorder ovl_want_write() after ovl_inode_lock()
ovl: split ovl_want_write() into two helpers
ovl: add helper ovl_file_modified()
ovl: protect copying of realinode attributes to ovl inode
ovl: punt write aio completion to workqueue
ovl: propagate IOCB_APPEND flag on writes to realfile
ovl: use simpler function to convert iocb to rw flags
- Fix inode metadata space layout documentation;
- Avoid warning MicroLZMA format anymore;
- Fix erofs_insert_workgroup() lockref usage;
- Some cleanups.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmVBMVoRHHhpYW5nQGtl
cm5lbC5vcmcACgkQUXZn5Zlu5qq8iA/+NdvaKPFvexKuJkW2n9t4UaDEPqS6qobk
B1uIaXxDhz6RdhDyhLA7ysQQi1Z8g0t/MXgCWZGM1NXJ995Y0VE+dFlT+a5QPLiu
4rz8bnZpdUHVF0jIyoru/nPZQWXsYBh2AeYLKPi+nvQoCRtoRsvdKXxv76iEF45o
zE2548TXqogL3Q04oH040fRzVwYROqnF85q9uCPNFS/Sh/swxyob9waBijlzbECS
GLjyJ+ZsQBBG95SzwNEPSFek5+od9ty2UKQwDJfqABOhbQT2Li6r2t0afyrlqDQY
m7VvvmzkFe5uYwAkuMkAOxhnW+eBzxz/whk+8W3dw/lm4dDbe0LuCrGXVdivdf3F
dO5ywSEGJnOH1FqNF1ubAWdzJ+qoaTupuOpwrGM9/lHGgxCIVjW06XzKfOv/AfaQ
VR2SPTrcA6eulXsLGopVho9dtHBUQZIAmJjQcWIyCpYm15NmElDXsybN/Os2Hgo0
d2Tsx9fgsN25xRsIBPxiV6qsEXqx56Nce1U4k0VV4ali81Y22nqq6rV4AFWz8KNJ
LhYFcL5YgO5RxYEpoQGkt4hyeDkxt4lc/cYqCr8E8kpNEuz0P3O27NcLnSP7CJVQ
pBYfSoV5tm+SyK4toeacG+HJTnohNyLE4vIXN96awoCBfJiickY7ZxJNuf6wRbOA
4Y+aMq/lQXQ=
=Yum+
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"Nothing exciting lands for this cycle, since we're still busying in
developing support for sub-page blocks and large-folios of compressed
data for new scenarios on Android.
In this cycle, MicroLZMA format is marked as stable, and there are
minor cleanups around documentation and codebase. In addition, it also
fixes incorrect lockref usage in erofs_insert_workgroup().
Summary:
- Fix inode metadata space layout documentation
- Avoid warning for MicroLZMA format anymore
- Fix erofs_insert_workgroup() lockref usage
- Some cleanups"
* tag 'erofs-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix erofs_insert_workgroup() lockref usage
erofs: tidy up redundant includes
erofs: get rid of ROOT_NID()
erofs: simplify compression configuration parser
erofs: don't warn MicroLZMA format anymore
erofs: fix inode metadata space layout description in documentation
there are some significant changes nonetheless:
- Some more Spanish-language and Chinese translations.
- The much-discussed documentation of the confidential-computing threat
model.
- Powerpc and RISCV documentation move under Documentation/arch - these
complete this particular bit of documentation churn.
- A large traditional-Chinese documentation update.
- A new document on backporting and conflict resolution.
- Some kernel-doc and Sphinx fixes.
Plus the usual smattering of smaller updates and typo fixes.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmVBNv8PHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Y0JkH/36MOpkaDnsY69/dMRKSuD4mAAP2H6LS8V63
SsMgH5VCj8lcy/Tz1+J89t14pbcX8l0viKxSo4UxvzoJ5snrz8A8gZ9oqY7NCcNs
nMtolnN5IwdbgGnEGqASSLsl07lnabhRK0VYv9ZO7lHjYQp97VsJ/qrjJn385HFE
vYW8iRcxcKdwtuuwOtbPcdAMjP54saJdNC5wMLsfMR0csKcGbzaSNpqpiGovzT7l
phG2DSxrJH0gUZyeGPryroNppaf+mVKSDSiwRdI8mzm0J67p6dZYYwBS1Iw6Awbf
8iYoj6W63/FVQbXffPx5d6ffOSQh4JkAskxgBUOzluSGusSDc+4=
=9HU5
-----END PGP SIGNATURE-----
Merge tag 'docs-6.7' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"The number of commits for documentation is not huge this time around,
but there are some significant changes nonetheless:
- Some more Spanish-language and Chinese translations
- The much-discussed documentation of the confidential-computing
threat model
- Powerpc and RISCV documentation move under Documentation/arch -
these complete this particular bit of documentation churn
- A large traditional-Chinese documentation update
- A new document on backporting and conflict resolution
- Some kernel-doc and Sphinx fixes
Plus the usual smattering of smaller updates and typo fixes"
* tag 'docs-6.7' of git://git.lwn.net/linux: (40 commits)
scripts/kernel-doc: Fix the regex for matching -Werror flag
docs: backporting: address feedback
Documentation: driver-api: pps: Update PPS generator documentation
speakup: Document USB support
doc: blk-ioprio: Bring the doc in line with the implementation
docs: usb: fix reference to nonexistent file in UVC Gadget
docs: doc-guide: mention 'make refcheckdocs'
Documentation: fix typo in dynamic-debug howto
scripts/kernel-doc: match -Werror flag strictly
Documentation/sphinx: Remove the repeated word "the" in comments.
docs: sparse: add SPDX-License-Identifier
docs/zh_CN: Add subsystem-apis Chinese translation
docs/zh_TW: update contents for zh_TW
docs: submitting-patches: encourage direct notifications to commenters
docs: add backporting and conflict resolution document
docs: move riscv under arch
docs: update link to powerpc/vmemmap_dedup.rst
mm/memory-hotplug: fix typo in documentation
docs: move powerpc under arch
PCI: Update the devres documentation regarding to pcim_*()
...
- Documentation update for /proc/cmdline, which includes both the
parameters from bootloader and the embedded parameters in the kernel.
- fs/proc: Add bootloader argument as a comment line to /proc/bootconfig
so that the user can distinguish what parameters were passed from
bootloader even if bootconfig modified that.
- Documentation fix to add /proc/bootconfig to proc.rst.
-----BEGIN PGP SIGNATURE-----
iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmVAzokbHG1hc2FtaS5o
aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bJxMH+QHgxumk/zhQ6TG1szG3
TWrkhg5oP2hGRTbJc6lK/BJM5E/xz6SNoGP6nEEW5JdkzKwNyNSLab+J0as9Ua3V
KtJGiAtp4FZpVl6B2y5UBqBerRl8U4NjW8TOsEv8Dy4gVlgjdbyS7fzy3T86DEZ5
QGQ004pYVciSzmRvEmk3f0GUqsgJUYOjZWIGp2+fukpTnJhD4o3K3UwUs4Rz1tF7
cnVm34af67648/mGSxB+qHBHq4dznrWopv/SgH6OcEzOOpvEGAwPuteAms3S8g3j
WOZNHB7QlNkVlASlAcC+yPHjJBtWMfRTsWQIxW5F0wSqq0SRCzTMLHU1wltJagyq
lBQ=
=J8P0
-----END PGP SIGNATURE-----
Merge tag 'bootconfig-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig updates from Masami Hiramatsu:
- Documentation update for /proc/cmdline, which includes both the
parameters from bootloader and the embedded parameters in the kernel
- fs/proc: Add bootloader argument as a comment line to
/proc/bootconfig so that the user can distinguish what parameters
were passed from bootloader even if bootconfig modified that
- Documentation fix to add /proc/bootconfig to proc.rst
* tag 'bootconfig-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
doc: Add /proc/bootconfig to proc.rst
fs/proc: Add boot loader arguments as comment to /proc/bootconfig
doc: Update /proc/cmdline documentation to include boot config
Add new mount options lowerdir+ and datadir+ that can be used to add
layers to lower layers stack one by one.
Unlike the legacy lowerdir mount option, special characters (i.e. colons
and cammas) are not unescaped with these new mount options.
The new mount options can be repeated to compose a large stack of lower
layers, but they may not be mixed with the lagacy lowerdir mount option,
because for displaying lower layers in mountinfo, we do not want to mix
escaped with unescaped lower layers path syntax.
Similar to data-only layer rules with the lowerdir mount option, the
datadir+ option must follow at least one lowerdir+ option and the
lowerdir+ option must not follow the datadir+ option.
If the legacy lowerdir mount option follows lowerdir+ and datadir+
mount options, it overrides them. Sepcifically, calling:
fsconfig(FSCONFIG_SET_STRING, "lowerdir", "", 0);
can be used to reset previously setup lower layers.
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Link: https://lore.kernel.org/r/CAJfpegt7VC94KkRtb1dfHG8+4OzwPBLYqhtc8=QFUxpFJE+=RQ@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
This update adds support for configuring the crypto data unit size (i.e.
the granularity of file contents encryption) to be less than the
filesystem block size. This can allow users to use inline encryption
hardware in some cases when it wouldn't otherwise be possible.
In addition, there are two commits that are prerequisites for the
extent-based encryption support that the btrfs folks are working on.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZT8acBQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK+czAQDkStgX1ICJQANnxwbrg/SUVdZjPuFH
sJw3sUVpBR81TwEA/SyWh3YzVNZdpE7PWNrCknrC+qnO8hd9QBEjnQfwIQc=
=t44a
-----END PGP SIGNATURE-----
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux
Pull fscrypt updates from Eric Biggers:
"This update adds support for configuring the crypto data unit size
(i.e. the granularity of file contents encryption) to be less than the
filesystem block size. This can allow users to use inline encryption
hardware in some cases when it wouldn't otherwise be possible.
In addition, there are two commits that are prerequisites for the
extent-based encryption support that the btrfs folks are working on"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: track master key presence separately from secret
fscrypt: rename fscrypt_info => fscrypt_inode_info
fscrypt: support crypto data unit size less than filesystem block size
fscrypt: replace get_ino_and_lblk_bits with just has_32bit_inodes
fscrypt: compute max_lblk_bits from s_maxbytes and block size
fscrypt: make the bounce page pool opt-in instead of opt-out
fscrypt: make it clearer that key_prefix is deprecated
This release completes the SunRPC thread scheduler work that was
begun in v6.6. The scheduler can now find an svc thread to wake in
constant time and without a list walk. Thanks again to Neil Brown
for this overhaul.
Lorenzo Bianconi contributed infrastructure for a netlink-based
NFSD control plane. The long-term plan is to provide the same
functionality as found in /proc/fs/nfsd, plus some interesting
additions, and then migrate the NFSD user space utilities to
netlink.
A long series to overhaul NFSD's NFSv4 operation encoding was
applied in this release. The goals are to bring this family of
encoding functions in line with the matching NFSv4 decoding
functions and with the NFSv2 and NFSv3 XDR functions, preparing
the way for better memory safety and maintainability.
A further improvement to NFSD's write delegation support was
contributed by Dai Ngo. This adds a CB_GETATTR callback,
enabling the server to retrieve cached size and mtime data from
clients holding write delegations. If the server can retrieve
this information, it does not have to recall the delegation in
some cases.
The usual panoply of bug fixes and minor improvements round out
this release. As always I am grateful to all contributors,
reviewers, and testers.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmU5IuoACgkQM2qzM29m
f5eVsg//bVp8S93ci/oDlKfzOwH2fO5e5rna91wrDpJxkd51h6KTx55dSRG5sjAZ
EywIVOann6xCtsixAPyff5Cweg2dWvzQRsy1ZnvWQ1qZBzD5KAJY5LPkeSFUCKBo
Zani/qTOYbxzgFMjZx+yDSXDPKG68WYZBQK59SI7mURu4SYdk8aRyNY8mjHfr0Vh
Aqrcny4oVtXV4sL5P5G/2FUW7WKT3olA3jSYlRRNMhbs2qpEemRCCrspOEMMad+b
t1+ZCg+U27PMranvOJnof4RU7peZbaxDWA0gyiUbivVXVtZn9uOs0ffhktkvechL
ePc33dqdp2ITdKIPA6JlaRv5WflKXQw0YYM9Kv5mcR4A2el7owL4f/pMlPhtbYwJ
IOJv15KdKVN979G2e6WMYiKK+iHfaUUguhMEXnfnGoAajHOZNQiUEo3iFQAD7LDc
DvMF8d9QqYmB9IW8FOYaRRfZGJOQHf3TL79Nd08z/bn5swvlvfj77leux9Sb+0/m
Luk2Xvz2AJVSXE31wzabaGHkizN+BtH+e4MMbXUHBPW5jE9v7XOnEUFr4UdZyr9P
Gl87A7NcrzNjJWT5TrnzM4sOslNsx46Aeg+VuNt2fSRn2dm6iBu2B8s0N4imx6dV
PX1y9VSLq5WRhjrFZ1qeiZdsuTaQtrEiNDoRIQR6nCJPAV80iFk=
=B4wJ
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"This release completes the SunRPC thread scheduler work that was begun
in v6.6. The scheduler can now find an svc thread to wake in constant
time and without a list walk. Thanks again to Neil Brown for this
overhaul.
Lorenzo Bianconi contributed infrastructure for a netlink-based NFSD
control plane. The long-term plan is to provide the same functionality
as found in /proc/fs/nfsd, plus some interesting additions, and then
migrate the NFSD user space utilities to netlink.
A long series to overhaul NFSD's NFSv4 operation encoding was applied
in this release. The goals are to bring this family of encoding
functions in line with the matching NFSv4 decoding functions and with
the NFSv2 and NFSv3 XDR functions, preparing the way for better memory
safety and maintainability.
A further improvement to NFSD's write delegation support was
contributed by Dai Ngo. This adds a CB_GETATTR callback, enabling the
server to retrieve cached size and mtime data from clients holding
write delegations. If the server can retrieve this information, it
does not have to recall the delegation in some cases.
The usual panoply of bug fixes and minor improvements round out this
release. As always I am grateful to all contributors, reviewers, and
testers"
* tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (127 commits)
svcrdma: Fix tracepoint printk format
svcrdma: Drop connection after an RDMA Read error
NFSD: clean up alloc_init_deleg()
NFSD: Fix frame size warning in svc_export_parse()
NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
nfsd: Clean up errors in nfs3proc.c
nfsd: Clean up errors in nfs4state.c
NFSD: Clean up errors in stats.c
NFSD: simplify error paths in nfsd_svc()
NFSD: Clean up nfsd4_encode_seek()
NFSD: Clean up nfsd4_encode_offset_status()
NFSD: Clean up nfsd4_encode_copy_notify()
NFSD: Clean up nfsd4_encode_copy()
NFSD: Clean up nfsd4_encode_test_stateid()
NFSD: Clean up nfsd4_encode_exchange_id()
NFSD: Clean up nfsd4_do_encode_secinfo()
NFSD: Clean up nfsd4_encode_access()
NFSD: Clean up nfsd4_encode_readdir()
NFSD: Clean up nfsd4_encode_entry4()
NFSD: Add an nfsd4_encode_nfs_cookie4() helper
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTpoQAAKCRCRxhvAZXjc
ovFNAQDgIRjXfZ1Ku+USxsRRdqp8geJVaNc3PuMmYhOYhUenqgEAmC1m+p0y31dS
P6+HlL16Mqgu0tpLCcJK9BibpDZ0Ew4=
=7yD1
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual miscellaneous features, cleanups, and fixes
for vfs and individual fses.
Features:
- Rename and export helpers that get write access to a mount. They
are used in overlayfs to get write access to the upper mount.
- Print the pretty name of the root device on boot failure. This
helps in scenarios where we would usually only print
"unknown-block(1,2)".
- Add an internal SB_I_NOUMASK flag. This is another part in the
endless POSIX ACL saga in a way.
When POSIX ACLs are enabled via SB_POSIXACL the vfs cannot strip
the umask because if the relevant inode has POSIX ACLs set it might
take the umask from there. But if the inode doesn't have any POSIX
ACLs set then we apply the umask in the filesytem itself. So we end
up with:
(1) no SB_POSIXACL -> strip umask in vfs
(2) SB_POSIXACL -> strip umask in filesystem
The umask semantics associated with SB_POSIXACL allowed filesystems
that don't even support POSIX ACLs at all to raise SB_POSIXACL
purely to avoid umask stripping. That specifically means NFS v4 and
Overlayfs. NFS v4 does it because it delegates this to the server
and Overlayfs because it needs to delegate umask stripping to the
upper filesystem, i.e., the filesystem used as the writable layer.
This went so far that SB_POSIXACL is raised eve on kernels that
don't even have POSIX ACL support at all.
Stop this blatant abuse and add SB_I_NOUMASK which is an internal
superblock flag that filesystems can raise to opt out of umask
handling. That should really only be the two mentioned above. It's
not that we want any filesystems to do this. Ideally we have all
umask handling always in the vfs.
- Make overlayfs use SB_I_NOUMASK too.
- Now that we have SB_I_NOUMASK, stop checking for SB_POSIXACL in
IS_POSIXACL() if the kernel doesn't have support for it. This is a
very old patch but it's only possible to do this now with the wider
cleanup that was done.
- Follow-up work on fake path handling from last cycle. Citing mostly
from Amir:
When overlayfs was first merged, overlayfs files of regular files
and directories, the ones that are installed in file table, had a
"fake" path, namely, f_path is the overlayfs path and f_inode is
the "real" inode on the underlying filesystem.
In v6.5, we took another small step by introducing of the
backing_file container and the file_real_path() helper. This change
allowed vfs and filesystem code to get the "real" path of an
overlayfs backing file. With this change, we were able to make
fsnotify work correctly and report events on the "real" filesystem
objects that were accessed via overlayfs.
This method works fine, but it still leaves the vfs vulnerable to
new code that is not aware of files with fake path. A recent
example is commit db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get
the i_version"). This commit uses direct referencing to f_path in
IMA code that otherwise uses file_inode() and file_dentry() to
reference the filesystem objects that it is measuring.
This contains work to switch things around: instead of having
filesystem code opt-in to get the "real" path, have generic code
opt-in for the "fake" path in the few places that it is needed.
Is it far more likely that new filesystems code that does not use
the file_dentry() and file_real_path() helpers will end up causing
crashes or averting LSM/audit rules if we keep the "fake" path
exposed by default.
This change already makes file_dentry() moot, but for now we did
not change this helper just added a WARN_ON() in ovl_d_real() to
catch if we have made any wrong assumptions.
After the dust settles on this change, we can make file_dentry() a
plain accessor and we can drop the inode argument to ->d_real().
- Switch struct file to SLAB_TYPESAFE_BY_RCU. This looks like a small
change but it really isn't and I would like to see everyone on
their tippie toes for any possible bugs from this work.
Essentially we've been doing most of what SLAB_TYPESAFE_BY_RCU for
files since a very long time because of the nasty interactions
between the SCM_RIGHTS file descriptor garbage collection. So
extending it makes a lot of sense but it is a subtle change. There
are almost no places that fiddle with file rcu semantics directly
and the ones that did mess around with struct file internal under
rcu have been made to stop doing that because it really was always
dodgy.
I forgot to put in the link tag for this change and the discussion
in the commit so adding it into the merge message:
https://lore.kernel.org/r/20230926162228.68666-1-mjguzik@gmail.com
Cleanups:
- Various smaller pipe cleanups including the removal of a spin lock
that was only used to protect against writes without pipe_lock()
from O_NOTIFICATION_PIPE aka watch queues. As that was never
implemented remove the additional locking from pipe_write().
- Annotate struct watch_filter with the new __counted_by attribute.
- Clarify do_unlinkat() cleanup so that it doesn't look like an extra
iput() is done that would cause issues.
- Simplify file cleanup when the file has never been opened.
- Use module helper instead of open-coding it.
- Predict error unlikely for stale retry.
- Use WRITE_ONCE() for mount expiry field instead of just commenting
that one hopes the compiler doesn't get smart.
Fixes:
- Fix readahead on block devices.
- Fix writeback when layztime is enabled and inodes whose timestamp
is the only thing that changed reside on wb->b_dirty_time. This
caused excessively large zombie memory cgroup when lazytime was
enabled as such inodes weren't handled fast enough.
- Convert BUG_ON() to WARN_ON_ONCE() in open_last_lookups()"
* tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (26 commits)
file, i915: fix file reference for mmap_singleton()
vfs: Convert BUG_ON to WARN_ON_ONCE in open_last_lookups
writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs
chardev: Simplify usage of try_module_get()
ovl: rely on SB_I_NOUMASK
fs: fix umask on NFS with CONFIG_FS_POSIX_ACL=n
fs: store real path instead of fake path in backing file f_path
fs: create helper file_user_path() for user displayed mapped file path
fs: get mnt_writers count for an open backing file's real path
vfs: stop counting on gcc not messing with mnt_expiry_mark if not asked
vfs: predict the error in retry_estale as unlikely
backing file: free directly
vfs: fix readahead(2) on block devices
io_uring: use files_lookup_fd_locked()
file: convert to SLAB_TYPESAFE_BY_RCU
vfs: shave work on failed file open
fs: simplify misleading code to remove ambiguity regarding ihold()/iput()
watch_queue: Annotate struct watch_filter with __counted_by
fs/pipe: use spinlock in pipe_read() only if there is a watch_queue
fs/pipe: remove unnecessary spinlock from pipe_write()
...
Rename the default helper for encoding FILEID_INO32_GEN* file handles to
generic_encode_ino32_fh() and convert the filesystems that used the
default implementation to use the generic helper explicitly.
After this change, exportfs_encode_inode_fh() no longer has a default
implementation to encode FILEID_INO32_GEN* file handles.
This is a step towards allowing filesystems to encode non-decodeable
file handles for fanotify without having to implement any
export_operations.
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231023180801.2953446-3-amir73il@gmail.com
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
In recent discussions around some performance improvements in the file
handling area we discussed switching the file cache to rely on
SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based
freeing for files completely. This is a pretty sensitive change overall
but it might actually be worth doing.
The main downside is the subtlety. The other one is that we should
really wait for Jann's patch to land that enables KASAN to handle
SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this
exists.
With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times
which requires a few changes. So it isn't sufficient anymore to just
acquire a reference to the file in question under rcu using
atomic_long_inc_not_zero() since the file might have already been
recycled and someone else might have bumped the reference.
In other words, callers might see reference count bumps from newer
users. For this reason it is necessary to verify that the pointer is the
same before and after the reference count increment. This pattern can be
seen in get_file_rcu() and __files_get_rcu().
In addition, it isn't possible to access or check fields in struct file
without first aqcuiring a reference on it. Not doing that was always
very dodgy and it was only usable for non-pointer data in struct file.
With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a
reference under rcu or they must hold the files_lock of the fdtable.
Failing to do either one of this is a bug.
Thanks to Jann for pointing out that we need to ensure memory ordering
between reallocations and pointer check by ensuring that all subsequent
loads have a dependency on the second load in get_file_rcu() and
providing a fixup that was folded into this patch.
Cc: Jann Horn <jannh@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Master keys can be in one of three states: present, incompletely
removed, and absent (as per FSCRYPT_KEY_STATUS_* used in the UAPI).
Currently, the way that "present" is distinguished from "incompletely
removed" internally is by whether ->mk_secret exists or not.
With extent-based encryption, it will be necessary to allow per-extent
keys to be derived while the master key is incompletely removed, so that
I/O on open files will reliably continue working after removal of the
key has been initiated. (We could allow I/O to sometimes fail in that
case, but that seems problematic for reasons such as writes getting
silently thrown away and diverging from the existing fscrypt semantics.)
Therefore, when the filesystem is using extent-based encryption,
->mk_secret can't be wiped when the key becomes incompletely removed.
As a prerequisite for doing that, this patch makes the "present" state
be tracked using a new field, ->mk_present. No behavior is changed yet.
The basic idea here is borrowed from Josef Bacik's patch
"fscrypt: use a flag to indicate that the master key is being evicted"
(https://lore.kernel.org/r/e86c16dddc049ff065f877d793ad773e4c6bfad9.1696970227.git.josef@toxicpanda.com).
I reimplemented it using a "present" bool instead of an "evicted" flag,
fixed a couple bugs, and tried to update everything to be consistent.
Note: I considered adding a ->mk_status field instead, holding one of
FSCRYPT_KEY_STATUS_*. At first that seemed nice, but it ended up being
more complex (despite simplifying FS_IOC_GET_ENCRYPTION_KEY_STATUS),
since it would have introduced redundancy and had weird locking rules.
Reviewed-by: Neal Gompa <neal@gompa.dev>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20231015061055.62673-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
This patch reverts mostly commit 40595cdc93ed ("nfs: block notification
on fs with its own ->lock") and introduces an EXPORT_OP_ASYNC_LOCK
export flag to signal that the "own ->lock" implementation supports
async lock requests. The only main user is DLM that is used by GFS2 and
OCFS2 filesystem. Those implement their own lock() implementation and
return FILE_LOCK_DEFERRED as return value. Since commit 40595cdc93ed
("nfs: block notification on fs with its own ->lock") the DLM
implementation were never updated. This patch should prepare for DLM
to set the EXPORT_OP_ASYNC_LOCK export flag and update the DLM
plock implementation regarding to it.
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Before commit b36a5780cb44 ("ovl: modify layer parameter parsing"),
spaces and commas in lowerdir mount option value used to be escaped using
seq_show_option().
In current upstream, when lowerdir value has a space, it is not escaped
in /proc/mounts, e.g.:
none /mnt overlay rw,relatime,lowerdir=l l,upperdir=u,workdir=w 0 0
which results in broken output of the mount utility:
none on /mnt type overlay (rw,relatime,lowerdir=l)
Store the original lowerdir mount options before unescaping and show
them using the same escaping used for seq_show_option() in addition to
escaping the colon separator character.
Fixes: b36a5780cb44 ("ovl: modify layer parameter parsing")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Update the /proc/cmdline documentation to explicitly state that this
file provides kernel boot parameters obtained via boot config from the
kernel image as well as those supplied by the boot loader.
Link: https://lore.kernel.org/all/20231005171747.541123-1-paulmck@kernel.org/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
- Fix a memory leak issue when using LZMA global compressed
deduplication;
- Fix empty device tags in flatdev mode;
- Update documentation for recent new features.
-----BEGIN PGP SIGNATURE-----
iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCZR7+cREceGlhbmdAa2Vy
bmVsLm9yZwAKCRA5NzHcH7XmBEHRAQCYKy+4zs4J2AavzVaxRPsbgun3VfVb4pxe
yqxR+vDOLgD/RAQoadAMcgKYnnDZrziLtb0myOO2SFGEIMTkI7FzEQE=
=+SF9
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.6-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Fix a memory leak issue when using LZMA global compressed
deduplication
- Fix empty device tags in flatdev mode
- Update documentation for recent new features
* tag 'erofs-for-6.6-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: update documentation
erofs: allow empty device tags in flatdev mode
erofs: fix memory leak of LZMA global compressed deduplication
- update new features like bloom filter and DEFLATE.
- add documentation for the long xattr name prefixes, which was
landed upstream since v6.4.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230928134852.31118-1-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZRKHuAAKCRCRxhvAZXjc
ohOLAQDU9Fxq5UdqCdmsyi/b24XJFZlQhcVIZy2Hrhcor9TiVQEAjuECGlxFPSgj
atVOWLdugDJquiHextqTEMgIecJpNw4=
=uINF
-----END PGP SIGNATURE-----
Merge tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains the usual miscellaneous fixes and cleanups for vfs and
individual fses:
Fixes:
- Revert ki_pos on error from buffered writes for direct io fallback
- Add missing documentation for block device and superblock handling
for changes merged this cycle
- Fix reiserfs flexible array usage
- Ensure that overlayfs sets ctime when setting mtime and atime
- Disable deferred caller completions with overlayfs writes until
proper support exists
Cleanups:
- Remove duplicate initialization in pipe code
- Annotate aio kioctx_table with __counted_by"
* tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
overlayfs: set ctime when setting mtime and atime
ntfs3: put resources during ntfs_fill_super()
ovl: disable IOCB_DIO_CALLER_COMP
porting: document superblock as block device holder
porting: document new block device opening order
fs/pipe: remove duplicate "offset" initializer
fs-writeback: do not requeue a clean inode having skipped pages
aio: Annotate struct kioctx_table with __counted_by
direct_write_fallback(): on error revert the ->ki_pos update from buffered write
reiserfs: Replace 1-element array with C99 style flex-array
Until now, fscrypt has always used the filesystem block size as the
granularity of file contents encryption. Two scenarios have come up
where a sub-block granularity of contents encryption would be useful:
1. Inline crypto hardware that only supports a crypto data unit size
that is less than the filesystem block size.
2. Support for direct I/O at a granularity less than the filesystem
block size, for example at the block device's logical block size in
order to match the traditional direct I/O alignment requirement.
(1) first came up with older eMMC inline crypto hardware that only
supports a crypto data unit size of 512 bytes. That specific case
ultimately went away because all systems with that hardware continued
using out of tree code and never actually upgraded to the upstream
inline crypto framework. But, now it's coming back in a new way: some
current UFS controllers only support a data unit size of 4096 bytes, and
there is a proposal to increase the filesystem block size to 16K.
(2) was discussed as a "nice to have" feature, though not essential,
when support for direct I/O on encrypted files was being upstreamed.
Still, the fact that this feature has come up several times does suggest
it would be wise to have available. Therefore, this patch implements it
by using one of the reserved bytes in fscrypt_policy_v2 to allow users
to select a sub-block data unit size. Supported data unit sizes are
powers of 2 between 512 and the filesystem block size, inclusively.
Support is implemented for both the FS-layer and inline crypto cases.
This patch focuses on the basic support for sub-block data units. Some
things are out of scope for this patch but may be addressed later:
- Supporting sub-block data units in combination with
FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64, in most cases. Unfortunately this
combination usually causes data unit indices to exceed 32 bits, and
thus fscrypt_supported_policy() correctly disallows it. The users who
potentially need this combination are using f2fs. To support it, f2fs
would need to provide an option to slightly reduce its max file size.
- Supporting sub-block data units in combination with
FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32. This has the same problem
described above, but also it will need special code to make DUN
wraparound still happen on a FS block boundary.
- Supporting use case (2) mentioned above. The encrypted direct I/O
code will need to stop requiring and assuming FS block alignment.
This won't be hard, but it belongs in a separate patch.
- Supporting this feature on filesystems other than ext4 and f2fs.
(Filesystems declare support for it via their fscrypt_operations.)
On UBIFS, sub-block data units don't make sense because UBIFS encrypts
variable-length blocks as a result of compression. CephFS could
support it, but a bit more work would be needed to make the
fscrypt_*_block_inplace functions play nicely with sub-block data
units. I don't think there's a use case for this on CephFS anyway.
Link: https://lore.kernel.org/r/20230925055451.59499-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Remove the repeated word "the" in comments.
Signed-off-by: Charles Han <hanchunchao@inspur.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed: Dave Chinner <dchinner@redhat.com>
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
We've changed the holder of the block device which has consequences.
Document this clearly and in detail so filesystem and vfs developers
have a proper digital paper trail.
Signed-off-by: Christian Brauner <brauner@kernel.org>
We've changed the order of opening block devices and superblock
handling. Let's document this so filesystem and vfs developers have
a proper digital paper trail.
Signed-off-by: Christian Brauner <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmT/hwAACgkQxWXV+ddt
WDsn7hAAngwEMKEAH9Jvu/BtHgRYcAdsGh5Mxw34aQf1+DAaH03GGsZjN6hfHYo4
FMsnnvoZD5VPfuaFaQVd+mS9mRzikm503W7KfZFAPAQTOjz50RZbohLnZWa3eFbI
46OcpoHusxwoYosEmIAt+dcw/gDlT9fpj+W11dKYtwOEjCqGA/OeKoVenfk38hVJ
r+XhLwZFf4dPIqE3Ht26UtJk87Xs2X0/LQxOX3vM1MZ+l38N4dyo7TQnwfTHlQNw
AK9sK6vp3rpRR96rvTV1dWr9lnmE7wky+Vh36DN/jxpzbW7Wx8IVoobBpcsO4Tyk
Vw/rdjB7g7LfBmjLFhWvvQ73jv0WjIUUzXH17RuxOeyAQJ9tXFztVMh+QoVVC/Ka
NxwA5uqyJKR7DIA+kLL06abUnASUVgP6Krdv9Fk7rYCKWluWk1k9ls9XaFFhytvg
eeno/UB0px1rwps5P5zfaSXLIXEl53Luy5rFhTMCCNQfXyo+Qe6PJyTafR3E0uP8
aXJV1lPG+o7qi9Vwg+20yy//1sE5gR0dLrcTaup3/20RK6eljZ/bNSkl3GJR9mlS
YF+J/Ccia06y8Qo0xaeCofxkoI3J/PK6KPOTt8yZDgYoetYgHhrfBRO0I7ZU4Edq
10512hAeskzPt6+5348+/jOEENASffXKP3FJSdDEzWd33vtlaHE=
=mHTa
-----END PGP SIGNATURE-----
Merge tag 'for-6.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- several fixes for handling directory item (inserting, removing,
iteration, error handling)
- fix transaction commit stalls when auto relocation is running and
blocks other tasks that want to commit
- fix a build error when DEBUG is enabled
- fix lockdep warning in inode number lookup ioctl
- fix race when finishing block group creation
- remove link to obsolete wiki in several files
* tag 'for-6.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
MAINTAINERS: remove links to obsolete btrfs.wiki.kernel.org
btrfs: assert delayed node locked when removing delayed item
btrfs: remove BUG() after failure to insert delayed dir index item
btrfs: improve error message after failure to add delayed dir index item
btrfs: fix a compilation error if DEBUG is defined in btree_dirty_folio
btrfs: check for BTRFS_FS_ERROR in pending ordered assert
btrfs: fix lockdep splat and potential deadlock after failure running delayed items
btrfs: do not block starts waiting on previous transaction commit
btrfs: release path before inode lookup during the ino lookup ioctl
btrfs: fix race between finishing block group creation and its item update
The wiki has been archived and is not updated anymore. Remove or replace
the links in files that contain it (MAINTAINERS, Kconfig, docs).
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>