90480 Commits

Author SHA1 Message Date
Thomas Bertschinger
b07ce72626 bcachefs: omit alignment attribute on big endian struct bkey
This is needed for building Rust bindings on big endian architectures
like s390x. Currently this is only done in userspace, but it might
happen in-kernel in the future. When creating a Rust binding for struct
bkey, the "packed" attribute is needed to get a type with the correct
member offsets in the big endian case. However, rustc does not allow
types to have both a "packed" and "align" attribute. Thus, in order to
get a Rust type compatible with the C type, we must omit the "aligned"
attribute in C.

This does not affect the struct's size or member offsets, only its
toplevel alignment, which should be an acceptable impact.

The little endian version can have the "align" attribute because the
"packed" attr is redundant, and rust-bindgen will omit the "packed" attr
when an "align" attr is present and it can do so without changing a
type's layout

Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:25 -04:00
Kent Overstreet
6e9d0558b1 bcachefs: bch2_trigger_alloc() handles state changes better
bch2_trigger_alloc() kicks off certain tasks on bucket state changes;
e.g. triggering the bucket discard worker and the invalidate worker.

We've observed the discard worker running too often - most runs it
doesn't do any work, according to the tracepoint - so clearly, we're
kicking it off too often.

This adds an explicit statechange() macro to make these checks more
precise.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
b63570f747 bcachefs: bch2_print_opts()
Make sure early error messages get redirected, for
kernel-fsck-from-userland.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
130d229ff5 bcachefs: Improve error messages in device remove path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
5ca8ff157d bcachefs: Use kvzalloc() when dynamically allocating btree paths
THis silences a mm/page_alloc.c warning about allocating more than a
page with GFP_NOFAIL - and there's no reason for this to not have a
vmalloc fallback anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
83bd5985fa bcachefs: Track iter->ip_allocated at bch2_trans_copy_iter()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
3254c1b0e5 bcachefs: Save key_cache_path in peek_slot()
When bch2_btree_iter_peek_slot() clones the iterator to search for the
next key, and then discovers that the key from the cloned iterator is
the key we want to return - we also want to save the
iter->key_cache_path as well, for the update path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
91dcad18d3 bcachefs: Pin btree cache in ram for random access in fsck
Various phases of fsck involve checking references from one btree to
another: this means doing a sequential scan of one btree, and then
mostly random access into the second.

This is particularly painful for checking extents <-> backpointers; we
can prefetch btree node access on the sequential scan, but not on the
random access portion, and this is particularly painful on spinning
rust, where we'd like to keep the pipeline fairly full of btree node
reads so that the elevator can reduce seeking.

This patch implements prefetching and pinning of the portion of the
btree that we'll be doing random access to. We already calculate how
much of the random access btree will fit in memory so it's a fairly
straightforward change.

This will put more pressure on system memory usage, so we introduce a
new option, fsck_memory_usage_percent, which is the percentage of total
system ram that fsck is allowed to pin.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
835cd3e147 bcachefs: Check for subvolume children when deleting subvolumes
Recursively destroying subvolumes isn't allowed yet.

Fixes: https://github.com/koverstreet/bcachefs/issues/634
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
b26d79147f bcachefs: BTREE_ID_subvolume_children
Add a btree to record a parent -> child subvolume relationships,
according to the filesystem heirarchy.

The subvolume_children btree is a bitset btree: if a bit is set at pos
p, that means p.offset is a child of subvolume p.inode.

This will be used for efficiently listing subvolumes, as well as
recursive deletion.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
b8628a2529 bcachefs: bch_subvolume::fs_path_parent
Record the filesystem path heirarchy for subvolumes in bch_subvolume

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
e07c28ab92 bcachefs: bch2_btree_bit_mod()
Provide a non-write buffer version of bch2_btree_bit_mod_buffered(), for
the subvolume children btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
506b187603 bcachefs: bch2_btree_bit_mod -> bch2_btree_bit_mod_buffered
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
56e230473d bcachefs: Correctly reattach subvolumes
Subvolumes need special handling to reattach - we always reattach them
in the root subvolume's lost+found, and they need a slightly different
kind of dirent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
3a136177f3 bcachefs: check_path() now prints full inode when reattaching
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
688a769409 bcachefs: Pass inode bkey to check_path()
prep work for improving logging/error messages

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
f5d58d0c72 bcachefs: Fix path where dirent -> subvol missing and we don't fix
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
64304aaf4e bcachefs: bch_subvolume::parent -> creation_parent
bit of renaming, prep for adding a fs path parent

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
45b4ed525e bcachefs: Repair subvol dirents that point to non subvols
when repair switches d_type to or from DT_SUBVOL, we need to update the
target accordingly

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
c60b7f803c bcachefs: check dirent->d_parent_subvol
Check that d_parent_subvol makes sense - the dirent's snapshot must be
visible in d_parent_subvol (i.e. an ancestor of d_parent_subvol's
snapshot) in order to be visible.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
f4e68c859f bcachefs: check inode->bi_parent_subvol against dirent
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
ea27001e14 bcachefs: delete duplicated checks in check_dirent_to_subvol()
these were already checked in check_subvol()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
e539ebb867 bcachefs: simplify check_dirent_inode_dirent()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
0b498a5a39 bcachefs: check bi_parent_subvol in check_inode()
check for inodes with a nonzero bi_parent_subvol field that aren't
actually subvolume roots

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet
971a1503a2 bcachefs: better log message in lookup_inode_for_snapshot()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet
0b17618fdc bcachefs: check_inode_dirent_inode()
check that if an inode has a backpointer, the dirent it points to points
back to it.

We do this in check_dirent_inode_dirent(), but only for inodes that have
dirents that point to them - we also have to do the check starting from
the inode to catch inodes that don't have dirents that point to them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet
f2b02d099c bcachefs: Check subvol <-> inode pointers in check_inode()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet
4c20278eb1 bcachefs: Check subvol <-> inode pointers in check_subvol()
Subvolumes and subvolume root inodes point to each other: this verifies
the subvolume -> inode -> subvolme path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet
52946d828a bcachefs: Kill more -EIO error codes
This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet
da23795e4c bcachefs: thread_with_file: add f_ops.flush
Add a flush op, to return the exit code via close().

Also update bcachefs usage to use this to return fsck exit codes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:20 -04:00
Kent Overstreet
6b33312925 bcachefs: thread_with_file: Fix missing va_end()
Fixes: https://lore.kernel.org/linux-bcachefs/202402131603.E953E2CF@keescook/T/#u
Reported-by: coverity scan
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:18 -04:00
Darrick J. Wong
658a1e42ce bcachefs: thread_with_file: allow ioctls against these files
Make it so that a thread_with_stdio user can handle ioctls against the
file descriptor.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:15 -04:00
Darrick J. Wong
ab6752e24e bcachefs: thread_with_file: create ops structure for thread_with_stdio
Create an ops structure so we can add more file-based functionality in
the next few patches.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:13 -04:00
Darrick J. Wong
1cbae651e5 bcachefs: thread_with_file: fix various printf problems
Experimentally fix some problems with stdio_redirect_vprintf by creating
a MOO variant with which we can experiment.  We can't do a GFP_KERNEL
allocation while holding the spinlock, and I don't like how the printf
function can silently truncate the output if memory allocation fails.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:10 -04:00
Darrick J. Wong
fcb1620edd bcachefs: thread_with_file: allow creation of readonly files
Create a new run_thread_with_stdout function that opens a file in
O_RDONLY mode so that the kernel can write things to userspace but
userspace cannot write to the kernel.  This will be used to convey xfs
health event information to userspace.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:08 -04:00
Kent Overstreet
a5a650d647 bcachefs: thread_with_stdio: suppress hung task warning
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:04 -04:00
Kent Overstreet
8f9320d3a3 bcachefs: thread_with_stdio: Mark completed in ->release()
This fixes stdio_redirect_read() getting stuck, not noticing that the
pipe has been closed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:00 -04:00
Kent Overstreet
032b3fd057 bcachefs: Thread with file documentation
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:13 -04:00
Kent Overstreet
f704f108af bcachefs: thread_with_stdio: fix bch2_stdio_redirect_readline()
This fixes a bug where we'd return data without waiting for a newline,
if data was present but a newline was not.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:13 -04:00
Kent Overstreet
a6777ca4ff bcachefs: thread_with_stdio: kill thread_with_stdio_done()
Move the cleanup code to a wrapper function, where we can call it after
the thread_with_stdio fn exits.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:13 -04:00
Kent Overstreet
60e1baa872 bcachefs: thread_with_stdio: convert to darray
- eliminate the dependency on printbufs, so that we can lift
   thread_with_file for use in xfs
 - add a nonblocking parameter to stdio_redirect_printf(), and either
   block if the buffer is full or drop it on the floor - don't buffer
   infinitely

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:13 -04:00
Kent Overstreet
e017047fdb bcachefs: thread_with_stdio: eliminate double buffering
The output buffer lock has to be a spinlock so that we can write to it
from interrupt context, so we can't use a direct copy_to_user; this
switches thread_with_file_read() to use fault_in_writeable() and
copy_to_user_nofault(), similar to how thread_with_file_write() works.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:13 -04:00
Kent Overstreet
cb6fc943b6 bcachefs: kill kvpmalloc()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:12 -04:00
Linus Torvalds
e5e038b7ae \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmXx5kwACgkQnJ2qBz9k
 QNmZowf/UlGJ1rmQFFhoodn3SyK48tQjOZ23Ygx6v9FZiLMuQ3b1k0kWKmwM4lZb
 mtRriCm+lPO9Yp/Sflz+jn8S51b/2bcTXiPV4w2Y4ZIun41wwggV7rWPnTCHhu94
 rGEPu/SNSBdpxWGv43BKHSDl4XolsGbyusQKBbKZtftnrpIf0y2OnyEXSV91Vnlh
 KM/XxzacBD4/3r4KCljyEkORWlIIn2+gdZf58sKtxLKvnfCIxjB+BF1e0gOWgmNQ
 e/pVnzbAHO3wuavRlwnrtA+ekBYQiJq7T61yyYI8zpeSoLHmwvPoKSsZP+q4BTvV
 yrcVCbGp3uZlXHD93U3BOfdqS0xBmg==
 =84Q4
 -----END PGP SIGNATURE-----

Merge tag 'fs_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull ext2, isofs, udf, and quota updates from Jan Kara:
 "A lot of material this time:

   - removal of a lot of GFP_NOFS usage from ext2, udf, quota (either it
     was legacy or replaced with scoped memalloc_nofs_*() API)

   - removal of BUG_ONs in quota code

   - conversion of UDF to the new mount API

   - tightening quota on disk format verification

   - fix some potentially unsafe use of RCU pointers in quota code and
     annotate everything properly to make sparse happy

   - a few other small quota, ext2, udf, and isofs fixes"

* tag 'fs_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (26 commits)
  udf: remove SLAB_MEM_SPREAD flag usage
  quota: remove SLAB_MEM_SPREAD flag usage
  isofs: remove SLAB_MEM_SPREAD flag usage
  ext2: remove SLAB_MEM_SPREAD flag usage
  ext2: mark as deprecated
  udf: convert to new mount API
  udf: convert novrs to an option flag
  MAINTAINERS: add missing git address for ext2 entry
  quota: Detect loops in quota tree
  quota: Properly annotate i_dquot arrays with __rcu
  quota: Fix rcu annotations of inode dquot pointers
  isofs: handle CDs with bad root inode but good Joliet root directory
  udf: Avoid invalid LVID used on mount
  quota: Fix potential NULL pointer dereference
  quota: Drop GFP_NOFS instances under dquot->dq_lock and dqio_sem
  quota: Set nofs allocation context when acquiring dqio_sem
  ext2: Remove GFP_NOFS use in ext2_xattr_cache_insert()
  ext2: Drop GFP_NOFS use in ext2_get_blocks()
  ext2: Drop GFP_NOFS allocation from ext2_init_block_alloc_info()
  udf: Remove GFP_NOFS allocation in udf_expand_file_adinicb()
  ...
2024-03-13 14:30:58 -07:00
Linus Torvalds
1715f710e7 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmXx5UkACgkQnJ2qBz9k
 QNmq5gf+Nm5PB2EQKt3xDqfdK8huTyTPH418tHHMcUjVeWIeFviFMXb2FeuJArr4
 TWYjrRzs8aC75SYpPk1LZ6+6OymqYqV+0fxI91BkNnvNpwCInG6h8x6AlG28RLi+
 /vmat7qHTPhJ+iTWGU4W3aDXINdXUq1KcN7+8aNDeKy80eI+UhJaWePNe+IFsovX
 hSDzl6P8FbGqX8s/v52FsUJCXqHHcJYkiyQyUninY0yA/WNPVnzyK+RngP5p216d
 /Kdh11jbduu+xRObn+CTgsASRANqazQi7rddSVTFefUie2s7vUD7wcyzEHTPY5QS
 BEQypvCmOFNPFKmMy+e8iLXtYRgTeg==
 =kQX5
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify updates from Jan Kara:

 - fsnotify optimizations to reduce cost of fsnotify when nobody is
   watching

 - fix longstanding wart that system could not be suspended when some
   process was waiting for response to fanotify permission event

 - some spelling fixes

* tag 'fsnotify_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fanotify: allow freeze when waiting response for permission events
  fanotify: Fix misspelling of "writable"
  fsnotify: Fix misspelling of "writable"
  inotify: Fix misspelling of "writable"
  fsnotify: Add fsnotify_sb_has_watchers() helper
  fsnotify: optimize the case of no parent watcher
2024-03-13 14:27:24 -07:00
Linus Torvalds
babbcc0232 New code for 6.9:
* Online Repair;
   ** New ondisk structures being repaired.
      - Inode's mode field by trying to obtain file type value from the a
        directory entry.
      - Quota counters.
      - Link counts of inodes.
      - FS summary counters.
      - rmap btrees.
        Support for in-memory btrees has been added to support repair of rmap
        btrees.
   ** Misc changes
      - Report corruption of metadata to the health tracking subsystem.
      - Enable indirect health reporting when resources are scarce.
      - Reduce memory usage while reparing refcount btree.
      - Extend "Bmap update" intent item to support atomic extent swapping on
        the realtime device.
      - Extend "Bmap update" intent item to support extended attribute fork and
        unwritten extents.
   ** Code cleanups
      - Bmap log intent.
      - Btree block pointer checking.
      - Btree readahead.
      - Buffer target.
      - Symbolic link code.
   * Remove mrlock wrapper around the rwsem.
   * Convert all the GFP_NOFS flag usages to use the scoped
     memalloc_nofs_save() API instead of direct calls with the GFP_NOFS.
   * Refactor and simplify xfile abstraction. Lower level APIs in
     shmem.c are required to be exported in order to achieve this.
   * Skip checking alignment constraints for inode chunk allocations when block
     size is larger than inode chunk size.
   * Do not submit delwri buffers collected during log recovery when an error
     has been encountered.
   * Fix SEEK_HOLE/DATA for file regions which have active COW extents.
   * Fix lock order inversion when executing error handling path during
     shrinking a filesystem.
   * Remove duplicate ifdefs.
 
 Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZemMkgAKCRAH7y4RirJu
 9ON5AP0Vda6sMn/ZUYoLo9ZUrUvlUb8L0dhEN5JL0XfyWW5ogAD/bH4G6pKSNyTw
 cSEjryuDakirdHLt5g0c+QHd2a/fzw0=
 =ymKk
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Chandan Babu:

 - Online repair updates:
    - More ondisk structures being repaired:
        - Inode's mode field by trying to obtain file type value from
          the a directory entry
        - Quota counters
        - Link counts of inodes
        - FS summary counters
        - Support for in-memory btrees has been added to support repair
          of rmap btrees
    - Misc changes:
        - Report corruption of metadata to the health tracking subsystem
        - Enable indirect health reporting when resources are scarce
        - Reduce memory usage while repairing refcount btree
        - Extend "Bmap update" intent item to support atomic extent
          swapping on the realtime device
        - Extend "Bmap update" intent item to support extended attribute
          fork and unwritten extents
    - Code cleanups:
        - Bmap log intent
        - Btree block pointer checking
        - Btree readahead
        - Buffer target
        - Symbolic link code

 - Remove mrlock wrapper around the rwsem

 - Convert all the GFP_NOFS flag usages to use the scoped
   memalloc_nofs_save() API instead of direct calls with the GFP_NOFS

 - Refactor and simplify xfile abstraction. Lower level APIs in shmem.c
   are required to be exported in order to achieve this

 - Skip checking alignment constraints for inode chunk allocations when
   block size is larger than inode chunk size

 - Do not submit delwri buffers collected during log recovery when an
   error has been encountered

 - Fix SEEK_HOLE/DATA for file regions which have active COW extents

 - Fix lock order inversion when executing error handling path during
   shrinking a filesystem

 - Remove duplicate ifdefs

* tag 'xfs-6.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (183 commits)
  xfs: shrink failure needs to hold AGI buffer
  mm/shmem.c: Use new form of *@param in kernel-doc
  kernel-doc: Add unary operator * to $type_param_ref
  xfs: use kvfree() in xlog_cil_free_logvec()
  xfs: xfs_btree_bload_prep_block() should use __GFP_NOFAIL
  xfs: fix scrub stats file permissions
  xfs: fix log recovery erroring out on refcount recovery failure
  xfs: move symlink target write function to libxfs
  xfs: move remote symlink target read function to libxfs
  xfs: move xfs_symlink_remote.c declarations to xfs_symlink_remote.h
  xfs: xfs_bmap_finish_one should map unwritten extents properly
  xfs: support deferred bmap updates on the attr fork
  xfs: support recovering bmap intent items targetting realtime extents
  xfs: add a realtime flag to the bmap update log redo items
  xfs: add a xattr_entry helper
  xfs: fix xfs_bunmapi to allow unmapping of partial rt extents
  xfs: move xfs_bmap_defer_add to xfs_bmap_item.c
  xfs: reuse xfs_bmap_update_cancel_item
  xfs: add a bi_entry helper
  xfs: remove xfs_trans_set_bmap_flags
  ...
2024-03-13 13:52:24 -07:00
Linus Torvalds
279d44ceb8 24 cifs.ko changesets
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmXwkE4ACgkQiiy9cAdy
 T1HyZQwAhHXADDJ0v4/bEpgdFmfGl1b8vcQU2+zlakv3cpyo0hyHjuhG3YLdQj5k
 NSOZAnkgCghVJ3VpcyR/7xzUJyk3RGdvTeGvjNaqX1aMl1oPpGNZ2A2Z+BlL5GTd
 MP/jXs/se5E4w89rEf1MgBcjDueElZ6A4weZpFRmgEmwgsJl2/RoF690pcUrhNzG
 a7FHRA94qnDfJD81A+5PMW/TmUhS+ks42+1w7AcivC0Mr/mzYO+HfLGpg4zFeWtW
 SaA7BGKRKzUDvbmxLHbeVsSKLwMG8PTdstMiECz1wjsJKux2enPUPAEau1Hmggy4
 XnVDiswzg5a1j9OnYtZIbwOWR8KQepCSgwTw/UInjfB9dI0+FVs0D+cWlJIQGBrg
 0np9coYc4KnUxgQIhB1OtaX773uyGuXa7w7iHF0iAXaQWrZi8Xefq8wsIm8e/XXc
 /gT3oICteRaj51pxwIlW3JPTKM+diDb5NcFo8+JiXbtAdv1ub084ToBJBNFNWfeu
 JVExfGHX
 =+IA7
 -----END PGP SIGNATURE-----

Merge tag '6.9-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client updates from Steve French:

 - fix for folios/netfs data corruption in cifs_extend_writeback

 - additional tracepoint added

 - updates for special files and symlinks: improvements to allow
   selecting use of either WSL or NFS reparse point format on creating
   special files

 - allocation size improvement for cached files

 - minor cleanup patches

 - fix to allow changing the password on remount when password for the
   session is expired.

 - lease key related fixes: caching hardlinked files, deletes of
   deferred close files, and an important fix to better reuse lease keys
   for compound operations, which also can avoid lease break timeouts
   when low on credits

 - fix potential data corruption with write/readdir races

 - compression cleanups and a fix for compression headers

* tag '6.9-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (24 commits)
  cifs: update internal module version number for cifs.ko
  smb: common: simplify compression headers
  smb: common: fix fields sizes in compression_pattern_payload_v1
  smb: client: negotiate compression algorithms
  smb3: add dynamic trace point for ioctls
  cifs: Fix writeback data corruption
  smb: client: return reparse type in /proc/mounts
  smb: client: set correct d_type for reparse DFS/DFSR and mount point
  smb: client: parse uid, gid, mode and dev from WSL reparse points
  smb: client: introduce SMB2_OP_QUERY_WSL_EA
  smb: client: Fix a NULL vs IS_ERR() check in wsl_set_xattrs()
  smb: client: add support for WSL reparse points
  smb: client: reduce number of parameters in smb2_compound_op()
  smb: client: fix potential broken compound request
  smb: client: move most of reparse point handling code to common file
  smb: client: introduce reparse mount option
  smb: client: retry compound request without reusing lease
  smb: client: do not defer close open handles to deleted files
  smb: client: reuse file lease key in compound operations
  smb3: update allocation size more accurately on write completion
  ...
2024-03-13 13:15:24 -07:00
Christian Brauner
9d9539db86 pidfs: remove config option
As Linus suggested this enables pidfs unconditionally. A key property to
retain is the ability to compare pidfds by inode number (cf. [1]).
That's extremely helpful just as comparing namespace file descriptors by
inode number is. They are used in a variety of scenarios where they need
to be compared, e.g., when receiving a pidfd via SO_PEERPIDFD from a
socket to trivially authenticate a the sender and various other
use-cases.

For 64bit systems this is pretty trivial to do. For 32bit it's slightly
more annoying as we discussed but we simply add a dumb ida based
allocator that gets used on 32bit. This gives the same guarantees about
inode numbers on 64bit without any overflow risk. Practically, we'll
never run into overflow issues because we're constrained by the number
of processes that can exist on 32bit and by the number of open files
that can exist on a 32bit system. On 64bit none of this matters and
things are very simple.

If 32bit also needs the uniqueness guarantee they can simply parse the
contents of /proc/<pid>/fd/<nr>. The uniqueness guarantees have a
variety of use-cases. One of the most obvious ones is that they will
make pidfiles (or "pidfdfiles", I guess) reliable as the unique
identifier can be placed into there that won't be reycled. Also a
frequent request.

Note, I took the chance and simplified path_from_stashed() even further.
Instead of passing the inode number explicitly to path_from_stashed() we
let the filesystem handle that internally. So path_from_stashed() ends
up even simpler than it is now. This is also a good solution allowing
the cleanup code to be clean and consistent between 32bit and 64bit. The
cleanup path in prepare_anon_dentry() is also switched around so we put
the inode before the dentry allocation. This means we only have to call
the cleanup handler for the filesystem's inode data once and can rely
->evict_inode() otherwise.

Aside from having to have a bit of extra code for 32bit it actually ends
up a nice cleanup for path_from_stashed() imho.

Tested on both 32 and 64bit including error injection.

Link: https://github.com/systemd/systemd/pull/31713 [1]
Link: https://lore.kernel.org/r/20240312-dingo-sehnlich-b3ecc35c6de7@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-13 12:53:53 -07:00
Linus Torvalds
f88c3fb81c mm, slab: remove last vestiges of SLAB_MEM_SPREAD
Yes, yes, I know the slab people were planning on going slow and letting
every subsystem fight this thing on their own.  But let's just rip off
the band-aid and get it over and done with.  I don't want to see a
number of unnecessary pull requests just to get rid of a flag that no
longer has any meaning.

This was mainly done with a couple of 'sed' scripts and then some manual
cleanup of the end result.

Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-12 20:32:19 -07:00
Linus Torvalds
cc4a875cf3 lsm/stable-6.9 PR 20240312
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmXwt3cUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXObOhAAqldn1nbYS/t1D/k/9ZN/PtSQetK4
 S58D8+gB59Sg0daWFaRhCwwShIbXS/6XzhqaVb3iAPptJs0YDFMbWLAW2d+dd69K
 /7C8diguHbuJdEnCJtFYQIVinavaYVRlyoQcO8uwTz8uvTgXPOhr2P9NcOApJXcR
 xqttuADVo/9Zn0O9/+GUPCH0ROL0SMnuUjwdVP3bpPHj9zEk8F1/A6chzTeSLJru
 Y4+cRrN/r0JTkvRqPdnF9LSvxK7mtAEaHkKGeLQbw0O5pv3r3w0EWMJvq+uonGU2
 WX0eR5VMfevkFMUdw8FKOTa+OZ0HJ2KKIb4sB4wDMgeGyov7Z6SxgvFeQiSyD3aB
 QnyfLDzeEuPfousxUd45dUDnsWNnSgFF+JAdi0LSzm5hMuLeQDozTsFmh0orQcX1
 L5A6VtAbSPP0ffl+tuPi48q3P3LlSjMP0B8W20NXFYhXukKXCgXVMr/dEvpwpu1m
 o1glviGIXeLQQSnX3lMWb7Ds2igmCtXPrqkdu2vpRhMp0od6n4R4jH73Aj5MeSQn
 n3sP73dg5sAaMjtI2NOisMeFUp09MMlOumCCM+AIplPXremm1kwgKRTIp0rKsLW9
 VoQPXa43LQc3hAgPrpGuE+4yBfaBUq7Z8I37IFER/2y4K8b9YkduW4kDh7OdRz+d
 iQ4Nnu2lR/+CCH0=
 =0mTM
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm updates from Paul Moore:

 - Promote IMA/EVM to a proper LSM

   This is the bulk of the diffstat, and the source of all the changes
   in the VFS code. Prior to the start of the LSM stacking work it was
   important that IMA/EVM were separate from the rest of the LSMs,
   complete with their own hooks, infrastructure, etc. as it was the
   only way to enable IMA/EVM at the same time as a LSM.

   However, now that the bulk of the LSM infrastructure supports
   multiple simultaneous LSMs, we can simplify things greatly by
   bringing IMA/EVM into the LSM infrastructure as proper LSMs. This is
   something I've wanted to see happen for quite some time and Roberto
   was kind enough to put in the work to make it happen.

 - Use the LSM hook default values to simplify the call_int_hook() macro

   Previously the call_int_hook() macro required callers to supply a
   default return value, despite a default value being specified when
   the LSM hook was defined.

   This simplifies the macro by using the defined default return value
   which makes life easier for callers and should also reduce the number
   of return value bugs in the future (we've had a few pop up recently,
   hence this work).

 - Use the KMEM_CACHE() macro instead of kmem_cache_create()

   The guidance appears to be to use the KMEM_CACHE() macro when
   possible and there is no reason why we can't use the macro, so let's
   use it.

 - Fix a number of comment typos in the LSM hook comment blocks

   Not much to say here, we fixed some questionable grammar decisions in
   the LSM hook comment blocks.

* tag 'lsm-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (28 commits)
  cred: Use KMEM_CACHE() instead of kmem_cache_create()
  lsm: use default hook return value in call_int_hook()
  lsm: fix typos in security/security.c comment headers
  integrity: Remove LSM
  ima: Make it independent from 'integrity' LSM
  evm: Make it independent from 'integrity' LSM
  evm: Move to LSM infrastructure
  ima: Move IMA-Appraisal to LSM infrastructure
  ima: Move to LSM infrastructure
  integrity: Move integrity_kernel_module_request() to IMA
  security: Introduce key_post_create_or_update hook
  security: Introduce inode_post_remove_acl hook
  security: Introduce inode_post_set_acl hook
  security: Introduce inode_post_create_tmpfile hook
  security: Introduce path_post_mknod hook
  security: Introduce file_release hook
  security: Introduce file_post_open hook
  security: Introduce inode_post_removexattr hook
  security: Introduce inode_post_setattr hook
  security: Align inode_setattr hook definition with EVM
  ...
2024-03-12 20:03:34 -07:00