linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	a8958a1a95	bcachefs: bkey_copy() is no longer a macro Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-05 13:12:18 -05:00
Kent Overstreet	103ffe9aaf	bcachefs: x-macro-ify inode flags enum This lets us use bch2_prt_bitflags to print them out. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-05 13:12:18 -05:00
Kent Overstreet	d4c8bb69d0	bcachefs: Convert bch2_fs_open() to darray Open coded dynamic arrays are deprecated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-05 13:12:17 -05:00
Kent Overstreet	0f0fc31238	bcachefs: Move __bch2_members_v2_get_mut to sb-members.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-05 13:12:17 -05:00
Kent Overstreet	59154f2c66	bcachefs: bch2_prt_datetime() Improved, better named version of pr_time(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-05 13:12:08 -05:00
Kent Overstreet	bf61dcdfc1	bcachefs: CONFIG_BCACHEFS_DEBUG_TRANSACTIONS no longer defaults to y BCACHEFS_DEBUG_TRANSACTIONS is useful, but it's too expensive to have on by default - and it hasn't been coming up in bug reports. Turn it off by default until we figure out a way to make it cheaper. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Kent Overstreet	9fcdd23b6e	bcachefs: Add a comment for BTREE_INSERT_NOJOURNAL usage BTREE_INSERT_NOJOURNAL is primarily used for a performance optimization related to inode updates and fsync - document it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Kent Overstreet	d3c7727bb9	bcachefs: rebalance_work btree is not a snapshots btree rebalance_work entries may refer to entries in the extents btree, which is a snapshots btree, or they may also refer to entries in the reflink btree, which is not. Hence rebalance_work keys may use the snapshot field but it's not required to be nonzero - add a new btree flag to reflect this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Kent Overstreet	01ccee225a	bcachefs: Add missing printk newlines This was causing error messages in -tools to not get printed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Kent Overstreet	5a53f851e6	bcachefs: Fix recovery when forced to use JSET_NO_FLUSH journal entry When we didn't find anything in the journal that we'd like to use, and we're forced to use whatever we can find - that entry will have been a JSET_NO_FLUSH entry with a garbage last_seq value, since it's not normally used. Initialize it to something sane, for bch2_fs_journal_start(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Kent Overstreet	ce3e9a8a10	bcachefs: .get_parent() should return an error pointer Delete the useless check for inum == 0; we'll return -ENOENT without it, which is what we want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Kent Overstreet	4bd156c4b4	bcachefs: Fix bch2_delete_dead_inodes() - the fsck_err() check for the filesystem being clean was incorrect, causing us to always fail to delete unlinked inodes - if a snapshot had been taken, the unlinked inode needs to be propagated to snapshot leaves so the unlink can happen there - fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Brian Foster	7cb2a7895d	bcachefs: use swab40 for bch_backpointer.bucket_offset bitfield The bucket_offset field of bch_backpointer is a 40-bit bitfield, but the bch2_backpointer_swab() helper uses swab32. This leads to inconsistency when an on-disk fs is accessed from an opposite endian machine. As it turns out, we already have an internal swab40() helper that is used from the bch_alloc_v4 swab callback. Lift it into the backpointers header file and use it consistently in both places. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Brian Foster	0996c72a0f	bcachefs: byte order swap bch_alloc_v4.fragmentation_lru field A simple test to populate a filesystem on one CPU architecture and fsck on an arch of the opposite byte order produces errors related to the fragmentation LRU. This occurs because the 64-bit fragmentation_lru field is not byte-order swapped when reads detect that the on-disk/bset key values were written in opposite byte-order of the current CPU. Update the bch2_alloc_v4 swab callback to handle fragmentation_lru as is done for other multi-byte fields. This doesn't affect existing filesystems when accessed by CPUs of the same endianness because the ->swab() callback is only called when the bset flags indicate an endianness mismatch between the CPU and on-disk data. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Brian Foster	2a4e749760	bcachefs: allow writeback to fill bio completely The bcachefs folio writeback code includes a bio full check as well as a fixed size check to determine when to split off and submit writeback I/O. The inclusive check of the latter against the limit means that writeback can submit slightly prematurely. This is not a functional problem, but results in unnecessarily split I/Os and extent merging. This can be observed with a buffered write sized exactly to the current maximum value (1MB) and with key_merging_disabled=1. The latter prevents the merge from the second write such that a subsequent check of the extent list shows a 1020k extent followed by a contiguous 4k extent. The purpose for the fixed size check is also undocumented and somewhat obscure. Lift this check into a new helper that wraps the bio check, fix the comparison logic, and add a comment to document the purpose and how we might improve on this in the future. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Brian Foster	0e91d3a6d5	bcachefs: fix odebug warn and lockdep splat due to on-stack rhashtable Guenter Roeck reports a lockdep splat and DEBUG_OBJECTS_WORK related warning when bch2_copygc_thread() initializes its rhashtable. The lockdep splat relates to a warning print caused by the fact that the rhashtable exists on the stack but is not annotated as so. This is something that could be addressed by INIT_WORK_ONSTACK(), but rhashtable doesn't expose that control and probably isnt worth the churn for just one user. Instead, dynamically allocate the buckets_in_flight structure and avoid the splat that way. Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Brian Foster	e0fb0dccfd	bcachefs: update alloc cursor in early bucket allocator A recent bug report uncovered a scenario where a filesystem never runs with freespace_initialized, and therefore the user observes significantly degraded write performance by virtue of running the early bucket allocator. The associated bug aside, the primary cause of the performance drop in this particular instance is that the early bucket allocator does not update the allocation cursor. This means that every allocation walks the alloc btree from the first bucket of the associated device looking for a bucket marked as free space. Update the early allocator code to set the alloc cursor to the last processed position in the tree, similar to how the freelist allocator behaves. With the alloc_cursor being updated, the retry logic also needs to be updated to restart from the beginning of the device when a free bucket is not available between the cursor and the end of the device. Track the restart position in a first_bucket variable to make the code a bit more easily readable and consistent with the freelist allocator. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Brian Foster	385a82f62a	bcachefs: serialize on cached key in early bucket allocator bcachefs had a transient bug where freespace_initialized was not properly being set, which lead to unexpected use of the early bucket allocator at runtime. This issue has been fixed, but the existence of it uncovered a coherency issue in the early bucket allocation code that is somewhat related to how uncached iterators deal with the key cache. The problem itself manifests as occasional failure of generic/113 due to corruption, often seen as a duplicate backpointer or multiple data types per-bucket error. The immediate cause of the error is a racing bucket allocation along the lines of the following sequence: - Task 1 selects key A in bch2_bucket_alloc_early() and schedules. - Task 2 selects the same key A, but proceeds to complete the allocation and associated I/O, after which it releases the open_bucket. - Task 1 resumes with key A, but does not recognize the bucket is now allocated because the open_bucket has been removed from the hash when it was released in the previous step. This generally shouldn't happen because the allocating task updates the alloc btree key before releasing the bucket. This is not sufficient in this particular instance, however, because an uncached iterator for a cached btree doesn't actually lock the key cache slot when no key exists for a given slot in the cache. Thus the fact that the allocation side updates the cached key means that multiple uncached iters can stumble across the same alloc key and duplicate the bucket allocation as described above. This is something that probably needs a longer term fix in the iterator code. As a short term fix, close the race through explicit use of a cached iterator for likely allocation candidates. We don't want to scan the btree with a cached iterator because that would unnecessarily pollute the cache. This mitigates cache pollution by primarily scanning the tree with an uncached iterator, but closes the race by creating a key cache entry for any prospective slot prior to the bucket allocation attempt (also similar to how _alloc_freelist() works via try_alloc_bucket()). This survives many iterations of generic/113 on a kernel hacked to always use the early bucket allocator. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Kent Overstreet	f82755e4e8	bcachefs: Data move path now uses bch2_trans_unlock_long() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:11 -04:00
Linus Torvalds	aea6bf908d	f2fs update for 6.7-rc1 In this cycle, we introduce a bigger page size support by changing the internal f2fs's block size aligned to the page size. We also continue to improve zoned block device support regarding the power off recovery. As usual, there are some bug fixes regarding the error handling routines in compression and ioctl. Enhancement: - Support Block Size == Page Size - let f2fs_precache_extents() traverses in file range - stop iterating f2fs_map_block if hole exists - preload extent_cache for POSIX_FADV_WILLNEED - compress: fix to avoid fragment w/ OPU during f2fs_ioc_compress_file() Bug fix: - do not return EFSCORRUPTED, but try to run online repair - finish previous checkpoints before returning from remount - fix error handling of __get_node_page and __f2fs_build_free_nids - clean up zones when not successfully unmounted - fix to initialize map.m_pblk in f2fs_precache_extents() - fix to drop meta_inode's page cache in f2fs_put_super() - set the default compress_level on ioctl - fix to avoid use-after-free on dic - fix to avoid redundant compress extension - do sanity check on cluster when CONFIG_F2FS_CHECK_FS is on - fix deadloop in f2fs_write_cache_pages() -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmVGlP4ACgkQQBSofoJI UNKahg/7BuiBi2+NZ93WeUoRURBNa4H1cK93GtyE4rPMdEjVPujxonnqzC/N2nx5 1ZjBLEjV0mCXfqlm+D+ORkZDIeCO+PcPI+vLjjHG+qhphWsf9wUJnWIUf+LhmXDP p0ertHlZrSONG+HsbXl6BOEq/LWYTe6WDOwY0MMO4U7IxHVyKUuFJSL6DQRRL2r2 Op4/NOes/cvj8dXvUZtyF3tQHyhkTbdPl8pfFtmagLlujC2WOuySKlfHUF8pxdv4 RT3TER66Rs8IStrFtyuv6yHHtvpfl8jTuJ1DNaNphBCi2RlERvx7+zY3iyLIZgIZ zKxDkGUIb/UnKmoipiu/ZkUvJ6wi2IfP4C5hffAHBsxwyVlrSldCXioE4j5Ysbcm pOWjgB7ASyGVU6yxyUpoQUlW5oztPwqkjnhfeXu6cgHOLRWdw224hJ6bA4XU3E61 R2rdaNrpGICwl3juFPxH7uHxNjnPTJB1G38LJApAypkoHoa/et5foZtK0pwSxMD+ j8ZkczJvdAgkk6fbFzPA1Urg+6Qhx8SpDeNxCXIS/F7izPZhkvyZ45SBuWboJXld 8rzel9JzswdB2dpeXm0VVofwBg9l8CG5b7mwzmgHpmfD2MjdstLKp9aO5G2SUSJt BKArSUfipzQhljv0xhous0OQitKLhg83o6+KMAXw3OpUlmWGLV4= =U4Ke -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this cycle, we introduce a bigger page size support by changing the internal f2fs's block size aligned to the page size. We also continue to improve zoned block device support regarding the power off recovery. As usual, there are some bug fixes regarding the error handling routines in compression and ioctl. Enhancements: - Support Block Size == Page Size - let f2fs_precache_extents() traverses in file range - stop iterating f2fs_map_block if hole exists - preload extent_cache for POSIX_FADV_WILLNEED - compress: fix to avoid fragment w/ OPU during f2fs_ioc_compress_file() Bug fixes: - do not return EFSCORRUPTED, but try to run online repair - finish previous checkpoints before returning from remount - fix error handling of __get_node_page and __f2fs_build_free_nids - clean up zones when not successfully unmounted - fix to initialize map.m_pblk in f2fs_precache_extents() - fix to drop meta_inode's page cache in f2fs_put_super() - set the default compress_level on ioctl - fix to avoid use-after-free on dic - fix to avoid redundant compress extension - do sanity check on cluster when CONFIG_F2FS_CHECK_FS is on - fix deadloop in f2fs_write_cache_pages()" * tag 'f2fs-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: finish previous checkpoints before returning from remount f2fs: fix error handling of __get_node_page f2fs: do not return EFSCORRUPTED, but try to run online repair f2fs: fix error path of __f2fs_build_free_nids f2fs: Clean up errors in segment.h f2fs: clean up zones when not successfully unmounted f2fs: let f2fs_precache_extents() traverses in file range f2fs: avoid format-overflow warning f2fs: fix to initialize map.m_pblk in f2fs_precache_extents() f2fs: Support Block Size == Page Size f2fs: stop iterating f2fs_map_block if hole exists f2fs: preload extent_cache for POSIX_FADV_WILLNEED f2fs: set the default compress_level on ioctl f2fs: compress: fix to avoid fragment w/ OPU during f2fs_ioc_compress_file() f2fs: fix to drop meta_inode's page cache in f2fs_put_super() f2fs: split initial and dynamic conditions for extent_cache f2fs: compress: fix to avoid redundant compress extension f2fs: compress: do sanity check on cluster when CONFIG_F2FS_CHECK_FS is on f2fs: compress: fix to avoid use-after-free on dic f2fs: compress: fix deadloop in f2fs_write_cache_pages()	2023-11-04 09:26:23 -10:00
Linus Torvalds	c9b93cafb6	Bunch of small fixes: - three W=1 warning fixes: the NULL -> "" replacement isn't trivial but is serialized identically by the protocol layer and has been tested - one syzbot/KCSAN datarace annotation where we don't care about users messing with the fd they passed to mount -t 9p - removing a declaration without implementation - yet another race fix for trans_fd around connection close: the 'err' field is also used in potentially racy calls and this isn't complete, but it's better than what we have. - and finally a theorical memory leak fix on serialization failure -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmVFmAcACgkQq06b7GqY 5nASEg/9Fdtd9B//G+t/gQm0pdZIGoV+KbovWRrO4w3IIfENe8OP3A+8uZhmdij4 p1tYUfyOPjMTxvZlmEArYjyiIkB5SsdYVMiOY6WfFRo3XEyMirdLDXa7beSnFLAs wuk5rAlndGIg+HwzXFD4WpKU5By5r6hnMgYeQ98bwB1bsF5ENZL+0VctFOd1H1zQ zwFChfsv4do3LfA2BssseTaH3PAniqs/X8gvDfslCL45YhitunzrB5C8pCOLpiA6 j6+7iDBeh7H/S+xTG2h+NBdPh+emalex1GHckZHNaTyEWNGcnJnOG8qPy+ufNgyX G1Sxr80dX7oAXD74LqPfoG4IsgXsB/0sLnjFNxbfpcmGoka/GTWzV0O+9LauhN9X Nu3OnbOBM+VZR3VA7EpqCEbf5CAduQklWameZHGqqH8vr3wGJSPXs3eEpTnXnXmK yIUdFWvTL61Il/RSk1CFjlsxXh42tjNqlJobiOru3+zhmwFw3yffFnq3Mxn1Y+Mi 4jet/E1VD7NB6nlTqRuaMWbWBC037pMgXfFe4BGbAynKLPbnBD8fjDKkbxoU66Qw ud5UabfuikR1ah5te8l1F956ij88HgKFuufQ6vqc98be3EqcIkXSe0/+fgXMcU55 hPMmfLAP4g5p5VEi6kbOZdVAWqaE6EO1x6EzrQ+VCEg5hHJx0Ew= =ZD9A -----END PGP SIGNATURE----- Merge tag '9p-for-6.7-rc1' of https://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: A bunch of small fixes: - three W=1 warning fixes: the NULL -> "" replacement isn't trivial but is serialized identically by the protocol layer and has been tested - one syzbot/KCSAN datarace annotation where we don't care about users messing with the fd they passed to mount -t 9p - removing a declaration without implementation - yet another race fix for trans_fd around connection close: the 'err' field is also used in potentially racy calls and this isn't complete, but it's better than what we had - and finally a theorical memory leak fix on serialization failure" * tag '9p-for-6.7-rc1' of https://github.com/martinetd/linux: 9p/net: fix possible memory leak in p9_check_errors() 9p/fs: add MODULE_DESCRIPTION 9p/net: xen: fix false positive printf format overflow warning 9p: v9fs_listxattr: fix %s null argument warning 9p/trans_fd: Annotate data-racy writes to file::f_flags fs/9p: Remove unused function declaration v9fs_inode2stat() 9p/trans_fd: avoid sending req to a cancelled conn	2023-11-04 09:20:04 -10:00
Linus Torvalds	766e9cf3bd	15 cifs client fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmVFoW8ACgkQiiy9cAdy T1HFtQwAvvJH7wsCuN/QYCMPbXJ150fgmgMs+FJeq/gMRJaRjeFLpR9NfO6DAYbA NraRyfoinxvBkftXVQkDUKn0xSdxz1JEZodq08+kvEZyIQp5z9Q5NTsSF4loyMlg jtRM6UtkiFmsaRrbLL+EguZiQkhV8/SyGtojxn5jozubXUZS1+llMZk56MbOxIdY RRGjndlxRxtJWP/xCHLwnGF7wRWqeiQhNPml1DoEVo2LOZK7OTqZP5cckwusj0YR nAaSffn3U2v5tVPOpHMZIZbAcRIawRTjv6CT+hMH5/9k1JOy4vc8Y92uiO7DLx2O uoI4fOs9hX3yWvJCe9QExvOjvmtWaAY8uhI8dGEo2mWvRQqdPBGcrVXpUqbVY1r0 ffb6N+CJ5VZmbgIi57+3FUU2YuJnIxyWj0uQCJr7Q6E9sZiHx2IZ2Y3Yck5yy6Bk ZsZLQcB5xQiosCjt2xR9UmXv/bUCH3TwCY/TUPARkf+QQCo58DQokczy0UtEffzB rLxedNQz =zu9T -----END PGP SIGNATURE----- Merge tag '6.7-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client updates from Steve French: - use after free fixes and deadlock fix - symlink timestamp fix - hashing perf improvement - multichannel fixes - minor debugging improvements - fix creating fifos when using "sfu" mounts - NTLMSSP authentication improvement - minor fixes to include some missing create flags and structures from recently updated protocol documentation * tag '6.7-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: cifs: force interface update before a fresh session setup cifs: do not reset chan_max if multichannel is not supported at mount cifs: reconnect helper should set reconnect for the right channel smb: client: fix use-after-free in smb2_query_info_compound() smb: client: remove extra @chan_count check in __cifs_put_smb_ses() cifs: add xid to query server interface call cifs: print server capabilities in DebugData smb: use crypto_shash_digest() in symlink_hash() smb: client: fix use-after-free bug in cifs_debug_data_proc_show() smb: client: fix potential deadlock when releasing mids smb3: fix creating FIFOs when mounting with "sfu" mount option Add definition for new smb3.1.1 command type SMB3: clarify some of the unused CreateOption flags cifs: Add client version details to NTLM authenticate message smb3: fix touch -h of symlink	2023-11-04 09:13:50 -10:00
Linus Torvalds	4c975a43fa	EFI update for v6.7 - implement uid/gid mount options for efivarfs -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZUV51gAKCRAwbglWLn0t XBQVAP9a2PAeevQ9gA29rI+2caC9tpgcNPoiAsFiod8jrIymcwEAtdZAp98T8Wsc egjnvwNjzd2nTvrL1aZXKl4Id8jn2Qo= =VM67 -----END PGP SIGNATURE----- Merge tag 'efi-next-for-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI update from Ard Biesheuvel: "This is the only remaining EFI change, as everything else was taken via -tip this cycle: - implement uid/gid mount options for efivarfs" * tag 'efi-next-for-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efivarfs: Add uid/gid mount options	2023-11-04 08:54:20 -10:00
Kent Overstreet	c4accde498	bcachefs: Ensure srcu lock is not held too long The SRCU read lock that btree_trans takes exists to make it safe for bch2_trans_relock() to deref pointers to btree nodes/key cache items we don't have locked, but as a side effect it blocks reclaim from freeing those items. Thus, it's important to not hold it for too long: we need to differentiate between bch2_trans_unlock() calls that will be only for a short duration, and ones that will be for an unbounded duration. This introduces bch2_trans_unlock_long(), to be used mainly by the data move paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 14:17:11 -04:00
Kent Overstreet	6dfa10ab22	bcachefs: Fix build errors with gcc 10 gcc 10 seems to complain about array bounds in situations where gcc 11 does not - curious. This unfortunately requires adding some casts for now; we may investigate getting rid of our __u64 _data[] VLA in a future patch so that our start[0] members can be VLAs. Reported-by: John Stoffel <john@stoffel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 14:17:11 -04:00
Kent Overstreet	4db8ac8629	bcachefs: Fix MEAN_AND_VARIANCE kconfig options Fixes: https://lore.kernel.org/linux-bcachefs/CAMuHMdXpwMdLuoWsNGa8qacT_5Wv-vSTz0xoBR5n_fnD9cNOuQ@mail.gmail.com/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 14:17:11 -04:00
Kent Overstreet	1f7056b735	bcachefs: Ensure copygc does not spin If copygc does no work - finds no fragmented buckets - wait for a bit of IO to happen. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 14:17:11 -04:00
Linus Torvalds	b06f58ad8e	Driver core changes for 6.7-rc1 Here is the set of driver core updates for 6.7-rc1. Nothing major in here at all, just a small number of changes including: - minor cleanups and updates from Andy Shevchenko - __counted_by addition - firmware_loader update for aborting loads cleaner - other minor changes, details in the shortlog - documentation update All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZUTe8A8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynP0QCfT2jQx3OcL22MoqCvdTuZJKPiHSIAoMxrliJF d4cUeICW17ywlTFzsKg8 =nTeu -----END PGP SIGNATURE----- Merge tag 'driver-core-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of driver core updates for 6.7-rc1. Nothing major in here at all, just a small number of changes including: - minor cleanups and updates from Andy Shevchenko - __counted_by addition - firmware_loader update for aborting loads cleaner - other minor changes, details in the shortlog - documentation update All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (21 commits) firmware_loader: Abort all upcoming firmware load request once reboot triggered firmware_loader: Refactor kill_pending_fw_fallback_reqs() Documentation: security-bugs.rst: linux-distros relaxed their rules driver core: Release all resources during unbind before updating device links driver core: class: remove boilerplate code driver core: platform: Annotate struct irq_affinity_devres with __counted_by resource: Constify resource crosscheck APIs resource: Unify next_resource() and next_resource_skip_children() resource: Reuse for_each_resource() macro PCI: Implement custom llseek for sysfs resource entries kernfs: sysfs: support custom llseek method for sysfs entries debugfs: Fix __rcu type comparison warning device property: Replace custom implementation of COUNT_ARGS() drivers: base: test: Make property entry API test modular driver core: Add missing parameter description to __fwnode_link_add() device property: Clarify usage scope of some struct fwnode_handle members devres: rename the first parameter of devm_add_action(_or_reset) driver core: platform: Unify the firmware node type check driver core: platform: Use temporary variable in platform_device_add() driver core: platform: Refactor error path in a couple places ...	2023-11-03 15:15:47 -10:00
Christian Brauner	56d2e2cfa2	ceph: allow idmapped mounts Now that we converted cephfs internally to account for idmapped mounts allow the creation of idmapped mounts on by setting the FS_ALLOW_IDMAP flag. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:34 +01:00
Christian Brauner	8a051b40ab	ceph: allow idmapped atomic_open inode op Enable ceph_atomic_open() to handle idmapped mounts. This is just a matter of passing down the mount's idmapping. [ aleksandr.mikhalitsyn: adapted to `5fadbd9929` ("ceph: rely on vfs for setgid stripping") ] Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:34 +01:00
Christian Brauner	2cce72dda2	ceph: allow idmapped set_acl inode op Enable ceph_set_acl() to handle idmapped mounts. This is just a matter of passing down the mount's idmapping. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:34 +01:00
Christian Brauner	a04aff2588	ceph: allow idmapped setattr inode op Enable __ceph_setattr() to handle idmapped mounts. This is just a matter of passing down the mount's idmapping. [ aleksandr.mikhalitsyn: adapted to `b27c82e129` ("attr: port attribute changes to new types") ] Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:34 +01:00
Alexander Mikhalitsyn	79c66a0c8c	ceph: pass idmap to __ceph_setattr Just pass down the mount's idmapping to __ceph_setattr, because we will need it later. Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:34 +01:00
Christian Brauner	8995375fae	ceph: allow idmapped permission inode op Enable ceph_permission() to handle idmapped mounts. This is just a matter of passing down the mount's idmapping. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:34 +01:00
Christian Brauner	0513043ec4	ceph: allow idmapped getattr inode op Enable ceph_getattr() to handle idmapped mounts. This is just a matter of passing down the mount's idmapping. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:34 +01:00
Christian Brauner	09838f1bfd	ceph: pass an idmapping to mknod/symlink/mkdir Enable mknod/symlink/mkdir iops to handle idmapped mounts. This is just a matter of passing down the mount's idmapping. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:34 +01:00
Alexander Mikhalitsyn	673478b6e5	ceph: add enable_unsafe_idmap module parameter This parameter is used to decide if we allow to perform IO on idmapped mount in case when MDS lacks support of CEPHFS_FEATURE_HAS_OWNER_UIDGID feature. In this case we can't properly handle MDS permission checks and if UID/GID-based restrictions are enabled on the MDS side then IO requests which go through an idmapped mount may fail with -EACCESS/-EPERM. Fortunately, for most of users it's not a case and everything should work fine. But we put work "unsafe" in the module parameter name to warn users about possible problems with this feature and encourage update of cephfs MDS. Suggested-by: Stéphane Graber <stgraber@ubuntu.com> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:33 +01:00
Christian Brauner	5ccd8530dd	ceph: handle idmapped mounts in create_request_message() Inode operations that create a new filesystem object such as ->mknod, ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. Instead the caller's fs{g,u}id is used for the {g,u}id of the new filesystem object. In order to ensure that the correct {g,u}id is used map the caller's fs{g,u}id for creation requests. This doesn't require complex changes. It suffices to pass in the relevant idmapping recorded in the request message. If this request message was triggered from an inode operation that creates filesystem objects it will have passed down the relevant idmaping. If this is a request message that was triggered from an inode operation that doens't need to take idmappings into account the initial idmapping is passed down which is an identity mapping. This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID which adds two new fields (owner_{u,g}id) to the request head structure. So, we need to ensure that MDS supports it otherwise we need to fail any IO that comes through an idmapped mount because we can't process it in a proper way. MDS server without such an extension will use caller_{u,g}id fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id values are unmapped. At the same time we can't map these fields with an idmapping as it can break UID/GID-based permission checks logic on the MDS side. This problem was described with a lot of details at [1], [2]. [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ Link: https://github.com/ceph/ceph/pull/52575 Link: https://tracker.ceph.com/issues/62217 Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:33 +01:00
Christian Brauner	9c2df2271c	ceph: stash idmapping in mdsc request When sending a mds request cephfs will send relevant data for the requested operation. For creation requests the caller's fs{g,u}id is used to set the ownership of the newly created filesystem object. For setattr requests the caller can pass in arbitrary {g,u}id values to which the relevant filesystem object is supposed to be changed. If the caller is performing the relevant operation via an idmapped mount cephfs simply needs to take the idmapping into account when it sends the relevant mds request. In order to support idmapped mounts for cephfs we stash the idmapping whenever they are relevant for the operation for the duration of the request. Since mds requests can be queued and performed asynchronously we make sure to keep the idmapping around and release it once the request has finished. In follow-up patches we will use this to send correct ownership information over the wire. This patch just adds the basic infrastructure to keep the idmapping around. The actual conversion patches are all fairly minimal. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:33 +01:00
Alexander Mikhalitsyn	1b90344614	fs: export mnt_idmap_get/mnt_idmap_put These helpers are required to support idmapped mounts in CephFS. Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:33 +01:00
Xiubo Li	522dc5108f	libceph, ceph: move mdsmap.h to fs/ceph The mdsmap.h is only used by CephFS, so move it to fs/ceph. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:33 +01:00
Xiubo Li	38d46409c4	ceph: print cluster fsid and client global_id in all debug logs Multiple CephFS mounts on a host is increasingly common so disambiguating messages like this is necessary and will make it easier to debug issues. At the same this will improve the debug logs to make them easier to troubleshooting issues, such as print the ino# instead only printing the memory addresses of the corresponding inodes and print the dentry names instead of the corresponding memory addresses for the dentry,etc. Link: https://tracker.ceph.com/issues/61590 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:33 +01:00
Xiubo Li	5995d90d2d	ceph: rename _to_client() to _to_fs_client() We need to covert the inode to ceph_client in the following commit, and will add one new helper for that, here we rename the old helper to _fs_client(). Link: https://tracker.ceph.com/issues/61590 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:33 +01:00
Xiubo Li	197b7d792d	ceph: pass the mdsc to several helpers We will use the 'mdsc' to get the global_id in the following commits. Link: https://tracker.ceph.com/issues/61590 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-11-03 23:28:33 +01:00
Linus Torvalds	31e5f934ff	Tracing updates for v6.7: - Remove eventfs_file descriptor This is the biggest change, and the second part of making eventfs create its files dynamically. In 6.6 the first part was added, and that maintained a one to one mapping between eventfs meta descriptors and the directories and file inodes and dentries that were dynamically created. The directories were represented by a eventfs_inode and the files were represented by a eventfs_file. In v6.7 the eventfs_file is removed. As all events have the same directory make up (sched_switch has an "enable", "id", "format", etc files), the handing of what files are underneath each leaf eventfs directory is moved back to the tracing subsystem via a callback. When a event is added to the eventfs, it registers an array of evenfs_entry's. These hold the names of the files and the callbacks to call when the file is referenced. The callback gets the name so that the same callback may be used by multiple files. The callback then supplies the filesystem_operations structure needed to create this file. This has brought the memory footprint of creating multiple eventfs instances down by 2 megs each! - User events now has persistent events that are not associated to a single processes. These are privileged events that hang around even if no process is attached to them. - Clean up of seq_buf. There's talk about using seq_buf more to replace strscpy() and friends. But this also requires some minor modifications of seq_buf to be able to do this. - Expand instance ring buffers individually Currently if boot up creates an instance, and a trace event is enabled on that instance, the ring buffer for that instance and the top level ring buffer are expanded (1.4 MB per CPU). This wastes memory as this happens when nothing is using the top level instance. - Other minor clean ups and fixes. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZUMrBBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6quzVAQCed/kPM7X9j2QZamJVDruMf2CmVxpu /TOvKvSKV584GgEAxLntf5VKx1Q98bc68y3Zkg+OCi8jSgORos1ROmURhws= =iIgb -----END PGP SIGNATURE----- Merge tag 'trace-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Remove eventfs_file descriptor This is the biggest change, and the second part of making eventfs create its files dynamically. In 6.6 the first part was added, and that maintained a one to one mapping between eventfs meta descriptors and the directories and file inodes and dentries that were dynamically created. The directories were represented by a eventfs_inode and the files were represented by a eventfs_file. In v6.7 the eventfs_file is removed. As all events have the same directory make up (sched_switch has an "enable", "id", "format", etc files), the handing of what files are underneath each leaf eventfs directory is moved back to the tracing subsystem via a callback. When an event is added to the eventfs, it registers an array of evenfs_entry's. These hold the names of the files and the callbacks to call when the file is referenced. The callback gets the name so that the same callback may be used by multiple files. The callback then supplies the filesystem_operations structure needed to create this file. This has brought the memory footprint of creating multiple eventfs instances down by 2 megs each! - User events now has persistent events that are not associated to a single processes. These are privileged events that hang around even if no process is attached to them - Clean up of seq_buf There's talk about using seq_buf more to replace strscpy() and friends. But this also requires some minor modifications of seq_buf to be able to do this - Expand instance ring buffers individually Currently if boot up creates an instance, and a trace event is enabled on that instance, the ring buffer for that instance and the top level ring buffer are expanded (1.4 MB per CPU). This wastes memory as this happens when nothing is using the top level instance - Other minor clean ups and fixes * tag 'trace-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (34 commits) seq_buf: Export seq_buf_puts() seq_buf: Export seq_buf_putc() eventfs: Use simple_recursive_removal() to clean up dentries eventfs: Remove special processing of dput() of events directory eventfs: Delete eventfs_inode when the last dentry is freed eventfs: Hold eventfs_mutex when calling callback functions eventfs: Save ownership and mode eventfs: Test for ei->is_freed when accessing ei->dentry eventfs: Have a free_ei() that just frees the eventfs_inode eventfs: Remove "is_freed" union with rcu head eventfs: Fix kerneldoc of eventfs_remove_rec() tracing: Have the user copy of synthetic event address use correct context eventfs: Remove extra dget() in eventfs_create_events_dir() tracing: Have trace_event_file have ref counters seq_buf: Introduce DECLARE_SEQ_BUF and seq_buf_str() eventfs: Fix typo in eventfs_inode union comment eventfs: Fix WARN_ON() in create_file_dentry() powerpc: Remove initialisation of readpos tracing/histograms: Simplify last_cmd_set() seq_buf: fix a misleading comment ...	2023-11-03 07:41:18 -10:00
Yuezhang Mo	1373ca10ec	exfat: fix ctime is not updated Commit `4c72a36edd` ("exfat: convert to new timestamp accessors") removed attr_copy() from exfat_set_attr(). It causes xfstests generic/221 to fail. In xfstests generic/221, it tests ctime should be updated even if futimens() update atime only. But in this case, ctime will not be updated if attr_copy() is removed. attr_copy() may also update other attributes, and removing it may cause other bugs, so this commit restores to call attr_copy() in exfat_set_attr(). Fixes: `4c72a36edd` ("exfat: convert to new timestamp accessors") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2023-11-03 22:24:11 +09:00
Yuezhang Mo	fc12a722e6	exfat: fix setting uninitialized time to ctime/atime An uninitialized time is set to ctime/atime in __exfat_write_inode(). It causes xfstests generic/003 and generic/192 to fail. And since there will be a time gap between setting ctime/atime to the inode and writing back the inode, so ctime/atime should not be set again when writing back the inode. Fixes: `4c72a36edd` ("exfat: convert to new timestamp accessors") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2023-11-03 22:24:07 +09:00
Linus Torvalds	8f6f76a6a2	As usual, lots of singleton and doubleton patches all over the tree and there's little I can say which isn't in the individual changelogs. The lengthier patch series are - "kdump: use generic functions to simplify crashkernel reservation in arch", from Baoquan He. This is mainly cleanups and consolidation of the "crashkernel=" kernel parameter handling. - After much discussion, David Laight's "minmax: Relax type checks in min() and max()" is here. Hopefully reduces some typecasting and the use of min_t() and max_t(). - A group of patches from Oleg Nesterov which clean up and slightly fix our handling of reads from /proc/PID/task/... and which remove task_struct.therad_group. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZUQP9wAKCRDdBJ7gKXxA jmOAAQDh8sxagQYocoVsSm28ICqXFeaY9Co1jzBIDdNesAvYVwD/c2DHRqJHEiS4 63BNcG3+hM9nwGJHb5lyh5m79nBMRg0= =On4u -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "As usual, lots of singleton and doubleton patches all over the tree and there's little I can say which isn't in the individual changelogs. The lengthier patch series are - 'kdump: use generic functions to simplify crashkernel reservation in arch', from Baoquan He. This is mainly cleanups and consolidation of the 'crashkernel=' kernel parameter handling - After much discussion, David Laight's 'minmax: Relax type checks in min() and max()' is here. Hopefully reduces some typecasting and the use of min_t() and max_t() - A group of patches from Oleg Nesterov which clean up and slightly fix our handling of reads from /proc/PID/task/... and which remove task_struct.thread_group" * tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits) scripts/gdb/vmalloc: disable on no-MMU scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n .mailmap: add address mapping for Tomeu Vizoso mailmap: update email address for Claudiu Beznea tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions .mailmap: map Benjamin Poirier's address scripts/gdb: add lx_current support for riscv ocfs2: fix a spelling typo in comment proc: test ProtectionKey in proc-empty-vm test proc: fix proc-empty-vm test with vsyscall fs/proc/base.c: remove unneeded semicolon do_io_accounting: use sig->stats_lock do_io_accounting: use __for_each_thread() ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error() ocfs2: fix a typo in a comment scripts/show_delta: add __main__ judgement before main code treewide: mark stuff as __ro_after_init fs: ocfs2: check status values proc: test /proc/${pid}/statm compiler.h: move __is_constexpr() to compiler.h ...	2023-11-02 20:53:31 -10:00
Linus Torvalds	ecae0bd517	Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series "Fixes and cleanups to compaction". - Joel Fernandes has a patchset ("Optimize mremap during mutual alignment within PMD") which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested. - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series "Do not try to access unaccepted memory" Adrian Hunter provides some fixups for the recently-added "unaccepted memory' feature. To increase the feature's checking coverage. "Plug a few gaps where RAM is exposed without checking if it is unaccepted memory". - In the series "cleanups for lockless slab shrink" Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code. - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series "use refcount+RCU method to implement lockless slab shrink". - David Hildenbrand contributes some maintenance work for the rmap code in the series "Anon rmap cleanups". - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series "mm: migrate: more folio conversion and unification". - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series "Add and use bdev_getblk()". - In the series "Use nth_page() in place of direct struct page manipulation" Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames. - In the series "mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO" has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use. - Matthew Wilcox has sent the series "Small hugetlb cleanups" - code rationalization and folio conversions in the hugetlb code. - Yin Fengwei has improved mlock()'s handling of large folios in the series "support large folio for mlock" - In the series "Expose swapcache stat for memcg v1" Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2. - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named "MDWE without inheritance". - Kefeng Wang has provided the series "mm: convert numa balancing functions to use a folio" which does what it says. - In the series "mm/ksm: add fork-exec support for prctl" Stefan Roesch makes is possible for a process to propagate KSM treatment across exec(). - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use "high bandwidth memory" in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named "memory tiering: calculate abstract distance based on ACPI HMAT" - In the series "Smart scanning mode for KSM" Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans. - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series "mm: memcg: fix tracking of pending stats updates values". - In the series "Implement IOCTL to get and optionally clear info about PTEs" Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU. - Hugh Dickins contributed the series "shmem,tmpfs: general maintenance" - a bunch of relatively minor maintenance tweaks to this code. - Matthew Wilcox has increased the use of the VMA lock over file-backed page faults in the series "Handle more faults under the VMA lock". Some rationalizations of the fault path became possible as a result. - In the series "mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()" David Hildenbrand has implemented some cleanups and folio conversions. - In the series "various improvements to the GUP interface" Lorenzo Stoakes has simplified and improved the GUP interface with an eye to providing groundwork for future improvements. - Andrey Konovalov has sent along the series "kasan: assorted fixes and improvements" which does those things. - Some page allocator maintenance work from Kemeng Shi in the series "Two minor cleanups to break_down_buddy_pages". - In thes series "New selftest for mm" Breno Leitao has developed another MM self test which tickles a race we had between madvise() and page faults. - In the series "Add folio_end_read" Matthew Wilcox provides cleanups and an optimization to the core pagecache code. - Nhat Pham has added memcg accounting for hugetlb memory in the series "hugetlb memcg accounting". - Cleanups and rationalizations to the pagemap code from Lorenzo Stoakes, in the series "Abstract vma_merge() and split_vma()". - Audra Mitchell has fixed issues in the procfs page_owner code's new timestamping feature which was causing some misbehaviours. In the series "Fix page_owner's use of free timestamps". - Lorenzo Stoakes has fixed the handling of new mappings of sealed files in the series "permit write-sealed memfd read-only shared mappings". - Mike Kravetz has optimized the hugetlb vmemmap optimization in the series "Batch hugetlb vmemmap modification operations". - Some buffer_head folio conversions and cleanups from Matthew Wilcox in the series "Finish the create_empty_buffers() transition". - As a page allocator performance optimization Huang Ying has added automatic tuning to the allocator's per-cpu-pages feature, in the series "mm: PCP high auto-tuning". - Roman Gushchin has contributed the patchset "mm: improve performance of accounted kernel memory allocations" which improves their performance by ~30% as measured by a micro-benchmark. - folio conversions from Kefeng Wang in the series "mm: convert page cpupid functions to folios". - Some kmemleak fixups in Liu Shixin's series "Some bugfix about kmemleak". - Qi Zheng has improved our handling of memoryless nodes by keeping them off the allocation fallback list. This is done in the series "handle memoryless nodes more appropriately". - khugepaged conversions from Vishal Moola in the series "Some khugepaged folio conversions". -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZULEMwAKCRDdBJ7gKXxA jhQHAQCYpD3g849x69DmHnHWHm/EHQLvQmRMDeYZI+nx/sCJOwEAw4AKg0Oemv9y FgeUPAD1oasg6CP+INZvCj34waNxwAc= =E+Y4 -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series 'Fixes and cleanups to compaction' - Joel Fernandes has a patchset ('Optimize mremap during mutual alignment within PMD') which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series 'Do not try to access unaccepted memory' Adrian Hunter provides some fixups for the recently-added 'unaccepted memory' feature. To increase the feature's checking coverage. 'Plug a few gaps where RAM is exposed without checking if it is unaccepted memory' - In the series 'cleanups for lockless slab shrink' Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series 'use refcount+RCU method to implement lockless slab shrink' - David Hildenbrand contributes some maintenance work for the rmap code in the series 'Anon rmap cleanups' - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series 'mm: migrate: more folio conversion and unification' - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series 'Add and use bdev_getblk()' - In the series 'Use nth_page() in place of direct struct page manipulation' Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames - In the series 'mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO' has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code rationalization and folio conversions in the hugetlb code - Yin Fengwei has improved mlock()'s handling of large folios in the series 'support large folio for mlock' - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named 'MDWE without inheritance' - Kefeng Wang has provided the series 'mm: convert numa balancing functions to use a folio' which does what it says - In the series 'mm/ksm: add fork-exec support for prctl' Stefan Roesch makes is possible for a process to propagate KSM treatment across exec() - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use 'high bandwidth memory' in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named 'memory tiering: calculate abstract distance based on ACPI HMAT' - In the series 'Smart scanning mode for KSM' Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series 'mm: memcg: fix tracking of pending stats updates values' - In the series 'Implement IOCTL to get and optionally clear info about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU - Hugh Dickins contributed the series 'shmem,tmpfs: general maintenance', a bunch of relatively minor maintenance tweaks to this code - Matthew Wilcox has increased the use of the VMA lock over file-backed page faults in the series 'Handle more faults under the VMA lock'. Some rationalizations of the fault path became possible as a result - In the series 'mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()' David Hildenbrand has implemented some cleanups and folio conversions - In the series 'various improvements to the GUP interface' Lorenzo Stoakes has simplified and improved the GUP interface with an eye to providing groundwork for future improvements - Andrey Konovalov has sent along the series 'kasan: assorted fixes and improvements' which does those things - Some page allocator maintenance work from Kemeng Shi in the series 'Two minor cleanups to break_down_buddy_pages' - In thes series 'New selftest for mm' Breno Leitao has developed another MM self test which tickles a race we had between madvise() and page faults - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups and an optimization to the core pagecache code - Nhat Pham has added memcg accounting for hugetlb memory in the series 'hugetlb memcg accounting' - Cleanups and rationalizations to the pagemap code from Lorenzo Stoakes, in the series 'Abstract vma_merge() and split_vma()' - Audra Mitchell has fixed issues in the procfs page_owner code's new timestamping feature which was causing some misbehaviours. In the series 'Fix page_owner's use of free timestamps' - Lorenzo Stoakes has fixed the handling of new mappings of sealed files in the series 'permit write-sealed memfd read-only shared mappings' - Mike Kravetz has optimized the hugetlb vmemmap optimization in the series 'Batch hugetlb vmemmap modification operations' - Some buffer_head folio conversions and cleanups from Matthew Wilcox in the series 'Finish the create_empty_buffers() transition' - As a page allocator performance optimization Huang Ying has added automatic tuning to the allocator's per-cpu-pages feature, in the series 'mm: PCP high auto-tuning' - Roman Gushchin has contributed the patchset 'mm: improve performance of accounted kernel memory allocations' which improves their performance by ~30% as measured by a micro-benchmark - folio conversions from Kefeng Wang in the series 'mm: convert page cpupid functions to folios' - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about kmemleak' - Qi Zheng has improved our handling of memoryless nodes by keeping them off the allocation fallback list. This is done in the series 'handle memoryless nodes more appropriately' - khugepaged conversions from Vishal Moola in the series 'Some khugepaged folio conversions'" [ bcachefs conflicts with the dynamically allocated shrinkers have been resolved as per Stephen Rothwell in https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/ with help from Qi Zheng. The clone3 test filtering conflict was half-arsed by yours truly ] * tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits) mm/damon/sysfs: update monitoring target regions for online input commit mm/damon/sysfs: remove requested targets when online-commit inputs selftests: add a sanity check for zswap Documentation: maple_tree: fix word spelling error mm/vmalloc: fix the unchecked dereference warning in vread_iter() zswap: export compression failure stats Documentation: ubsan: drop "the" from article title mempolicy: migration attempt to match interleave nodes mempolicy: mmap_lock is not needed while migrating folios mempolicy: alloc_pages_mpol() for NUMA policy without vma mm: add page_rmappable_folio() wrapper mempolicy: remove confusing MPOL_MF_LAZY dead code mempolicy: mpol_shared_policy_init() without pseudo-vma mempolicy trivia: use pgoff_t in shared mempolicy tree mempolicy trivia: slightly more consistent naming mempolicy trivia: delete those ancient pr_debug()s mempolicy: fix migrate_pages(2) syscall return nr_failed kernfs: drop shared NUMA mempolicy hooks hugetlbfs: drop shared NUMA mempolicy pretence mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets() ...	2023-11-02 19:38:47 -10:00
Linus Torvalds	bc3012f4e3	This update includes the following changes: API: - Add virtual-address based lskcipher interface. - Optimise ahash/shash performance in light of costly indirect calls. - Remove ahash alignmask attribute. Algorithms: - Improve AES/XTS performance of 6-way unrolling for ppc. - Remove some uses of obsolete algorithms (md4, md5, sha1). - Add FIPS 202 SHA-3 support in pkcs1pad. - Add fast path for single-page messages in adiantum. - Remove zlib-deflate. Drivers: - Add support for S4 in meson RNG driver. - Add STM32MP13x support in stm32. - Add hwrng interface support in qcom-rng. - Add support for deflate algorithm in hisilicon/zip. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmVB3vgACgkQxycdCkmx i6dsOBAAykbnX8BpnpnOXYywE9ZWrl98rAk51MK0N9olZNfg78zRPIv7fFxFdC20 SDJrDSNPmn0Qvaa5e0EfoAdklsm0k2GkXL/BwPKMKWUsyIoJVYI3WrBMnjBy9xMp yfME+h0bKoXJCZKnYkIUSGUejmUPSyRlEylrXoFlH/VWYwAaii/x9zwreQoF+0LR KI24A1q8AYs6Dw9HSfndaAub9GOzrqKYs6fSaMG+77Y4UC5aoi5J9Bp2G3uVyHay x/0bZtIxKXS9wn+LeG/3GspX23x/I5VwBOdAoMigrYmAIaIg5qgyMszudltTAs4R zF1Kh7WsnM5+vpnBSeigzo+/GGOU3QTz8y3tBTg+3ZR7GWGOwQLiizhOYqCyOfAH pIm6c++sZw/OOHiL69Nt4HeLKzGNYYWk3s4X/B/6cqoouPfOsfBaQobZNx9zfy7q ZNEvSVBjrFX/L6wDSotny1LTWLUNjHbmLaMV5uQZ/SQKEtv19fp2Dl7SsLkHH+3v ldOAwfoJR6QcSwz3Ez02TUAvQhtP172Hnxi7u44eiZu2aUboLhCFr7aEU6kVdBCx 1rIRVHD1oqlOEDRwPRXzhF3I8R4QDORJIxZ6UUhg7yueuI+XCGDsBNC+LqBrBmSR IbdjqmSDUBhJyM5yMnt1VFYhqKQ/ZzwZ3JQviwW76Es9pwEIolM= =IZmR -----END PGP SIGNATURE----- Merge tag 'v6.7-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add virtual-address based lskcipher interface - Optimise ahash/shash performance in light of costly indirect calls - Remove ahash alignmask attribute Algorithms: - Improve AES/XTS performance of 6-way unrolling for ppc - Remove some uses of obsolete algorithms (md4, md5, sha1) - Add FIPS 202 SHA-3 support in pkcs1pad - Add fast path for single-page messages in adiantum - Remove zlib-deflate Drivers: - Add support for S4 in meson RNG driver - Add STM32MP13x support in stm32 - Add hwrng interface support in qcom-rng - Add support for deflate algorithm in hisilicon/zip" * tag 'v6.7-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (283 commits) crypto: adiantum - flush destination page before unmapping crypto: testmgr - move pkcs1pad(rsa,sha3-*) to correct place Documentation/module-signing.txt: bring up to date module: enable automatic module signing with FIPS 202 SHA-3 crypto: asymmetric_keys - allow FIPS 202 SHA-3 signatures crypto: rsa-pkcs1pad - Add FIPS 202 SHA-3 support crypto: FIPS 202 SHA-3 register in hash info for IMA x509: Add OIDs for FIPS 202 SHA-3 hash and signatures crypto: ahash - optimize performance when wrapping shash crypto: ahash - check for shash type instead of not ahash type crypto: hash - move "ahash wrapping shash" functions to ahash.c crypto: talitos - stop using crypto_ahash::init crypto: chelsio - stop using crypto_ahash::init crypto: ahash - improve file comment crypto: ahash - remove struct ahash_request_priv crypto: ahash - remove crypto_ahash_alignmask crypto: gcm - stop using alignmask of ahash crypto: chacha20poly1305 - stop using alignmask of ahash crypto: ccm - stop using alignmask of ahash net: ipv6: stop checking crypto_ahash_alignmask ...	2023-11-02 16:15:30 -10:00
Andreas Gruenbacher	92099f0c92	gfs2: Add metapath_dibh helper Add a metapath_dibh() helper for extracting the inode's buffer head from a metapath. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-02 20:10:00 +01:00
Andreas Gruenbacher	1f7b0a84c8	gfs2: Clean up quota.c:print_message Function print_message() in quota.c doesn't return a meaningful return value. Turn it into a void function and stop abusing it for setting variable error to 0 in gfs2_quota_check(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-02 20:10:00 +01:00
Andreas Gruenbacher	f7e4c610cb	gfs2: Clean up gfs2_alloc_parms initializers When intializing a struct, all fields that are not explicitly mentioned are zeroed out already. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-02 20:10:00 +01:00
Andreas Gruenbacher	703df114f9	gfs2: Two quota=account mode fixes Make sure we don't skip accounting for quota changes with the quota=account mount option. Reviewed-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-02 20:09:38 +01:00
Linus Torvalds	4652b8e4f3	7 ksmbd server fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmVDA3MACgkQiiy9cAdy T1GCoQv/UcMfvtHMCy3qU5LQQH/b2TVKay+0Ki6HFNIui5519YZzqidOT5PEnsiA wV0EFi1Xw6CvNu6j6rSqlFH9HG+ROk/j1CxiO4OCOuXFtV+0/mVHHR5TAfiHq6bE zHKoPULXnNB8vl6X9f1iRBL5IPU3iDUqdV7n1ujPwKMjf9XeoNROQW6Rya/JTTZK Yv/ykUW+OhOn8iL9cQH5nqjhV2NVbC+oRk+dQb/2wyiWmeU+7j5+kCCAmHIEnD/Z ffiU4IN2+amEk+Jlic3tsgkOUGFk5IZOWW4N6YEpKuyA3TFfbfKD2tlii2VtAiG3 bNVSZLDKDhn0lsAn9f2ye1xsRg3STO7sYW8zgiLok4CrQq3HL4nODUutrHy7+oLA r7ZDTzOQmDlc3f5vVBliDluFvL9wTzeSjaGTGikVj74C1+P0slnqHho3jzXvxH+V FTRdYMPj26Yl67fQEG65SvXYR/V3q1h+cLd+bBcNbGpSqgNqu5Ni3rKLSOAVHE7Q 408EbRC1 =gYgr -----END PGP SIGNATURE----- Merge tag '6.7-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd Pull smb server updates from Steve French: "Seven ksmbd server fixes: - logoff improvement for multichannel bound connections - unicode fix for surrogate pairs - RDMA (smbdirect) fix for IB devices - fix locking deadlock in kern_path_create during rename - iov memory allocation fix - two minor cleanup patches (doc cleanup, and unused variable)" * tag '6.7-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: no need to wait for binded connection termination at logoff ksmbd: add support for surrogate pair conversion ksmbd: fix missing RDMA-capable flag for IPoIB device in ksmbd_rdma_capable_netdev() ksmbd: fix recursive locking in vfs helpers ksmbd: fix kernel-doc comment of ksmbd_vfs_setxattr() ksmbd: reorganize ksmbd_iov_pin_rsp() ksmbd: Remove unused field in ksmbd_user struct	2023-11-02 08:32:07 -10:00
Linus Torvalds	71fb7b320b	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmVCKKsACgkQnJ2qBz9k QNmaYQgAwKEtBSO6RdSqP1Sg65a472HKOfdRZuT1PfATueLu3qOEqaPuJZzA9Ed6 ypph8YYf91n53XUdDJPDq8zWUO2SMIwQsMdL83uIPQQ3PGvWckl9AGnGs6OSR2gd RPDKvKkg7+JrMWMawKc31fjPK8dpwBne1mA3pqj2loWCaQ400XIMYjt+08PMa9iN YEM77dOmt86g23LMZ1pl+37rSYMxLqn2NJdFSAcljGGXzNqJO+ngkJX0O8+hYgRy qPoY+F5bWy7EGmAo/aAcr5Yr8RRbq/1wr42j7nVe3gF+2l9zj97r84sIWhx2Umil ZZdNrCok8oi13ZthSmqX9f3Wk0+X6g== =VhdK -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify update from Jan Kara: "This time just one tiny cleanup for fsnotify" * tag 'fsnotify_for_v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: delete useless parenthesis in FANOTIFY_INLINE_FH macro	2023-11-02 08:27:04 -10:00
Linus Torvalds	5efad0a765	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmVCKSwACgkQnJ2qBz9k QNn6jwgAjQOrUJkJvnqindNNAGvkwE5zph+MWJolbQ7gu3Im5jYLECD4huER3m6q j+tj9ya49iHuIJLEBSctb3NvnhwGgNkupeM9klHGWDtyJwiV5FiV1OYEusxjiq5G tuWCImEn/OoLckfdP4umKqNDMftDVoTUYMViUoS9KBNXVVz0Y4Ex0KwSZFSM47j8 yRlSCRe6CK4lNyiAxLsm8kSzPge/s+mwz7P7z3i/4IgCkvw3fnMBEulCSbMZjhHD WrdCmYsgOJtqj+/hSSKPIJ5W92fFjLY1yaxocsVIzRGHQXriBznix7OS4iRLXtzk 9eBAzvVlu/HXBZCF+ZRIUMG+2P17Dg== =0k3R -----END PGP SIGNATURE----- Merge tag 'fs_for_v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, udf, and quota updates from Jan Kara: - conversion of ext2 directory code to use folios - cleanups in UDF declarations - bugfix for quota interaction with file encryption * tag 'fs_for_v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Convert ext2_prepare_chunk and ext2_commit_chunk to folios ext2: Convert ext2_make_empty() to use a folio ext2: Convert ext2_unlink() and ext2_rename() to use folios ext2: Convert ext2_delete_entry() to use folios ext2: Convert ext2_empty_dir() to use a folio ext2: Convert ext2_add_link() to use a folio ext2: Convert ext2_readdir to use a folio ext2: Add ext2_get_folio() ext2: Convert ext2_check_page to ext2_check_folio highmem: Add folio_release_kmap() udf: Avoid unneeded variable length array in struct fileIdentDesc udf: Annotate struct udf_bitmap with __counted_by quota: explicitly forbid quota files from being encrypted	2023-11-02 08:19:51 -10:00
Linus Torvalds	e9806ff8a0	Minor stability improvements -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmVCUHIACgkQNqiEXrVA jGRHKxAAo3L21+6JRql8h5kHAYMDTY0U2/moi0+l+u9kOp/v8eI2HOsx6QB7O2sA GD3+sgJuIhkr0xj6VVjT1Q882XyH7STIVr5aKkQmyQeaCKhXIJQ4R97baktHgNLt y1mRVuJqnIFBQkCBW4FPzYFjGjduRB49OsKzOltGxBruB2e8Ps4wfoEQNaZI/xI2 l08RaNMeQkt54oWsKcZs0rm5Zu6aGmG1KIXATZI7d8jJvNwZo57LHaO5xzZMWQSO IlWRIKjo6XarfDnnGsMqxtfdIqnykLRZHrvXy3WcrC7p4+QidUSqogkg3u08x1Gh svACGP9yBu8CZZ+kVxDNSMvyQrIoUPWLj6LCT+stb6pMzk0Ug6ZBQbODX0Lhcosa h65xR1sd62DgLkZQ05qj6y8D34sGVf7Du4hXm1y0eyLjDiPlxLvnid0vOxm3rOE0 lLQ4oqXDisJUFxziTsDvR20fvFl4kT96pR6e8rlJxj2flTCfsmAON7xZAEFmLnII EyLcMsfGI03CE4uX1jY6U2SVM7weZKAKAt8QrcTs+J/Yy2wfE0L/PigUenC21RRq 1jCqluKaxd42/SFqYxGLxBQo/lvAxMpF1rla8idj/LYrr5Brf1LZW87KJqZ1O7uN 73ZpnKJz4OXR01JyJA9TuIRdpfBKEau/HrMHHEXSYrjYjWgxKnI= =ia2n -----END PGP SIGNATURE----- Merge tag 'jfs-6.7' of https://github.com/kleikamp/linux-shaggy Pull jfs updates from Dave Kleikamp: "Minor stability improvements" * tag 'jfs-6.7' of https://github.com/kleikamp/linux-shaggy: jfs: define xtree root and page independently jfs: fix array-index-out-of-bounds in diAlloc jfs: fix array-index-out-of-bounds in dbFindLeaf fs/jfs: Add validity check for db_maxag and db_agpref fs/jfs: Add check for negative db_l2nbperpage	2023-11-02 08:08:28 -10:00
Linus Torvalds	dc737f11c2	Description for this pull request: - Add ioctls to get and set file attribute that used in fatattr util. - Add zero_size_dir mount option not to allocate a cluster when creating directory. -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmVB3PcWHGxpbmtpbmpl b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCPN6D/0eK9tVBSP4eZ7IKZ6OAjmjm+ED f4yrer/e44hRYw2kMu3zDjMOVyqsOOdzFy+Tu/dunEgWKyLR2LmPjkuIlyi0YZ8l bkkcO0wud6F4mp+p5w5V8Myyac1uzafTu9ChDqh61eNPI4lGRKaqI1apC1JzSDAM zPO2j3DZ9yd+bPjMkbFDFkzf8LMN2oqvV9rQLudLsIQICJoVfNwXY8wgioa1f00R YvRk8S2OFhpI7N8SXo/hvpgXySTI2tzfVQBvSigU7NZCObFx5SdvcRLmo6jtcMd3 /wykxae8xR+BE9x1eUiOz5fRsD4+7ebLPPxsqOxiF+S9Vlr6zgEelumrwP1cuVKo C/bqrjjiN2HAqJGLx/xE9F3KW4cQYzdhwl1p+06koVrTubTQE1Kp0idIcD+4yrMe 5/+R9n5T7LTdwGU8e70qD/65LMSj3r5ocZJvZHkXjWqe46k0sbti0E6KqCCn/9UV K7C7B6iW0082B2YDG8W1X5949UtwEHXyNuvn8a4rjwBX7ahcsZYlNJ9ucoW7gYms uqLXSFJVzJcp748xbCbqwqgS+u7gGyRxBT6ldruYEM/HnWRD16kdpCUNufcxpkKN +Z8pbMlxb5btm9WqduyMi1h1z0zPKsrJ//QS1kkUhjJEoMavF2k78iAN3IFpw/VJ 9v0Y7SfIik4v4kTMzw== =xztJ -----END PGP SIGNATURE----- Merge tag 'exfat-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Add ioctls to get and set file attribute that is used in the fatattr util - Add zero_size_dir mount option to avoid allocating a cluster when creating a directory * tag 'exfat-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: support create zero-size directory exfat: support handle zero-size directory exfat: add ioctls for accessing attributes	2023-11-02 08:00:53 -10:00
Linus Torvalds	87a201b43b	Changes since last update: - Fix inode metadata space layout documentation; - Avoid warning MicroLZMA format anymore; - Fix erofs_insert_workgroup() lockref usage; - Some cleanups. -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmVBMVoRHHhpYW5nQGtl cm5lbC5vcmcACgkQUXZn5Zlu5qq8iA/+NdvaKPFvexKuJkW2n9t4UaDEPqS6qobk B1uIaXxDhz6RdhDyhLA7ysQQi1Z8g0t/MXgCWZGM1NXJ995Y0VE+dFlT+a5QPLiu 4rz8bnZpdUHVF0jIyoru/nPZQWXsYBh2AeYLKPi+nvQoCRtoRsvdKXxv76iEF45o zE2548TXqogL3Q04oH040fRzVwYROqnF85q9uCPNFS/Sh/swxyob9waBijlzbECS GLjyJ+ZsQBBG95SzwNEPSFek5+od9ty2UKQwDJfqABOhbQT2Li6r2t0afyrlqDQY m7VvvmzkFe5uYwAkuMkAOxhnW+eBzxz/whk+8W3dw/lm4dDbe0LuCrGXVdivdf3F dO5ywSEGJnOH1FqNF1ubAWdzJ+qoaTupuOpwrGM9/lHGgxCIVjW06XzKfOv/AfaQ VR2SPTrcA6eulXsLGopVho9dtHBUQZIAmJjQcWIyCpYm15NmElDXsybN/Os2Hgo0 d2Tsx9fgsN25xRsIBPxiV6qsEXqx56Nce1U4k0VV4ali81Y22nqq6rV4AFWz8KNJ LhYFcL5YgO5RxYEpoQGkt4hyeDkxt4lc/cYqCr8E8kpNEuz0P3O27NcLnSP7CJVQ pBYfSoV5tm+SyK4toeacG+HJTnohNyLE4vIXN96awoCBfJiickY7ZxJNuf6wRbOA 4Y+aMq/lQXQ= =Yum+ -----END PGP SIGNATURE----- Merge tag 'erofs-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "Nothing exciting lands for this cycle, since we're still busying in developing support for sub-page blocks and large-folios of compressed data for new scenarios on Android. In this cycle, MicroLZMA format is marked as stable, and there are minor cleanups around documentation and codebase. In addition, it also fixes incorrect lockref usage in erofs_insert_workgroup(). Summary: - Fix inode metadata space layout documentation - Avoid warning for MicroLZMA format anymore - Fix erofs_insert_workgroup() lockref usage - Some cleanups" * tag 'erofs-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix erofs_insert_workgroup() lockref usage erofs: tidy up redundant includes erofs: get rid of ROOT_NID() erofs: simplify compression configuration parser erofs: don't warn MicroLZMA format anymore erofs: fix inode metadata space layout description in documentation	2023-11-02 07:53:57 -10:00
Linus Torvalds	57aff99745	Cleanup ext4's multi-block allocator, including adding some unit tests, as well as cleaning how we update the backup superblock after online resizes or updating the label or uuid. Optimize handling of released data blocks in ext4's commit machinery to avoid a potential lock contention on s_md_lock spinlock. Fix a number of ext4 bugs: - fix race between writepages and remount - fix racy may inline data check in dio write - add missed brelse in an error path in update_backups - fix umask handling when ACL support is disabled - fix lost EIO error when a journal commit races with a fsync of the blockdev - fix potential improper i_size when there is a crash right after an O_SYNC direct write. - check extent node for validity before potentially using what might be an invalid pointer - fix potential stale data exposure when writing to an unwritten extent and the file system is nearly out of space - fix potential accounting error around block reservations when writing partial delayed allocation writes to a bigalloc cluster - avoid memory allocation failure when tracking partial delayed allocation writes to a bigalloc cluster - fix various debugging print messages -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmVBtjsACgkQ8vlZVpUN gaNynQf+M2hnDsf7bR+szh1j9hmfuGoDfSRwIpCtgwZtrjCD3gIVbxBi9i1N99JK gc7fyIDaYFOqNb2nLqS3pYtVnD0gd8Da+oV5XphUoEWCjbRP5rBIZssmyaXrgijw 6UtYf3dZ0MM/NkQRBuj7szcG8tFLA1vGRbSHsu3DW6Sv6R3uDbnLEww0bmPDiXhf SpoJqF/IYXKYJefVZ67MvZvNHgZRjklVVZVgobXQb8JUAvo9OvxGe4FfgaxkoTxv MOxweNF70iH0OASN03JAptZCxJFZOsMAFvS0fYDk1NH+Z6CLK3tzCOTaZ1R+BDLq QzdvyETuEJuMT2T02UXoZDoyPNzaGw== =JTtz -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Cleanup ext4's multi-block allocator, including adding some unit tests, as well as cleaning how we update the backup superblock after online resizes or updating the label or uuid. Optimize handling of released data blocks in ext4's commit machinery to avoid a potential lock contention on s_md_lock spinlock. Fix a number of ext4 bugs: - fix race between writepages and remount - fix racy may inline data check in dio write - add missed brelse in an error path in update_backups - fix umask handling when ACL support is disabled - fix lost EIO error when a journal commit races with a fsync of the blockdev - fix potential improper i_size when there is a crash right after an O_SYNC direct write. - check extent node for validity before potentially using what might be an invalid pointer - fix potential stale data exposure when writing to an unwritten extent and the file system is nearly out of space - fix potential accounting error around block reservations when writing partial delayed allocation writes to a bigalloc cluster - avoid memory allocation failure when tracking partial delayed allocation writes to a bigalloc cluster - fix various debugging print messages" * tag 'ext4_for_linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (41 commits) ext4: properly sync file size update after O_SYNC direct IO ext4: fix racy may inline data check in dio write ext4: run mballoc test with different layouts setting ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc ext4: add some kunit stub for mballoc kunit test ext4: call ext4_mb_mark_context in ext4_group_add_blocks() ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks() ext4: call ext4_mb_mark_context in ext4_mb_clear_bb ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb() ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used ext4: extend ext4_mb_mark_context to support allocation under journal ext4: call ext4_mb_mark_context in ext4_free_blocks_simple ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb ext4: make state in ext4_mb_mark_bb to be bool jbd2: fix potential data lost in recovering journal raced with synchronizing fs bdev ext4: apply umask if ACL support is disabled ext4: mark buffer new if it is unwritten to avoid stale data exposure ext4: move 'ix' sanity check to corrent position jbd2: fix printk format type for 'io_block' in do_one_pass() jbd2: print io_block if check data block checksum failed when do recovery ...	2023-11-02 07:45:14 -10:00
Linus Torvalds	91a683cdf6	dlm for 6.7 This set of patches has some minor fixes for message handling, some misc cleanups, and updates the maintainers entry. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEcGkeEvkvjdvlR90nOBtzx/yAaaoFAmVBETQACgkQOBtzx/yA aapeiRAAwveqN80kjxc5lHIuLUZsvMCw8UZlr7nQhB4XDVLQHsfX+rtzPVR3rpDQ XTKhM53787XvAx+Qn8wrBGB6i0comB28Dsz91wPSXNrW/nXhbsTLR8XP4D7g0/h3 /wxJQF/u4IQr/4gtpvafWGB8w1j9iljRKxY90BN+CT1tB0mYgbfGD+RmPwblcm+T dlyT+MO83wkTkRwg71PSBrL/81h3IVKpiZiDCJjsLMwjFQT74NEPj/JTdsQptWy5 lwSqTm8Ye575bKxZfkZpfcoaMZHz2a6p2AZE0wNg1o0PZ0z4o64V1i5iBioymJL0 oF4flq4/bRQdN/J1jUzRZZOm4QReqBsvlCiPdmq2s8++YMhzE9U4Yl+kiQY3ek6n ie7dnPC0XpKQWgc8RNeD4kAK6Qmrus/zIdzzLORGV1f9SINzeH+TsbTPc/h6rSvk rbqNJLBlRnX4j9OYzp+kCjcVsrd53q87cb8KoQa/9ekGFyp1svcSKj8XLQLRHTUA iN0DZczCu11MrUDg9uyyDJ0cOxXn2/ptswdZeF7SlFAdGmJ0G/L+1aUx1nP/Kgnf DaOWRVngjBT9M/Z/aoDDLZx94eqALq/gqevTP8FrA4TopUb67Friy7vZtVAOhSGR cVJn0iuzV31sGu+d7O5u0mrnqWcDbHRgv9otGI5HYa5vywgsEvE= =PWCW -----END PGP SIGNATURE----- Merge tag 'dlm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set of patches has some minor fixes for message handling, some misc cleanups, and updates the maintainers entry" * tag 'dlm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: MAINTAINERS: Update dlm maintainer and web page dlm: slow down filling up processing queue dlm: fix no ack after final message dlm: be sure we reset all nodes at forced shutdown dlm: fix remove member after close call dlm: fix creating multiple node structures fs: dlm: Remove some useless memset() fs: dlm: Fix the size of a buffer in dlm_create_debug_file() fs: dlm: Simplify buffer size computation in dlm_create_debug_file()	2023-11-02 07:40:07 -10:00
Linus Torvalds	ca219be012	integrity-v6.7 -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCZUDyWhQcem9oYXJAbGlu dXguaWJtLmNvbQAKCRDLwZzRsCrn5QtIAPwLSdHw2qix1A6lMhbRiXqFOWINHcTF DMtZkiPmpeuTKAEA0KaXfddKq5OC5S/ixPEEZCVqOq2ixxfMDhudyoh/qQs= =lh3g -----END PGP SIGNATURE----- Merge tag 'integrity-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "Four integrity changes: two IMA-overlay updates, an integrity Kconfig cleanup, and a secondary keyring update" * tag 'integrity-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: detect changes to the backing overlay file certs: Only allow certs signed by keys on the builtin keyring integrity: fix indentation of config attributes ima: annotate iint mutex to avoid lockdep false positive warnings	2023-11-02 06:53:22 -10:00
Shyam Prasad N	d9a6d78096	cifs: force interface update before a fresh session setup During a session reconnect, it is possible that the server moved to another physical server (happens in case of Azure files). So at this time, force a query of server interfaces again (in case of multichannel session), such that the secondary channels connect to the right IP addresses (possibly updated now). Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-02 08:06:06 -05:00
Shyam Prasad N	6e5e64c947	cifs: do not reset chan_max if multichannel is not supported at mount If the mount command has specified multichannel as a mount option, but multichannel is found to be unsupported by the server at the time of mount, we set chan_max to 1. Which means that the user needs to remount the share if the server starts supporting multichannel. This change removes this reset. What it means is that if the user specified multichannel or max_channels during mount, and at this time, multichannel is not supported, but the server starts supporting it at a later point, the client will be capable of scaling out the number of channels. Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-02 08:06:06 -05:00
Shyam Prasad N	c3326a61cd	cifs: reconnect helper should set reconnect for the right channel We introduced a helper function to be used by non-cifsd threads to mark the connection for reconnect. For multichannel, when only a particular channel needs to be reconnected, this had a bug. This change fixes that by marking that particular channel for reconnect. Fixes: `dca65818c8` ("cifs: use a different reconnect helper for non-cifsd threads") Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-02 08:06:06 -05:00
Paulo Alcantara	5c86919455	smb: client: fix use-after-free in smb2_query_info_compound() The following UAF was triggered when running fstests generic/072 with KASAN enabled against Windows Server 2022 and mount options 'multichannel,max_channels=2,vers=3.1.1,mfsymlinks,noperm' BUG: KASAN: slab-use-after-free in smb2_query_info_compound+0x423/0x6d0 [cifs] Read of size 8 at addr ffff888014941048 by task xfs_io/27534 CPU: 0 PID: 27534 Comm: xfs_io Not tainted 6.6.0-rc7 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 Call Trace: dump_stack_lvl+0x4a/0x80 print_report+0xcf/0x650 ? srso_alias_return_thunk+0x5/0x7f ? srso_alias_return_thunk+0x5/0x7f ? __phys_addr+0x46/0x90 kasan_report+0xda/0x110 ? smb2_query_info_compound+0x423/0x6d0 [cifs] ? smb2_query_info_compound+0x423/0x6d0 [cifs] smb2_query_info_compound+0x423/0x6d0 [cifs] ? __pfx_smb2_query_info_compound+0x10/0x10 [cifs] ? srso_alias_return_thunk+0x5/0x7f ? __stack_depot_save+0x39/0x480 ? kasan_save_stack+0x33/0x60 ? kasan_set_track+0x25/0x30 ? ____kasan_slab_free+0x126/0x170 smb2_queryfs+0xc2/0x2c0 [cifs] ? __pfx_smb2_queryfs+0x10/0x10 [cifs] ? __pfx___lock_acquire+0x10/0x10 smb311_queryfs+0x210/0x220 [cifs] ? __pfx_smb311_queryfs+0x10/0x10 [cifs] ? srso_alias_return_thunk+0x5/0x7f ? __lock_acquire+0x480/0x26c0 ? lock_release+0x1ed/0x640 ? srso_alias_return_thunk+0x5/0x7f ? do_raw_spin_unlock+0x9b/0x100 cifs_statfs+0x18c/0x4b0 [cifs] statfs_by_dentry+0x9b/0xf0 fd_statfs+0x4e/0xb0 __do_sys_fstatfs+0x7f/0xe0 ? __pfx___do_sys_fstatfs+0x10/0x10 ? srso_alias_return_thunk+0x5/0x7f ? lockdep_hardirqs_on_prepare+0x136/0x200 ? srso_alias_return_thunk+0x5/0x7f do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Allocated by task 27534: kasan_save_stack+0x33/0x60 kasan_set_track+0x25/0x30 __kasan_kmalloc+0x8f/0xa0 open_cached_dir+0x71b/0x1240 [cifs] smb2_query_info_compound+0x5c3/0x6d0 [cifs] smb2_queryfs+0xc2/0x2c0 [cifs] smb311_queryfs+0x210/0x220 [cifs] cifs_statfs+0x18c/0x4b0 [cifs] statfs_by_dentry+0x9b/0xf0 fd_statfs+0x4e/0xb0 __do_sys_fstatfs+0x7f/0xe0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Freed by task 27534: kasan_save_stack+0x33/0x60 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2b/0x50 ____kasan_slab_free+0x126/0x170 slab_free_freelist_hook+0xd0/0x1e0 __kmem_cache_free+0x9d/0x1b0 open_cached_dir+0xff5/0x1240 [cifs] smb2_query_info_compound+0x5c3/0x6d0 [cifs] smb2_queryfs+0xc2/0x2c0 [cifs] This is a race between open_cached_dir() and cached_dir_lease_break() where the cache entry for the open directory handle receives a lease break while creating it. And before returning from open_cached_dir(), we put the last reference of the new @cfid because of !@cfid->has_lease. Besides the UAF, while running xfstests a lot of missed lease breaks have been noticed in tests that run several concurrent statfs(2) calls on those cached fids CIFS: VFS: \\w22-root1.gandalf.test No task to wake, unknown frame... CIFS: VFS: \\w22-root1.gandalf.test Cmd: 18 Err: 0x0 Flags: 0x1... CIFS: VFS: \\w22-root1.gandalf.test smb buf 00000000715bfe83 len 108 CIFS: VFS: Dump pending requests: CIFS: VFS: \\w22-root1.gandalf.test No task to wake, unknown frame... CIFS: VFS: \\w22-root1.gandalf.test Cmd: 18 Err: 0x0 Flags: 0x1... CIFS: VFS: \\w22-root1.gandalf.test smb buf 000000005aa7316e len 108 ... To fix both, in open_cached_dir() ensure that @cfid->has_lease is set right before sending out compounded request so that any potential lease break will be get processed by demultiplex thread while we're still caching @cfid. And, if open failed for some reason, re-check @cfid->has_lease to decide whether or not put lease reference. Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-02 08:06:06 -05:00
Paulo Alcantara	c37ed2d7d0	smb: client: remove extra @chan_count check in __cifs_put_smb_ses() If @ses->chan_count <= 1, then for-loop body will not be executed so no need to check it twice. Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-02 08:05:45 -05:00
Linus Torvalds	426ee5196d	sysctl-6.7-rc1 To help make the move of sysctls out of kernel/sysctl.c not incur a size penalty sysctl has been changed to allow us to not require the sentinel, the final empty element on the sysctl array. Joel Granados has been doing all this work. On the v6.6 kernel we got the major infrastructure changes required to support this. For v6.7-rc1 we have all arch/ and drivers/ modified to remove the sentinel. Both arch and driver changes have been on linux-next for a bit less than a month. It is worth re-iterating the value: - this helps reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array - the extra 64-byte penalty is no longer inncurred now when we move sysctls out from kernel/sysctl.c to their own files For v6.8-rc1 expect removal of all the sentinels and also then the unneeded check for procname == NULL. The last 2 patches are fixes recently merged by Krister Johansen which allow us again to use softlockup_panic early on boot. This used to work but the alias work broke it. This is useful for folks who want to detect softlockups super early rather than wait and spend money on cloud solutions with nothing but an eventual hung kernel. Although this hadn't gone through linux-next it's also a stable fix, so we might as well roll through the fixes now. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmVCqKsSHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoinEgYQAIpkqRL85DBwems19Uk9A27lkctwZ6Fc HdslQCObQTsbuKVimZFP4IL2beUfUE0cfLZCXlzp+4nRDOf6vyhyf3w19jPQtI0Q YdqwTk9y6G5VjDsb35QK0+UBloY/kZ1H3/LW4uCwjXTuksUGmWW2Qvey35696Scv hDMLADqKQmdpYxLUaNi9QyYbEAjYtOai2ezg3+i7hTG168t1k/Ab2BxIFrPVsCR2 FAiq05L4ugWjNskdsWBjck05JZsx9SK/qcAxpIPoUm4nGiFNHApXE0E0hs3vsnmn WIHIbxCQw8ZlUDlmw4S+0YH3NFFzFbWfmW8k2b0f2qZTJm/rU4KiJfcJVknkAUVF raFox6XDW0AUQ9L/NOUJ9ip5rup57GcFrMYocdJ3PPAvvmHKOb1D1O741p75RRcc 9j7zwfIRrzjPUqzhsQS/GFjdJu3lJNmEBK1AcgrVry6WoItrAzJHKPPDC7TwaNmD eXpjxMl1sYzzHqtVh4hn+xkUYphj/6gTGMV8zdo+/FopFswgeJW9G8kHtlEWKDPk MRIKwACmfetP6f3ngHunBg+BOipbjCANL7JI0nOhVOQoaULxCCPx+IPJ6GfSyiuH AbcjH8DGI7fJbUkBFoF0dsRFZ2gH8ds1PYMbWUJ6x3FtuCuv5iIuvQYoaWU6itm7 6f0KvCogg0fU =Qf50 -----END PGP SIGNATURE----- Merge tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "To help make the move of sysctls out of kernel/sysctl.c not incur a size penalty sysctl has been changed to allow us to not require the sentinel, the final empty element on the sysctl array. Joel Granados has been doing all this work. On the v6.6 kernel we got the major infrastructure changes required to support this. For v6.7-rc1 we have all arch/ and drivers/ modified to remove the sentinel. Both arch and driver changes have been on linux-next for a bit less than a month. It is worth re-iterating the value: - this helps reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array - the extra 64-byte penalty is no longer inncurred now when we move sysctls out from kernel/sysctl.c to their own files For v6.8-rc1 expect removal of all the sentinels and also then the unneeded check for procname == NULL. The last two patches are fixes recently merged by Krister Johansen which allow us again to use softlockup_panic early on boot. This used to work but the alias work broke it. This is useful for folks who want to detect softlockups super early rather than wait and spend money on cloud solutions with nothing but an eventual hung kernel. Although this hadn't gone through linux-next it's also a stable fix, so we might as well roll through the fixes now" * tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (23 commits) watchdog: move softlockup_panic back to early_param proc: sysctl: prevent aliased sysctls from getting passed to init intel drm: Remove now superfluous sentinel element from ctl_table array Drivers: hv: Remove now superfluous sentinel element from ctl_table array raid: Remove now superfluous sentinel element from ctl_table array fw loader: Remove the now superfluous sentinel element from ctl_table array sgi-xp: Remove the now superfluous sentinel element from ctl_table array vrf: Remove the now superfluous sentinel element from ctl_table array char-misc: Remove the now superfluous sentinel element from ctl_table array infiniband: Remove the now superfluous sentinel element from ctl_table array macintosh: Remove the now superfluous sentinel element from ctl_table array parport: Remove the now superfluous sentinel element from ctl_table array scsi: Remove now superfluous sentinel element from ctl_table array tty: Remove now superfluous sentinel element from ctl_table array xen: Remove now superfluous sentinel element from ctl_table array hpet: Remove now superfluous sentinel element from ctl_table array c-sky: Remove now superfluous sentinel element from ctl_talbe array powerpc: Remove now superfluous sentinel element from ctl_table arrays riscv: Remove now superfluous sentinel element from ctl_table array x86/vdso: Remove now superfluous sentinel element from ctl_table array ...	2023-11-01 20:51:41 -10:00
Steven Rostedt (Google)	407c6726ca	eventfs: Use simple_recursive_removal() to clean up dentries Looking at how dentry is removed via the tracefs system, I found that eventfs does not do everything that it did under tracefs. The tracefs removal of a dentry calls simple_recursive_removal() that does a lot more than a simple d_invalidate(). As it should be a requirement that any eventfs_inode that has a dentry, so does its parent. When removing a eventfs_inode, if it has a dentry, a call to simple_recursive_removal() on that dentry should clean up all the dentries underneath it. Add WARN_ON_ONCE() to check for the parent having a dentry if any children do. Link: https://lore.kernel.org/all/20231101022553.GE1957730@ZenIV/ Link: https://lkml.kernel.org/r/20231101172650.552471568@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: `5bdcd5f533` ("eventfs: Implement removal of meta data from eventfs") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-02 00:18:36 -04:00
Steven Rostedt (Google)	62d65cac11	eventfs: Remove special processing of dput() of events directory The top level events directory is no longer special with regards to how it should be delete. Remove the extra processing for it in eventfs_set_ei_status_free(). Link: https://lkml.kernel.org/r/20231101172650.340876747@goodmis.org Cc: Ajay Kaher <akaher@vmware.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-02 00:18:27 -04:00
Steven Rostedt (Google)	020010fbfa	eventfs: Delete eventfs_inode when the last dentry is freed There exists a race between holding a reference of an eventfs_inode dentry and the freeing of the eventfs_inode. If user space has a dentry held long enough, it may still be able to access the dentry's eventfs_inode after it has been freed. To prevent this, have he eventfs_inode freed via the last dput() (or via RCU if the eventfs_inode does not have a dentry). This means reintroducing the eventfs_inode del_list field at a temporary place to put the eventfs_inode. It needs to mark it as freed (via the list) but also must invalidate the dentry immediately as the return from eventfs_remove_dir() expects that they are. But the dentry invalidation must not be called under the eventfs_mutex, so it must be done after the eventfs_inode is marked as free (put on a deletion list). Link: https://lkml.kernel.org/r/20231101172650.123479767@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ajay Kaher <akaher@vmware.com> Fixes: `5bdcd5f533` ("eventfs: Implement removal of meta data from eventfs") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-02 00:17:27 -04:00
Steven Rostedt (Google)	44365329f8	eventfs: Hold eventfs_mutex when calling callback functions The callback function that is used to create inodes and dentries is not protected by anything and the data that is passed to it could become stale. After eventfs_remove_dir() is called by the tracing system, it is free to remove the events that are associated to that directory. Unfortunately, that means the callbacks must not be called after that. CPU0 CPU1 ---- ---- eventfs_root_lookup() { eventfs_remove_dir() { mutex_lock(&event_mutex); ei->is_freed = set; mutex_unlock(&event_mutex); } kfree(event_call); for (...) { entry = &ei->entries[i]; r = entry->callback() { call = data; // call == event_call above if (call->flags ...) [ USE AFTER FREE BUG ] The safest way to protect this is to wrap the callback with: mutex_lock(&eventfs_mutex); if (!ei->is_freed) r = entry->callback(); else r = -1; mutex_unlock(&eventfs_mutex); This will make sure that the callback will not be called after it is freed. But now it needs to be known that the callback is called while holding internal eventfs locks, and that it must not call back into the eventfs / tracefs system. There's no reason it should anyway, but document that as well. Link: https://lore.kernel.org/all/CA+G9fYu9GOEbD=rR5eMR-=HJ8H6rMsbzDC2ZY5=Y50WpWAE7_Q@mail.gmail.com/ Link: https://lkml.kernel.org/r/20231101172649.906696613@goodmis.org Cc: Ajay Kaher <akaher@vmware.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: `5790b1fb3d` ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-02 00:16:49 -04:00
Steven Rostedt (Google)	28e12c09f5	eventfs: Save ownership and mode Now that inodes and dentries are created on the fly, they are also reclaimed on memory pressure. Since the ownership and file mode are saved in the inode, if they are freed, any changes to the ownership and mode will be lost. To counter this, if the user changes the permissions or ownership, save them, and when creating the inodes again, restore those changes. Link: https://lkml.kernel.org/r/20231101172649.691841445@goodmis.org Cc: stable@vger.kernel.org Cc: Ajay Kaher <akaher@vmware.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: `6394044955` ("eventfs: Implement eventfs lookup, read, open functions") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-01 23:55:12 -04:00
Steven Rostedt (Google)	77a06c33a2	eventfs: Test for ei->is_freed when accessing ei->dentry The eventfs_inode (ei) is protected by SRCU, but the ei->dentry is not. It is protected by the eventfs_mutex. Anytime the eventfs_mutex is released, and access to the ei->dentry needs to be done, it should first check if ei->is_freed is set under the eventfs_mutex. If it is, then the ei->dentry is invalid and must not be used. The ei->dentry must only be accessed under the eventfs_mutex and after checking if ei->is_freed is set. When the ei is being freed, it will (under the eventfs_mutex) set is_freed and at the same time move the dentry to a free list to be cleared after the eventfs_mutex is released. This means that any access to the ei->dentry must check first if ei->is_freed is set, because if it is, then the dentry is on its way to be freed. Also add comments to describe this better. Link: https://lore.kernel.org/all/CA+G9fYt6pY+tMZEOg=SoEywQOe19fGP3uR15SGowkdK+_X85Cg@mail.gmail.com/ Link: https://lore.kernel.org/all/CA+G9fYuDP3hVQ3t7FfrBAjd_WFVSurMgCepTxunSJf=MTe=6aA@mail.gmail.com/ Link: https://lkml.kernel.org/r/20231101172649.477608228@goodmis.org Cc: Ajay Kaher <akaher@vmware.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: `5790b1fb3d` ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Beau Belgrave <beaub@linux.microsoft.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-01 23:50:22 -04:00
Steven Rostedt (Google)	db3a397209	eventfs: Have a free_ei() that just frees the eventfs_inode As the eventfs_inode is freed in two different locations, make a helper function free_ei() to make sure all the allocated fields of the eventfs_inode is freed. This requires renaming the existing free_ei() which is called by the srcu handler to free_rcu_ei() and have free_ei() just do the freeing, where free_rcu_ei() will call it. Link: https://lkml.kernel.org/r/20231101172649.265214087@goodmis.org Cc: Ajay Kaher <akaher@vmware.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-01 23:49:55 -04:00
Steven Rostedt (Google)	f2f496370a	eventfs: Remove "is_freed" union with rcu head The eventfs_inode->is_freed was a union with the rcu_head with the assumption that when it was on the srcu list the head would contain a pointer which would make "is_freed" true. But that was a wrong assumption as the rcu head is a single link list where the last element is NULL. Instead, split the nr_entries integer so that "is_freed" is one bit and the nr_entries is the next 31 bits. As there shouldn't be more than 10 (currently there's at most 5 to 7 depending on the config), this should not be a problem. Link: https://lkml.kernel.org/r/20231101172649.049758712@goodmis.org Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ajay Kaher <akaher@vmware.com> Fixes: `6394044955` ("eventfs: Implement eventfs lookup, read, open functions") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-01 23:49:32 -04:00
Steven Rostedt (Google)	9037caa09e	eventfs: Fix kerneldoc of eventfs_remove_rec() The eventfs_remove_rec() had some missing parameters in the kerneldoc comment above it. Also, rephrase the description a bit more to have a bit more correct grammar. Link: https://lore.kernel.org/linux-trace-kernel/20231030121523.0b2225a7@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: `5790b1fb3d` ("eventfs: Remove eventfs_file and just use eventfs_inode"); Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310052216.4SgqasWo-lkp@intel.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-01 23:49:25 -04:00
Steven Rostedt (Google)	77bc4d4921	eventfs: Remove extra dget() in eventfs_create_events_dir() The creation of the top events directory does a dget() at the end of the creation in eventfs_create_events_dir() with a comment saying the final dput() will happen when it is removed. The problem is that a dget() is already done on the dentry when it was created with tracefs_start_creating()! The dget() now just causes a memory leak of that dentry. Remove the extra dget() as the final dput() in the deletion of the events directory actually matches the one in tracefs_start_creating(). Link: https://lore.kernel.org/linux-trace-kernel/20231031124229.4f2e3fa1@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: `5790b1fb3d` ("eventfs: Remove eventfs_file and just use eventfs_inode") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-01 23:45:05 -04:00
Linus Torvalds	1b10d2c8c6	Bootconfig for v6.7: - Documentation update for /proc/cmdline, which includes both the parameters from bootloader and the embedded parameters in the kernel. - fs/proc: Add bootloader argument as a comment line to /proc/bootconfig so that the user can distinguish what parameters were passed from bootloader even if bootconfig modified that. - Documentation fix to add /proc/bootconfig to proc.rst. -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmVAzokbHG1hc2FtaS5o aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bJxMH+QHgxumk/zhQ6TG1szG3 TWrkhg5oP2hGRTbJc6lK/BJM5E/xz6SNoGP6nEEW5JdkzKwNyNSLab+J0as9Ua3V KtJGiAtp4FZpVl6B2y5UBqBerRl8U4NjW8TOsEv8Dy4gVlgjdbyS7fzy3T86DEZ5 QGQ004pYVciSzmRvEmk3f0GUqsgJUYOjZWIGp2+fukpTnJhD4o3K3UwUs4Rz1tF7 cnVm34af67648/mGSxB+qHBHq4dznrWopv/SgH6OcEzOOpvEGAwPuteAms3S8g3j WOZNHB7QlNkVlASlAcC+yPHjJBtWMfRTsWQIxW5F0wSqq0SRCzTMLHU1wltJagyq lBQ= =J8P0 -----END PGP SIGNATURE----- Merge tag 'bootconfig-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull bootconfig updates from Masami Hiramatsu: - Documentation update for /proc/cmdline, which includes both the parameters from bootloader and the embedded parameters in the kernel - fs/proc: Add bootloader argument as a comment line to /proc/bootconfig so that the user can distinguish what parameters were passed from bootloader even if bootconfig modified that - Documentation fix to add /proc/bootconfig to proc.rst * tag 'bootconfig-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: doc: Add /proc/bootconfig to proc.rst fs/proc: Add boot loader arguments as comment to /proc/bootconfig doc: Update /proc/cmdline documentation to include boot config	2023-11-01 16:07:05 -10:00
Linus Torvalds	1e0c505e13	asm-generic updates for v6.7 The ia64 architecture gets its well-earned retirement as planned, now that there is one last (mostly) working release that will be maintained as an LTS kernel. The architecture specific system call tables are updated for the added map_shadow_stack() syscall and to remove references to the long-gone sys_lookup_dcookie() syscall. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmVC40IACgkQYKtH/8kJ Uidhmw/9EX+aWSXGoObJ3fngaNSMw+PmrEuP8qEKBHxfKHcCdX3hc451Oh4GlhaQ tru91pPwgNvN2/rfoKusxT+V4PemGIzfNni/04rp+P0kvmdw5otQ2yNhsQNsfVmq XGWvkxF4P2GO6bkjjfR/1dDq7GtlyXtwwPDKeLbYb6TnJOZjtx+EAN27kkfSn1Ms R4Sa3zJ+DfHUmHL5S9g+7UD/CZ5GfKNmIskI4Mz5GsfoUz/0iiU+Bge/9sdcdSJQ kmbLy5YnVzfooLZ3TQmBFsO3iAMWb0s/mDdtyhqhTVmTUshLolkPYyKnPFvdupyv shXcpEST2XJNeaDRnL2K4zSCdxdbnCZHDpjfl9wfioBg7I8NfhXKpf1jYZHH1de4 LXq8ndEFEOVQw/zSpYWfQq1sux8Jiqr+UK/ukbVeFWiGGIUs91gEWtPAf8T0AZo9 ujkJvaWGl98O1g5wmBu0/dAR6QcFJMDfVwbmlIFpU8O+MEaz6X8mM+O5/T0IyTcD eMbAUjj4uYcU7ihKzHEv/0SS9Of38kzff67CLN5k8wOP/9NlaGZ78o1bVle9b52A BdhrsAefFiWHp1jT6Y9Rg4HOO/TguQ9e6EWSKOYFulsiLH9LEFaB9RwZLeLytV0W vlAgY9rUW77g1OJcb7DoNv33nRFuxsKqsnz3DEIXtgozo9CzbYI= =H1vH -----END PGP SIGNATURE----- Merge tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull ia64 removal and asm-generic updates from Arnd Bergmann: - The ia64 architecture gets its well-earned retirement as planned, now that there is one last (mostly) working release that will be maintained as an LTS kernel. - The architecture specific system call tables are updated for the added map_shadow_stack() syscall and to remove references to the long-gone sys_lookup_dcookie() syscall. * tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: hexagon: Remove unusable symbols from the ptrace.h uapi asm-generic: Fix spelling of architecture arch: Reserve map_shadow_stack() syscall number for all architectures syscalls: Cleanup references to sys_lookup_dcookie() Documentation: Drop or replace remaining mentions of IA64 lib/raid6: Drop IA64 support Documentation: Drop IA64 from feature descriptions kernel: Drop IA64 support from sig_fault handlers arch: Remove Itanium (IA-64) architecture	2023-11-01 15:28:33 -10:00
Kent Overstreet	dc7a15fb90	bcachefs: Skip deleted members in member_to_text() This fixes show-super output - we shouldn't be printing members that have been deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:08 -04:00
Kent Overstreet	df94cb2e57	bcachefs: Fix an integer overflow Fixes: bcachefs (e7fdc10e-54a3-49d9-bd0c-390370889d84): disk usage increased 4294967296 more than 2823707312 sectors reserved) transaction updates for __bchfs_fallocate journal seq 467859 update: btree=extents cached=0 bch2_trans_update+0x4e8/0x540 old u64s 5 type deleted 536925940:3559337304:4294967283 len 0 ver 0 new u64s 6 type reservation 536925940:3559337304:4294967283 len 3559337304 ver 0: generation 0 replicas 2 update: btree=inodes cached=1 bch2_extent_update_i_size_sectors+0x305/0x3b0 old u64s 19 type inode_v3 0:536925940:4294967283 len 0 ver 0: mode 100600 flags 15300000 journal_seq 467859 bi_size 0 bi_sectors 0 bi_version 0 bi_atime 40905301656446 bi_ctime 40905301656446 bi_mtime 40905301656446 bi_otime 40905301656446 bi_uid 0 bi_gid 0 bi_nlink 0 bi_generation 0 bi_dev 0 bi_data_checksum 0 bi_compression 0 bi_project 0 bi_background_compression 0 bi_data_replicas 0 bi_promote_target 0 bi_foreground_target 0 bi_background_target 0 bi_erasure_code 0 bi_fields_set 0 bi_dir 1879048193 bi_dir_offset 3384856038735393365 bi_subvol 0 bi_parent_subvol 0 bi_nocow 0 new u64s 19 type inode_v3 0:536925940:4294967283 len 0 ver 0: mode 100600 flags 15300000 journal_seq 467859 bi_size 0 bi_sectors 3559337304 bi_version 0 bi_atime 40905301656446 bi_ctime 40905301656446 bi_mtime 40905301656446 bi_otime 40905301656446 bi_uid 0 bi_gid 0 bi_nlink 0 bi_generation 0 bi_dev 0 bi_data_checksum 0 bi_compression 0 bi_project 0 bi_background_compression 0 bi_data_replicas 0 bi_promote_target 0 bi_foreground_target 0 bi_background_target 0 bi_erasure_code 0 bi_fields_set 0 bi_dir 1879048193 bi_dir_offset 3384856038735393365 bi_subvol 0 bi_parent_subvol 0 bi_nocow 0 Kernel panic - not syncing: bcachefs (e7fdc10e-54a3-49d9-bd0c-390370889d84): panic after error CPU: 4 PID: 5154 Comm: rsync Not tainted 6.5.9-gateway-gca1614174cc0-dirty #1 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Phantom Gaming 4, BIOS P4.20 08/02/2021 Call Trace: <TASK> dump_stack_lvl+0x5a/0x90 panic+0x105/0x300 ? console_unlock+0xf1/0x130 ? bch2_printbuf_exit+0x16/0x30 ? srso_return_thunk+0x5/0x10 bch2_inconsistent_error+0x6f/0x80 bch2_trans_fs_usage_apply+0x279/0x3d0 __bch2_trans_commit+0x112a/0x1df0 ? bch2_extent_update+0x13a/0x1d0 bch2_extent_update+0x13a/0x1d0 bch2_extent_fallocate+0x58e/0x740 bch2_fallocate_dispatch+0xb7c/0x1030 ? do_filp_open+0xa0/0x140 vfs_fallocate+0x18e/0x1d0 __x64_sys_fallocate+0x46/0x70 do_syscall_64+0x48/0xa0 ? exit_to_user_mode_prepare+0x4d/0xa0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7fc85d91bbb3 Code: 64 89 02 b8 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 80 3d 31 da 0d 00 00 49 89 ca 74 14 b8 1d 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5d c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 10 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:08 -04:00
Kent Overstreet	be9e782df3	bcachefs: Don't downgrade locks on transaction restart We should only be downgrading locks on success - otherwise, our transaction restarts won't be getting the correct locks and we'll livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:08 -04:00
Kent Overstreet	2e7acdfbca	bcachefs: Fix deleted inodes btree in snapshot deletion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:08 -04:00
Kent Overstreet	85103d15ca	bcachefs: Fix error path in bch2_replicas_gc_end() We were dropping a lock we hadn't taken when entering with an error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:08 -04:00
Kent Overstreet	b65db750e2	bcachefs: Enumerate fsck errors This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:08 -04:00
Kent Overstreet	f5d26fa31e	bcachefs: bch_sb_field_errors Add a new superblock section to keep counts of errors seen since filesystem creation: we'll be addingcounters for every distinct fsck error. The new superblock section has entries of the for [ id, count, time_of_last_error ]; this is intended to let us see what errors are occuring - and getting fixed - via show-super output. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:08 -04:00
Kent Overstreet	94119eeb02	bcachefs: Add IO error counts to bch_member We now track IO errors per device since filesystem creation. IO error counts can be viewed in sysfs, or with the 'bcachefs show-super' command. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:08 -04:00
Kent Overstreet	5394fe9494	bcachefs: Fix snapshot skiplists Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:07 -04:00
Kent Overstreet	e84843489c	bcachefs: Fix a kasan splat in bch2_dev_add() This fixes a use after free - mi is dangling after the resize call. Additionally, resizing the device's member info section was useless - we were attempting to preallocate the space required before adding it to the filesystem superblock, but there's other sections that we should have been preallocating as well for that to work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:07 -04:00
Kent Overstreet	5c1ab40e76	bcachefs: Fix kasan splat in members_v1_get() This fixes an incorrect memcpy() in the recent members_v2 code - a members_v1 member is BCH_MEMBER_V1_BYTES, not sizeof(struct bch_member). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:07 -04:00
Kent Overstreet	fb3f57bb11	bcachefs: rebalance_work This adds a new btree, rebalance_work, to eliminate scanning required for finding extents that need work done on them in the background - i.e. for the background_target and background_compression options. rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an extent in the extents or reflink btree at the same pos. A new extent field is added, bch_extent_rebalance, which indicates that this extent has work that needs to be done in the background - and which options to use. This allows per-inode options to be propagated to indirect extents - at least in some circumstances. In this patch, changing IO options on a file will not propagate the new options to indirect extents pointed to by that file. Updating (setting/clearing) the rebalance_work btree is done by the extent trigger, which looks at the bch_extent_rebalance field. Scanning is still requrired after changing IO path options - either just for a given inode, or for the whole filesystem. We indicate that scanning is required by adding a KEY_TYPE_cookie key to the rebalance_work btree: the cookie counter is so that we can detect that scanning is still required when an option has been flipped mid-way through an existing scan. Future possible work: - Propagate options to indirect extents when being changed - Add other IO path options - nr_replicas, ec, to rebalance_work so they can be applied in the background when they change - Add a counter, for bcachefs fs usage output, showing the pending amount of rebalance work: we'll probably want to do this after the disk space accounting rewrite (moving it to a new btree) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:05 -04:00
Kunwu Chan	e3bc0c427f	ocfs2: fix a spelling typo in comment Fix a spelling typo in comment. Link: https://lkml.kernel.org/r/20231025072906.14285-1-chentao@kylinos.cn Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-11-01 12:46:59 -07:00
Yang Li	639931020e	fs/proc/base.c: remove unneeded semicolon ./fs/proc/base.c:3829:2-3: Unneeded semicolon Link: https://lkml.kernel.org/r/20231026005634.6581-1-yang.lee@linux.alibaba.com Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7057 Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-11-01 12:46:59 -07:00
Oleg Nesterov	1df4bd83cd	do_io_accounting: use sig->stats_lock Rather than lock_task_sighand(), sig->stats_lock was specifically designed for this type of use. This way the "if (whole)" branch runs lockless in the likely case. Link: https://lkml.kernel.org/r/20231023153405.GA4639@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-11-01 12:46:59 -07:00
Oleg Nesterov	2320222067	do_io_accounting: use __for_each_thread() Rather than while_each_thread() which should be avoided when possible. This makes the code more clear and allows the next change. Link: https://lkml.kernel.org/r/20231023153343.GA4629@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-11-01 12:46:58 -07:00
Jia Rui	873ed7222c	ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error() The BUG_ON() at ocfs2_num_free_extents() handles the error that l_tree_deepth of leaf extent block just read form disk is invalid. This error is mostly caused by file system metadata corruption on the disk. There is no need to call BUG_ON() to handle such errors. We can return error code, since the caller can deal with errors from ocfs2_num_free_extents(). Also, we should make the file system read-only to avoid the damage from expanding. Therefore, BUG_ON() is removed and ocfs2_error() is called instead. Link: https://lkml.kernel.org/r/20231018191811.412458-1-jindui71@gmail.com Signed-off-by: Jia Rui <jindui71@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-11-01 12:46:58 -07:00
Matthew Wilcox (Oracle)	f003a717ae	nfs: Convert nfs_symlink() to use a folio Use the folio APIs, saving about four calls to compound_head(). Convert back to a page in each of the individual protocol implementations. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2023-11-01 15:40:44 -04:00
Olga Kornievskaia	5cc7688bae	NFSv4.1: fix SP4_MACH_CRED protection for pnfs IO If the client is doing pnfs IO and Kerberos is configured and EXCHANGEID successfully negotiated SP4_MACH_CRED and WRITE/COMMIT are on the list of state protected operations, then we need to make sure to choose the DS's rpc_client structure instead of the MDS's one. Fixes: `fb91fb0ee7` ("NFS: Move call to nfs4_state_protect_write() to nfs4_write_setup()") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2023-11-01 15:40:44 -04:00

1 2 3 4 5 ...

87634 Commits