linux

iv/linux

Author	SHA1	Message	Date
Darrick J. Wong	97e993830a	xfs: use accessor functions for bitmap words Create get and set functions for rtbitmap words so that we can redefine the ondisk format with a specific endianness. Note that this requires the definition of a distinct type for ondisk rtbitmap words so that the compiler can perform proper typechecking as we go back and forth. In the upcoming rtgroups feature, we're going to fix the problem that rtwords are written in host endian order, which means we'll need the distinct rtword/rtword_raw types. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-18 11:46:19 -07:00
Darrick J. Wong	312d61021b	xfs: create a helper to handle logging parts of rt bitmap/summary blocks Create an explicit helper function to log parts of rt bitmap and summary blocks. While we're at it, fix an off-by-one error in two of the rtbitmap logging calls that led to unnecessarily large log items but was otherwise benign. Note that the upcoming rtgroups patchset will add block headers to the rtbitmap and rtsummary files. The helpers in this and the next few patches take a less than direct route through xfs_rbmblock_wordptr and xfs_rsumblock_infoptr to avoid helper churn in that patchset. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-18 10:58:58 -07:00
Darrick J. Wong	d0448fe76a	xfs: create helpers for rtbitmap block/wordcount computations Create helper functions that compute the number of blocks or words necessary to store the rt bitmap. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-18 10:58:58 -07:00
Darrick J. Wong	097b4b7b64	xfs: convert rt summary macros to helpers Convert the realtime summary file macros to helper functions so that we can improve type checking. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 17:45:38 -07:00
Darrick J. Wong	a994862684	xfs: convert open-coded xfs_rtword_t pointer accesses to helper There are a bunch of places where we use open-coded logic to find a pointer to an xfs_rtword_t within a rt bitmap buffer. Convert all that to helper functions for better type safety. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 17:45:38 -07:00
Darrick J. Wong	add3cddaea	xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros Remove these trivial macros since they're not even part of the ondisk format. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 17:45:37 -07:00
Darrick J. Wong	90d98a6ada	xfs: convert the rtbitmap block and bit macros to static inline functions Replace these macros with typechecked helper functions. Eventually we're going to add more logic to the helpers and it'll be easier if we don't have to macro it up. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:26:25 -07:00
Darrick J. Wong	ef5a83b7e5	xfs: use shifting and masking when converting rt extents, if possible Avoid the costs of integer division (32-bit and 64-bit) if the realtime extent size is a power of two. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:26:25 -07:00
Darrick J. Wong	5f57f7309d	xfs: create rt extent rounding helpers for realtime extent blocks Create a pair of functions to round rtblock numbers up or down to the nearest rt extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:26:25 -07:00
Darrick J. Wong	055641248f	xfs: convert do_div calls to xfs_rtb_to_rtx helper calls Convert these calls to use the helpers, and clean up all these places where the same variable can have different units depending on where it is in the function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:25:55 -07:00
Darrick J. Wong	5dc3a80d46	xfs: create helpers to convert rt block numbers to rt extent numbers Create helpers to do unit conversions of rt block numbers to rt extent numbers. There are three variations -- one to compute the rt extent number from an rt block number; one to compute the offset of an rt block within an rt extent; and one to extract both. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:24:22 -07:00
Darrick J. Wong	2c2b981b73	xfs: create a helper to convert extlen to rtextlen Create a helper to compute the realtime extent (xfs_rtxlen_t) from an extent length (xfs_extlen_t) value. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:24:22 -07:00
Darrick J. Wong	68db60bf01	xfs: create a helper to compute leftovers of realtime extents Create a helper to compute the misalignment between a file extent (xfs_extlen_t) and a realtime extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:24:22 -07:00
Darrick J. Wong	fa5a387230	xfs: create a helper to convert rtextents to rtblocks Create a helper to convert a realtime extent to a realtime block. Later on we'll change the helper to use bit shifts when possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:24:22 -07:00
Darrick J. Wong	2d5f216b77	xfs: convert rt extent numbers to xfs_rtxnum_t Further disambiguate the xfs_rtblock_t uses by creating a new type, xfs_rtxnum_t, to store the position of an extent within the realtime section, in units of rtextents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:24:22 -07:00
Darrick J. Wong	3d2b6d034f	xfs: rename xfs_verify_rtext to xfs_verify_rtbext This helper function validates that a range of blocks in the realtime section is completely contained within the realtime section. It does /not/ validate ranges of rtextents. Rename the function to avoid suggesting that it does, and change the type of the @len parameter since xfs_rtblock_t is a position unit, not a length unit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:24:22 -07:00
Darrick J. Wong	f29c3e745d	xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t XFS uses xfs_rtblock_t for many different uses, which makes it much more difficult to perform a unit analysis on the codebase. One of these (ab)uses is when we need to store the length of a free space extent as stored in the realtime bitmap. Because there can be up to 2^64 realtime extents in a filesystem, we need a new type that is larger than xfs_rtxlen_t for callers that are querying the bitmap directly. This means scrub and growfs. Create this type as "xfs_rtbxlen_t" and use it to store 64-bit rtx lengths. 'b' stands for 'bitmap' or 'big'; reader's choice. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:24:22 -07:00
Darrick J. Wong	03f4de332e	xfs: convert rt bitmap/summary block numbers to xfs_fileoff_t We should use xfs_fileoff_t to store the file block offset of any location within the realtime bitmap or summary files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:24:22 -07:00
Darrick J. Wong	a684c538bc	xfs: convert xfs_extlen_t to xfs_rtxlen_t in the rt allocator In most of the filesystem, we use xfs_extlen_t to store the length of a file (or AG) space mapping in units of fs blocks. Unfortunately, the realtime allocator also uses it to store the length of a rt space mapping in units of rt extents. This is confusing, since one rt extent can consist of many fs blocks. Separate the two by introducing a new type (xfs_rtxlen_t) to store the length of a space mapping (in units of realtime extents) that would be found in a file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:24:22 -07:00
Darrick J. Wong	13928113fc	xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h Move all the declarations for functionality in xfs_rtbitmap.c into a separate xfs_rtbitmap.h header file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:24:22 -07:00
Darrick J. Wong	f6a2dae2a1	xfs: make sure maxlen is still congruent with prod when rounding down In commit 2a6ca4baed62, we tried to fix an overflow problem in the realtime allocator that was caused by an overly large maxlen value causing xfs_rtcheck_range to run off the end of the realtime bitmap. Unfortunately, there is a subtle bug here -- maxlen (and minlen) both have to be aligned with @prod, but @prod can be larger than 1 if the user has set an extent size hint on the file, and that extent size hint is larger than the realtime extent size. If the rt free space extents are not aligned to this file's extszhint because other files without extent size hints allocated space (or the number of rt extents is similarly not aligned), then it's possible that maxlen after clamping to sb_rextents will no longer be aligned to prod. The allocation will succeed just fine, but we still trip the assertion. Fix the problem by reducing maxlen by any misalignment with prod. While we're at it, split the assertions into two so that we can tell which value had the bad alignment. Fixes: 2a6ca4baed62 ("xfs: make sure the rt allocator doesn't run off the end") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:24:22 -07:00
Darrick J. Wong	ddd98076d5	xfs: fix units conversion error in xfs_bmap_del_extent_delay The unit conversions in this function do not make sense. First we convert a block count to bytes, then divide that bytes value by rextsize, which is in blocks, to get an rt extent count. You can't divide bytes by blocks to get a (possibly multiblock) extent value. Fortunately nobody uses delalloc on the rt volume so this hasn't mattered. Fixes: fa5c836ca8eb5 ("xfs: refactor xfs_bunmapi_cow") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:23:49 -07:00
Darrick J. Wong	c2988eb5cf	xfs: rt stubs should return negative errnos when rt disabled When realtime support is not compiled into the kernel, these functions should return negative errnos, not positive errnos. While we're at it, fix a broken macro declaration. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:22:40 -07:00
Darrick J. Wong	b73494fa9a	xfs: prevent rt growfs when quota is enabled Quotas aren't (yet) supported with realtime, so we shouldn't allow userspace to set up a realtime section when quotas are enabled, even if they attached one via mount options. IOWS, you shouldn't be able to do: # mkfs.xfs -f /dev/sda # mount /dev/sda /mnt -o rtdev=/dev/sdb,usrquota # xfs_growfs -r /mnt Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 16:22:40 -07:00
Darrick J. Wong	6c66448433	xfs: hoist freeing of rt data fork extent mappings Currently, xfs_bmap_del_extent_real contains a bunch of code to convert the physical extent of a data fork mapping for a realtime file into rt extents and pass that to the rt extent freeing function. Since the details of this aren't needed when CONFIG_XFS_REALTIME=n, move it to xfs_rtbitmap.c to reduce code size when realtime isn't enabled. This will (one day) enable realtime EFIs to reuse the same unit-converting call with less code duplication. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 08:40:54 -07:00
Darrick J. Wong	9488062805	xfs: bump max fsgeom struct version The latest version of the fs geometry structure is v5. Bump this constant so that xfs_db and mkfs calls to libxfs_fs_geometry will fill out all the fields. IOWs, this commit is a no-op for the kernel, but will be useful for userspace reporting in later changes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-17 08:40:54 -07:00
Jeff Layton	cbc06310c3	xfs: reinstate the old i_version counter as STATX_CHANGE_COOKIE The handling of STATX_CHANGE_COOKIE was moved into generic_fillattr in commit 0d72b92883c6 (fs: pass the request_mask to generic_fillattr), but we didn't account for the fact that xfs doesn't call generic_fillattr at all. Make XFS report its i_version as the STATX_CHANGE_COOKIE. Fixes: 0d72b92883c6 (fs: pass the request_mask to generic_fillattr) Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-10-12 10:17:03 +05:30
Jiapeng Chong	f93b930030	xfs: Remove duplicate include ./fs/xfs/scrub/xfile.c: xfs_format.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6209 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-10-12 10:14:45 +05:30
Shiyang Ruan	3c90c01e49	xfs: correct calculation for agend and blockcount The agend should be "start + length - 1", then, blockcount should be "end + 1 - start". Correct 2 calculation mistakes. Also, rename "agend" to "range_agend" because it's not the end of the AG per se; it's the end of the dead region within an AG's agblock space. Fixes: 5cf32f63b0f4 ("xfs: fix the calculation for "end" and "length"") Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-10-12 10:11:56 +05:30
Darrick J. Wong	442177be8c	xfs: process free extents to busy list in FIFO order When we're adding extents to the busy discard list, add them to the tail of the list so that we get FIFO order. For FITRIM commands, this means that we send discard bios sorted in order from longest to shortest, like we did before commit 89cfa899608fc. For transactions that are freeing extents, this puts them in the transaction's busy list in FIFO order as well, which shouldn't make any noticeable difference. Fixes: 89cfa899608fc ("xfs: reduce AGF hold times during fstrim operations") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-11 12:35:21 -07:00
Darrick J. Wong	6868b8505c	xfs: adjust the incore perag block_count when shrinking If we reduce the number of blocks in an AG, we must update the incore geometry values as well. Fixes: 0800169e3e2c9 ("xfs: Pre-calculate per-AG agbno geometry") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-10-11 12:35:20 -07:00
Dave Chinner	e78a40b851	xfs: abort fstrim if kernel is suspending A recent ext4 patch posting from Jan Kara reminded me of a discussion a year ago about fstrim in progress preventing kernels from suspending. The fix is simple, we should do the same for XFS. This removes the -ERESTARTSYS error return from this code, replacing it with either the last error seen or the number of blocks successfully trimmed up to the point where we detected the stop condition. References: https://bugzilla.kernel.org/show_bug.cgi?id=216322 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2023-10-04 09:25:04 +11:00
Dave Chinner	89cfa89960	xfs: reduce AGF hold times during fstrim operations fstrim will hold the AGF lock for as long as it takes to walk and discard all the free space in the AG that meets the userspace trim criteria. For AGs with lots of free space extents (e.g. millions) or the underlying device is really slow at processing discard requests (e.g. Ceph RBD), this means the AGF hold time is often measured in minutes to hours, not a few milliseconds as we normal see with non-discard based operations. This can result in the entire filesystem hanging whilst the long-running fstrim is in progress. We can have transactions get stuck waiting for the AGF lock (data or metadata extent allocation and freeing), and then more transactions get stuck waiting on the locks those transactions hold. We can get to the point where fstrim blocks an extent allocation or free operation long enough that it ends up pinning the tail of the log and the log then runs out of space. At this point, every modification in the filesystem gets blocked. This includes read operations, if atime updates need to be made. To fix this problem, we need to be able to discard free space extents safely without holding the AGF lock. Fortunately, we already do this with online discard via busy extents. We can mark free space extents as "busy being discarded" under the AGF lock and then unlock the AGF, knowing that nobody will be able to allocate that free space extent until we remove it from the busy tree. Modify xfs_trim_extents to use the same asynchronous discard mechanism backed by busy extents as is used with online discard. This results in the AGF only needing to be held for short periods of time and it is never held while we issue discards. Hence if discard submission gets throttled because it is slow and/or there are lots of them, we aren't preventing other operations from being performed on AGF while we wait for discards to complete... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2023-10-04 09:24:52 +11:00
Dave Chinner	428c4435b0	xfs: move log discard work to xfs_discard.c Because we are going to use the same list-based discard submission interface for fstrim-based discards, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2023-10-04 09:24:02 +11:00
Darrick J. Wong	537c013b14	xfs: fix reloading entire unlinked bucket lists During review of the patcheset that provided reloading of the incore iunlink list, Dave made a few suggestions, and I updated the copy in my dev tree. Unfortunately, I then got distracted by ... who even knows what ... and forgot to backport those changes from my dev tree to my release candidate branch. I then sent multiple pull requests with stale patches, and that's what was merged into -rc3. So. This patch re-adds the use of an unlocked iunlink list check to determine if we want to allocate the resources to recreate the incore list. Since lost iunlinked inodes are supposed to be rare, this change helps us avoid paying the transaction and AGF locking costs every time we open any inode. This also re-adds the shutdowns on failure, and re-applies the restructuring of the inner loop in xfs_inode_reload_unlinked_bucket, and re-adds a requested comment about the quotachecking code. Retain the original RVB tag from Dave since there's no code change from the last submission. Fixes: 68b957f64fca1 ("xfs: load uncached unlinked inodes into memory on demand") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2023-09-24 18:12:13 -07:00
Linus Torvalds	3abc79dce6	Bug fixes for 6.6-rc3: * Fix an integer overflow bug when processing an fsmap call. * Fix crash due to CPU hot remove event racing with filesystem mount operation. * During read-only mount, XFS does not allow the contents of the log to be recovered when there are one or more unrecognized rcompat features in the primary superblock, since the log might have intent items which the kernel does not know how to process. * During recovery of log intent items, XFS now reserves log space sufficient for one cycle of a permanent transaction to execute. Otherwise, this could lead to livelocks due to non-availability of log space. * On an fs which has an ondisk unlinked inode list, trying to delete a file or allocating an O_TMPFILE file can cause the fs to the shutdown if the first inode in the ondisk inode list is not present in the inode cache. The bug is solved by explicitly loading the first inode in the ondisk unlinked inode list into the inode cache if it is not already cached. A similar problem arises when the uncached inode is present in the middle of the ondisk unlinked inode list. This second bug is triggered when executing operations like quotacheck and bulkstat. In this case, XFS now reads in the entire ondisk unlinked inode list. * Enable LARP mode only on recent v5 filesystems. * Fix a out of bounds memory access in scrub. * Fix a performance bug when locating the tail of the log during mounting a filesystem. Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZQkx4QAKCRAH7y4RirJu 9HrTAQD6QhvHkS43vueGOb4WISZPG/jMKJ/FjvwLZrIZ0erbJwEAtRWhClwFv3NZ exJFtsmxrKC6Vifuo0pvfoCiK5mUvQ8= =SrJR -----END PGP SIGNATURE----- Merge tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Chandan Babu: - Fix an integer overflow bug when processing an fsmap call - Fix crash due to CPU hot remove event racing with filesystem mount operation - During read-only mount, XFS does not allow the contents of the log to be recovered when there are one or more unrecognized rcompat features in the primary superblock, since the log might have intent items which the kernel does not know how to process - During recovery of log intent items, XFS now reserves log space sufficient for one cycle of a permanent transaction to execute. Otherwise, this could lead to livelocks due to non-availability of log space - On an fs which has an ondisk unlinked inode list, trying to delete a file or allocating an O_TMPFILE file can cause the fs to the shutdown if the first inode in the ondisk inode list is not present in the inode cache. The bug is solved by explicitly loading the first inode in the ondisk unlinked inode list into the inode cache if it is not already cached A similar problem arises when the uncached inode is present in the middle of the ondisk unlinked inode list. This second bug is triggered when executing operations like quotacheck and bulkstat. In this case, XFS now reads in the entire ondisk unlinked inode list - Enable LARP mode only on recent v5 filesystems - Fix a out of bounds memory access in scrub - Fix a performance bug when locating the tail of the log during mounting a filesystem * tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail xfs: only call xchk_stats_merge after validating scrub inputs xfs: require a relatively recent V5 filesystem for LARP mode xfs: make inode unlinked bucket recovery work with quotacheck xfs: load uncached unlinked inodes into memory on demand xfs: reserve less log space when recovering log intent items xfs: fix log recovery when unknown rocompat bits are set xfs: reload entire unlinked bucket lists xfs: allow inode inactivation during a ro mount log recovery xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list xfs: remove CPU hotplug infrastructure xfs: remove the all-mounts list xfs: use per-mount cpumask to track nonempty percpu inodegc lists xfs: fix an agbno overflow in __xfs_getfsmap_datadev xfs: fix per-cpu CIL structure aggregation racing with dying cpus xfs: fix select in config XFS_ONLINE_SCRUB_STATS	2023-09-22 16:32:19 -07:00
Christian Brauner	f798accd59	Revert "xfs: switch to multigrain timestamps" This reverts commit e44df2664746aed8b6dd5245eb711a0ce33c5cf5. Users reported regressions due to enabling multi-grained timestamps unconditionally. As no clear consensus on a solution has come up and the discussion has gone back to the drawing board revert the infrastructure changes for. If it isn't code that's here to stay, make it go away. Message-ID: <20230920-keine-eile-c9755b5825db@brauner> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-09-20 18:05:31 +02:00
Wang Jianchao	8b010acb31	xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail In our production environment, we find that mounting a 500M /boot which is umount cleanly needs ~6s. One cause is that ffs() is used by xlog_write_log_records() to decide the buffer size. It can cause a lot of small IO easily when xlog_clear_stale_blocks() needs to wrap around the end of log area and log head block is not power of two. Things are similar in xlog_find_verify_cycle(). The code is able to handed bigger buffer very well, we can use roundup_pow_of_two() to replace ffs() directly to avoid small and sychronous IOs. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Wang Jianchao <wangjc136@midea.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-09-13 10:38:20 +05:30
Chandan Babu R	1155b12edb	xfs: fix out of bounds memory access in scrub This is a quick fix for a few internal syzbot reports concerning an invalid memory access in the scrub code. This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZQChOgAKCRBKO3ySh0YR pkKbAQCKg0+VAqr2UuKT7PygRSUaLNybnMBHetDZyd1maEl7OQD7BGuM9AxwXWFp hL0Jq/HN5yeArrueGKMd0K3u1HRjJQE= =XwHc -----END PGP SIGNATURE----- Merge tag 'fix-scrub-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA xfs: fix out of bounds memory access in scrub This is a quick fix for a few internal syzbot reports concerning an invalid memory access in the scrub code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'fix-scrub-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: only call xchk_stats_merge after validating scrub inputs	2023-09-13 10:35:49 +05:30
Chandan Babu R	6ebb6500e5	xfs: disallow LARP on old fses Before enabling logged xattrs, make sure the filesystem is new enough that it actually supports log incompat features. This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZQChOQAKCRBKO3ySh0YR pqc1AQD8hXUpatOY50TdRDI6qpKBWOEti7r+sXyq9bWM4QZFyAD/Zjx3aZ+R2u2g lsb1xLjekrh2DzToOFnvs4gd/nZd7Qw= =BxHQ -----END PGP SIGNATURE----- Merge tag 'fix-larp-requirements-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA xfs: disallow LARP on old fses Before enabling logged xattrs, make sure the filesystem is new enough that it actually supports log incompat features. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'fix-larp-requirements-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: require a relatively recent V5 filesystem for LARP mode	2023-09-13 10:33:27 +05:30
Chandan Babu R	abf7c8194f	xfs: reload entire iunlink lists This is the second part of correcting XFS to reload the incore unlinked inode list from the ondisk contents. Whereas part one tackled failures from regular filesystem calls, this part takes on the problem of needing to reload the entire incore unlinked inode list on account of somebody loading an inode that's in the /middle/ of an unlinked list. This happens during quotacheck, bulkstat, or even opening a file by handle. In this case we don't know the length of the list that we're reloading, so we don't want to create a new unbounded memory load while holding resources locked. Instead, we'll target UNTRUSTED iget calls to reload the entire bucket. Note that this changes the definition of the incore unlinked inode list slightly -- i_prev_unlinked == 0 now means "not on the incore list". This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZQChOQAKCRBKO3ySh0YR ptU6AP48lONiOPzWvF1mXTnDosAtIDsfliMY+qVgVxrghqBFmwEAitHlOadpWonu yoQ3cnSqzfA4rKT5MQZCm2iIHH/LMgU= =Kp5V -----END PGP SIGNATURE----- Merge tag 'fix-iunlink-list-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA xfs: reload entire iunlink lists This is the second part of correcting XFS to reload the incore unlinked inode list from the ondisk contents. Whereas part one tackled failures from regular filesystem calls, this part takes on the problem of needing to reload the entire incore unlinked inode list on account of somebody loading an inode that's in the /middle/ of an unlinked list. This happens during quotacheck, bulkstat, or even opening a file by handle. In this case we don't know the length of the list that we're reloading, so we don't want to create a new unbounded memory load while holding resources locked. Instead, we'll target UNTRUSTED iget calls to reload the entire bucket. Note that this changes the definition of the incore unlinked inode list slightly -- i_prev_unlinked == 0 now means "not on the incore list". Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'fix-iunlink-list-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: make inode unlinked bucket recovery work with quotacheck xfs: reload entire unlinked bucket lists xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list	2023-09-13 10:30:12 +05:30
Chandan Babu R	fffcdcc31f	xfs: reload the last iunlink item It turns out that there are some serious bugs in how xfs handles the unlinked inode lists. Way back before 4.14, there was a bug where a ro mount of a dirty filesystem would recover the log bug neglect to purge the unlinked list. This leads to clean unmounted filesystems with unlinked inodes. Starting around 5.15, we also converted the codebase to maintain a doubly-linked incore unlinked list. However, we never provided the ability to load the incore list from disk. If someone tries to allocate an O_TMPFILE file on a clean fs with a pre-existing unlinked list or even deletes a file, the code will fail and the fs shuts down. This first part of the correction effort adds the ability to load the first inode in the bucket when unlinking a file; and to load the next inode in the list when inactivating (freeing) an inode. This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZQChOQAKCRBKO3ySh0YR plJvAQC0s843w2nvluXlIE8P9nBqk2ht6zwNOJpiZbWnf0zeLAD/a6v0HVVLbGN5 qHVd/abQ5QIW55Ybm3Qko6PKvV4Nlgo= =WcRN -----END PGP SIGNATURE----- Merge tag 'fix-iunlink-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA xfs: reload the last iunlink item It turns out that there are some serious bugs in how xfs handles the unlinked inode lists. Way back before 4.14, there was a bug where a ro mount of a dirty filesystem would recover the log bug neglect to purge the unlinked list. This leads to clean unmounted filesystems with unlinked inodes. Starting around 5.15, we also converted the codebase to maintain a doubly-linked incore unlinked list. However, we never provided the ability to load the incore list from disk. If someone tries to allocate an O_TMPFILE file on a clean fs with a pre-existing unlinked list or even deletes a file, the code will fail and the fs shuts down. This first part of the correction effort adds the ability to load the first inode in the bucket when unlinking a file; and to load the next inode in the list when inactivating (freeing) an inode. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'fix-iunlink-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: load uncached unlinked inodes into memory on demand	2023-09-13 10:25:00 +05:30
Chandan Babu R	b6c2b6378d	xfs: fix EFI recovery livelocks This series fixes a customer-reported transaction reservation bug introduced ten years ago that could result in livelocks during log recovery. Log intent item recovery single-steps each step of a deferred op chain, which means that each step only needs to allocate one transaction's worth of space in the log, not an entire chain all at once. This single-stepping is critical to unpinning the log tail since there's nobody else to do it for us. This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZQChOQAKCRBKO3ySh0YR pt1uAQCkc4vnjA7C1eUsCwqnzK/A9fstwTnmx7qlGGfFM7wwowD7BqQX2AAeYUvu iT4UzvG9kao+jNNr0zx+ddYOOTJcrgI= =H9nC -----END PGP SIGNATURE----- Merge tag 'fix-efi-recovery-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA xfs: fix EFI recovery livelocks This series fixes a customer-reported transaction reservation bug introduced ten years ago that could result in livelocks during log recovery. Log intent item recovery single-steps each step of a deferred op chain, which means that each step only needs to allocate one transaction's worth of space in the log, not an entire chain all at once. This single-stepping is critical to unpinning the log tail since there's nobody else to do it for us. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'fix-efi-recovery-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: reserve less log space when recovering log intent items	2023-09-13 10:22:26 +05:30
Chandan Babu R	f41d7d70b0	xfs: fix ro mounting with unknown rocompat features [v2] Dave pointed out some failures in xfs/270 when he upgraded Debian unstable and util-linux started using the new mount apis. Upon further inquiry I noticed that XFS is quite a hot mess when it encounters a filesystem with unrecognized rocompat bits set in the superblock. Whereas we used to allow readonly mounts under these conditions, a change to the sb write verifier several years ago resulted in the filesystem going down immediately because the post-mount log cleaning writes the superblock, which trips the sb write verifier on the unrecognized rocompat bit. I made the observation that the ROCOMPAT features RMAPBT and REFLINK both protect new log intent item types, which means that we actually cannot support recovering the log if we don't recognize all the rocompat bits. Therefore -- fix inode inactivation to work when we're recovering the log, disallow recovery when there's unrecognized rocompat bits, and don't clean the log if doing so would trip the rocompat checks. v2: change direction of series to allow log recovery on ro mounts This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZQChOQAKCRBKO3ySh0YR pkFRAP0f7+do6A3cs5GuMSCRdH3DImjX1ts9nHJAgxKadTod8gEApeDb290wI+ek NTetY6RKfexMZLEgXI8YtAlhsR8nVwI= =LARv -----END PGP SIGNATURE----- Merge tag 'fix-ro-mounts-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA xfs: fix ro mounting with unknown rocompat features Dave pointed out some failures in xfs/270 when he upgraded Debian unstable and util-linux started using the new mount apis. Upon further inquiry I noticed that XFS is quite a hot mess when it encounters a filesystem with unrecognized rocompat bits set in the superblock. Whereas we used to allow readonly mounts under these conditions, a change to the sb write verifier several years ago resulted in the filesystem going down immediately because the post-mount log cleaning writes the superblock, which trips the sb write verifier on the unrecognized rocompat bit. I made the observation that the ROCOMPAT features RMAPBT and REFLINK both protect new log intent item types, which means that we actually cannot support recovering the log if we don't recognize all the rocompat bits. Therefore -- fix inode inactivation to work when we're recovering the log, disallow recovery when there's unrecognized rocompat bits, and don't clean the log if doing so would trip the rocompat checks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'fix-ro-mounts-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: fix log recovery when unknown rocompat bits are set xfs: allow inode inactivation during a ro mount log recovery	2023-09-13 10:14:02 +05:30
Chandan Babu R	0a229c935a	xfs: fix cpu hotplug mess [v2] Ritesh and Eric separately reported crashes in XFS's hook function for CPU hot remove if the remove event races with a filesystem being mounted. I also noticed via generic/650 that once in a while the log will shut down over an apparent overrun of a transaction reservation; this turned out to be due to CIL percpu list aggregation failing to pick up the percpu list items from a dying CPU. Either way, the solution here is to eliminate the need for a CPU dying hook by using a private cpumask to track which CPUs have added to their percpu lists directly, and iterating with that mask. This fixes the log problems and (I think) solves a theoretical UAF bug in the inodegc code too. v2: fix a few put_cpu uses, add necessary memory barriers, and use atomic cpumask operations This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZQChOQAKCRBKO3ySh0YR plauAQCV0RymbwD/ONbvpor3yK4R3YO1pa923KtoiQ9IAV5uswD/YBWvyI76BhNs B8hwbEDm3X2ZjQaikxI+Xx2cMaAhkgY= =EJ0b -----END PGP SIGNATURE----- Merge tag 'fix-percpu-lists-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA xfs: fix cpu hotplug mess Ritesh and Eric separately reported crashes in XFS's hook function for CPU hot remove if the remove event races with a filesystem being mounted. I also noticed via generic/650 that once in a while the log will shut down over an apparent overrun of a transaction reservation; this turned out to be due to CIL percpu list aggregation failing to pick up the percpu list items from a dying CPU. Either way, the solution here is to eliminate the need for a CPU dying hook by using a private cpumask to track which CPUs have added to their percpu lists directly, and iterating with that mask. This fixes the log problems and (I think) solves a theoretical UAF bug in the inodegc code too. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'fix-percpu-lists-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: remove CPU hotplug infrastructure xfs: remove the all-mounts list xfs: use per-mount cpumask to track nonempty percpu inodegc lists xfs: fix per-cpu CIL structure aggregation racing with dying cpus	2023-09-13 10:11:44 +05:30
Chandan Babu R	da6f8410e7	xfs: fix fsmap cursor handling [v2] This patchset addresses an integer overflow bug that Dave Chinner found in how fsmap handles figuring out where in the record set we left off when userspace calls back after the first call filled up all the designated record space. v2: add RVB tags This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZQChMwAKCRBKO3ySh0YR prqBAP9Zp2WxwQuNQLqCfXBRLZiJRiW8JFcTNJOjdqIicsOPYgEAxs1GHJU4ozrO bKyolvNJIjSow7LWYP1GmfCRa9FqwQ4= =3uSx -----END PGP SIGNATURE----- Merge tag 'fix-fsmap-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA xfs: fix fsmap cursor handling This patchset addresses an integer overflow bug that Dave Chinner found in how fsmap handles figuring out where in the record set we left off when userspace calls back after the first call filled up all the designated record space. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'fix-fsmap-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: fix an agbno overflow in __xfs_getfsmap_datadev	2023-09-13 10:03:38 +05:30
Darrick J. Wong	e031928200	xfs: only call xchk_stats_merge after validating scrub inputs Harshit Mogalapalli slogged through several reports from our internal syzbot instance and observed that they all had a common stack trace: BUG: KASAN: user-memory-access in instrument_atomic_read_write include/linux/instrumented.h:96 [inline] BUG: KASAN: user-memory-access in atomic_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:1294 [inline] BUG: KASAN: user-memory-access in queued_spin_lock include/asm-generic/qspinlock.h:111 [inline] BUG: KASAN: user-memory-access in do_raw_spin_lock include/linux/spinlock.h:187 [inline] BUG: KASAN: user-memory-access in __raw_spin_lock include/linux/spinlock_api_smp.h:134 [inline] BUG: KASAN: user-memory-access in _raw_spin_lock+0x76/0xe0 kernel/locking/spinlock.c:154 Write of size 4 at addr 0000001dd87ee280 by task syz-executor365/1543 CPU: 2 PID: 1543 Comm: syz-executor365 Not tainted 6.5.0-syzk #1 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x83/0xb0 lib/dump_stack.c:106 print_report+0x3f8/0x620 mm/kasan/report.c:478 kasan_report+0xb0/0xe0 mm/kasan/report.c:588 check_region_inline mm/kasan/generic.c:181 [inline] kasan_check_range+0x139/0x1e0 mm/kasan/generic.c:187 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:1294 [inline] queued_spin_lock include/asm-generic/qspinlock.h:111 [inline] do_raw_spin_lock include/linux/spinlock.h:187 [inline] __raw_spin_lock include/linux/spinlock_api_smp.h:134 [inline] _raw_spin_lock+0x76/0xe0 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] xchk_stats_merge_one.isra.1+0x39/0x650 fs/xfs/scrub/stats.c:191 xchk_stats_merge+0x5f/0xe0 fs/xfs/scrub/stats.c:225 xfs_scrub_metadata+0x252/0x14e0 fs/xfs/scrub/scrub.c:599 xfs_ioc_scrub_metadata+0xc8/0x160 fs/xfs/xfs_ioctl.c:1646 xfs_file_ioctl+0x3fd/0x1870 fs/xfs/xfs_ioctl.c:1955 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl fs/ioctl.c:857 [inline] __x64_sys_ioctl+0x199/0x220 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3e/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7ff155af753d Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc006e2568 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff155af753d RDX: 00000000200000c0 RSI: 00000000c040583c RDI: 0000000000000003 RBP: 00000000ffffffff R08: 00000000004010c0 R09: 00000000004010c0 R10: 00000000004010c0 R11: 0000000000000246 R12: 0000000000400cb0 R13: 00007ffc006e2670 R14: 0000000000000000 R15: 0000000000000000 </TASK> The root cause here is that xchk_stats_merge_one walks off the end of the xchk_scrub_stats.cs_stats array because it has been fed a garbage value in sm->sm_type. That occurs because I put the xchk_stats_merge in the wrong place -- it should have been after the last xchk_teardown call on our way out of xfs_scrub_metadata because we only call the teardown function if we called the setup function, and we don't call the setup functions if the inputs are obviously garbage. Thanks to Harshit for triaging the bug reports and bringing this to my attention. Fixes: d7a74cad8f45 ("xfs: track usage statistics of online fsck") Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2023-09-12 10:31:08 -07:00
Darrick J. Wong	34389616a9	xfs: require a relatively recent V5 filesystem for LARP mode While reviewing the FIEXCHANGE code in XFS, I realized that the function that enables logged xattrs doesn't actually check that the superblock has a LOG_INCOMPAT feature bit field. Add a check to refuse the operation if we don't have a V5 filesystem... ...but on second though, let's require either reflink or rmap so that we only have to deal with LARP mode on relatively /modern/ kernel. 4.14 is about as far back as I feel like going. Seeing as LARP is a debugging-only option anyway, this isn't likely to affect any real users. Fixes: d9c61ccb3b09 ("xfs: move xfs_attr_use_log_assist out of xfs_log.c") Really-Fixes: f3f36c893f26 ("xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>	2023-09-12 10:31:08 -07:00
Darrick J. Wong	49813a21ed	xfs: make inode unlinked bucket recovery work with quotacheck Teach quotacheck to reload the unlinked inode lists when walking the inode table. This requires extra state handling, since it's possible that a reloaded inode will get inactivated before quotacheck tries to scan it; in this case, we need to ensure that the reloaded inode does not have dquots attached when it is freed. Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2023-09-12 10:31:07 -07:00
Darrick J. Wong	68b957f64f	xfs: load uncached unlinked inodes into memory on demand shrikanth hegde reports that filesystems fail shortly after mount with the following failure: WARNING: CPU: 56 PID: 12450 at fs/xfs/xfs_inode.c:1839 xfs_iunlink_lookup+0x58/0x80 [xfs] This of course is the WARN_ON_ONCE in xfs_iunlink_lookup: ip = radix_tree_lookup(&pag->pag_ici_root, agino); if (WARN_ON_ONCE(!ip \|\| !ip->i_ino)) { ... } From diagnostic data collected by the bug reporters, it would appear that we cleanly mounted a filesystem that contained unlinked inodes. Unlinked inodes are only processed as a final step of log recovery, which means that clean mounts do not process the unlinked list at all. Prior to the introduction of the incore unlinked lists, this wasn't a problem because the unlink code would (very expensively) traverse the entire ondisk metadata iunlink chain to keep things up to date. However, the incore unlinked list code complains when it realizes that it is out of sync with the ondisk metadata and shuts down the fs, which is bad. Ritesh proposed to solve this problem by unconditionally parsing the unlinked lists at mount time, but this imposes a mount time cost for every filesystem to catch something that should be very infrequent. Instead, let's target the places where we can encounter a next_unlinked pointer that refers to an inode that is not in cache, and load it into cache. Note: This patch does not address the problem of iget loading an inode from the middle of the iunlink list and needing to set i_prev_unlinked correctly. Reported-by: shrikanth hegde <sshegde@linux.vnet.ibm.com> Triaged-by: Ritesh Harjani <ritesh.list@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2023-09-12 10:31:07 -07:00

1 2 3 4 5 ...

8079 Commits