8087 Commits

Author SHA1 Message Date
Omar Sandoval
1b5d63963f xfs: return maximum free size from xfs_rtany_summary()
Instead of only returning whether there is any free space, return the
maximum size, which is fast thanks to the previous commit. This will be
used by two upcoming optimizations.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-19 08:34:33 -07:00
Omar Sandoval
e23aaf450d xfs: invert the realtime summary cache
In commit 355e3532132b ("xfs: cache minimum realtime summary level"), I
added a cache of the minimum level of the realtime summary that has any
free extents. However, it turns out that the _maximum_ level is more
useful for upcoming optimizations, and basically equivalent for the
existing usage. So, let's change the meaning of the cache to be the
maximum level + 1, or 0 if there are no free extents.

For example, if the cache contains:

{0, 4}

then there are no free extents starting in realtime bitmap block 0, and
there are no free extents larger than or equal to 2^4 blocks starting in
realtime bitmap block 1. The cache is a loose upper bound, so there may
or may not be free extents smaller than 2^4 blocks in realtime bitmap
block 1.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-19 08:34:33 -07:00
Darrick J. Wong
e2cf427c91 xfs: simplify rt bitmap/summary block accessor functions
Simplify the calling convention of these functions since the
xfs_rtalloc_args structure contains the parameters we need.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-19 08:33:42 -07:00
Darrick J. Wong
5b1d0ae975 xfs: simplify xfs_rtbuf_get calling conventions
Now that xfs_rtalloc_args holds references to the last-read bitmap and
summary blocks, we don't need to pass the buffer pointer out of
xfs_rtbuf_get.

Callers no longer have to xfs_trans_brelse on their own, though they are
required to call xfs_rtbuf_cache_relse before the xfs_rtalloc_args goes
out of scope.

While we're at it, create some trivial helpers so that we don't have to
remember if "0" means "bitmap" and "1" means "summary".

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-19 08:33:40 -07:00
Omar Sandoval
e94b53ff69 xfs: cache last bitmap block in realtime allocator
Profiling a workload on a highly fragmented realtime device showed a ton
of CPU cycles being spent in xfs_trans_read_buf() called by
xfs_rtbuf_get(). Further tracing showed that much of that was repeated
calls to xfs_rtbuf_get() for the same block of the realtime bitmap.
These come from xfs_rtallocate_extent_block(): as it walks through
ranges of free bits in the bitmap, each call to xfs_rtcheck_range() and
xfs_rtfind_{forw,back}() gets the same bitmap block. If the bitmap block
is very fragmented, then this is _a lot_ of buffer lookups.

The realtime allocator already passes around a cache of the last used
realtime summary block to avoid repeated reads (the parameters rbpp and
rsb). We can do the same for the realtime bitmap.

This replaces rbpp and rsb with a struct xfs_rtbuf_cache, which caches
the most recently used block for both the realtime bitmap and summary.
xfs_rtbuf_get() now handles the caching instead of the callers, which
requires plumbing xfs_rtbuf_cache to more functions but also makes sure
we don't miss anything.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-19 08:22:36 -07:00
Dave Chinner
41f33d82cf xfs: consolidate realtime allocation arguments
Consolidate the arguments passed around the rt allocator into a
struct xfs_rtalloc_arg similar to how the btree allocator arguments
are consolidated in a struct xfs_alloc_arg....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-18 16:53:00 -07:00
Darrick J. Wong
663b8db7b0 xfs: use accessor functions for summary info words
Create get and set functions for rtsummary words so that we can redefine
the ondisk format with a specific endianness.  Note that this requires
the definition of a distinct type for ondisk summary info words so that
the compiler can perform proper typechecking.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-18 16:53:00 -07:00
Darrick J. Wong
bd85af280d xfs: create helpers for rtsummary block/wordcount computations
Create helper functions that compute the number of blocks or words
necessary to store the rt summary file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-18 16:53:00 -07:00
Darrick J. Wong
97e993830a xfs: use accessor functions for bitmap words
Create get and set functions for rtbitmap words so that we can redefine
the ondisk format with a specific endianness.  Note that this requires
the definition of a distinct type for ondisk rtbitmap words so that the
compiler can perform proper typechecking as we go back and forth.

In the upcoming rtgroups feature, we're going to fix the problem that
rtwords are written in host endian order, which means we'll need the
distinct rtword/rtword_raw types.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-18 11:46:19 -07:00
Darrick J. Wong
312d61021b xfs: create a helper to handle logging parts of rt bitmap/summary blocks
Create an explicit helper function to log parts of rt bitmap and summary
blocks.  While we're at it, fix an off-by-one error in two of the
rtbitmap logging calls that led to unnecessarily large log items but was
otherwise benign.

Note that the upcoming rtgroups patchset will add block headers to the
rtbitmap and rtsummary files.  The helpers in this and the next few
patches take a less than direct route through xfs_rbmblock_wordptr and
xfs_rsumblock_infoptr to avoid helper churn in that patchset.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-18 10:58:58 -07:00
Darrick J. Wong
d0448fe76a xfs: create helpers for rtbitmap block/wordcount computations
Create helper functions that compute the number of blocks or words
necessary to store the rt bitmap.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-18 10:58:58 -07:00
Darrick J. Wong
097b4b7b64 xfs: convert rt summary macros to helpers
Convert the realtime summary file macros to helper functions so that we
can improve type checking.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 17:45:38 -07:00
Darrick J. Wong
a994862684 xfs: convert open-coded xfs_rtword_t pointer accesses to helper
There are a bunch of places where we use open-coded logic to find a
pointer to an xfs_rtword_t within a rt bitmap buffer.  Convert all that
to helper functions for better type safety.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 17:45:38 -07:00
Darrick J. Wong
add3cddaea xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros
Remove these trivial macros since they're not even part of the ondisk
format.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 17:45:37 -07:00
Darrick J. Wong
90d98a6ada xfs: convert the rtbitmap block and bit macros to static inline functions
Replace these macros with typechecked helper functions.  Eventually
we're going to add more logic to the helpers and it'll be easier if we
don't have to macro it up.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:26:25 -07:00
Darrick J. Wong
ef5a83b7e5 xfs: use shifting and masking when converting rt extents, if possible
Avoid the costs of integer division (32-bit and 64-bit) if the realtime
extent size is a power of two.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:26:25 -07:00
Darrick J. Wong
5f57f7309d xfs: create rt extent rounding helpers for realtime extent blocks
Create a pair of functions to round rtblock numbers up or down to the
nearest rt extent.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:26:25 -07:00
Darrick J. Wong
055641248f xfs: convert do_div calls to xfs_rtb_to_rtx helper calls
Convert these calls to use the helpers, and clean up all these places
where the same variable can have different units depending on where it
is in the function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:25:55 -07:00
Darrick J. Wong
5dc3a80d46 xfs: create helpers to convert rt block numbers to rt extent numbers
Create helpers to do unit conversions of rt block numbers to rt extent
numbers.  There are three variations -- one to compute the rt extent
number from an rt block number; one to compute the offset of an rt block
within an rt extent; and one to extract both.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:24:22 -07:00
Darrick J. Wong
2c2b981b73 xfs: create a helper to convert extlen to rtextlen
Create a helper to compute the realtime extent (xfs_rtxlen_t) from an
extent length (xfs_extlen_t) value.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:24:22 -07:00
Darrick J. Wong
68db60bf01 xfs: create a helper to compute leftovers of realtime extents
Create a helper to compute the misalignment between a file extent
(xfs_extlen_t) and a realtime extent.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:24:22 -07:00
Darrick J. Wong
fa5a387230 xfs: create a helper to convert rtextents to rtblocks
Create a helper to convert a realtime extent to a realtime block.  Later
on we'll change the helper to use bit shifts when possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:24:22 -07:00
Darrick J. Wong
2d5f216b77 xfs: convert rt extent numbers to xfs_rtxnum_t
Further disambiguate the xfs_rtblock_t uses by creating a new type,
xfs_rtxnum_t, to store the position of an extent within the realtime
section, in units of rtextents.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:24:22 -07:00
Darrick J. Wong
3d2b6d034f xfs: rename xfs_verify_rtext to xfs_verify_rtbext
This helper function validates that a range of *blocks* in the
realtime section is completely contained within the realtime section.
It does /not/ validate ranges of *rtextents*.  Rename the function to
avoid suggesting that it does, and change the type of the @len parameter
since xfs_rtblock_t is a position unit, not a length unit.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:24:22 -07:00
Darrick J. Wong
f29c3e745d xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t
XFS uses xfs_rtblock_t for many different uses, which makes it much more
difficult to perform a unit analysis on the codebase.  One of these
(ab)uses is when we need to store the length of a free space extent as
stored in the realtime bitmap.  Because there can be up to 2^64 realtime
extents in a filesystem, we need a new type that is larger than
xfs_rtxlen_t for callers that are querying the bitmap directly.  This
means scrub and growfs.

Create this type as "xfs_rtbxlen_t" and use it to store 64-bit rtx
lengths.  'b' stands for 'bitmap' or 'big'; reader's choice.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:24:22 -07:00
Darrick J. Wong
03f4de332e xfs: convert rt bitmap/summary block numbers to xfs_fileoff_t
We should use xfs_fileoff_t to store the file block offset of any
location within the realtime bitmap or summary files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:24:22 -07:00
Darrick J. Wong
a684c538bc xfs: convert xfs_extlen_t to xfs_rtxlen_t in the rt allocator
In most of the filesystem, we use xfs_extlen_t to store the length of a
file (or AG) space mapping in units of fs blocks.  Unfortunately, the
realtime allocator also uses it to store the length of a rt space
mapping in units of rt extents.  This is confusing, since one rt extent
can consist of many fs blocks.

Separate the two by introducing a new type (xfs_rtxlen_t) to store the
length of a space mapping (in units of realtime extents) that would be
found in a file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:24:22 -07:00
Darrick J. Wong
13928113fc xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h
Move all the declarations for functionality in xfs_rtbitmap.c into a
separate xfs_rtbitmap.h header file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:24:22 -07:00
Darrick J. Wong
f6a2dae2a1 xfs: make sure maxlen is still congruent with prod when rounding down
In commit 2a6ca4baed62, we tried to fix an overflow problem in the
realtime allocator that was caused by an overly large maxlen value
causing xfs_rtcheck_range to run off the end of the realtime bitmap.
Unfortunately, there is a subtle bug here -- maxlen (and minlen) both
have to be aligned with @prod, but @prod can be larger than 1 if the
user has set an extent size hint on the file, and that extent size hint
is larger than the realtime extent size.

If the rt free space extents are not aligned to this file's extszhint
because other files without extent size hints allocated space (or the
number of rt extents is similarly not aligned), then it's possible that
maxlen after clamping to sb_rextents will no longer be aligned to prod.
The allocation will succeed just fine, but we still trip the assertion.

Fix the problem by reducing maxlen by any misalignment with prod.  While
we're at it, split the assertions into two so that we can tell which
value had the bad alignment.

Fixes: 2a6ca4baed62 ("xfs: make sure the rt allocator doesn't run off the end")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:24:22 -07:00
Darrick J. Wong
ddd98076d5 xfs: fix units conversion error in xfs_bmap_del_extent_delay
The unit conversions in this function do not make sense.  First we
convert a block count to bytes, then divide that bytes value by
rextsize, which is in blocks, to get an rt extent count.  You can't
divide bytes by blocks to get a (possibly multiblock) extent value.

Fortunately nobody uses delalloc on the rt volume so this hasn't
mattered.

Fixes: fa5c836ca8eb5 ("xfs: refactor xfs_bunmapi_cow")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:23:49 -07:00
Darrick J. Wong
c2988eb5cf xfs: rt stubs should return negative errnos when rt disabled
When realtime support is not compiled into the kernel, these functions
should return negative errnos, not positive errnos.  While we're at it,
fix a broken macro declaration.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:22:40 -07:00
Darrick J. Wong
b73494fa9a xfs: prevent rt growfs when quota is enabled
Quotas aren't (yet) supported with realtime, so we shouldn't allow
userspace to set up a realtime section when quotas are enabled, even if
they attached one via mount options.  IOWS, you shouldn't be able to do:

# mkfs.xfs -f /dev/sda
# mount /dev/sda /mnt -o rtdev=/dev/sdb,usrquota
# xfs_growfs -r /mnt

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 16:22:40 -07:00
Darrick J. Wong
6c66448433 xfs: hoist freeing of rt data fork extent mappings
Currently, xfs_bmap_del_extent_real contains a bunch of code to convert
the physical extent of a data fork mapping for a realtime file into rt
extents and pass that to the rt extent freeing function.  Since the
details of this aren't needed when CONFIG_XFS_REALTIME=n, move it to
xfs_rtbitmap.c to reduce code size when realtime isn't enabled.

This will (one day) enable realtime EFIs to reuse the same
unit-converting call with less code duplication.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 08:40:54 -07:00
Darrick J. Wong
9488062805 xfs: bump max fsgeom struct version
The latest version of the fs geometry structure is v5.  Bump this
constant so that xfs_db and mkfs calls to libxfs_fs_geometry will fill
out all the fields.

IOWs, this commit is a no-op for the kernel, but will be useful for
userspace reporting in later changes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17 08:40:54 -07:00
Jeff Layton
cbc06310c3 xfs: reinstate the old i_version counter as STATX_CHANGE_COOKIE
The handling of STATX_CHANGE_COOKIE was moved into generic_fillattr in
commit 0d72b92883c6 (fs: pass the request_mask to generic_fillattr), but
we didn't account for the fact that xfs doesn't call generic_fillattr at
all.

Make XFS report its i_version as the STATX_CHANGE_COOKIE.

Fixes: 0d72b92883c6 (fs: pass the request_mask to generic_fillattr)
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-10-12 10:17:03 +05:30
Jiapeng Chong
f93b930030 xfs: Remove duplicate include
./fs/xfs/scrub/xfile.c: xfs_format.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6209
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-10-12 10:14:45 +05:30
Shiyang Ruan
3c90c01e49 xfs: correct calculation for agend and blockcount
The agend should be "start + length - 1", then, blockcount should be
"end + 1 - start".  Correct 2 calculation mistakes.

Also, rename "agend" to "range_agend" because it's not the end of the AG
per se; it's the end of the dead region within an AG's agblock space.

Fixes: 5cf32f63b0f4 ("xfs: fix the calculation for "end" and "length"")
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-10-12 10:11:56 +05:30
Darrick J. Wong
442177be8c xfs: process free extents to busy list in FIFO order
When we're adding extents to the busy discard list, add them to the tail
of the list so that we get FIFO order.  For FITRIM commands, this means
that we send discard bios sorted in order from longest to shortest, like
we did before commit 89cfa899608fc.

For transactions that are freeing extents, this puts them in the
transaction's busy list in FIFO order as well, which shouldn't make any
noticeable difference.

Fixes: 89cfa899608fc ("xfs: reduce AGF hold times during fstrim operations")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-11 12:35:21 -07:00
Darrick J. Wong
6868b8505c xfs: adjust the incore perag block_count when shrinking
If we reduce the number of blocks in an AG, we must update the incore
geometry values as well.

Fixes: 0800169e3e2c9 ("xfs: Pre-calculate per-AG agbno geometry")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-11 12:35:20 -07:00
Dave Chinner
e78a40b851 xfs: abort fstrim if kernel is suspending
A recent ext4 patch posting from Jan Kara reminded me of a
discussion a year ago about fstrim in progress preventing kernels
from suspending. The fix is simple, we should do the same for XFS.

This removes the -ERESTARTSYS error return from this code, replacing
it with either the last error seen or the number of blocks
successfully trimmed up to the point where we detected the stop
condition.

References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-10-04 09:25:04 +11:00
Dave Chinner
89cfa89960 xfs: reduce AGF hold times during fstrim operations
fstrim will hold the AGF lock for as long as it takes to walk and
discard all the free space in the AG that meets the userspace trim
criteria. For AGs with lots of free space extents (e.g. millions)
or the underlying device is really slow at processing discard
requests (e.g. Ceph RBD), this means the AGF hold time is often
measured in minutes to hours, not a few milliseconds as we normal
see with non-discard based operations.

This can result in the entire filesystem hanging whilst the
long-running fstrim is in progress. We can have transactions get
stuck waiting for the AGF lock (data or metadata extent allocation
and freeing), and then more transactions get stuck waiting on the
locks those transactions hold. We can get to the point where fstrim
blocks an extent allocation or free operation long enough that it
ends up pinning the tail of the log and the log then runs out of
space. At this point, every modification in the filesystem gets
blocked. This includes read operations, if atime updates need to be
made.

To fix this problem, we need to be able to discard free space
extents safely without holding the AGF lock. Fortunately, we already
do this with online discard via busy extents. We can mark free space
extents as "busy being discarded" under the AGF lock and then unlock
the AGF, knowing that nobody will be able to allocate that free
space extent until we remove it from the busy tree.

Modify xfs_trim_extents to use the same asynchronous discard
mechanism backed by busy extents as is used with online discard.
This results in the AGF only needing to be held for short periods of
time and it is never held while we issue discards. Hence if discard
submission gets throttled because it is slow and/or there are lots
of them, we aren't preventing other operations from being performed
on AGF while we wait for discards to complete...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-10-04 09:24:52 +11:00
Dave Chinner
428c4435b0 xfs: move log discard work to xfs_discard.c
Because we are going to use the same list-based discard submission
interface for fstrim-based discards, too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-10-04 09:24:02 +11:00
Darrick J. Wong
537c013b14 xfs: fix reloading entire unlinked bucket lists
During review of the patcheset that provided reloading of the incore
iunlink list, Dave made a few suggestions, and I updated the copy in my
dev tree.  Unfortunately, I then got distracted by ... who even knows
what ... and forgot to backport those changes from my dev tree to my
release candidate branch.  I then sent multiple pull requests with stale
patches, and that's what was merged into -rc3.

So.

This patch re-adds the use of an unlocked iunlink list check to
determine if we want to allocate the resources to recreate the incore
list.  Since lost iunlinked inodes are supposed to be rare, this change
helps us avoid paying the transaction and AGF locking costs every time
we open any inode.

This also re-adds the shutdowns on failure, and re-applies the
restructuring of the inner loop in xfs_inode_reload_unlinked_bucket, and
re-adds a requested comment about the quotachecking code.

Retain the original RVB tag from Dave since there's no code change from
the last submission.

Fixes: 68b957f64fca1 ("xfs: load uncached unlinked inodes into memory on demand")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-09-24 18:12:13 -07:00
Linus Torvalds
3abc79dce6 Bug fixes for 6.6-rc3:
* Fix an integer overflow bug when processing an fsmap call.
 
  * Fix crash due to CPU hot remove event racing with filesystem mount
    operation.
 
  * During read-only mount, XFS does not allow the contents of the log to be
    recovered when there are one or more unrecognized rcompat features in the
    primary superblock, since the log might have intent items which the kernel
    does not know how to process.
 
  * During recovery of log intent items, XFS now reserves log space sufficient
    for one cycle of a permanent transaction to execute. Otherwise, this could
    lead to livelocks due to non-availability of log space.
 
  * On an fs which has an ondisk unlinked inode list, trying to delete a file
    or allocating an O_TMPFILE file can cause the fs to the shutdown if the
    first inode in the ondisk inode list is not present in the inode cache.
    The bug is solved by explicitly loading the first inode in the ondisk
    unlinked inode list into the inode cache if it is not already cached.
 
    A similar problem arises when the uncached inode is present in the middle
    of the ondisk unlinked inode list. This second bug is triggered when
    executing operations like quotacheck and bulkstat. In this case, XFS now
    reads in the entire ondisk unlinked inode list.
 
  * Enable LARP mode only on recent v5 filesystems.
 
  * Fix a out of bounds memory access in scrub.
 
  * Fix a performance bug when locating the tail of the log during mounting a
    filesystem.
 
 Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZQkx4QAKCRAH7y4RirJu
 9HrTAQD6QhvHkS43vueGOb4WISZPG/jMKJ/FjvwLZrIZ0erbJwEAtRWhClwFv3NZ
 exJFtsmxrKC6Vifuo0pvfoCiK5mUvQ8=
 =SrJR
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Chandan Babu:

 - Fix an integer overflow bug when processing an fsmap call

 - Fix crash due to CPU hot remove event racing with filesystem mount
   operation

 - During read-only mount, XFS does not allow the contents of the log to
   be recovered when there are one or more unrecognized rcompat features
   in the primary superblock, since the log might have intent items
   which the kernel does not know how to process

 - During recovery of log intent items, XFS now reserves log space
   sufficient for one cycle of a permanent transaction to execute.
   Otherwise, this could lead to livelocks due to non-availability of
   log space

 - On an fs which has an ondisk unlinked inode list, trying to delete a
   file or allocating an O_TMPFILE file can cause the fs to the shutdown
   if the first inode in the ondisk inode list is not present in the
   inode cache. The bug is solved by explicitly loading the first inode
   in the ondisk unlinked inode list into the inode cache if it is not
   already cached

   A similar problem arises when the uncached inode is present in the
   middle of the ondisk unlinked inode list. This second bug is
   triggered when executing operations like quotacheck and bulkstat. In
   this case, XFS now reads in the entire ondisk unlinked inode list

 - Enable LARP mode only on recent v5 filesystems

 - Fix a out of bounds memory access in scrub

 - Fix a performance bug when locating the tail of the log during
   mounting a filesystem

* tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail
  xfs: only call xchk_stats_merge after validating scrub inputs
  xfs: require a relatively recent V5 filesystem for LARP mode
  xfs: make inode unlinked bucket recovery work with quotacheck
  xfs: load uncached unlinked inodes into memory on demand
  xfs: reserve less log space when recovering log intent items
  xfs: fix log recovery when unknown rocompat bits are set
  xfs: reload entire unlinked bucket lists
  xfs: allow inode inactivation during a ro mount log recovery
  xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
  xfs: remove CPU hotplug infrastructure
  xfs: remove the all-mounts list
  xfs: use per-mount cpumask to track nonempty percpu inodegc lists
  xfs: fix an agbno overflow in __xfs_getfsmap_datadev
  xfs: fix per-cpu CIL structure aggregation racing with dying cpus
  xfs: fix select in config XFS_ONLINE_SCRUB_STATS
2023-09-22 16:32:19 -07:00
Christian Brauner
f798accd59
Revert "xfs: switch to multigrain timestamps"
This reverts commit e44df2664746aed8b6dd5245eb711a0ce33c5cf5.

Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.

Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-20 18:05:31 +02:00
Wang Jianchao
8b010acb31 xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail
In our production environment, we find that mounting a 500M /boot
which is umount cleanly needs ~6s. One cause is that ffs() is
used by xlog_write_log_records() to decide the buffer size. It
can cause a lot of small IO easily when xlog_clear_stale_blocks()
needs to wrap around the end of log area and log head block is
not power of two. Things are similar in xlog_find_verify_cycle().

The code is able to handed bigger buffer very well, we can use
roundup_pow_of_two() to replace ffs() directly to avoid small
and sychronous IOs.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Wang Jianchao <wangjc136@midea.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-09-13 10:38:20 +05:30
Chandan Babu R
1155b12edb xfs: fix out of bounds memory access in scrub
This is a quick fix for a few internal syzbot reports concerning an
 invalid memory access in the scrub code.
 
 This has been lightly tested with fstests.  Enjoy!
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZQChOgAKCRBKO3ySh0YR
 pkKbAQCKg0+VAqr2UuKT7PygRSUaLNybnMBHetDZyd1maEl7OQD7BGuM9AxwXWFp
 hL0Jq/HN5yeArrueGKMd0K3u1HRjJQE=
 =XwHc
 -----END PGP SIGNATURE-----

Merge tag 'fix-scrub-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA

xfs: fix out of bounds memory access in scrub

This is a quick fix for a few internal syzbot reports concerning an
invalid memory access in the scrub code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

* tag 'fix-scrub-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: only call xchk_stats_merge after validating scrub inputs
2023-09-13 10:35:49 +05:30
Chandan Babu R
6ebb6500e5 xfs: disallow LARP on old fses
Before enabling logged xattrs, make sure the filesystem is new enough
 that it actually supports log incompat features.
 
 This has been lightly tested with fstests.  Enjoy!
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZQChOQAKCRBKO3ySh0YR
 pqc1AQD8hXUpatOY50TdRDI6qpKBWOEti7r+sXyq9bWM4QZFyAD/Zjx3aZ+R2u2g
 lsb1xLjekrh2DzToOFnvs4gd/nZd7Qw=
 =BxHQ
 -----END PGP SIGNATURE-----

Merge tag 'fix-larp-requirements-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA

xfs: disallow LARP on old fses

Before enabling logged xattrs, make sure the filesystem is new enough
that it actually supports log incompat features.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

* tag 'fix-larp-requirements-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: require a relatively recent V5 filesystem for LARP mode
2023-09-13 10:33:27 +05:30
Chandan Babu R
abf7c8194f xfs: reload entire iunlink lists
This is the second part of correcting XFS to reload the incore unlinked
 inode list from the ondisk contents.  Whereas part one tackled failures
 from regular filesystem calls, this part takes on the problem of needing
 to reload the entire incore unlinked inode list on account of somebody
 loading an inode that's in the /middle/ of an unlinked list.  This
 happens during quotacheck, bulkstat, or even opening a file by handle.
 
 In this case we don't know the length of the list that we're reloading,
 so we don't want to create a new unbounded memory load while holding
 resources locked.  Instead, we'll target UNTRUSTED iget calls to reload
 the entire bucket.
 
 Note that this changes the definition of the incore unlinked inode list
 slightly -- i_prev_unlinked == 0 now means "not on the incore list".
 
 This has been lightly tested with fstests.  Enjoy!
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZQChOQAKCRBKO3ySh0YR
 ptU6AP48lONiOPzWvF1mXTnDosAtIDsfliMY+qVgVxrghqBFmwEAitHlOadpWonu
 yoQ3cnSqzfA4rKT5MQZCm2iIHH/LMgU=
 =Kp5V
 -----END PGP SIGNATURE-----

Merge tag 'fix-iunlink-list-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA

xfs: reload entire iunlink lists

This is the second part of correcting XFS to reload the incore unlinked
inode list from the ondisk contents.  Whereas part one tackled failures
from regular filesystem calls, this part takes on the problem of needing
to reload the entire incore unlinked inode list on account of somebody
loading an inode that's in the /middle/ of an unlinked list.  This
happens during quotacheck, bulkstat, or even opening a file by handle.

In this case we don't know the length of the list that we're reloading,
so we don't want to create a new unbounded memory load while holding
resources locked.  Instead, we'll target UNTRUSTED iget calls to reload
the entire bucket.

Note that this changes the definition of the incore unlinked inode list
slightly -- i_prev_unlinked == 0 now means "not on the incore list".

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

* tag 'fix-iunlink-list-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: make inode unlinked bucket recovery work with quotacheck
  xfs: reload entire unlinked bucket lists
  xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
2023-09-13 10:30:12 +05:30
Chandan Babu R
fffcdcc31f xfs: reload the last iunlink item
It turns out that there are some serious bugs in how xfs handles the
 unlinked inode lists.  Way back before 4.14, there was a bug where a ro
 mount of a dirty filesystem would recover the log bug neglect to purge
 the unlinked list.  This leads to clean unmounted filesystems with
 unlinked inodes.  Starting around 5.15, we also converted the codebase
 to maintain a doubly-linked incore unlinked list.  However, we never
 provided the ability to load the incore list from disk.  If someone
 tries to allocate an O_TMPFILE file on a clean fs with a pre-existing
 unlinked list or even deletes a file, the code will fail and the fs
 shuts down.
 
 This first part of the correction effort adds the ability to load the
 first inode in the bucket when unlinking a file; and to load the next
 inode in the list when inactivating (freeing) an inode.
 
 This has been lightly tested with fstests.  Enjoy!
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZQChOQAKCRBKO3ySh0YR
 plJvAQC0s843w2nvluXlIE8P9nBqk2ht6zwNOJpiZbWnf0zeLAD/a6v0HVVLbGN5
 qHVd/abQ5QIW55Ybm3Qko6PKvV4Nlgo=
 =WcRN
 -----END PGP SIGNATURE-----

Merge tag 'fix-iunlink-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.6-fixesA

xfs: reload the last iunlink item

It turns out that there are some serious bugs in how xfs handles the
unlinked inode lists.  Way back before 4.14, there was a bug where a ro
mount of a dirty filesystem would recover the log bug neglect to purge
the unlinked list.  This leads to clean unmounted filesystems with
unlinked inodes.  Starting around 5.15, we also converted the codebase
to maintain a doubly-linked incore unlinked list.  However, we never
provided the ability to load the incore list from disk.  If someone
tries to allocate an O_TMPFILE file on a clean fs with a pre-existing
unlinked list or even deletes a file, the code will fail and the fs
shuts down.

This first part of the correction effort adds the ability to load the
first inode in the bucket when unlinking a file; and to load the next
inode in the list when inactivating (freeing) an inode.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

* tag 'fix-iunlink-6.6_2023-09-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: load uncached unlinked inodes into memory on demand
2023-09-13 10:25:00 +05:30