IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The statistics offset is completely static, move it into the btree_ops
structure instead of the cursor.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Move the btree buffer LRU refcount to the btree ops structure so that we
can eliminate the last bc_btnum switch in the generic btree code. We're
about to create repair-specific btree types, and we don't want that
stuff cluttering up libxfs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Set the btree block buffer ops in xfs_btree_init_buf since we already
have access to that information through the btree ops.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Now that all of the callers pass XFS_BUF_DADDR_NULL as the daddr
parameter, we can elide that too.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Convert any place we call xfs_btree_init_block with a buffer to use the
_init_buf function.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Rename xfs_btree_init_block_int to xfs_btree_init_block, and
xfs_btree_init_block to xfs_btree_init_buf so that the name suggests the
type that caller are supposed to pass in.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Notice now that the btree ops structure encodes btree geometry flags and
the magic number through the buffer ops. Refactor the btree block
initialization functions to use the btree ops so that we no longer have
to open code all that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Expose these static btree ops structures so that we can reference them
in the AG initialization code in the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Add a new XFS_BTREE_ALLOCBT_ACTIVE flag to replace the active field.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Add a single xfs_alloc_lookup helper to sort out the argument passing and
setting of the active flag instead of duplicating the logic three times.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Just move the two flags into bc_flags where there is plenty of space.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Certain btree flags never change for the life of a btree cursor because
they describe the geometry of the btree itself. Encode these in the
btree ops structure and reduce the amount of code required in each btree
type's init_cursor functions. This also frees up most of the bits in
bc_flags.
A previous version of this patch also converted the open-coded flags
logic to helpers. This was removed due to the pending refactoring (that
follows this patch) to eliminate most of the state flags.
Conversion script:
sed \
-e 's/XFS_BTREE_LONG_PTRS/XFS_BTGEO_LONG_PTRS/g' \
-e 's/XFS_BTREE_ROOT_IN_INODE/XFS_BTGEO_ROOT_IN_INODE/g' \
-e 's/XFS_BTREE_LASTREC_UPDATE/XFS_BTGEO_LASTREC_UPDATE/g' \
-e 's/XFS_BTREE_OVERLAPPING/XFS_BTGEO_OVERLAPPING/g' \
-e 's/cur->bc_flags & XFS_BTGEO_/cur->bc_ops->geom_flags \& XFS_BTGEO_/g' \
-i $(git ls-files fs/xfs/*.[ch] fs/xfs/libxfs/*.[ch] fs/xfs/scrub/*.[ch])
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
A reviewer was confused by the init_sa logic in this function. Upon
checking the logic, I discovered that the code is imprecise. What we
want to do here is check that there is an ownership record in the rmap
btree for the AG that contains a btree block.
For an inode-rooted btree (e.g. the bmbt) the per-AG btree cursors have
not been initialized because inode btrees can span multiple AGs.
Therefore, we must initialize the per-AG btree cursors in sc->sa before
proceeding. That is what init_sa controls, and hence the logic should
be gated on XFS_BTREE_ROOT_IN_INODE, not XFS_BTREE_LONG_PTRS.
In practice, ROOT_IN_INODE and LONG_PTRS are coincident so this hasn't
mattered. However, we're about to refactor both of those flags into
separate btree_ops fields so we want this the logic to make sense
afterwards.
Fixes: 858333dcf0 ("xfs: check btree block ownership with bnobt/rmapbt when scrubbing btree")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
All existing btree types set XFS_BTREE_CRC_BLOCKS when running against a
V5 filesystem. All currently proposed btree types are V5 only and use
the richer XFS_BTREE_CRC_BLOCKS format. Therefore, we can drop this
flag and change the conditional to xfs_has_crc.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
This is a precursor to putting more static data in the btree ops structure.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Don't waste tracepoint segment memory on per-btree block allocation
tracepoints when we can do it from the generic btree code.
With this patch applied, two tracepoints are collapsed into one
tracepoint, with the following effects on objdump -hx xfs.ko output:
Before:
10 __tracepoints_ptrs 00000b38 0000000000000000 0000000000000000 001412f0 2**2
14 __tracepoints_strings 00005433 0000000000000000 0000000000000000 001689a0 2**5
29 __tracepoints 00010d30 0000000000000000 0000000000000000 0023fe00 2**5
After:
10 __tracepoints_ptrs 00000b34 0000000000000000 0000000000000000 001417b0 2**2
14 __tracepoints_strings 00005413 0000000000000000 0000000000000000 00168e80 2**5
29 __tracepoints 00010cd0 0000000000000000 0000000000000000 00240760 2**5
Column 3 is the section size in bytes; removing these two tracepoints
reduces the size of the ELF segments by 132 bytes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Don't waste memory on extra per-btree block freeing tracepoints when we
can do it from the generic btree code.
With this patch applied, two tracepoints are collapsed into one
tracepoint, with the following effects on objdump -hx xfs.ko output:
Before:
10 __tracepoints_ptrs 00000b3c 0000000000000000 0000000000000000 00140eb0 2**2
14 __tracepoints_strings 00005453 0000000000000000 0000000000000000 00168540 2**5
29 __tracepoints 00010d90 0000000000000000 0000000000000000 0023f5e0 2**5
After:
10 __tracepoints_ptrs 00000b38 0000000000000000 0000000000000000 001412f0 2**2
14 __tracepoints_strings 00005433 0000000000000000 0000000000000000 001689a0 2**5
29 __tracepoints 00010d30 0000000000000000 0000000000000000 0023fe00 2**5
Column 3 is the section size in bytes; removing these two tracepoints
reduces the size of the ELF segments by 132 bytes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Use the same summary counter calculation infrastructure to generate new
values for the in-core summary counters. The difference between the
scrubber and the repairer is that the repairer will freeze the fs during
setup, which means that the values should match exactly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
If scrub finds that everything is ok with the filesystem, we need a way
to tell the health tracking that it can let go of indirect health flags,
since indirect flags only mean that at some point in the past we lost
some context.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
If an unhealthy inode gets inactivated, remember this fact in the
per-fs health summary.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Establish two more classes of health tracking bits:
* Indirect problems, which suggest problems in other health domains
that we weren't able to preserve.
* Secondary problems, which track state that's related to primary
evidence of health problems; and
The first class we'll use in an upcoming patch to record in the AG
health status the fact that we ran out of memory and had to inactivate
an inode with defective metadata. The second class we use to indicate
that repair knows that an inode is bad and we need to fix it later.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Whenever we encounter XFS_IS_CORRUPT failures, we should report that to
the health monitoring system for later reporting.
I started with this semantic patch and massaged everything until it
built:
@@
expression mp, test;
@@
- if (XFS_IS_CORRUPT(mp, test)) return -EFSCORRUPTED;
+ if (XFS_IS_CORRUPT(mp, test)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; }
@@
expression mp, test;
identifier label, error;
@@
- if (XFS_IS_CORRUPT(mp, test)) { error = -EFSCORRUPTED; goto label; }
+ if (XFS_IS_CORRUPT(mp, test)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto label; }
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Whenever we encounter corrupt realtime metadat blocks, we should report
that to the health monitoring system for later reporting.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Whenever we encounter corrupt quota blocks, we should report that to the
health monitoring system for later reporting.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Whenever we encounter corrupt inode records, we should report that to
the health monitoring system for later reporting.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Whenever we encounter corrupt symbolic link blocks, we should report
that to the health monitoring system for later reporting.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Whenever we encounter corrupt directory or extended attribute blocks, we
should report that to the health monitoring system for later reporting.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Whenever we encounter corrupt btree blocks, we should report that to the
health monitoring system for later reporting.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Whenever we encounter a corrupt block mapping, we should report that to
the health monitoring system for later reporting.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Whenever we encounter a corrupt AG header, we should report that to the
health monitoring system for later reporting. Buffer readers that don't
respond to corruption events with a _mark_sick call can be detected with
the following script:
#!/bin/bash
# Detect missing calls to xfs_*_mark_sick
filter=cat
tty -s && filter=less
git grep -A10 -E '( = xfs_trans_read_buf| = xfs_buf_read\()' fs/xfs/*.[ch] fs/xfs/libxfs/*.[ch] | awk '
BEGIN {
ignore = 0;
lineno = 0;
delete lines;
}
{
if ($0 == "--") {
if (!ignore) {
for (i = 0; i < lineno; i++) {
print(lines[i]);
}
printf("--\n");
}
delete lines;
lineno = 0;
ignore = 0;
} else if ($0 ~ /mark_sick/) {
ignore = 1;
} else {
lines[lineno++] = $0;
}
}
' | $filter
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Split the setting of the sick and checked masks into separate functions
as part of preparing to add the ability for regular runtime fs code
(i.e. not scrub) to mark metadata structures sick when corruptions are
found. Improve the documentation of libxfs' requirements for helper
behavior.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fix the file link counts since we just computed the correct ones.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Create the necessary hooks in the directory operations
(create/link/unlink/rename) code so that our live nlink scrub code can
stay up to date with link count updates in the rest of the filesystem.
This will be the means to keep our shadow link count information up to
date while the scan runs in real time.
In online fsck part 2, we'll use these same hooks to handle repairs
to directories and parent pointer information.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Create the necessary scrub code to walk the filesystem's directory tree
so that we can compute file link counts. Similar to quotacheck, we
create an incore shadow array of link count information and then we walk
the filesystem a second time to compare the link counts. We need live
updates to keep the information up to date during the lengthy scan, so
this scrubber remains disabled until the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Use the shadow quota counters that live quotacheck creates to reset the
incore dquot counters.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
While running xfs/804 (quota repairs racing with fsstress), I observed a
filesystem shutdown in the primary sb write verifier:
run fstests xfs/804 at 2022-05-23 18:43:48
XFS (sda4): Mounting V5 Filesystem
XFS (sda4): Ending clean mount
XFS (sda4): Quotacheck needed: Please wait.
XFS (sda4): Quotacheck: Done.
XFS (sda4): EXPERIMENTAL online scrub feature in use. Use at your own risk!
XFS (sda4): SB ifree sanity check failed 0xb5 > 0x80
XFS (sda4): Metadata corruption detected at xfs_sb_write_verify+0x5e/0x100 [xfs], xfs_sb block 0x0
XFS (sda4): Unmount and run xfs_repair
The "SB ifree sanity check failed" message was a debugging printk that I
added to the kernel; observe that 0xb5 - 0x80 = 53, which is less than
one inode chunk.
I traced this to the xfs_log_sb calls from the online quota repair code,
which tries to clear the CHKD flags from the superblock to force a
mount-time quotacheck if the repair fails. On a V5 filesystem,
xfs_log_sb updates the ondisk sb summary counters with the current
contents of the percpu counters. This is done without quiescing other
writer threads, which means it could be racing with a thread that has
updated icount and is about to update ifree.
If the other write thread had incremented ifree before updating icount,
the repair thread will write icount > ifree into the logged update. If
the AIL writes the logged superblock back to disk before anyone else
fixes this siutation, this will lead to a write verifier failure, which
causes a filesystem shutdown.
Resolve this problem by updating the quota flags and calling
xfs_sb_to_disk directly, which does not touch the percpu counters.
While we're at it, we can elide the entire update if the selected qflags
aren't set.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Create a shadow dqtrx system in the quotacheck code that hooks the
regular dquot counter update code. This will be the means to keep our
copy of the dquot counters up to date while the scan runs in real time.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Create a new trio of scrub functions to check quota counters. While the
dquots themselves are filesystem metadata and should be checked early,
the dquot counter values are computed from other metadata and are
therefore summary counters. We don't plug these into the scrub dispatch
just yet, because we still need to be able to watch quota updates while
doing our scan.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Create a new method to load an xfarray element from the xfile, but with
a twist. If we've never stored to the array index, zero the caller's
buffer. This will facilitate RMWs updates of records in a sparse array
without fuss, since the sparse xfarray convention is that uninitialized
array elements default to zeroes.
This is a separate patch to reduce the size of the upcoming quotacheck
patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Create a helper to compute the number of blocks that a file has
allocated from the data realtime volumes. This patch was
split out to reduce the size of the upcoming quotacheck patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Create a helper to initialize empty transactions on behalf of a scrub
operation.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Repair might encounter an inode with a totally garbage i_mode. To fix
this problem, we have to figure out if the file was a regular file, a
directory, or a special file. One way to figure this out is to check if
there are any directories with entries pointing down to the busted file.
This patch recovers the file mode by scanning every directory entry on
the filesystem to see if there are any that point to the busted file.
If the ftype of all such dirents are consistent, the mode is recovered
from the ftype. If no dirents are found, the file becomes a regular
file. In all cases, ACLs are canceled and the file is made accessible
only by root.
A previous patch attempted to guess the mode by reading the beginning of
the file data. This was rejected by Christoph on the grounds that we
cannot trust user-controlled data blocks. Users do not have direct
control over the ondisk contents of directory entries, so this method
should be much safer.
If all the dirents have the same ftype, then we can translate that back
into an S_IFMT flag and fix the file. If not, reset the mode to
S_IFREG.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Create the XFS_DIR3_FTYPE_STR macro so that we can report ftype as
strings instead of numbers in tracepoints.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Create a simple predicate to determine if two xfs_names are the same
objects or have the exact same name. The comparison is always case
sensitive.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Create an xfs_name_dot object so that upcoming scrub code can compare
against that. Offline repair already has such an object, so we're
really just hoisting it to the kernel.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The inode scanner tries to reduce contention on the AGI header buffer
lock by grabbing references to consecutive allocated inodes. Batching
stops as soon as we encounter an unallocated inode. This is unfortunate
because in the worst case performance collapses to the old "one at a
time" behavior if every other inode is free.
This is correct behavior, but we could do better. Unallocated inodes by
definition have nothing to scan, which means the iscan can ignore them
as long as someone ensures that the scan data will reflect another
thread allocating the inode and adding interesting metadata to that
inode. That mechanism is, of course, the live update hooks.
Therefore, extend the batching mechanism to track unallocated inodes
adjacent to the scan cursor. The _want_live_update predicate can tell
the caller's live update hook to incorporate all live updates to what
the scanner thinks is an unallocated inode if (after dropping the AGI)
some other thread allocates one of those inodes and begins using it.
Note that we cannot just copy the ir_free bitmap into the scan cursor
because the batching stops if iget says the inode is in an intermediate
state (e.g. on the inactivation list) and cannot be igrabbed.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
After observing xfs_scrub taking forever to rebuild parent pointers on a
pptrs enabled filesystem, I decided to profile what the system was
doing. It turns out that when there are a lot of threads trying to scan
the filesystem, most of our time is spent contending on AGI buffer
locks. Given that we're walking the inobt records anyway, we can often
tell ahead of time when there's a bunch of (up to 64) consecutive inodes
that we could grab all at once.
Do this to amortize the cost of taking the AGI lock across as many
inodes as we possibly can. On the author's system this seems to improve
parallel throughput from barely one and a half cores to slightly
sublinear scaling. The obvious antipattern here of course is where the
freemask has every other bit set (e.g. all 0xA's)
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>