IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Replace this shouty macro with a real C function that has a more
descriptive name.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Modify xfs_ifork_ptr to return a NULL pointer if the caller asks for the
attribute fork but i_forkoff is zero. This eliminates the ambiguity
between i_forkoff and i_af.if_present, which should make it easier to
understand the lifetime of attr forks.
While we're at it, remove the if_present checks around calls to
xfs_idestroy_fork and xfs_ifork_zap_attr since they can both handle attr
forks that have already been torn down.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Syzkaller reported a UAF bug a while back:
==================================================================
BUG: KASAN: use-after-free in xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
Read of size 4 at addr ffff88802cec919c by task syz-executor262/2958
CPU: 2 PID: 2958 Comm: syz-executor262 Not tainted
5.15.0-0.30.3-20220406_1406 #3
Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x82/0xa9 lib/dump_stack.c:106
print_address_description.constprop.9+0x21/0x2d5 mm/kasan/report.c:256
__kasan_report mm/kasan/report.c:442 [inline]
kasan_report.cold.14+0x7f/0x11b mm/kasan/report.c:459
xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
xfs_attr_get+0x378/0x4c2 fs/xfs/libxfs/xfs_attr.c:159
xfs_xattr_get+0xe3/0x150 fs/xfs/xfs_xattr.c:36
__vfs_getxattr+0xdf/0x13d fs/xattr.c:399
cap_inode_need_killpriv+0x41/0x5d security/commoncap.c:300
security_inode_need_killpriv+0x4c/0x97 security/security.c:1408
dentry_needs_remove_privs.part.28+0x21/0x63 fs/inode.c:1912
dentry_needs_remove_privs+0x80/0x9e fs/inode.c:1908
do_truncate+0xc3/0x1e0 fs/open.c:56
handle_truncate fs/namei.c:3084 [inline]
do_open fs/namei.c:3432 [inline]
path_openat+0x30ab/0x396d fs/namei.c:3561
do_filp_open+0x1c4/0x290 fs/namei.c:3588
do_sys_openat2+0x60d/0x98c fs/open.c:1212
do_sys_open+0xcf/0x13c fs/open.c:1228
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0x0
RIP: 0033:0x7f7ef4bb753d
Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73
01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007f7ef52c2ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
RAX: ffffffffffffffda RBX: 0000000000404148 RCX: 00007f7ef4bb753d
RDX: 00007f7ef4bb753d RSI: 0000000000000000 RDI: 0000000020004fc0
RBP: 0000000000404140 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0030656c69662f2e
R13: 00007ffd794db37f R14: 00007ffd794db470 R15: 00007f7ef52c2fc0
</TASK>
Allocated by task 2953:
kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:434 [inline]
__kasan_slab_alloc+0x68/0x7c mm/kasan/common.c:467
kasan_slab_alloc include/linux/kasan.h:254 [inline]
slab_post_alloc_hook mm/slab.h:519 [inline]
slab_alloc_node mm/slub.c:3213 [inline]
slab_alloc mm/slub.c:3221 [inline]
kmem_cache_alloc+0x11b/0x3eb mm/slub.c:3226
kmem_cache_zalloc include/linux/slab.h:711 [inline]
xfs_ifork_alloc+0x25/0xa2 fs/xfs/libxfs/xfs_inode_fork.c:287
xfs_bmap_add_attrfork+0x3f2/0x9b1 fs/xfs/libxfs/xfs_bmap.c:1098
xfs_attr_set+0xe38/0x12a7 fs/xfs/libxfs/xfs_attr.c:746
xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
__vfs_setxattr+0x11b/0x177 fs/xattr.c:180
__vfs_setxattr_noperm+0x128/0x5e0 fs/xattr.c:214
__vfs_setxattr_locked+0x1d4/0x258 fs/xattr.c:275
vfs_setxattr+0x154/0x33d fs/xattr.c:301
setxattr+0x216/0x29f fs/xattr.c:575
__do_sys_fsetxattr fs/xattr.c:632 [inline]
__se_sys_fsetxattr fs/xattr.c:621 [inline]
__x64_sys_fsetxattr+0x243/0x2fe fs/xattr.c:621
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0x0
Freed by task 2949:
kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
kasan_set_track+0x1c/0x21 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360
____kasan_slab_free mm/kasan/common.c:366 [inline]
____kasan_slab_free mm/kasan/common.c:328 [inline]
__kasan_slab_free+0xe2/0x10e mm/kasan/common.c:374
kasan_slab_free include/linux/kasan.h:230 [inline]
slab_free_hook mm/slub.c:1700 [inline]
slab_free_freelist_hook mm/slub.c:1726 [inline]
slab_free mm/slub.c:3492 [inline]
kmem_cache_free+0xdc/0x3ce mm/slub.c:3508
xfs_attr_fork_remove+0x8d/0x132 fs/xfs/libxfs/xfs_attr_leaf.c:773
xfs_attr_sf_removename+0x5dd/0x6cb fs/xfs/libxfs/xfs_attr_leaf.c:822
xfs_attr_remove_iter+0x68c/0x805 fs/xfs/libxfs/xfs_attr.c:1413
xfs_attr_remove_args+0xb1/0x10d fs/xfs/libxfs/xfs_attr.c:684
xfs_attr_set+0xf1e/0x12a7 fs/xfs/libxfs/xfs_attr.c:802
xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
__vfs_removexattr+0x106/0x16a fs/xattr.c:468
cap_inode_killpriv+0x24/0x47 security/commoncap.c:324
security_inode_killpriv+0x54/0xa1 security/security.c:1414
setattr_prepare+0x1a6/0x897 fs/attr.c:146
xfs_vn_change_ok+0x111/0x15e fs/xfs/xfs_iops.c:682
xfs_vn_setattr_size+0x5f/0x15a fs/xfs/xfs_iops.c:1065
xfs_vn_setattr+0x125/0x2ad fs/xfs/xfs_iops.c:1093
notify_change+0xae5/0x10a1 fs/attr.c:410
do_truncate+0x134/0x1e0 fs/open.c:64
handle_truncate fs/namei.c:3084 [inline]
do_open fs/namei.c:3432 [inline]
path_openat+0x30ab/0x396d fs/namei.c:3561
do_filp_open+0x1c4/0x290 fs/namei.c:3588
do_sys_openat2+0x60d/0x98c fs/open.c:1212
do_sys_open+0xcf/0x13c fs/open.c:1228
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0x0
The buggy address belongs to the object at ffff88802cec9188
which belongs to the cache xfs_ifork of size 40
The buggy address is located 20 bytes inside of
40-byte region [ffff88802cec9188, ffff88802cec91b0)
The buggy address belongs to the page:
page:00000000c3af36a1 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x2cec9
flags: 0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000200 ffffea00009d2580 0000000600000006 ffff88801a9ffc80
raw: 0000000000000000 0000000080490049 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88802cec9080: fb fb fb fc fc fa fb fb fb fb fc fc fb fb fb fb
ffff88802cec9100: fb fc fc fb fb fb fb fb fc fc fb fb fb fb fb fc
>ffff88802cec9180: fc fa fb fb fb fb fc fc fa fb fb fb fb fc fc fb
^
ffff88802cec9200: fb fb fb fb fc fc fb fb fb fb fb fc fc fb fb fb
ffff88802cec9280: fb fb fc fc fa fb fb fb fb fc fc fa fb fb fb fb
==================================================================
The root cause of this bug is the unlocked access to xfs_inode.i_afp
from the getxattr code paths while trying to determine which ILOCK mode
to use to stabilize the xattr data. Unfortunately, the VFS does not
acquire i_rwsem when vfs_getxattr (or listxattr) call into the
filesystem, which means that getxattr can race with a removexattr that's
tearing down the attr fork and crash:
xfs_attr_set: xfs_attr_get:
xfs_attr_fork_remove: xfs_ilock_attr_map_shared:
xfs_idestroy_fork(ip->i_afp);
kmem_cache_free(xfs_ifork_cache, ip->i_afp);
if (ip->i_afp &&
ip->i_afp = NULL;
xfs_need_iread_extents(ip->i_afp))
<KABOOM>
ip->i_forkoff = 0;
Regrettably, the VFS is much more lax about i_rwsem and getxattr than
is immediately obvious -- not only does it not guarantee that we hold
i_rwsem, it actually doesn't guarantee that we *don't* hold it either.
The getxattr system call won't acquire the lock before calling XFS, but
the file capabilities code calls getxattr with and without i_rwsem held
to determine if the "security.capabilities" xattr is set on the file.
Fixing the VFS locking requires a treewide investigation into every code
path that could touch an xattr and what i_rwsem state it expects or sets
up. That could take years or even prove impossible; fortunately, we
can fix this UAF problem inside XFS.
An earlier version of this patch used smp_wmb in xfs_attr_fork_remove to
ensure that i_forkoff is always zeroed before i_afp is set to null and
changed the read paths to use smp_rmb before accessing i_forkoff and
i_afp, which avoided these UAF problems. However, the patch author was
too busy dealing with other problems in the meantime, and by the time he
came back to this issue, the situation had changed a bit.
On a modern system with selinux, each inode will always have at least
one xattr for the selinux label, so it doesn't make much sense to keep
incurring the extra pointer dereference. Furthermore, Allison's
upcoming parent pointer patchset will also cause nearly every inode in
the filesystem to have extended attributes. Therefore, make the inode
attribute fork structure part of struct xfs_inode, at a cost of 40 more
bytes.
This patch adds a clunky if_present field where necessary to maintain
the existing logic of xattr fork null pointer testing in the existing
codebase. The next patch switches the logic over to XFS_IFORK_Q and it
all goes away.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
We're about to make this logic do a bit more, so convert the macro to a
static inline function for better typechecking and fewer shouty macros.
No functional changes here.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
At line 1561, variable "state" is being compared
with NULL every loop iteration.
-------------------------------------------------------------------
1561 for (i = 0; state != NULL && i < state->path.active; i++) {
1562 xfs_trans_brelse(args->trans, state->path.blk[i].bp);
1563 state->path.blk[i].bp = NULL;
1564 }
-------------------------------------------------------------------
However, it cannot be NULL.
----------------------------------------
1546 state = xfs_da_state_alloc(args);
----------------------------------------
xfs_da_state_alloc calls kmem_cache_zalloc. kmem_cache_zalloc is
called with __GFP_NOFAIL flag and, therefore, it cannot return NULL.
--------------------------------------------------------------------------
struct xfs_da_state *
xfs_da_state_alloc(
struct xfs_da_args *args)
{
struct xfs_da_state *state;
state = kmem_cache_zalloc(xfs_da_state_cache, GFP_NOFS | __GFP_NOFAIL);
state->args = args;
state->mp = args->dp->i_mount;
return state;
}
--------------------------------------------------------------------------
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Andrey Strachuk <strochuk@ispras.ru>
Fixes: 4d0cdd2bb8f0 ("xfs: clean up xfs_attr_node_hasname")
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Make it consistent with the other buffer APIs to return a error and
the buffer is placed in a parameter.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
We check if an ag contains the log in many places, so make this
a first class XFS helper by lifting it to fs/xfs/libxfs/xfs_ag.h and
renaming it xfs_ag_contains_log(). The convert all the places that
check if the AG contains the log to use this helper.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Many of the places that call xfs_ag_block_count() have a perag
available. These places can just read pag->block_count directly
instead of calculating the AG block count from first principles.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
There is a lot of overhead in functions like xfs_verify_agino() that
repeatedly calculate the geometry limits of an AG. These can be
pre-calculated as they are static and the verification context has
a per-ag context it can quickly reference.
In the case of xfs_verify_agino(), we now always have a perag
context handy, so we can store the minimum and maximum agino values
in the AG in the perag. This means we don't have to calculate
it on every call and it can be inlined in callers if we move it
to xfs_ag.h.
xfs_verify_agino_or_null() gets the same perag treatment.
xfs_agino_range() is moved to xfs_ag.c as it's not really a type
function, and it's use is largely restricted as the first and last
aginos can be grabbed straight from the perag in most cases.
Note that we leave the original xfs_verify_agino in place in
xfs_types.c as a static function as other callers in that file do
not have per-ag contexts so still need to go the long way. It's been
renamed to xfs_verify_agno_agino() to indicate it takes both an agno
and an agino to differentiate it from new function.
$ size --totals fs/xfs/built-in.a
text data bss dec hex filename
before 1482185 329588 572 1812345 1ba779 (TOTALS)
after 1481937 329588 572 1812097 1ba681 (TOTALS)
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
There is a lot of overhead in functions like xfs_verify_agbno() that
repeatedly calculate the geometry limits of an AG. These can be
pre-calculated as they are static and the verification context has
a per-ag context it can quickly reference.
In the case of xfs_verify_agbno(), we now always have a perag
context handy, so we can store the AG length and the minimum valid
block in the AG in the perag. This means we don't have to calculate
it on every call and it can be inlined in callers if we move it
to xfs_ag.h.
Move xfs_ag_block_count() to xfs_ag.c because it's really a
per-ag function and not an XFS type function. We need a little
bit of rework that is specific to xfs_initialise_perag() to allow
growfs to calculate the new perag sizes before we've updated the
primary superblock during the grow (chicken/egg situation).
Note that we leave the original xfs_verify_agbno in place in
xfs_types.c as a static function as other callers in that file do
not have per-ag contexts so still need to go the long way. It's been
renamed to xfs_verify_agno_agbno() to indicate it takes both an agno
and an agbno to differentiate it from new function.
Future commits will make similar changes for other per-ag geometry
validation functions.
Further:
$ size --totals fs/xfs/built-in.a
text data bss dec hex filename
before 1483006 329588 572 1813166 1baaae (TOTALS)
after 1482185 329588 572 1812345 1ba779 (TOTALS)
This rework reduces the binary size by ~820 bytes, indicating
that much less work is being done to bounds check the agbno values
against on per-ag geometry information.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
We have the perag in most places we call xfs_alloc_read_agfl, so
pass the perag instead of a mount/agno pair.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
It's available in all callers, so pass it in so that the perag can
be passed further down the stack.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
It's available in all callers, so pass it in so that the perag can
be passed further down the stack.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
We have the perag in most places we call xfs_read_agf, so pass the
perag instead of a mount/agno pair.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
We have the perag in most palces we call xfs_read_agi, so pass the
perag instead of a mount/agno pair.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
xfs_alloc_read_agf() initialises the perag if it hasn't been done
yet, so it makes sense to pass it the perag rather than pull a
reference from the buffer. This allows callers to be per-ag centric
rather than passing mount/agno pairs everywhere.
Whilst modifying the xfs_reflink_find_shared() function definition,
declare it static and remove the extern declaration as it is an
internal function only these days.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Trivial wrapper around xfs_alloc_read_agf(), can be easily replaced
by passing a NULL agfbp to xfs_alloc_read_agf().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
xfs_ialloc_read_agi() initialises the perag if it hasn't been done
yet, so it makes sense to pass it the perag rather than pull a
reference from the buffer. This allows callers to be per-ag centric
rather than passing mount/agno pairs everywhere.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
This is just a basic wrapper around xfs_ialloc_read_agi(), which can
be entirely handled by xfs_ialloc_read_agi() by passing a NULL
agibpp....
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Because the perag must exist for these operations, look it up as
part of the common shrink operations and pass it instead of the
mount/agno pair.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Now that we've established (again!) that empty xattr leaf buffers are
ok, we no longer need to bhold them to transactions when we're creating
new leaf blocks. Get rid of the entire mechanism, which should simplify
the xattr code quite a bit.
The original justification for using bhold here was to prevent the AIL
from trying to write the empty leaf block into the fs during the brief
time that we release the buffer lock. The reason for /that/ was to
prevent recovery from tripping over the empty ondisk block.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
TLDR: Revert commit 51e6104fdb95 ("xfs: detect empty attr leaf blocks in
xfs_attr3_leaf_verify") because it was wrong.
Every now and then we get a corruption report from the kernel or
xfs_repair about empty leaf blocks in the extended attribute structure.
We've long thought that these shouldn't be possible, but prior to 5.18
one would shake loose in the recoveryloop fstests about once a month.
A new addition to the xattr leaf block verifier in 5.19-rc1 makes this
happen every 7 minutes on my testing cloud. I added a ton of logging to
detect any time we set the header count on an xattr leaf block to zero.
This produced the following dmesg output on generic/388:
XFS (sda4): ino 0x21fcbaf leaf 0x129bf78 hdcount==0!
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
xfs_attr3_leaf_create+0x187/0x230
xfs_attr_shortform_to_leaf+0xd1/0x2f0
xfs_attr_set_iter+0x73e/0xa90
xfs_xattri_finish_update+0x45/0x80
xfs_attr_finish_item+0x1b/0xd0
xfs_defer_finish_noroll+0x19c/0x770
__xfs_trans_commit+0x153/0x3e0
xfs_attr_set+0x36b/0x740
xfs_xattr_set+0x89/0xd0
__vfs_setxattr+0x67/0x80
__vfs_setxattr_noperm+0x6e/0x120
vfs_setxattr+0x97/0x180
setxattr+0x88/0xa0
path_setxattr+0xc3/0xe0
__x64_sys_setxattr+0x27/0x30
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
So now we know that someone is creating empty xattr leaf blocks as part
of converting a sf xattr structure into a leaf xattr structure. The
conversion routine logs any existing sf attributes in the same
transaction that creates the leaf block, so we know this is a setxattr
to a file that has no attributes at all.
Next, g/388 calls the shutdown ioctl and cycles the mount to trigger log
recovery. I also augmented buffer item recovery to call ->verify_struct
on any attr leaf blocks and complain if it finds a failure:
XFS (sda4): Unmounting Filesystem
XFS (sda4): Mounting V5 Filesystem
XFS (sda4): Starting recovery (logdev: internal)
XFS (sda4): xattr leaf daddr 0x129bf78 hdrcount == 0!
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
xfs_attr3_leaf_verify+0x3b8/0x420
xlog_recover_buf_commit_pass2+0x60a/0x6c0
xlog_recover_items_pass2+0x4e/0xc0
xlog_recover_commit_trans+0x33c/0x350
xlog_recovery_process_trans+0xa5/0xe0
xlog_recover_process_data+0x8d/0x140
xlog_do_recovery_pass+0x19b/0x720
xlog_do_log_recovery+0x62/0xc0
xlog_do_recover+0x33/0x1d0
xlog_recover+0xda/0x190
xfs_log_mount+0x14c/0x360
xfs_mountfs+0x517/0xa60
xfs_fs_fill_super+0x6bc/0x950
get_tree_bdev+0x175/0x280
vfs_get_tree+0x1a/0x80
path_mount+0x6f5/0xaa0
__x64_sys_mount+0x103/0x140
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fc61e241eae
And a moment later, the _delwri_submit of the recovered buffers trips
the same verifier and recovery fails:
XFS (sda4): Metadata corruption detected at xfs_attr3_leaf_verify+0x393/0x420 [xfs], xfs_attr3_leaf block 0x129bf78
XFS (sda4): Unmount and run xfs_repair
XFS (sda4): First 128 bytes of corrupted metadata buffer:
00000000: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00 ........;.......
00000010: 00 00 00 00 01 29 bf 78 00 00 00 00 00 00 00 00 .....).x........
00000020: a5 1b d0 02 b2 9a 49 df 8e 9c fb 8d f8 31 3e 9d ......I......1>.
00000030: 00 00 00 00 02 1f cb af 00 00 00 00 10 00 00 00 ................
00000040: 00 50 0f b0 00 00 00 00 00 00 00 00 00 00 00 00 .P..............
00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
XFS (sda4): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x37f/0x3b0 [xfs] (fs/xfs/xfs_buf.c:1518). Shutting down filesystem.
XFS (sda4): Please unmount the filesystem and rectify the problem(s)
XFS (sda4): log mount/recovery failed: error -117
XFS (sda4): log mount failed
I think I see what's going on here -- setxattr is racing with something
that shuts down the filesystem:
Thread 1 Thread 2
-------- --------
xfs_attr_sf_addname
xfs_attr_shortform_to_leaf
<create empty leaf>
xfs_trans_bhold(leaf)
xattri_dela_state = XFS_DAS_LEAF_ADD
<roll transaction>
<flush log>
<shut down filesystem>
xfs_trans_bhold_release(leaf)
<discover fs is dead, bail>
Thread 3
--------
<cycle mount, start recovery>
xlog_recover_buf_commit_pass2
xlog_recover_do_reg_buffer
<replay empty leaf buffer from recovered buf item>
xfs_buf_delwri_queue(leaf)
xfs_buf_delwri_submit
_xfs_buf_ioapply(leaf)
xfs_attr3_leaf_write_verify
<trip over empty leaf buffer>
<fail recovery>
As you can see, the bhold keeps the leaf buffer locked and thus prevents
the *AIL* from tripping over the ichdr.count==0 check in the write
verifier. Unfortunately, it doesn't prevent the log from getting
flushed to disk, which sets up log recovery to fail.
So. It's clear that the kernel has always had the ability to persist
attr leaf blocks with ichdr.count==0, which means that it's part of the
ondisk format now.
Unfortunately, this check has been added and removed multiple times
throughout history. It first appeared in[1] kernel 3.10 as part of the
early V5 format patches. The check was later discovered to break log
recovery and hence disabled[2] during log recovery in kernel 4.10.
Simultaneously, the check was added[3] to xfs_repair 4.9.0 to try to
weed out the empty leaf blocks. This was still not correct because log
recovery would recover an empty attr leaf block successfully only for
regular xattr operations to trip over the empty block during of the
block during regular operation. Therefore, the check was removed
entirely[4] in kernel 5.7 but removal of the xfs_repair check was
forgotten. The continued complaints from xfs_repair lead to us
mistakenly re-adding[5] the verifier check for kernel 5.19. Remove it
once again.
[1] 517c22207b04 ("xfs: add CRCs to attr leaf blocks")
[2] 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier
during log replay")
[3] f7140161 ("xfs_repair: junk leaf attribute if count == 0")
[4] f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf
block")
[5] 51e6104fdb95 ("xfs: detect empty attr leaf blocks in
xfs_attr3_leaf_verify")
Looking at the rest of the xattr code, it seems that files with empty
leaf blocks behave as expected -- listxattr reports no attributes;
getxattr on any xattr returns nothing as expected; removexattr does
nothing; and setxattr can add attributes just fine.
Original-bug: 517c22207b04 ("xfs: add CRCs to attr leaf blocks")
Still-not-fixed-by: 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier during log replay")
Removed-in: f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf block")
Fixes: 51e6104fdb95 ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
The variable @args is fed to a tracepoint, and that's the only place
it's used. This is fine for the kernel, but for userspace, tracepoints
are #define'd out of existence, which results in this warning on gcc
11.2:
xfs_attr.c: In function ‘xfs_attr_node_try_addname’:
xfs_attr.c:1440:42: warning: unused variable ‘args’ [-Wunused-variable]
1440 | struct xfs_da_args *args = attr->xattri_da_args;
| ^~~~
Clean this up.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
I found a race involving the larp control knob, aka the debugging knob
that lets developers enable logging of extended attribute updates:
Thread 1 Thread 2
echo 0 > /sys/fs/xfs/debug/larp
setxattr(REPLACE)
xfs_has_larp (returns false)
xfs_attr_set
echo 1 > /sys/fs/xfs/debug/larp
xfs_attr_defer_replace
xfs_attr_init_replace_state
xfs_has_larp (returns true)
xfs_attr_init_remove_state
<oops, wrong DAS state!>
This isn't a particularly severe problem right now because xattr logging
is only enabled when CONFIG_XFS_DEBUG=y, and developers *should* know
what they're doing.
However, the eventual intent is that callers should be able to ask for
the assistance of the log in persisting xattr updates. This capability
might not be required for /all/ callers, which means that dynamic
control must work correctly. Once an xattr update has decided whether
or not to use logged xattrs, it needs to stay in that mode until the end
of the operation regardless of what subsequent parallel operations might
do.
Therefore, it is an error to continue sampling xfs_globals.larp once
xfs_attr_change has made a decision about larp, and it was not correct
for me to have told Allison that ->create_intent functions can sample
the global log incompat feature bitfield to decide to elide a log item.
Instead, create a new op flag for the xfs_da_args structure, and convert
all other callers of xfs_has_larp and xfs_sb_version_haslogxattrs within
the attr update state machine to look for the operations flag.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
This series contains a two key cleanups for the new LARP code. Most
of it is refactoring and tweaking the code that creates kernel log
messages about enabling and disabling features -- we should be
warning about LARP being turned on once per mount, instead of once
per insmod cycle; we shouldn't be spamming the logs so aggressively
about turning *off* log incompat features.
The second part of the series refactors the LARP code responsible
for getting (and releasing) permission to use xattr log items. The
implementation code doesn't belong in xfs_log.c, and calls to
logging functions don't belong in libxfs -- they really should be
done by the VFS implementation functions before they start calling
into libraries.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
As part of solving the memory leaks and UAF problems in the new LARP
code, kmemleak also reported that log recovery will leak the table
used to hash buffer cancellations if the recovery fails. Fix this
problem by creating alloc/free helpers that initialize and free the
hashtable contents correctly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The LARP patchset added an awkward coupling point between libxfs and
what would be libxlog, if the XFS log were actually its own library.
Move the code that sets up logged xattr updates out of libxfs and into
xfs_xattr.c so that libxfs no longer has to know about xlog_* functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The LARP patchset added an awkward coupling point between libxfs and
what would be libxlog, if the XFS log were actually its own library.
Move the code that enables logged xattr updates out of "lib"xlog and into
xfs_xattr.c so that it no longer has to know about xlog_* functions.
While we're at it, give xfs_xattr.c its own header file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
While we're messing around with how recovery allocates and frees the
buffer cancellation table, convert the allocation to use kmalloc_array
instead of the old kmem_alloc APIs, and make it handle a null return,
even though that's not likely.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Move the code that allocates and frees the buffer cancellation tables
used by log recovery into the file that actually uses the tables. This
is a precursor to some cleanups and a memory leak fix.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The recent patch to improve btree cycle checking caused a regression
when I rebased the in-memory btree branch atop the 5.19 for-next branch,
because in-memory short-pointer btrees do not have AG numbers. This
produced the following complaint from kmemleak:
unreferenced object 0xffff88803d47dde8 (size 264):
comm "xfs_io", pid 4889, jiffies 4294906764 (age 24.072s)
hex dump (first 32 bytes):
90 4d 0b 0f 80 88 ff ff 00 a0 bd 05 80 88 ff ff .M..............
e0 44 3a a0 ff ff ff ff 00 df 08 06 80 88 ff ff .D:.............
backtrace:
[<ffffffffa0388059>] xfbtree_dup_cursor+0x49/0xc0 [xfs]
[<ffffffffa029887b>] xfs_btree_dup_cursor+0x3b/0x200 [xfs]
[<ffffffffa029af5d>] __xfs_btree_split+0x6ad/0x820 [xfs]
[<ffffffffa029b130>] xfs_btree_split+0x60/0x110 [xfs]
[<ffffffffa029f6da>] xfs_btree_make_block_unfull+0x19a/0x1f0 [xfs]
[<ffffffffa029fada>] xfs_btree_insrec+0x3aa/0x810 [xfs]
[<ffffffffa029fff3>] xfs_btree_insert+0xb3/0x240 [xfs]
[<ffffffffa02cb729>] xfs_rmap_insert+0x99/0x200 [xfs]
[<ffffffffa02cf142>] xfs_rmap_map_shared+0x192/0x5f0 [xfs]
[<ffffffffa02cf60b>] xfs_rmap_map_raw+0x6b/0x90 [xfs]
[<ffffffffa0384a85>] xrep_rmap_stash+0xd5/0x1d0 [xfs]
[<ffffffffa0384dc0>] xrep_rmap_visit_bmbt+0xa0/0xf0 [xfs]
[<ffffffffa0384fb6>] xrep_rmap_scan_iext+0x56/0xa0 [xfs]
[<ffffffffa03850d8>] xrep_rmap_scan_ifork+0xd8/0x160 [xfs]
[<ffffffffa0385195>] xrep_rmap_scan_inode+0x35/0x80 [xfs]
[<ffffffffa03852ee>] xrep_rmap_find_rmaps+0x10e/0x270 [xfs]
I noticed that xfs_btree_insrec has a bunch of debug code that return
out of the function immediately, without freeing the "new" btree cursor
that can be returned when _make_block_unfull calls xfs_btree_split. Fix
the error return in this function to free the btree cursor.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
xfs/538 on a 1kB block filesystem failed with this assert:
XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || xfs_is_shutdown(cur->bc_mp), file: fs/xfs/libxfs/xfs_btree.c, line: 448
The problem was that an allocation failed unexpectedly in
xfs_bmbt_alloc_block() after roughly 150,000 minlen allocation error
injections, resulting in an EFSCORRUPTED error being returned to
xfs_bmapi_write(). The error occurred on extent-to-btree format
conversion allocating the new root block:
RIP: 0010:xfs_bmbt_alloc_block+0x177/0x210
Call Trace:
<TASK>
xfs_btree_new_iroot+0xdf/0x520
xfs_btree_make_block_unfull+0x10d/0x1c0
xfs_btree_insrec+0x364/0x790
xfs_btree_insert+0xaa/0x210
xfs_bmap_add_extent_hole_real+0x1fe/0x9a0
xfs_bmapi_allocate+0x34c/0x420
xfs_bmapi_write+0x53c/0x9c0
xfs_alloc_file_space+0xee/0x320
xfs_file_fallocate+0x36b/0x450
vfs_fallocate+0x148/0x340
__x64_sys_fallocate+0x3c/0x70
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa
Why the allocation failed at this point is unknown, but is likely
that we ran the transaction out of reserved space and filesystem out
of space with bmbt blocks because of all the minlen allocations
being done causing worst case fragmentation of a large allocation.
Regardless of the cause, we've then called xfs_bmapi_finish() which
calls xfs_btree_del_cursor(cur, error) to tear down the cursor.
So we have a failed operation, error != 0, cur->bc_ino.allocated > 0
and the filesystem is still up. The assert fails to take into
account that allocation can fail with an error and the transaction
teardown will shut the filesystem down if necessary. i.e. the
assert needs to check "|| error != 0" as well, because at this point
shutdown is pending because the current transaction is dirty....
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Not fatal, the assert is there to catch developer attention. I'm
seeing this occasionally during recoveryloop testing after a
shutdown, and I don't want this to stop an overnight recoveryloop
run as it is currently doing.
Convert the ASSERT to a XFS_IS_CORRUPT() check so it will dump a
corruption report into the log and cause a test failure that way,
but it won't stop the machine dead.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Commit dc04db2aa7c9 has caused a small aim7 regression, showing a
small increase in CPU usage in __xfs_btree_check_sblock() as a
result of the extra checking.
This is likely due to the endian conversion of the sibling poitners
being unconditional instead of relying on the compiler to endian
convert the NULL pointer at compile time and avoiding the runtime
conversion for this common case.
Rework the checks so that endian conversion of the sibling pointers
is only done if they are not null as the original code did.
.... and these need to be "inline" because the compiler completely
fails to inline them automatically like it should be doing.
$ size fs/xfs/libxfs/xfs_btree.o*
text data bss dec hex filename
51874 240 0 52114 cb92 fs/xfs/libxfs/xfs_btree.o.orig
51562 240 0 51802 ca5a fs/xfs/libxfs/xfs_btree.o.inline
Just when you think the tools have advanced sufficiently we don't
have to care about stuff like this anymore, along comes a reminder
that *our tools still suck*.
Fixes: dc04db2aa7c9 ("xfs: detect self referencing btree sibling pointers")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
While running xfs/297 and generic/642, I noticed a crash in
xfs_attri_item_relog when it tries to copy the attr name to the new
xattri log item. I think what happened here was that we called
->iop_commit on the old attri item (which nulls out the pointers) as
part of a log force at the same time that a chained attr operation was
ongoing. The system was busy enough that at some later point, the defer
ops operation decided it was necessary to relog the attri log item, but
as we've detached the name buffer from the old attri log item, we can't
copy it to the new one, and kaboom.
I think there's a broader refcounting problem with LARP mode -- the
setxattr code can return to userspace before the CIL actually formats
and commits the log item, which results in a UAF bug. Therefore, the
xattr log item needs to be able to retain a reference to the name and
value buffers until the log items have completely cleared the log.
Furthermore, each time we create an intent log item, we allocate new
memory and (re)copy the contents; sharing here would be very useful.
Solve the UAF and the unnecessary memory allocations by having the log
code create a single refcounted buffer to contain the name and value
contents. This buffer can be passed from old to new during a relog
operation, and the logging code can (optionally) attach it to the
xfs_attr_item for reuse when LARP mode is enabled.
This also fixes a problem where the xfs_attri_log_item objects weren't
being freed back to the same cache where they came from.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
V4 superblocks do not contain the log_incompat feature bit, which means
that we cannot protect xattr log items against kernels that are too old
to know how to recover them. Turn off the log items for such
filesystems and adjust the "delayed" name to reflect what it's really
controlling.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Everywhere else in XFS, structures that capture the state of an ongoing
deferred work item all have names that end with "_intent". The new
extended attribute deferred work items are not named as such, so fix it
to follow the naming convention used elsewhere.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The state variable is now a local variable pointing to a heap
allocation, so we don't need to zero-initialize it, nor do we need the
conditional to decide if we should free it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Initialize and destroy the xattr log item caches in the same places that
we do all the other log item caches.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Nobody uses this field, so get rid of it and the unused flag definition.
Rearrange the structure layout to reduce its size from 104 to 96 bytes.
This gets us from 39 to 42 objects per page.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Create a separate slab cache for struct xfs_attr_item objects, since we
can pack the (104-byte) intent items more tightly than we can with the
general slab cache objects. On x86, this means 39 intents per memory
page instead of 32.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The flags that are stored in the extended attr intent log item really
should have a separate namespace from the rest of the XFS_ATTR_* flags.
Give them one to make it a little more obvious that they're intent item
flags.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The calling conventions of this function are a mess -- callers /can/
provide a pointer to a pointer to a state structure, but it's not
required, and as evidenced by the last two patches, the callers that do
weren't be careful enough about how to deal with an existing da state.
Push the allocation and freeing responsibilty to the callers, which
means that callers from the xattr node state machine steps now have the
visibility to allocate or free the da state structure as they please.
As a bonus, the node remove/add paths for larp-mode replaces can reset
the da state structure instead of freeing and immediately reallocating
it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Make sure we screen the "attr flags" field of recovered xattr intent log
items to reject flag bits that we don't know about. This is really the
attr *filter* field from xfs_da_args, so rename the field and create
a mask to make checking for invalid bits easier.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
If a setxattr operation finds an xattr structure in leaf format, adding
the attr can fail due to lack of space and hence requires an upgrade to
node format. After this happens, we'll roll the transaction and
re-enter the state machine, at which time we need to perform a second
lookup of the attribute name to find its new location. This lookup
attaches a new da state structure to the xfs_attr_item but doesn't free
the old one (from the leaf lookup) and leaks it. Fix that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
kmemleak reported that we lost an xfs_da_state while removing xattrs in
generic/020:
unreferenced object 0xffff88801c0e4b40 (size 480):
comm "attr", pid 30515, jiffies 4294931061 (age 5.960s)
hex dump (first 32 bytes):
78 bc 65 07 00 c9 ff ff 00 30 60 1c 80 88 ff ff x.e......0`.....
02 00 00 00 00 00 00 00 80 18 83 4e 80 88 ff ff ...........N....
backtrace:
[<ffffffffa023ef4a>] xfs_da_state_alloc+0x1a/0x30 [xfs]
[<ffffffffa021b6f3>] xfs_attr_node_hasname+0x23/0x90 [xfs]
[<ffffffffa021c6f1>] xfs_attr_set_iter+0x441/0xa30 [xfs]
[<ffffffffa02b5104>] xfs_xattri_finish_update+0x44/0x80 [xfs]
[<ffffffffa02b515e>] xfs_attr_finish_item+0x1e/0x40 [xfs]
[<ffffffffa0244744>] xfs_defer_finish_noroll+0x184/0x740 [xfs]
[<ffffffffa02a6473>] __xfs_trans_commit+0x153/0x3e0 [xfs]
[<ffffffffa021d149>] xfs_attr_set+0x469/0x7e0 [xfs]
[<ffffffffa02a78d9>] xfs_xattr_set+0x89/0xd0 [xfs]
[<ffffffff812e6512>] __vfs_removexattr+0x52/0x70
[<ffffffff812e6a08>] __vfs_removexattr_locked+0xb8/0x150
[<ffffffff812e6af6>] vfs_removexattr+0x56/0x100
[<ffffffff812e6bf8>] removexattr+0x58/0x90
[<ffffffff812e6cce>] path_removexattr+0x9e/0xc0
[<ffffffff812e6d44>] __x64_sys_lremovexattr+0x14/0x20
[<ffffffff81786b35>] do_syscall_64+0x35/0x80
I think this is a consequence of xfs_attr_node_removename_setup
attaching a new da(btree) state to xfs_attr_item and never freeing it.
I /think/ it's the case that the remove paths could detach the da state
earlier in the remove state machine since nothing else accesses the
state. However, let's future-proof the new xattr code by adding a
catch-all when we free the xfs_attr_item to make sure we never leak the
da state.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
xfs_repair flags these as a corruption error, so the verifier should
catch software bugs that result in empty leaf blocks being written
to disk, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>