2018-04-03 19:23:33 +02:00
// SPDX-License-Identifier: GPL-2.0
2017-10-09 01:51:02 +00:00
/*
* Copyright ( C ) Qu Wenruo 2017. All rights reserved .
*/
/*
* The module is used to catch unexpected / corrupted tree block data .
* Such behavior can be caused either by a fuzzed image or bugs .
*
* The objective is to do leaf / node validation checks when tree block is read
* from disk , and check * every * possible member , so other code won ' t
* need to checking them again .
*
* Due to the potential and unwanted damage , every checker needs to be
* carefully reviewed otherwise so it does not prevent mount of valid images .
*/
2019-04-24 15:22:53 +08:00
# include <linux/types.h>
# include <linux/stddef.h>
# include <linux/error-injection.h>
2022-10-19 10:50:49 -04:00
# include "messages.h"
2017-10-09 01:51:02 +00:00
# include "ctree.h"
# include "tree-checker.h"
# include "compression.h"
2018-07-03 17:10:05 +08:00
# include "volumes.h"
2019-10-01 19:44:42 +02:00
# include "misc.h"
2022-10-19 10:50:47 -04:00
# include "fs.h"
2022-10-19 10:51:00 -04:00
# include "accessors.h"
2022-11-15 11:16:12 -05:00
# include "file-item.h"
2023-04-29 16:07:18 -04:00
# include "inode-item.h"
2023-08-25 16:19:22 -04:00
# include "dir-item.h"
btrfs: tree-checker: add type and sequence check for inline backrefs
[BUG]
There is a bug report that ntfs2btrfs had a bug that it can lead to
transaction abort and the filesystem flips to read-only.
[CAUSE]
For inline backref items, kernel has a strict requirement for their
ordered, they must follow the following rules:
- All btrfs_extent_inline_ref::type should be in an ascending order
- Within the same type, the items should follow a descending order by
their sequence number
For EXTENT_DATA_REF type, the sequence number is result from
hash_extent_data_ref().
For other types, their sequence numbers are
btrfs_extent_inline_ref::offset.
Thus if there is any code not following above rules, the resulted
inline backrefs can prevent the kernel to locate the needed inline
backref and lead to transaction abort.
[FIX]
Ntrfs2btrfs has already fixed the problem, and btrfs-progs has added the
ability to detect such problems.
For kernel, let's be more noisy and be more specific about the order, so
that the next time kernel hits such problem we would reject it in the
first place, without leading to transaction abort.
Link: https://github.com/kdave/btrfs-progs/pull/622
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-24 12:41:11 +10:30
# include "extent-tree.h"
2017-10-09 01:51:02 +00:00
2017-10-09 01:51:03 +00:00
/*
* Error message should follow the following format :
* corrupt < type > : < identifier > , < reason > [ , < bad_value > ]
*
* @ type : leaf or node
* @ identifier : the necessary info to locate the leaf / node .
2018-11-28 12:05:13 +01:00
* It ' s recommended to decode key . objecitd / offset if it ' s
2017-10-09 01:51:03 +00:00
* meaningful .
* @ reason : describe the error
2018-11-28 12:05:13 +01:00
* @ bad_value : optional , it ' s recommended to output bad value and its
2017-10-09 01:51:03 +00:00
* expected value ( range ) .
*
* Since comma is used to separate the components , only space is allowed
* inside each component .
*/
/*
* Append generic " corrupt leaf/node root=%llu block=%llu slot=%d: " to @ fmt .
* Allows callers to customize the output .
*/
2019-03-20 15:31:28 +01:00
__printf ( 3 , 4 )
2018-02-19 17:24:18 +01:00
__cold
2019-03-20 15:31:28 +01:00
static void generic_err ( const struct extent_buffer * eb , int slot ,
2017-10-09 01:51:03 +00:00
const char * fmt , . . . )
{
2019-03-20 15:31:28 +01:00
const struct btrfs_fs_info * fs_info = eb - > fs_info ;
2017-10-09 01:51:03 +00:00
struct va_format vaf ;
va_list args ;
va_start ( args , fmt ) ;
vaf . fmt = fmt ;
vaf . va = & args ;
btrfs: tree-checker: dump the page status if hit something wrong
[BUG]
There is a bug report about very suspicious tree-checker got triggered:
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
SELinux: inode_doinit_use_xattr: getxattr returned 117 for dev=dm-0
ino=5737268
[ANALYZE]
The root cause is still unclear, but there are some clues already:
- Unaligned eb bytenr
The block bytenr is 8550954455682405139, which is not even aligned to
2.
This bytenr is fetched from extent buffer header, not from eb->start.
This means, at the initial time of read, eb header bytenr is still
correct (the very basis check to continue read), but later something
wrong happened, got at least the first page corrupted.
Thus we got such obviously incorrect value.
- Invalid extent buffer header owner
The read itself is triggered for subvolume 256, but the eb header
owner is 11858205567642294356, which is not really possible.
The problem here is, subvolume id is limited to (1 << 48 - 1),
and this one definitely goes beyond that limit.
So this value is another garbage.
We already got two garbage from an extent buffer, which passed the
initial bytenr and csum checks, but later the contents become garbage at
some point.
This looks like a page lifespan problem (e.g. we didn't properly hold the
page).
[ENHANCEMENT]
The current tree-checker only outputs things from the extent buffer,
nothing with the page status.
So this patch would enhance the tree-checker output by also dumping the
first page, which would look like this:
page:00000000aa9f3ce8 refcount:4 mapcount:0 mapping:00000000169aa6b6 index:0x1d0c pfn:0x1022e5
memcg:ffff888103456000
aops:btree_aops [btrfs] ino:1
flags: 0x2ffff0000008000(private|node=0|zone=2|lastcpupid=0xffff)
page_type: 0xffffffff()
raw: 02ffff0000008000 0000000000000000 dead000000000122 ffff88811e06e220
raw: 0000000000001d0c ffff888102fdb1d8 00000004ffffffff ffff888103456000
page dumped because: eb page dump
BTRFS critical (device dm-3): corrupt leaf: root=5 block=30457856 slot=6 ino=257 file_offset=0, invalid disk_bytenr for file extent, have 10617606235235216665, should be aligned to 4096
BTRFS error (device dm-3): read time tree block corruption detected on logical 30457856 mirror 1
From the dump we can see some extra info, something can help us to do
extra cross-checks:
- Page refcount
if it's too low, it definitely means something bad.
- Page aops
Any mapped eb page should have btree_aops with inode number 1.
- Page index
Since a mapped eb page should has its bytenr matching the page
position, (index << PAGE_SHIFT) should match the bytenr of the
bytenr from the critical line.
- Page Private flags
A mapped eb page should have Private flag set to indicate it's managed
by btrfs.
Link: https://lore.kernel.org/linux-btrfs/CAHk-=whNdMaN9ntZ47XRKP6DBes2E5w7fi-0U3H2+PS18p+Pzw@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-27 10:18:36 +10:30
dump_page ( folio_page ( eb - > folios [ 0 ] , 0 ) , " eb page dump " ) ;
2018-01-25 14:56:18 +08:00
btrfs_crit ( fs_info ,
2017-10-09 01:51:03 +00:00
" corrupt %s: root=%llu block=%llu slot=%d, %pV " ,
btrfs_header_level ( eb ) = = 0 ? " leaf " : " node " ,
2018-01-25 14:56:18 +08:00
btrfs_header_owner ( eb ) , btrfs_header_bytenr ( eb ) , slot , & vaf ) ;
2017-10-09 01:51:03 +00:00
va_end ( args ) ;
}
2017-10-09 01:51:06 +00:00
/*
* Customized reporter for extent data item , since its key objectid and
* offset has its own meaning .
*/
2019-03-20 15:32:46 +01:00
__printf ( 3 , 4 )
2018-02-19 17:24:18 +01:00
__cold
2019-03-20 15:32:46 +01:00
static void file_extent_err ( const struct extent_buffer * eb , int slot ,
2017-10-09 01:51:06 +00:00
const char * fmt , . . . )
{
2019-03-20 15:32:46 +01:00
const struct btrfs_fs_info * fs_info = eb - > fs_info ;
2017-10-09 01:51:06 +00:00
struct btrfs_key key ;
struct va_format vaf ;
va_list args ;
btrfs_item_key_to_cpu ( eb , & key , slot ) ;
va_start ( args , fmt ) ;
vaf . fmt = fmt ;
vaf . va = & args ;
btrfs: tree-checker: dump the page status if hit something wrong
[BUG]
There is a bug report about very suspicious tree-checker got triggered:
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
SELinux: inode_doinit_use_xattr: getxattr returned 117 for dev=dm-0
ino=5737268
[ANALYZE]
The root cause is still unclear, but there are some clues already:
- Unaligned eb bytenr
The block bytenr is 8550954455682405139, which is not even aligned to
2.
This bytenr is fetched from extent buffer header, not from eb->start.
This means, at the initial time of read, eb header bytenr is still
correct (the very basis check to continue read), but later something
wrong happened, got at least the first page corrupted.
Thus we got such obviously incorrect value.
- Invalid extent buffer header owner
The read itself is triggered for subvolume 256, but the eb header
owner is 11858205567642294356, which is not really possible.
The problem here is, subvolume id is limited to (1 << 48 - 1),
and this one definitely goes beyond that limit.
So this value is another garbage.
We already got two garbage from an extent buffer, which passed the
initial bytenr and csum checks, but later the contents become garbage at
some point.
This looks like a page lifespan problem (e.g. we didn't properly hold the
page).
[ENHANCEMENT]
The current tree-checker only outputs things from the extent buffer,
nothing with the page status.
So this patch would enhance the tree-checker output by also dumping the
first page, which would look like this:
page:00000000aa9f3ce8 refcount:4 mapcount:0 mapping:00000000169aa6b6 index:0x1d0c pfn:0x1022e5
memcg:ffff888103456000
aops:btree_aops [btrfs] ino:1
flags: 0x2ffff0000008000(private|node=0|zone=2|lastcpupid=0xffff)
page_type: 0xffffffff()
raw: 02ffff0000008000 0000000000000000 dead000000000122 ffff88811e06e220
raw: 0000000000001d0c ffff888102fdb1d8 00000004ffffffff ffff888103456000
page dumped because: eb page dump
BTRFS critical (device dm-3): corrupt leaf: root=5 block=30457856 slot=6 ino=257 file_offset=0, invalid disk_bytenr for file extent, have 10617606235235216665, should be aligned to 4096
BTRFS error (device dm-3): read time tree block corruption detected on logical 30457856 mirror 1
From the dump we can see some extra info, something can help us to do
extra cross-checks:
- Page refcount
if it's too low, it definitely means something bad.
- Page aops
Any mapped eb page should have btree_aops with inode number 1.
- Page index
Since a mapped eb page should has its bytenr matching the page
position, (index << PAGE_SHIFT) should match the bytenr of the
bytenr from the critical line.
- Page Private flags
A mapped eb page should have Private flag set to indicate it's managed
by btrfs.
Link: https://lore.kernel.org/linux-btrfs/CAHk-=whNdMaN9ntZ47XRKP6DBes2E5w7fi-0U3H2+PS18p+Pzw@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-27 10:18:36 +10:30
dump_page ( folio_page ( eb - > folios [ 0 ] , 0 ) , " eb page dump " ) ;
2018-01-25 14:56:18 +08:00
btrfs_crit ( fs_info ,
2017-10-09 01:51:06 +00:00
" corrupt %s: root=%llu block=%llu slot=%d ino=%llu file_offset=%llu, %pV " ,
2018-01-25 14:56:18 +08:00
btrfs_header_level ( eb ) = = 0 ? " leaf " : " node " ,
btrfs_header_owner ( eb ) , btrfs_header_bytenr ( eb ) , slot ,
key . objectid , key . offset , & vaf ) ;
2017-10-09 01:51:06 +00:00
va_end ( args ) ;
}
/*
* Return 0 if the btrfs_file_extent_ # # name is aligned to @ alignment
* Else return 1
*/
2019-03-20 15:59:22 +01:00
# define CHECK_FE_ALIGNED(leaf, slot, fi, name, alignment) \
2017-10-09 01:51:06 +00:00
( { \
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! IS_ALIGNED ( btrfs_file_extent_ # # name ( ( leaf ) , ( fi ) ) , \
( alignment ) ) ) ) \
2019-03-20 15:32:46 +01:00
file_extent_err ( ( leaf ) , ( slot ) , \
2017-10-09 01:51:06 +00:00
" invalid %s for file extent, have %llu, should be aligned to %u " , \
( # name ) , btrfs_file_extent_ # # name ( ( leaf ) , ( fi ) ) , \
( alignment ) ) ; \
( ! IS_ALIGNED ( btrfs_file_extent_ # # name ( ( leaf ) , ( fi ) ) , ( alignment ) ) ) ; \
} )
2019-05-06 16:44:12 +01:00
static u64 file_extent_end ( struct extent_buffer * leaf ,
struct btrfs_key * key ,
struct btrfs_file_extent_item * extent )
{
u64 end ;
u64 len ;
if ( btrfs_file_extent_type ( leaf , extent ) = = BTRFS_FILE_EXTENT_INLINE ) {
len = btrfs_file_extent_ram_bytes ( leaf , extent ) ;
end = ALIGN ( key - > offset + len , leaf - > fs_info - > sectorsize ) ;
} else {
len = btrfs_file_extent_num_bytes ( leaf , extent ) ;
end = key - > offset + len ;
}
return end ;
}
2019-10-04 17:31:32 +08:00
/*
* Customized report for dir_item , the only new important information is
* key - > objectid , which represents inode number
*/
__printf ( 3 , 4 )
__cold
static void dir_item_err ( const struct extent_buffer * eb , int slot ,
const char * fmt , . . . )
{
const struct btrfs_fs_info * fs_info = eb - > fs_info ;
struct btrfs_key key ;
struct va_format vaf ;
va_list args ;
btrfs_item_key_to_cpu ( eb , & key , slot ) ;
va_start ( args , fmt ) ;
vaf . fmt = fmt ;
vaf . va = & args ;
btrfs: tree-checker: dump the page status if hit something wrong
[BUG]
There is a bug report about very suspicious tree-checker got triggered:
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
SELinux: inode_doinit_use_xattr: getxattr returned 117 for dev=dm-0
ino=5737268
[ANALYZE]
The root cause is still unclear, but there are some clues already:
- Unaligned eb bytenr
The block bytenr is 8550954455682405139, which is not even aligned to
2.
This bytenr is fetched from extent buffer header, not from eb->start.
This means, at the initial time of read, eb header bytenr is still
correct (the very basis check to continue read), but later something
wrong happened, got at least the first page corrupted.
Thus we got such obviously incorrect value.
- Invalid extent buffer header owner
The read itself is triggered for subvolume 256, but the eb header
owner is 11858205567642294356, which is not really possible.
The problem here is, subvolume id is limited to (1 << 48 - 1),
and this one definitely goes beyond that limit.
So this value is another garbage.
We already got two garbage from an extent buffer, which passed the
initial bytenr and csum checks, but later the contents become garbage at
some point.
This looks like a page lifespan problem (e.g. we didn't properly hold the
page).
[ENHANCEMENT]
The current tree-checker only outputs things from the extent buffer,
nothing with the page status.
So this patch would enhance the tree-checker output by also dumping the
first page, which would look like this:
page:00000000aa9f3ce8 refcount:4 mapcount:0 mapping:00000000169aa6b6 index:0x1d0c pfn:0x1022e5
memcg:ffff888103456000
aops:btree_aops [btrfs] ino:1
flags: 0x2ffff0000008000(private|node=0|zone=2|lastcpupid=0xffff)
page_type: 0xffffffff()
raw: 02ffff0000008000 0000000000000000 dead000000000122 ffff88811e06e220
raw: 0000000000001d0c ffff888102fdb1d8 00000004ffffffff ffff888103456000
page dumped because: eb page dump
BTRFS critical (device dm-3): corrupt leaf: root=5 block=30457856 slot=6 ino=257 file_offset=0, invalid disk_bytenr for file extent, have 10617606235235216665, should be aligned to 4096
BTRFS error (device dm-3): read time tree block corruption detected on logical 30457856 mirror 1
From the dump we can see some extra info, something can help us to do
extra cross-checks:
- Page refcount
if it's too low, it definitely means something bad.
- Page aops
Any mapped eb page should have btree_aops with inode number 1.
- Page index
Since a mapped eb page should has its bytenr matching the page
position, (index << PAGE_SHIFT) should match the bytenr of the
bytenr from the critical line.
- Page Private flags
A mapped eb page should have Private flag set to indicate it's managed
by btrfs.
Link: https://lore.kernel.org/linux-btrfs/CAHk-=whNdMaN9ntZ47XRKP6DBes2E5w7fi-0U3H2+PS18p+Pzw@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-27 10:18:36 +10:30
dump_page ( folio_page ( eb - > folios [ 0 ] , 0 ) , " eb page dump " ) ;
2019-10-04 17:31:32 +08:00
btrfs_crit ( fs_info ,
" corrupt %s: root=%llu block=%llu slot=%d ino=%llu, %pV " ,
btrfs_header_level ( eb ) = = 0 ? " leaf " : " node " ,
btrfs_header_owner ( eb ) , btrfs_header_bytenr ( eb ) , slot ,
key . objectid , & vaf ) ;
va_end ( args ) ;
}
/*
* This functions checks prev_key - > objectid , to ensure current key and prev_key
* share the same objectid as inode number .
*
* This is to detect missing INODE_ITEM in subvolume trees .
*
* Return true if everything is OK or we don ' t need to check .
* Return false if anything is wrong .
*/
static bool check_prev_ino ( struct extent_buffer * leaf ,
struct btrfs_key * key , int slot ,
struct btrfs_key * prev_key )
{
/* No prev key, skip check */
if ( slot = = 0 )
return true ;
/* Only these key->types needs to be checked */
ASSERT ( key - > type = = BTRFS_XATTR_ITEM_KEY | |
key - > type = = BTRFS_INODE_REF_KEY | |
key - > type = = BTRFS_DIR_INDEX_KEY | |
key - > type = = BTRFS_DIR_ITEM_KEY | |
key - > type = = BTRFS_EXTENT_DATA_KEY ) ;
/*
* Only subvolume trees along with their reloc trees need this check .
* Things like log tree doesn ' t follow this ino requirement .
*/
if ( ! is_fstree ( btrfs_header_owner ( leaf ) ) )
return true ;
if ( key - > objectid = = prev_key - > objectid )
return true ;
/* Error found */
dir_item_err ( leaf , slot ,
" invalid previous key objectid, have %llu expect %llu " ,
prev_key - > objectid , key - > objectid ) ;
return false ;
}
2019-03-20 16:21:10 +01:00
static int check_extent_data_item ( struct extent_buffer * leaf ,
2019-05-06 16:44:12 +01:00
struct btrfs_key * key , int slot ,
struct btrfs_key * prev_key )
2017-10-09 01:51:02 +00:00
{
2019-03-20 16:21:10 +01:00
struct btrfs_fs_info * fs_info = leaf - > fs_info ;
2017-10-09 01:51:02 +00:00
struct btrfs_file_extent_item * fi ;
2018-01-25 14:56:18 +08:00
u32 sectorsize = fs_info - > sectorsize ;
2021-10-21 14:58:35 -04:00
u32 item_size = btrfs_item_size ( leaf , slot ) ;
2019-05-03 08:30:54 +08:00
u64 extent_end ;
2017-10-09 01:51:02 +00:00
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! IS_ALIGNED ( key - > offset , sectorsize ) ) ) {
2019-03-20 15:32:46 +01:00
file_extent_err ( leaf , slot ,
2017-10-09 01:51:06 +00:00
" unaligned file_offset for file extent, have %llu should be aligned to %u " ,
key - > offset , sectorsize ) ;
2017-10-09 01:51:02 +00:00
return - EUCLEAN ;
}
2019-08-26 15:40:38 +08:00
/*
* Previous key must have the same key - > objectid ( ino ) .
* It can be XATTR_ITEM , INODE_ITEM or just another EXTENT_DATA .
* But if objectids mismatch , it means we have a missing
* INODE_ITEM .
*/
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! check_prev_ino ( leaf , key , slot , prev_key ) ) )
2019-08-26 15:40:38 +08:00
return - EUCLEAN ;
2017-10-09 01:51:02 +00:00
fi = btrfs_item_ptr ( leaf , slot , struct btrfs_file_extent_item ) ;
2019-09-03 07:46:19 +08:00
/*
* Make sure the item contains at least inline header , so the file
* extent type is not some garbage .
*/
2020-11-04 16:12:45 +01:00
if ( unlikely ( item_size < BTRFS_FILE_EXTENT_INLINE_DATA_START ) ) {
2019-09-03 07:46:19 +08:00
file_extent_err ( leaf , slot ,
2019-11-08 22:38:52 +01:00
" invalid item size, have %u expect [%zu, %u) " ,
2019-09-03 07:46:19 +08:00
item_size , BTRFS_FILE_EXTENT_INLINE_DATA_START ,
SZ_4K ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_file_extent_type ( leaf , fi ) > =
BTRFS_NR_FILE_EXTENT_TYPES ) ) {
2019-03-20 15:32:46 +01:00
file_extent_err ( leaf , slot ,
2017-10-09 01:51:06 +00:00
" invalid type for file extent, have %u expect range [0, %u] " ,
btrfs_file_extent_type ( leaf , fi ) ,
2019-10-10 15:59:58 +08:00
BTRFS_NR_FILE_EXTENT_TYPES - 1 ) ;
2017-10-09 01:51:02 +00:00
return - EUCLEAN ;
}
/*
2018-11-28 12:05:13 +01:00
* Support for new compression / encryption must introduce incompat flag ,
2017-10-09 01:51:02 +00:00
* and must be caught in open_ctree ( ) .
*/
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_file_extent_compression ( leaf , fi ) > =
BTRFS_NR_COMPRESS_TYPES ) ) {
2019-03-20 15:32:46 +01:00
file_extent_err ( leaf , slot ,
2017-10-09 01:51:06 +00:00
" invalid compression for file extent, have %u expect range [0, %u] " ,
btrfs_file_extent_compression ( leaf , fi ) ,
2019-10-10 15:59:57 +08:00
BTRFS_NR_COMPRESS_TYPES - 1 ) ;
2017-10-09 01:51:02 +00:00
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_file_extent_encryption ( leaf , fi ) ) ) {
2019-03-20 15:32:46 +01:00
file_extent_err ( leaf , slot ,
2017-10-09 01:51:06 +00:00
" invalid encryption for file extent, have %u expect 0 " ,
btrfs_file_extent_encryption ( leaf , fi ) ) ;
2017-10-09 01:51:02 +00:00
return - EUCLEAN ;
}
if ( btrfs_file_extent_type ( leaf , fi ) = = BTRFS_FILE_EXTENT_INLINE ) {
/* Inline extent must have 0 as key offset */
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > offset ) ) {
2019-03-20 15:32:46 +01:00
file_extent_err ( leaf , slot ,
2017-10-09 01:51:06 +00:00
" invalid file_offset for inline file extent, have %llu expect 0 " ,
key - > offset ) ;
2017-10-09 01:51:02 +00:00
return - EUCLEAN ;
}
/* Compressed inline extent has no on-disk size, skip it */
if ( btrfs_file_extent_compression ( leaf , fi ) ! =
BTRFS_COMPRESS_NONE )
return 0 ;
/* Uncompressed inline extent size must match item size */
2020-11-04 16:12:45 +01:00
if ( unlikely ( item_size ! = BTRFS_FILE_EXTENT_INLINE_DATA_START +
btrfs_file_extent_ram_bytes ( leaf , fi ) ) ) {
2019-03-20 15:32:46 +01:00
file_extent_err ( leaf , slot ,
2017-10-09 01:51:06 +00:00
" invalid ram_bytes for uncompressed inline extent, have %u expect %llu " ,
item_size , BTRFS_FILE_EXTENT_INLINE_DATA_START +
btrfs_file_extent_ram_bytes ( leaf , fi ) ) ;
2017-10-09 01:51:02 +00:00
return - EUCLEAN ;
}
return 0 ;
}
/* Regular or preallocated extent has fixed item size */
2020-11-04 16:12:45 +01:00
if ( unlikely ( item_size ! = sizeof ( * fi ) ) ) {
2019-03-20 15:32:46 +01:00
file_extent_err ( leaf , slot ,
2017-10-13 11:27:35 +02:00
" invalid item size for reg/prealloc file extent, have %u expect %zu " ,
2017-10-09 01:51:06 +00:00
item_size , sizeof ( * fi ) ) ;
2017-10-09 01:51:02 +00:00
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( CHECK_FE_ALIGNED ( leaf , slot , fi , ram_bytes , sectorsize ) | |
CHECK_FE_ALIGNED ( leaf , slot , fi , disk_bytenr , sectorsize ) | |
CHECK_FE_ALIGNED ( leaf , slot , fi , disk_num_bytes , sectorsize ) | |
CHECK_FE_ALIGNED ( leaf , slot , fi , offset , sectorsize ) | |
CHECK_FE_ALIGNED ( leaf , slot , fi , num_bytes , sectorsize ) ) )
2017-10-09 01:51:02 +00:00
return - EUCLEAN ;
2019-05-06 16:44:12 +01:00
2019-05-03 08:30:54 +08:00
/* Catch extent end overflow */
2020-11-04 16:12:45 +01:00
if ( unlikely ( check_add_overflow ( btrfs_file_extent_num_bytes ( leaf , fi ) ,
key - > offset , & extent_end ) ) ) {
2019-05-03 08:30:54 +08:00
file_extent_err ( leaf , slot ,
" extent end overflow, have file offset %llu extent num bytes %llu " ,
key - > offset ,
btrfs_file_extent_num_bytes ( leaf , fi ) ) ;
return - EUCLEAN ;
}
2019-05-06 16:44:12 +01:00
/*
* Check that no two consecutive file extent items , in the same leaf ,
* present ranges that overlap each other .
*/
if ( slot > 0 & &
prev_key - > objectid = = key - > objectid & &
prev_key - > type = = BTRFS_EXTENT_DATA_KEY ) {
struct btrfs_file_extent_item * prev_fi ;
u64 prev_end ;
prev_fi = btrfs_item_ptr ( leaf , slot - 1 ,
struct btrfs_file_extent_item ) ;
prev_end = file_extent_end ( leaf , prev_key , prev_fi ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( prev_end > key - > offset ) ) {
2019-05-06 16:44:12 +01:00
file_extent_err ( leaf , slot - 1 ,
" file extent end range (%llu) goes beyond start offset (%llu) of the next file extent " ,
prev_end , key - > offset ) ;
return - EUCLEAN ;
}
}
2017-10-09 01:51:02 +00:00
return 0 ;
}
2019-03-20 16:02:56 +01:00
static int check_csum_item ( struct extent_buffer * leaf , struct btrfs_key * key ,
2019-12-02 11:01:03 +00:00
int slot , struct btrfs_key * prev_key )
2017-10-09 01:51:02 +00:00
{
2019-03-20 16:02:56 +01:00
struct btrfs_fs_info * fs_info = leaf - > fs_info ;
2018-01-25 14:56:18 +08:00
u32 sectorsize = fs_info - > sectorsize ;
2020-07-02 11:27:30 +02:00
const u32 csumsize = fs_info - > csum_size ;
2017-10-09 01:51:02 +00:00
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > objectid ! = BTRFS_EXTENT_CSUM_OBJECTID ) ) {
2019-03-20 15:31:28 +01:00
generic_err ( leaf , slot ,
2017-10-09 01:51:05 +00:00
" invalid key objectid for csum item, have %llu expect %llu " ,
key - > objectid , BTRFS_EXTENT_CSUM_OBJECTID ) ;
2017-10-09 01:51:02 +00:00
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! IS_ALIGNED ( key - > offset , sectorsize ) ) ) {
2019-03-20 15:31:28 +01:00
generic_err ( leaf , slot ,
2017-10-09 01:51:05 +00:00
" unaligned key offset for csum item, have %llu should be aligned to %u " ,
key - > offset , sectorsize ) ;
2017-10-09 01:51:02 +00:00
return - EUCLEAN ;
}
2021-10-21 14:58:35 -04:00
if ( unlikely ( ! IS_ALIGNED ( btrfs_item_size ( leaf , slot ) , csumsize ) ) ) {
2019-03-20 15:31:28 +01:00
generic_err ( leaf , slot ,
2017-10-09 01:51:05 +00:00
" unaligned item size for csum item, have %u should be aligned to %u " ,
2021-10-21 14:58:35 -04:00
btrfs_item_size ( leaf , slot ) , csumsize ) ;
2017-10-09 01:51:02 +00:00
return - EUCLEAN ;
}
2019-12-02 11:01:03 +00:00
if ( slot > 0 & & prev_key - > type = = BTRFS_EXTENT_CSUM_KEY ) {
u64 prev_csum_end ;
u32 prev_item_size ;
2021-10-21 14:58:35 -04:00
prev_item_size = btrfs_item_size ( leaf , slot - 1 ) ;
2019-12-02 11:01:03 +00:00
prev_csum_end = ( prev_item_size / csumsize ) * sectorsize ;
prev_csum_end + = prev_key - > offset ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( prev_csum_end > key - > offset ) ) {
2019-12-02 11:01:03 +00:00
generic_err ( leaf , slot - 1 ,
" csum end range (%llu) goes beyond the start range (%llu) of the next csum item " ,
prev_csum_end , key - > offset ) ;
return - EUCLEAN ;
}
}
2017-10-09 01:51:02 +00:00
return 0 ;
}
2019-12-09 18:54:33 +08:00
/* Inode item error output has the same format as dir_item_err() */
# define inode_item_err(eb, slot, fmt, ...) \
dir_item_err ( eb , slot , fmt , __VA_ARGS__ )
static int check_inode_key ( struct extent_buffer * leaf , struct btrfs_key * key ,
int slot )
{
struct btrfs_key item_key ;
bool is_inode_item ;
btrfs_item_key_to_cpu ( leaf , & item_key , slot ) ;
is_inode_item = ( item_key . type = = BTRFS_INODE_ITEM_KEY ) ;
/* For XATTR_ITEM, location key should be all 0 */
if ( item_key . type = = BTRFS_XATTR_ITEM_KEY ) {
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > objectid ! = 0 | | key - > type ! = 0 | |
key - > offset ! = 0 ) )
2019-12-09 18:54:33 +08:00
return - EUCLEAN ;
return 0 ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( ( key - > objectid < BTRFS_FIRST_FREE_OBJECTID | |
key - > objectid > BTRFS_LAST_FREE_OBJECTID ) & &
key - > objectid ! = BTRFS_ROOT_TREE_DIR_OBJECTID & &
key - > objectid ! = BTRFS_FREE_INO_OBJECTID ) ) {
2019-12-09 18:54:33 +08:00
if ( is_inode_item ) {
generic_err ( leaf , slot ,
" invalid key objectid: has %llu expect %llu or [%llu, %llu] or %llu " ,
key - > objectid , BTRFS_ROOT_TREE_DIR_OBJECTID ,
BTRFS_FIRST_FREE_OBJECTID ,
BTRFS_LAST_FREE_OBJECTID ,
BTRFS_FREE_INO_OBJECTID ) ;
} else {
dir_item_err ( leaf , slot ,
" invalid location key objectid: has %llu expect %llu or [%llu, %llu] or %llu " ,
key - > objectid , BTRFS_ROOT_TREE_DIR_OBJECTID ,
BTRFS_FIRST_FREE_OBJECTID ,
BTRFS_LAST_FREE_OBJECTID ,
BTRFS_FREE_INO_OBJECTID ) ;
}
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > offset ! = 0 ) ) {
2019-12-09 18:54:33 +08:00
if ( is_inode_item )
inode_item_err ( leaf , slot ,
" invalid key offset: has %llu expect 0 " ,
key - > offset ) ;
else
dir_item_err ( leaf , slot ,
" invalid location key offset:has %llu expect 0 " ,
key - > offset ) ;
return - EUCLEAN ;
}
return 0 ;
}
2019-12-09 18:54:34 +08:00
static int check_root_key ( struct extent_buffer * leaf , struct btrfs_key * key ,
int slot )
{
struct btrfs_key item_key ;
bool is_root_item ;
btrfs_item_key_to_cpu ( leaf , & item_key , slot ) ;
is_root_item = ( item_key . type = = BTRFS_ROOT_ITEM_KEY ) ;
btrfs: reject invalid reloc tree root keys with stack dump
[BUG]
Syzbot reported a crash that an ASSERT() got triggered inside
prepare_to_merge().
That ASSERT() makes sure the reloc tree is properly pointed back by its
subvolume tree.
[CAUSE]
After more debugging output, it turns out we had an invalid reloc tree:
BTRFS error (device loop1): reloc tree mismatch, root 8 has no reloc root, expect reloc root key (-8, 132, 8) gen 17
Note the above root key is (TREE_RELOC_OBJECTID, ROOT_ITEM,
QUOTA_TREE_OBJECTID), meaning it's a reloc tree for quota tree.
But reloc trees can only exist for subvolumes, as for non-subvolume
trees, we just COW the involved tree block, no need to create a reloc
tree since those tree blocks won't be shared with other trees.
Only subvolumes tree can share tree blocks with other trees (thus they
have BTRFS_ROOT_SHAREABLE flag).
Thus this new debug output proves my previous assumption that corrupted
on-disk data can trigger that ASSERT().
[FIX]
Besides the dedicated fix and the graceful exit, also let tree-checker to
check such root keys, to make sure reloc trees can only exist for subvolumes.
CC: stable@vger.kernel.org # 5.15+
Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-03 17:20:43 +08:00
/*
* Bad rootid for reloc trees .
*
* Reloc trees are only for subvolume trees , other trees only need
* to be COWed to be relocated .
*/
if ( unlikely ( is_root_item & & key - > objectid = = BTRFS_TREE_RELOC_OBJECTID & &
! is_fstree ( key - > offset ) ) ) {
generic_err ( leaf , slot ,
" invalid reloc tree for root %lld, root id is not a subvolume tree " ,
key - > offset ) ;
return - EUCLEAN ;
}
2019-12-09 18:54:34 +08:00
/* No such tree id */
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > objectid = = 0 ) ) {
2019-12-09 18:54:34 +08:00
if ( is_root_item )
generic_err ( leaf , slot , " invalid root id 0 " ) ;
else
dir_item_err ( leaf , slot ,
" invalid location key root id 0 " ) ;
return - EUCLEAN ;
}
/* DIR_ITEM/INDEX/INODE_REF is not allowed to point to non-fs trees */
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! is_fstree ( key - > objectid ) & & ! is_root_item ) ) {
2019-12-09 18:54:34 +08:00
dir_item_err ( leaf , slot ,
" invalid location key objectid, have %llu expect [%llu, %llu] " ,
key - > objectid , BTRFS_FIRST_FREE_OBJECTID ,
BTRFS_LAST_FREE_OBJECTID ) ;
return - EUCLEAN ;
}
/*
* ROOT_ITEM with non - zero offset means this is a snapshot , created at
* @ offset transid .
* Furthermore , for location key in DIR_ITEM , its offset is always - 1.
*
* So here we only check offset for reloc tree whose key - > offset must
* be a valid tree .
*/
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > objectid = = BTRFS_TREE_RELOC_OBJECTID & &
key - > offset = = 0 ) ) {
2019-12-09 18:54:34 +08:00
generic_err ( leaf , slot , " invalid root id 0 for reloc tree " ) ;
return - EUCLEAN ;
}
return 0 ;
}
2019-03-20 16:17:46 +01:00
static int check_dir_item ( struct extent_buffer * leaf ,
2019-08-26 15:40:38 +08:00
struct btrfs_key * key , struct btrfs_key * prev_key ,
int slot )
2017-11-08 08:54:25 +08:00
{
2019-03-20 16:17:46 +01:00
struct btrfs_fs_info * fs_info = leaf - > fs_info ;
2017-11-08 08:54:25 +08:00
struct btrfs_dir_item * di ;
2021-10-21 14:58:35 -04:00
u32 item_size = btrfs_item_size ( leaf , slot ) ;
2017-11-08 08:54:25 +08:00
u32 cur = 0 ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! check_prev_ino ( leaf , key , slot , prev_key ) ) )
2019-08-26 15:40:38 +08:00
return - EUCLEAN ;
2020-11-04 16:12:45 +01:00
2017-11-08 08:54:25 +08:00
di = btrfs_item_ptr ( leaf , slot , struct btrfs_dir_item ) ;
while ( cur < item_size ) {
2019-12-09 18:54:35 +08:00
struct btrfs_key location_key ;
2017-11-08 08:54:25 +08:00
u32 name_len ;
u32 data_len ;
u32 max_name_len ;
u32 total_size ;
u32 name_hash ;
u8 dir_type ;
2019-12-09 18:54:35 +08:00
int ret ;
2017-11-08 08:54:25 +08:00
/* header itself should not cross item boundary */
2020-11-04 16:12:45 +01:00
if ( unlikely ( cur + sizeof ( * di ) > item_size ) ) {
2019-03-20 16:07:27 +01:00
dir_item_err ( leaf , slot ,
2017-12-06 15:18:14 +01:00
" dir item header crosses item boundary, have %zu boundary %u " ,
2017-11-08 08:54:25 +08:00
cur + sizeof ( * di ) , item_size ) ;
return - EUCLEAN ;
}
2019-12-09 18:54:35 +08:00
/* Location key check */
btrfs_dir_item_key_to_cpu ( leaf , di , & location_key ) ;
if ( location_key . type = = BTRFS_ROOT_ITEM_KEY ) {
ret = check_root_key ( leaf , & location_key , slot ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( ret < 0 ) )
2019-12-09 18:54:35 +08:00
return ret ;
} else if ( location_key . type = = BTRFS_INODE_ITEM_KEY | |
location_key . type = = 0 ) {
ret = check_inode_key ( leaf , & location_key , slot ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( ret < 0 ) )
2019-12-09 18:54:35 +08:00
return ret ;
} else {
dir_item_err ( leaf , slot ,
" invalid location key type, have %u, expect %u or %u " ,
location_key . type , BTRFS_ROOT_ITEM_KEY ,
BTRFS_INODE_ITEM_KEY ) ;
return - EUCLEAN ;
}
2017-11-08 08:54:25 +08:00
/* dir type check */
2022-10-20 12:58:28 -04:00
dir_type = btrfs_dir_ftype ( leaf , di ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( dir_type > = BTRFS_FT_MAX ) ) {
2019-03-20 16:07:27 +01:00
dir_item_err ( leaf , slot ,
2017-11-08 08:54:25 +08:00
" invalid dir item type, have %u expect [0, %u) " ,
dir_type , BTRFS_FT_MAX ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > type = = BTRFS_XATTR_ITEM_KEY & &
dir_type ! = BTRFS_FT_XATTR ) ) {
2019-03-20 16:07:27 +01:00
dir_item_err ( leaf , slot ,
2017-11-08 08:54:25 +08:00
" invalid dir item type for XATTR key, have %u expect %u " ,
dir_type , BTRFS_FT_XATTR ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( dir_type = = BTRFS_FT_XATTR & &
key - > type ! = BTRFS_XATTR_ITEM_KEY ) ) {
2019-03-20 16:07:27 +01:00
dir_item_err ( leaf , slot ,
2017-11-08 08:54:25 +08:00
" xattr dir type found for non-XATTR key " ) ;
return - EUCLEAN ;
}
if ( dir_type = = BTRFS_FT_XATTR )
max_name_len = XATTR_NAME_MAX ;
else
max_name_len = BTRFS_NAME_LEN ;
/* Name/data length check */
name_len = btrfs_dir_name_len ( leaf , di ) ;
data_len = btrfs_dir_data_len ( leaf , di ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( name_len > max_name_len ) ) {
2019-03-20 16:07:27 +01:00
dir_item_err ( leaf , slot ,
2017-11-08 08:54:25 +08:00
" dir item name len too long, have %u max %u " ,
name_len , max_name_len ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( name_len + data_len > BTRFS_MAX_XATTR_SIZE ( fs_info ) ) ) {
2019-03-20 16:07:27 +01:00
dir_item_err ( leaf , slot ,
2017-11-08 08:54:25 +08:00
" dir item name and data len too long, have %u max %u " ,
name_len + data_len ,
2018-01-25 14:56:18 +08:00
BTRFS_MAX_XATTR_SIZE ( fs_info ) ) ;
2017-11-08 08:54:25 +08:00
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( data_len & & dir_type ! = BTRFS_FT_XATTR ) ) {
2019-03-20 16:07:27 +01:00
dir_item_err ( leaf , slot ,
2017-11-08 08:54:25 +08:00
" dir item with invalid data len, have %u expect 0 " ,
data_len ) ;
return - EUCLEAN ;
}
total_size = sizeof ( * di ) + name_len + data_len ;
/* header and name/data should not cross item boundary */
2020-11-04 16:12:45 +01:00
if ( unlikely ( cur + total_size > item_size ) ) {
2019-03-20 16:07:27 +01:00
dir_item_err ( leaf , slot ,
2017-11-08 08:54:25 +08:00
" dir item data crosses item boundary, have %u boundary %u " ,
cur + total_size , item_size ) ;
return - EUCLEAN ;
}
/*
* Special check for XATTR / DIR_ITEM , as key - > offset is name
* hash , should match its name
*/
if ( key - > type = = BTRFS_DIR_ITEM_KEY | |
key - > type = = BTRFS_XATTR_ITEM_KEY ) {
2018-01-10 15:13:07 +01:00
char namebuf [ max ( BTRFS_NAME_LEN , XATTR_NAME_MAX ) ] ;
2017-11-08 08:54:25 +08:00
read_extent_buffer ( leaf , namebuf ,
( unsigned long ) ( di + 1 ) , name_len ) ;
name_hash = btrfs_name_hash ( namebuf , name_len ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > offset ! = name_hash ) ) {
2019-03-20 16:07:27 +01:00
dir_item_err ( leaf , slot ,
2017-11-08 08:54:25 +08:00
" name hash mismatch with key, have 0x%016x expect 0x%016llx " ,
name_hash , key - > offset ) ;
return - EUCLEAN ;
}
}
cur + = total_size ;
di = ( struct btrfs_dir_item * ) ( ( void * ) di + total_size ) ;
}
return 0 ;
}
2019-03-20 16:18:57 +01:00
__printf ( 3 , 4 )
2018-07-03 17:10:05 +08:00
__cold
2019-03-20 16:18:57 +01:00
static void block_group_err ( const struct extent_buffer * eb , int slot ,
2018-07-03 17:10:05 +08:00
const char * fmt , . . . )
{
2019-03-20 16:18:57 +01:00
const struct btrfs_fs_info * fs_info = eb - > fs_info ;
2018-07-03 17:10:05 +08:00
struct btrfs_key key ;
struct va_format vaf ;
va_list args ;
btrfs_item_key_to_cpu ( eb , & key , slot ) ;
va_start ( args , fmt ) ;
vaf . fmt = fmt ;
vaf . va = & args ;
btrfs: tree-checker: dump the page status if hit something wrong
[BUG]
There is a bug report about very suspicious tree-checker got triggered:
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
SELinux: inode_doinit_use_xattr: getxattr returned 117 for dev=dm-0
ino=5737268
[ANALYZE]
The root cause is still unclear, but there are some clues already:
- Unaligned eb bytenr
The block bytenr is 8550954455682405139, which is not even aligned to
2.
This bytenr is fetched from extent buffer header, not from eb->start.
This means, at the initial time of read, eb header bytenr is still
correct (the very basis check to continue read), but later something
wrong happened, got at least the first page corrupted.
Thus we got such obviously incorrect value.
- Invalid extent buffer header owner
The read itself is triggered for subvolume 256, but the eb header
owner is 11858205567642294356, which is not really possible.
The problem here is, subvolume id is limited to (1 << 48 - 1),
and this one definitely goes beyond that limit.
So this value is another garbage.
We already got two garbage from an extent buffer, which passed the
initial bytenr and csum checks, but later the contents become garbage at
some point.
This looks like a page lifespan problem (e.g. we didn't properly hold the
page).
[ENHANCEMENT]
The current tree-checker only outputs things from the extent buffer,
nothing with the page status.
So this patch would enhance the tree-checker output by also dumping the
first page, which would look like this:
page:00000000aa9f3ce8 refcount:4 mapcount:0 mapping:00000000169aa6b6 index:0x1d0c pfn:0x1022e5
memcg:ffff888103456000
aops:btree_aops [btrfs] ino:1
flags: 0x2ffff0000008000(private|node=0|zone=2|lastcpupid=0xffff)
page_type: 0xffffffff()
raw: 02ffff0000008000 0000000000000000 dead000000000122 ffff88811e06e220
raw: 0000000000001d0c ffff888102fdb1d8 00000004ffffffff ffff888103456000
page dumped because: eb page dump
BTRFS critical (device dm-3): corrupt leaf: root=5 block=30457856 slot=6 ino=257 file_offset=0, invalid disk_bytenr for file extent, have 10617606235235216665, should be aligned to 4096
BTRFS error (device dm-3): read time tree block corruption detected on logical 30457856 mirror 1
From the dump we can see some extra info, something can help us to do
extra cross-checks:
- Page refcount
if it's too low, it definitely means something bad.
- Page aops
Any mapped eb page should have btree_aops with inode number 1.
- Page index
Since a mapped eb page should has its bytenr matching the page
position, (index << PAGE_SHIFT) should match the bytenr of the
bytenr from the critical line.
- Page Private flags
A mapped eb page should have Private flag set to indicate it's managed
by btrfs.
Link: https://lore.kernel.org/linux-btrfs/CAHk-=whNdMaN9ntZ47XRKP6DBes2E5w7fi-0U3H2+PS18p+Pzw@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-27 10:18:36 +10:30
dump_page ( folio_page ( eb - > folios [ 0 ] , 0 ) , " eb page dump " ) ;
2018-07-03 17:10:05 +08:00
btrfs_crit ( fs_info ,
" corrupt %s: root=%llu block=%llu slot=%d bg_start=%llu bg_len=%llu, %pV " ,
btrfs_header_level ( eb ) = = 0 ? " leaf " : " node " ,
btrfs_header_owner ( eb ) , btrfs_header_bytenr ( eb ) , slot ,
key . objectid , key . offset , & vaf ) ;
va_end ( args ) ;
}
2019-03-20 16:19:31 +01:00
static int check_block_group_item ( struct extent_buffer * leaf ,
2018-07-03 17:10:05 +08:00
struct btrfs_key * key , int slot )
{
2021-12-15 15:40:08 -05:00
struct btrfs_fs_info * fs_info = leaf - > fs_info ;
2018-07-03 17:10:05 +08:00
struct btrfs_block_group_item bgi ;
2021-10-21 14:58:35 -04:00
u32 item_size = btrfs_item_size ( leaf , slot ) ;
2021-12-15 15:40:08 -05:00
u64 chunk_objectid ;
2018-07-03 17:10:05 +08:00
u64 flags ;
u64 type ;
/*
* Here we don ' t really care about alignment since extent allocator can
btrfs: tree-checker: Don't check max block group size as current max chunk size limit is unreliable
[BUG]
A completely valid btrfs will refuse to mount, with error message like:
BTRFS critical (device sdb2): corrupt leaf: root=2 block=239681536 slot=172 \
bg_start=12018974720 bg_len=10888413184, invalid block group size, \
have 10888413184 expect (0, 10737418240]
This has been reported several times as the 4.19 kernel is now being
used. The filesystem refuses to mount, but is otherwise ok and booting
4.18 is a workaround.
Btrfs check returns no error, and all kernels used on this fs is later
than 2011, which should all have the 10G size limit commit.
[CAUSE]
For a 12 devices btrfs, we could allocate a chunk larger than 10G due to
stripe stripe bump up.
__btrfs_alloc_chunk()
|- max_stripe_size = 1G
|- max_chunk_size = 10G
|- data_stripe = 11
|- if (1G * 11 > 10G) {
stripe_size = 976128930;
stripe_size = round_up(976128930, SZ_16M) = 989855744
However the final stripe_size (989855744) * 11 = 10888413184, which is
still larger than 10G.
[FIX]
For the comprehensive check, we need to do the full check at chunk read
time, and rely on bg <-> chunk mapping to do the check.
We could just skip the length check for now.
Fixes: fce466eab7ac ("btrfs: tree-checker: Verify block_group_item")
Cc: stable@vger.kernel.org # v4.19+
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-11-23 09:06:36 +08:00
* handle it . We care more about the size .
2018-07-03 17:10:05 +08:00
*/
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > offset = = 0 ) ) {
2019-03-20 16:18:57 +01:00
block_group_err ( leaf , slot ,
btrfs: tree-checker: Don't check max block group size as current max chunk size limit is unreliable
[BUG]
A completely valid btrfs will refuse to mount, with error message like:
BTRFS critical (device sdb2): corrupt leaf: root=2 block=239681536 slot=172 \
bg_start=12018974720 bg_len=10888413184, invalid block group size, \
have 10888413184 expect (0, 10737418240]
This has been reported several times as the 4.19 kernel is now being
used. The filesystem refuses to mount, but is otherwise ok and booting
4.18 is a workaround.
Btrfs check returns no error, and all kernels used on this fs is later
than 2011, which should all have the 10G size limit commit.
[CAUSE]
For a 12 devices btrfs, we could allocate a chunk larger than 10G due to
stripe stripe bump up.
__btrfs_alloc_chunk()
|- max_stripe_size = 1G
|- max_chunk_size = 10G
|- data_stripe = 11
|- if (1G * 11 > 10G) {
stripe_size = 976128930;
stripe_size = round_up(976128930, SZ_16M) = 989855744
However the final stripe_size (989855744) * 11 = 10888413184, which is
still larger than 10G.
[FIX]
For the comprehensive check, we need to do the full check at chunk read
time, and rely on bg <-> chunk mapping to do the check.
We could just skip the length check for now.
Fixes: fce466eab7ac ("btrfs: tree-checker: Verify block_group_item")
Cc: stable@vger.kernel.org # v4.19+
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-11-23 09:06:36 +08:00
" invalid block group size 0 " ) ;
2018-07-03 17:10:05 +08:00
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( item_size ! = sizeof ( bgi ) ) ) {
2019-03-20 16:18:57 +01:00
block_group_err ( leaf , slot ,
2018-07-03 17:10:05 +08:00
" invalid item size, have %u expect %zu " ,
item_size , sizeof ( bgi ) ) ;
return - EUCLEAN ;
}
read_extent_buffer ( leaf , & bgi , btrfs_item_ptr_offset ( leaf , slot ) ,
sizeof ( bgi ) ) ;
2021-12-15 15:40:08 -05:00
chunk_objectid = btrfs_stack_block_group_chunk_objectid ( & bgi ) ;
if ( btrfs_fs_incompat ( fs_info , EXTENT_TREE_V2 ) ) {
/*
* We don ' t init the nr_global_roots until we load the global
* roots , so this could be 0 at mount time . If it ' s 0 we ' ll
* just assume we ' re fine , and later we ' ll check against our
* actual value .
*/
if ( unlikely ( fs_info - > nr_global_roots & &
chunk_objectid > = fs_info - > nr_global_roots ) ) {
block_group_err ( leaf , slot ,
" invalid block group global root id, have %llu, needs to be <= %llu " ,
chunk_objectid ,
fs_info - > nr_global_roots ) ;
return - EUCLEAN ;
}
} else if ( unlikely ( chunk_objectid ! = BTRFS_FIRST_CHUNK_TREE_OBJECTID ) ) {
2019-03-20 16:18:57 +01:00
block_group_err ( leaf , slot ,
2018-07-03 17:10:05 +08:00
" invalid block group chunk objectid, have %llu expect %llu " ,
2019-10-23 18:48:18 +02:00
btrfs_stack_block_group_chunk_objectid ( & bgi ) ,
2018-07-03 17:10:05 +08:00
BTRFS_FIRST_CHUNK_TREE_OBJECTID ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_stack_block_group_used ( & bgi ) > key - > offset ) ) {
2019-03-20 16:18:57 +01:00
block_group_err ( leaf , slot ,
2018-07-03 17:10:05 +08:00
" invalid block group used, have %llu expect [0, %llu) " ,
2019-10-23 18:48:18 +02:00
btrfs_stack_block_group_used ( & bgi ) , key - > offset ) ;
2018-07-03 17:10:05 +08:00
return - EUCLEAN ;
}
2019-10-23 18:48:18 +02:00
flags = btrfs_stack_block_group_flags ( & bgi ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( hweight64 ( flags & BTRFS_BLOCK_GROUP_PROFILE_MASK ) > 1 ) ) {
2019-03-20 16:18:57 +01:00
block_group_err ( leaf , slot ,
2018-07-03 17:10:05 +08:00
" invalid profile flags, have 0x%llx (%lu bits set) expect no more than 1 bit set " ,
flags & BTRFS_BLOCK_GROUP_PROFILE_MASK ,
hweight64 ( flags & BTRFS_BLOCK_GROUP_PROFILE_MASK ) ) ;
return - EUCLEAN ;
}
type = flags & BTRFS_BLOCK_GROUP_TYPE_MASK ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( type ! = BTRFS_BLOCK_GROUP_DATA & &
type ! = BTRFS_BLOCK_GROUP_METADATA & &
type ! = BTRFS_BLOCK_GROUP_SYSTEM & &
type ! = ( BTRFS_BLOCK_GROUP_METADATA |
BTRFS_BLOCK_GROUP_DATA ) ) ) {
2019-03-20 16:18:57 +01:00
block_group_err ( leaf , slot ,
2018-11-05 18:49:09 +08:00
" invalid type, have 0x%llx (%lu bits set) expect either 0x%llx, 0x%llx, 0x%llx or 0x%llx " ,
2018-07-03 17:10:05 +08:00
type , hweight64 ( type ) ,
BTRFS_BLOCK_GROUP_DATA , BTRFS_BLOCK_GROUP_METADATA ,
BTRFS_BLOCK_GROUP_SYSTEM ,
BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA ) ;
return - EUCLEAN ;
}
return 0 ;
2019-03-20 13:16:42 +08:00
}
2019-03-20 16:22:58 +01:00
__printf ( 4 , 5 )
2019-03-20 13:36:06 +08:00
__cold
2019-03-20 16:22:58 +01:00
static void chunk_err ( const struct extent_buffer * leaf ,
2019-03-20 13:36:06 +08:00
const struct btrfs_chunk * chunk , u64 logical ,
const char * fmt , . . . )
{
2019-03-20 16:22:58 +01:00
const struct btrfs_fs_info * fs_info = leaf - > fs_info ;
2019-03-20 13:36:06 +08:00
bool is_sb ;
struct va_format vaf ;
va_list args ;
int i ;
int slot = - 1 ;
/* Only superblock eb is able to have such small offset */
is_sb = ( leaf - > start = = BTRFS_SUPER_INFO_OFFSET ) ;
if ( ! is_sb ) {
/*
* Get the slot number by iterating through all slots , this
* would provide better readability .
*/
for ( i = 0 ; i < btrfs_header_nritems ( leaf ) ; i + + ) {
if ( btrfs_item_ptr_offset ( leaf , i ) = =
( unsigned long ) chunk ) {
slot = i ;
break ;
}
}
}
va_start ( args , fmt ) ;
vaf . fmt = fmt ;
vaf . va = & args ;
if ( is_sb )
btrfs_crit ( fs_info ,
" corrupt superblock syschunk array: chunk_start=%llu, %pV " ,
logical , & vaf ) ;
else
btrfs_crit ( fs_info ,
" corrupt leaf: root=%llu block=%llu slot=%d chunk_start=%llu, %pV " ,
BTRFS_CHUNK_TREE_OBJECTID , leaf - > start , slot ,
logical , & vaf ) ;
va_end ( args ) ;
}
2019-03-20 13:16:42 +08:00
/*
* The common chunk check which could also work on super block sys chunk array .
*
2019-03-20 13:39:14 +08:00
* Return - EUCLEAN if anything is corrupted .
2019-03-20 13:16:42 +08:00
* Return 0 if everything is OK .
*/
2019-03-20 16:40:48 +01:00
int btrfs_check_chunk_valid ( struct extent_buffer * leaf ,
2019-03-20 13:16:42 +08:00
struct btrfs_chunk * chunk , u64 logical )
{
2019-03-20 16:40:48 +01:00
struct btrfs_fs_info * fs_info = leaf - > fs_info ;
2019-03-20 13:16:42 +08:00
u64 length ;
2021-01-03 17:28:04 +08:00
u64 chunk_end ;
2019-03-20 13:16:42 +08:00
u64 stripe_len ;
u16 num_stripes ;
u16 sub_stripes ;
u64 type ;
u64 features ;
bool mixed = false ;
2020-10-08 18:09:10 -07:00
int raid_index ;
int nparity ;
int ncopies ;
2019-03-20 13:16:42 +08:00
length = btrfs_chunk_length ( leaf , chunk ) ;
stripe_len = btrfs_chunk_stripe_len ( leaf , chunk ) ;
num_stripes = btrfs_chunk_num_stripes ( leaf , chunk ) ;
sub_stripes = btrfs_chunk_sub_stripes ( leaf , chunk ) ;
type = btrfs_chunk_type ( leaf , chunk ) ;
2020-10-08 18:09:10 -07:00
raid_index = btrfs_bg_flags_to_raid_index ( type ) ;
ncopies = btrfs_raid_array [ raid_index ] . ncopies ;
nparity = btrfs_raid_array [ raid_index ] . nparity ;
2019-03-20 13:16:42 +08:00
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! num_stripes ) ) {
2019-03-20 16:22:58 +01:00
chunk_err ( leaf , chunk , logical ,
2019-03-20 13:36:06 +08:00
" invalid chunk num_stripes, have %u " , num_stripes ) ;
2019-03-20 13:39:14 +08:00
return - EUCLEAN ;
2019-03-20 13:16:42 +08:00
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( num_stripes < ncopies ) ) {
2020-10-08 18:09:10 -07:00
chunk_err ( leaf , chunk , logical ,
" invalid chunk num_stripes < ncopies, have %u < %d " ,
num_stripes , ncopies ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( nparity & & num_stripes = = nparity ) ) {
2020-10-08 18:09:10 -07:00
chunk_err ( leaf , chunk , logical ,
" invalid chunk num_stripes == nparity, have %u == %d " ,
num_stripes , nparity ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! IS_ALIGNED ( logical , fs_info - > sectorsize ) ) ) {
2019-03-20 16:22:58 +01:00
chunk_err ( leaf , chunk , logical ,
2019-03-20 13:36:06 +08:00
" invalid chunk logical, have %llu should aligned to %u " ,
logical , fs_info - > sectorsize ) ;
2019-03-20 13:39:14 +08:00
return - EUCLEAN ;
2019-03-20 13:16:42 +08:00
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_chunk_sector_size ( leaf , chunk ) ! = fs_info - > sectorsize ) ) {
2019-03-20 16:22:58 +01:00
chunk_err ( leaf , chunk , logical ,
2019-03-20 13:36:06 +08:00
" invalid chunk sectorsize, have %u expect %u " ,
btrfs_chunk_sector_size ( leaf , chunk ) ,
fs_info - > sectorsize ) ;
2019-03-20 13:39:14 +08:00
return - EUCLEAN ;
2019-03-20 13:16:42 +08:00
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! length | | ! IS_ALIGNED ( length , fs_info - > sectorsize ) ) ) {
2019-03-20 16:22:58 +01:00
chunk_err ( leaf , chunk , logical ,
2019-03-20 13:36:06 +08:00
" invalid chunk length, have %llu " , length ) ;
2019-03-20 13:39:14 +08:00
return - EUCLEAN ;
2019-03-20 13:16:42 +08:00
}
2021-01-03 17:28:04 +08:00
if ( unlikely ( check_add_overflow ( logical , length , & chunk_end ) ) ) {
chunk_err ( leaf , chunk , logical ,
" invalid chunk logical start and length, have logical start %llu length %llu " ,
logical , length ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! is_power_of_2 ( stripe_len ) | | stripe_len ! = BTRFS_STRIPE_LEN ) ) {
2019-03-20 16:22:58 +01:00
chunk_err ( leaf , chunk , logical ,
2019-03-20 13:36:06 +08:00
" invalid chunk stripe length: %llu " ,
2019-03-20 13:16:42 +08:00
stripe_len ) ;
2019-03-20 13:39:14 +08:00
return - EUCLEAN ;
2019-03-20 13:16:42 +08:00
}
2023-02-17 13:36:59 +08:00
/*
* We artificially limit the chunk size , so that the number of stripes
* inside a chunk can be fit into a U32 . The current limit ( 256 G ) is
* way too large for real world usage anyway , and it ' s also much larger
* than our existing limit ( 10 G ) .
*
* Thus it should be a good way to catch obvious bitflips .
*/
2023-06-22 14:42:40 +08:00
if ( unlikely ( length > = btrfs_stripe_nr_to_offset ( U32_MAX ) ) ) {
2023-02-17 13:36:59 +08:00
chunk_err ( leaf , chunk , logical ,
" chunk length too large: have %llu limit %llu " ,
2023-06-22 14:42:40 +08:00
length , btrfs_stripe_nr_to_offset ( U32_MAX ) ) ;
2023-02-17 13:36:59 +08:00
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( type & ~ ( BTRFS_BLOCK_GROUP_TYPE_MASK |
BTRFS_BLOCK_GROUP_PROFILE_MASK ) ) ) {
2019-03-20 16:22:58 +01:00
chunk_err ( leaf , chunk , logical ,
2019-03-20 13:36:06 +08:00
" unrecognized chunk type: 0x%llx " ,
2019-03-20 13:16:42 +08:00
~ ( BTRFS_BLOCK_GROUP_TYPE_MASK |
BTRFS_BLOCK_GROUP_PROFILE_MASK ) &
btrfs_chunk_type ( leaf , chunk ) ) ;
2019-03-20 13:39:14 +08:00
return - EUCLEAN ;
2019-03-20 13:16:42 +08:00
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! has_single_bit_set ( type & BTRFS_BLOCK_GROUP_PROFILE_MASK ) & &
( type & BTRFS_BLOCK_GROUP_PROFILE_MASK ) ! = 0 ) ) {
2019-03-20 16:22:58 +01:00
chunk_err ( leaf , chunk , logical ,
2019-03-13 12:17:50 +08:00
" invalid chunk profile flag: 0x%llx, expect 0 or 1 bit set " ,
type & BTRFS_BLOCK_GROUP_PROFILE_MASK ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( ( type & BTRFS_BLOCK_GROUP_TYPE_MASK ) = = 0 ) ) {
2019-03-20 16:22:58 +01:00
chunk_err ( leaf , chunk , logical ,
2019-03-20 13:36:06 +08:00
" missing chunk type flag, have 0x%llx one bit must be set in 0x%llx " ,
type , BTRFS_BLOCK_GROUP_TYPE_MASK ) ;
2019-03-20 13:39:14 +08:00
return - EUCLEAN ;
2019-03-20 13:16:42 +08:00
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( ( type & BTRFS_BLOCK_GROUP_SYSTEM ) & &
( type & ( BTRFS_BLOCK_GROUP_METADATA |
BTRFS_BLOCK_GROUP_DATA ) ) ) ) {
2019-03-20 16:22:58 +01:00
chunk_err ( leaf , chunk , logical ,
2019-03-20 13:36:06 +08:00
" system chunk with data or metadata type: 0x%llx " ,
type ) ;
2019-03-20 13:39:14 +08:00
return - EUCLEAN ;
2019-03-20 13:16:42 +08:00
}
features = btrfs_super_incompat_flags ( fs_info - > super_copy ) ;
if ( features & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS )
mixed = true ;
if ( ! mixed ) {
2020-11-04 16:12:45 +01:00
if ( unlikely ( ( type & BTRFS_BLOCK_GROUP_METADATA ) & &
( type & BTRFS_BLOCK_GROUP_DATA ) ) ) {
2019-03-20 16:22:58 +01:00
chunk_err ( leaf , chunk , logical ,
2019-03-20 13:16:42 +08:00
" mixed chunk type in non-mixed mode: 0x%llx " , type ) ;
2019-03-20 13:39:14 +08:00
return - EUCLEAN ;
2019-03-20 13:16:42 +08:00
}
}
2021-07-26 14:15:15 +02:00
if ( unlikely ( ( type & BTRFS_BLOCK_GROUP_RAID10 & &
sub_stripes ! = btrfs_raid_array [ BTRFS_RAID_RAID10 ] . sub_stripes ) | |
( type & BTRFS_BLOCK_GROUP_RAID1 & &
num_stripes ! = btrfs_raid_array [ BTRFS_RAID_RAID1 ] . devs_min ) | |
2021-07-26 14:15:17 +02:00
( type & BTRFS_BLOCK_GROUP_RAID1C3 & &
num_stripes ! = btrfs_raid_array [ BTRFS_RAID_RAID1C3 ] . devs_min ) | |
( type & BTRFS_BLOCK_GROUP_RAID1C4 & &
num_stripes ! = btrfs_raid_array [ BTRFS_RAID_RAID1C4 ] . devs_min ) | |
2021-07-26 14:15:15 +02:00
( type & BTRFS_BLOCK_GROUP_RAID5 & &
num_stripes < btrfs_raid_array [ BTRFS_RAID_RAID5 ] . devs_min ) | |
( type & BTRFS_BLOCK_GROUP_RAID6 & &
num_stripes < btrfs_raid_array [ BTRFS_RAID_RAID6 ] . devs_min ) | |
( type & BTRFS_BLOCK_GROUP_DUP & &
num_stripes ! = btrfs_raid_array [ BTRFS_RAID_DUP ] . dev_stripes ) | |
2020-11-04 16:12:45 +01:00
( ( type & BTRFS_BLOCK_GROUP_PROFILE_MASK ) = = 0 & &
2021-07-26 14:15:15 +02:00
num_stripes ! = btrfs_raid_array [ BTRFS_RAID_SINGLE ] . dev_stripes ) ) ) {
2019-03-20 16:22:58 +01:00
chunk_err ( leaf , chunk , logical ,
2019-03-20 13:16:42 +08:00
" invalid num_stripes:sub_stripes %u:%u for profile %llu " ,
num_stripes , sub_stripes ,
type & BTRFS_BLOCK_GROUP_PROFILE_MASK ) ;
2019-03-20 13:39:14 +08:00
return - EUCLEAN ;
2019-03-20 13:16:42 +08:00
}
return 0 ;
2018-07-03 17:10:05 +08:00
}
2019-12-17 18:58:20 +08:00
/*
* Enhanced version of chunk item checker .
*
* The common btrfs_check_chunk_valid ( ) doesn ' t check item size since it needs
* to work on super block sys_chunk_array which doesn ' t have full item ptr .
*/
static int check_leaf_chunk_item ( struct extent_buffer * leaf ,
struct btrfs_chunk * chunk ,
struct btrfs_key * key , int slot )
{
int num_stripes ;
2021-10-21 14:58:35 -04:00
if ( unlikely ( btrfs_item_size ( leaf , slot ) < sizeof ( struct btrfs_chunk ) ) ) {
2019-12-17 18:58:20 +08:00
chunk_err ( leaf , chunk , key - > offset ,
" invalid chunk item size: have %u expect [%zu, %u) " ,
2021-10-21 14:58:35 -04:00
btrfs_item_size ( leaf , slot ) ,
2019-12-17 18:58:20 +08:00
sizeof ( struct btrfs_chunk ) ,
BTRFS_LEAF_DATA_SIZE ( leaf - > fs_info ) ) ;
return - EUCLEAN ;
}
num_stripes = btrfs_chunk_num_stripes ( leaf , chunk ) ;
/* Let btrfs_check_chunk_valid() handle this error type */
if ( num_stripes = = 0 )
goto out ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_chunk_item_size ( num_stripes ) ! =
2021-10-21 14:58:35 -04:00
btrfs_item_size ( leaf , slot ) ) ) {
2019-12-17 18:58:20 +08:00
chunk_err ( leaf , chunk , key - > offset ,
" invalid chunk item size: have %u expect %lu " ,
2021-10-21 14:58:35 -04:00
btrfs_item_size ( leaf , slot ) ,
2019-12-17 18:58:20 +08:00
btrfs_chunk_item_size ( num_stripes ) ) ;
return - EUCLEAN ;
}
out :
return btrfs_check_chunk_valid ( leaf , chunk , key - > offset ) ;
}
2019-03-20 16:22:58 +01:00
__printf ( 3 , 4 )
2019-03-08 14:20:03 +08:00
__cold
2019-03-20 16:22:58 +01:00
static void dev_item_err ( const struct extent_buffer * eb , int slot ,
2019-03-08 14:20:03 +08:00
const char * fmt , . . . )
{
struct btrfs_key key ;
struct va_format vaf ;
va_list args ;
btrfs_item_key_to_cpu ( eb , & key , slot ) ;
va_start ( args , fmt ) ;
vaf . fmt = fmt ;
vaf . va = & args ;
btrfs: tree-checker: dump the page status if hit something wrong
[BUG]
There is a bug report about very suspicious tree-checker got triggered:
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
SELinux: inode_doinit_use_xattr: getxattr returned 117 for dev=dm-0
ino=5737268
[ANALYZE]
The root cause is still unclear, but there are some clues already:
- Unaligned eb bytenr
The block bytenr is 8550954455682405139, which is not even aligned to
2.
This bytenr is fetched from extent buffer header, not from eb->start.
This means, at the initial time of read, eb header bytenr is still
correct (the very basis check to continue read), but later something
wrong happened, got at least the first page corrupted.
Thus we got such obviously incorrect value.
- Invalid extent buffer header owner
The read itself is triggered for subvolume 256, but the eb header
owner is 11858205567642294356, which is not really possible.
The problem here is, subvolume id is limited to (1 << 48 - 1),
and this one definitely goes beyond that limit.
So this value is another garbage.
We already got two garbage from an extent buffer, which passed the
initial bytenr and csum checks, but later the contents become garbage at
some point.
This looks like a page lifespan problem (e.g. we didn't properly hold the
page).
[ENHANCEMENT]
The current tree-checker only outputs things from the extent buffer,
nothing with the page status.
So this patch would enhance the tree-checker output by also dumping the
first page, which would look like this:
page:00000000aa9f3ce8 refcount:4 mapcount:0 mapping:00000000169aa6b6 index:0x1d0c pfn:0x1022e5
memcg:ffff888103456000
aops:btree_aops [btrfs] ino:1
flags: 0x2ffff0000008000(private|node=0|zone=2|lastcpupid=0xffff)
page_type: 0xffffffff()
raw: 02ffff0000008000 0000000000000000 dead000000000122 ffff88811e06e220
raw: 0000000000001d0c ffff888102fdb1d8 00000004ffffffff ffff888103456000
page dumped because: eb page dump
BTRFS critical (device dm-3): corrupt leaf: root=5 block=30457856 slot=6 ino=257 file_offset=0, invalid disk_bytenr for file extent, have 10617606235235216665, should be aligned to 4096
BTRFS error (device dm-3): read time tree block corruption detected on logical 30457856 mirror 1
From the dump we can see some extra info, something can help us to do
extra cross-checks:
- Page refcount
if it's too low, it definitely means something bad.
- Page aops
Any mapped eb page should have btree_aops with inode number 1.
- Page index
Since a mapped eb page should has its bytenr matching the page
position, (index << PAGE_SHIFT) should match the bytenr of the
bytenr from the critical line.
- Page Private flags
A mapped eb page should have Private flag set to indicate it's managed
by btrfs.
Link: https://lore.kernel.org/linux-btrfs/CAHk-=whNdMaN9ntZ47XRKP6DBes2E5w7fi-0U3H2+PS18p+Pzw@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-27 10:18:36 +10:30
dump_page ( folio_page ( eb - > folios [ 0 ] , 0 ) , " eb page dump " ) ;
2019-03-20 16:22:58 +01:00
btrfs_crit ( eb - > fs_info ,
2019-03-08 14:20:03 +08:00
" corrupt %s: root=%llu block=%llu slot=%d devid=%llu %pV " ,
btrfs_header_level ( eb ) = = 0 ? " leaf " : " node " ,
btrfs_header_owner ( eb ) , btrfs_header_bytenr ( eb ) , slot ,
key . objectid , & vaf ) ;
va_end ( args ) ;
}
2019-03-20 16:22:58 +01:00
static int check_dev_item ( struct extent_buffer * leaf ,
2019-03-08 14:20:03 +08:00
struct btrfs_key * key , int slot )
{
struct btrfs_dev_item * ditem ;
2022-01-21 17:33:35 +08:00
const u32 item_size = btrfs_item_size ( leaf , slot ) ;
2019-03-08 14:20:03 +08:00
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > objectid ! = BTRFS_DEV_ITEMS_OBJECTID ) ) {
2019-03-20 16:22:58 +01:00
dev_item_err ( leaf , slot ,
2019-03-08 14:20:03 +08:00
" invalid objectid: has=%llu expect=%llu " ,
key - > objectid , BTRFS_DEV_ITEMS_OBJECTID ) ;
return - EUCLEAN ;
}
2022-01-21 17:33:35 +08:00
if ( unlikely ( item_size ! = sizeof ( * ditem ) ) ) {
dev_item_err ( leaf , slot , " invalid item size: has %u expect %zu " ,
item_size , sizeof ( * ditem ) ) ;
return - EUCLEAN ;
}
2019-03-08 14:20:03 +08:00
ditem = btrfs_item_ptr ( leaf , slot , struct btrfs_dev_item ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_device_id ( leaf , ditem ) ! = key - > offset ) ) {
2019-03-20 16:22:58 +01:00
dev_item_err ( leaf , slot ,
2019-03-08 14:20:03 +08:00
" devid mismatch: key has=%llu item has=%llu " ,
key - > offset , btrfs_device_id ( leaf , ditem ) ) ;
return - EUCLEAN ;
}
/*
* For device total_bytes , we don ' t have reliable way to check it , as
* it can be 0 for device removal . Device size check can only be done
* by dev extents check .
*/
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_device_bytes_used ( leaf , ditem ) >
btrfs_device_total_bytes ( leaf , ditem ) ) ) {
2019-03-20 16:22:58 +01:00
dev_item_err ( leaf , slot ,
2019-03-08 14:20:03 +08:00
" invalid bytes used: have %llu expect [0, %llu] " ,
btrfs_device_bytes_used ( leaf , ditem ) ,
btrfs_device_total_bytes ( leaf , ditem ) ) ;
return - EUCLEAN ;
}
/*
* Remaining members like io_align / type / gen / dev_group aren ' t really
* utilized . Skip them to make later usage of them easier .
*/
return 0 ;
}
2019-03-20 16:22:58 +01:00
static int check_inode_item ( struct extent_buffer * leaf ,
2019-03-13 14:31:35 +08:00
struct btrfs_key * key , int slot )
{
2019-03-20 16:22:58 +01:00
struct btrfs_fs_info * fs_info = leaf - > fs_info ;
2019-03-13 14:31:35 +08:00
struct btrfs_inode_item * iitem ;
u64 super_gen = btrfs_super_generation ( fs_info - > super_copy ) ;
u32 valid_mask = ( S_IFMT | S_ISUID | S_ISGID | S_ISVTX | 0777 ) ;
2022-01-21 17:33:34 +08:00
const u32 item_size = btrfs_item_size ( leaf , slot ) ;
2019-03-13 14:31:35 +08:00
u32 mode ;
2019-12-09 18:54:33 +08:00
int ret ;
btrfs: add ro compat flags to inodes
Currently, inode flags are fully backwards incompatible in btrfs. If we
introduce a new inode flag, then tree-checker will detect it and fail.
This can even cause us to fail to mount entirely. To make it possible to
introduce new flags which can be read-only compatible, like VERITY, we
add new ro flags to btrfs without treating them quite so harshly in
tree-checker. A read-only file system can survive an unexpected flag,
and can be mounted.
As for the implementation, it unfortunately gets a little complicated.
The on-disk representation of the inode, btrfs_inode_item, has an __le64
for flags but the in-memory representation, btrfs_inode, uses a u32.
David Sterba had the nice idea that we could reclaim those wasted 32 bits
on disk and use them for the new ro_compat flags.
It turns out that the tree-checker code which checks for unknown flags
is broken, and ignores the upper 32 bits we are hoping to use. The issue
is that the flags use the literal 1 rather than 1ULL, so the flags are
signed ints, and one of them is specifically (1 << 31). As a result, the
mask which ORs the flags is a negative integer on machines where int is
32 bit twos complement. When tree-checker evaluates the expression:
btrfs_inode_flags(leaf, iitem) & ~BTRFS_INODE_FLAG_MASK)
The mask is something like 0x80000abc, which gets promoted to u64 with
sign extension to 0xffffffff80000abc. Negating that 64 bit mask leaves
all the upper bits zeroed, and we can't detect unexpected flags.
This suggests that we can't use those bits after all. Luckily, we have
good reason to believe that they are zero anyway. Inode flags are
metadata, which is always checksummed, so any bit flips that would
introduce 1s would cause a checksum failure anyway (excluding the
improbable case of the checksum getting corrupted exactly badly).
Further, unless the 1 << 31 flag is used, the cast to u64 of the 32 bit
inode flag should preserve its value and not add leading zeroes
(at least for twos complement). The only place that flag
(BTRFS_INODE_ROOT_ITEM_INIT) is used is in a special inode embedded in
the root item, and indeed for that inode we see 0xffffffff80000000 as
the flags on disk. However, that inode is never seen by tree checker,
nor is it used in a context where verity might be meaningful.
Theoretically, a future ro flag might cause trouble on that inode, so we
should proactively clean up that mess before it does.
With the introduction of the new ro flags, keep two separate unsigned
masks and check them against the appropriate u32. Since we no longer run
afoul of sign extension, this also stops writing out 0xffffffff80000000
in root_item inodes going forward.
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-30 13:01:48 -07:00
u32 flags ;
u32 ro_flags ;
2019-12-09 18:54:33 +08:00
ret = check_inode_key ( leaf , key , slot ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( ret < 0 ) )
2019-12-09 18:54:33 +08:00
return ret ;
2019-03-13 14:31:35 +08:00
2022-01-21 17:33:34 +08:00
if ( unlikely ( item_size ! = sizeof ( * iitem ) ) ) {
generic_err ( leaf , slot , " invalid item size: has %u expect %zu " ,
item_size , sizeof ( * iitem ) ) ;
return - EUCLEAN ;
}
2019-03-13 14:31:35 +08:00
iitem = btrfs_item_ptr ( leaf , slot , struct btrfs_inode_item ) ;
/* Here we use super block generation + 1 to handle log tree */
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_inode_generation ( leaf , iitem ) > super_gen + 1 ) ) {
2019-12-09 18:54:32 +08:00
inode_item_err ( leaf , slot ,
2019-03-13 14:31:35 +08:00
" invalid inode generation: has %llu expect (0, %llu] " ,
btrfs_inode_generation ( leaf , iitem ) ,
super_gen + 1 ) ;
return - EUCLEAN ;
}
/* Note for ROOT_TREE_DIR_ITEM, mkfs could set its transid 0 */
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_inode_transid ( leaf , iitem ) > super_gen + 1 ) ) {
2019-12-09 18:54:32 +08:00
inode_item_err ( leaf , slot ,
2020-08-25 21:42:51 +08:00
" invalid inode transid: has %llu expect [0, %llu] " ,
2019-03-13 14:31:35 +08:00
btrfs_inode_transid ( leaf , iitem ) , super_gen + 1 ) ;
return - EUCLEAN ;
}
/*
* For size and nbytes it ' s better not to be too strict , as for dir
* item its size / nbytes can easily get wrong , but doesn ' t affect
* anything in the fs . So here we skip the check .
*/
mode = btrfs_inode_mode ( leaf , iitem ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( mode & ~ valid_mask ) ) {
2019-12-09 18:54:32 +08:00
inode_item_err ( leaf , slot ,
2019-03-13 14:31:35 +08:00
" unknown mode bit detected: 0x%x " ,
mode & ~ valid_mask ) ;
return - EUCLEAN ;
}
/*
2019-10-01 19:44:42 +02:00
* S_IFMT is not bit mapped so we can ' t completely rely on
* is_power_of_2 / has_single_bit_set , but it can save us from checking
* FIFO / CHR / DIR / REG . Only needs to check BLK , LNK and SOCKS
2019-03-13 14:31:35 +08:00
*/
2019-10-01 19:44:42 +02:00
if ( ! has_single_bit_set ( mode & S_IFMT ) ) {
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! S_ISLNK ( mode ) & & ! S_ISBLK ( mode ) & & ! S_ISSOCK ( mode ) ) ) {
2019-12-09 18:54:32 +08:00
inode_item_err ( leaf , slot ,
2019-03-13 14:31:35 +08:00
" invalid mode: has 0%o expect valid S_IF* bit(s) " ,
mode & S_IFMT ) ;
return - EUCLEAN ;
}
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( S_ISDIR ( mode ) & & btrfs_inode_nlink ( leaf , iitem ) > 1 ) ) {
2019-12-09 18:54:32 +08:00
inode_item_err ( leaf , slot ,
2019-03-13 14:31:35 +08:00
" invalid nlink: has %u expect no more than 1 for dir " ,
btrfs_inode_nlink ( leaf , iitem ) ) ;
return - EUCLEAN ;
}
btrfs: add ro compat flags to inodes
Currently, inode flags are fully backwards incompatible in btrfs. If we
introduce a new inode flag, then tree-checker will detect it and fail.
This can even cause us to fail to mount entirely. To make it possible to
introduce new flags which can be read-only compatible, like VERITY, we
add new ro flags to btrfs without treating them quite so harshly in
tree-checker. A read-only file system can survive an unexpected flag,
and can be mounted.
As for the implementation, it unfortunately gets a little complicated.
The on-disk representation of the inode, btrfs_inode_item, has an __le64
for flags but the in-memory representation, btrfs_inode, uses a u32.
David Sterba had the nice idea that we could reclaim those wasted 32 bits
on disk and use them for the new ro_compat flags.
It turns out that the tree-checker code which checks for unknown flags
is broken, and ignores the upper 32 bits we are hoping to use. The issue
is that the flags use the literal 1 rather than 1ULL, so the flags are
signed ints, and one of them is specifically (1 << 31). As a result, the
mask which ORs the flags is a negative integer on machines where int is
32 bit twos complement. When tree-checker evaluates the expression:
btrfs_inode_flags(leaf, iitem) & ~BTRFS_INODE_FLAG_MASK)
The mask is something like 0x80000abc, which gets promoted to u64 with
sign extension to 0xffffffff80000abc. Negating that 64 bit mask leaves
all the upper bits zeroed, and we can't detect unexpected flags.
This suggests that we can't use those bits after all. Luckily, we have
good reason to believe that they are zero anyway. Inode flags are
metadata, which is always checksummed, so any bit flips that would
introduce 1s would cause a checksum failure anyway (excluding the
improbable case of the checksum getting corrupted exactly badly).
Further, unless the 1 << 31 flag is used, the cast to u64 of the 32 bit
inode flag should preserve its value and not add leading zeroes
(at least for twos complement). The only place that flag
(BTRFS_INODE_ROOT_ITEM_INIT) is used is in a special inode embedded in
the root item, and indeed for that inode we see 0xffffffff80000000 as
the flags on disk. However, that inode is never seen by tree checker,
nor is it used in a context where verity might be meaningful.
Theoretically, a future ro flag might cause trouble on that inode, so we
should proactively clean up that mess before it does.
With the introduction of the new ro flags, keep two separate unsigned
masks and check them against the appropriate u32. Since we no longer run
afoul of sign extension, this also stops writing out 0xffffffff80000000
in root_item inodes going forward.
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-30 13:01:48 -07:00
btrfs_inode_split_flags ( btrfs_inode_flags ( leaf , iitem ) , & flags , & ro_flags ) ;
if ( unlikely ( flags & ~ BTRFS_INODE_FLAG_MASK ) ) {
2019-12-09 18:54:32 +08:00
inode_item_err ( leaf , slot ,
btrfs: add ro compat flags to inodes
Currently, inode flags are fully backwards incompatible in btrfs. If we
introduce a new inode flag, then tree-checker will detect it and fail.
This can even cause us to fail to mount entirely. To make it possible to
introduce new flags which can be read-only compatible, like VERITY, we
add new ro flags to btrfs without treating them quite so harshly in
tree-checker. A read-only file system can survive an unexpected flag,
and can be mounted.
As for the implementation, it unfortunately gets a little complicated.
The on-disk representation of the inode, btrfs_inode_item, has an __le64
for flags but the in-memory representation, btrfs_inode, uses a u32.
David Sterba had the nice idea that we could reclaim those wasted 32 bits
on disk and use them for the new ro_compat flags.
It turns out that the tree-checker code which checks for unknown flags
is broken, and ignores the upper 32 bits we are hoping to use. The issue
is that the flags use the literal 1 rather than 1ULL, so the flags are
signed ints, and one of them is specifically (1 << 31). As a result, the
mask which ORs the flags is a negative integer on machines where int is
32 bit twos complement. When tree-checker evaluates the expression:
btrfs_inode_flags(leaf, iitem) & ~BTRFS_INODE_FLAG_MASK)
The mask is something like 0x80000abc, which gets promoted to u64 with
sign extension to 0xffffffff80000abc. Negating that 64 bit mask leaves
all the upper bits zeroed, and we can't detect unexpected flags.
This suggests that we can't use those bits after all. Luckily, we have
good reason to believe that they are zero anyway. Inode flags are
metadata, which is always checksummed, so any bit flips that would
introduce 1s would cause a checksum failure anyway (excluding the
improbable case of the checksum getting corrupted exactly badly).
Further, unless the 1 << 31 flag is used, the cast to u64 of the 32 bit
inode flag should preserve its value and not add leading zeroes
(at least for twos complement). The only place that flag
(BTRFS_INODE_ROOT_ITEM_INIT) is used is in a special inode embedded in
the root item, and indeed for that inode we see 0xffffffff80000000 as
the flags on disk. However, that inode is never seen by tree checker,
nor is it used in a context where verity might be meaningful.
Theoretically, a future ro flag might cause trouble on that inode, so we
should proactively clean up that mess before it does.
With the introduction of the new ro flags, keep two separate unsigned
masks and check them against the appropriate u32. Since we no longer run
afoul of sign extension, this also stops writing out 0xffffffff80000000
in root_item inodes going forward.
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-30 13:01:48 -07:00
" unknown incompat flags detected: 0x%x " , flags ) ;
return - EUCLEAN ;
}
if ( unlikely ( ! sb_rdonly ( fs_info - > sb ) & &
( ro_flags & ~ BTRFS_INODE_RO_FLAG_MASK ) ) ) {
inode_item_err ( leaf , slot ,
" unknown ro-compat flags detected on writeable mount: 0x%x " ,
ro_flags ) ;
2019-03-13 14:31:35 +08:00
return - EUCLEAN ;
}
return 0 ;
}
2019-07-16 17:00:34 +08:00
static int check_root_item ( struct extent_buffer * leaf , struct btrfs_key * key ,
int slot )
{
struct btrfs_fs_info * fs_info = leaf - > fs_info ;
2020-09-22 10:37:01 +08:00
struct btrfs_root_item ri = { 0 } ;
2019-07-16 17:00:34 +08:00
const u64 valid_root_flags = BTRFS_ROOT_SUBVOL_RDONLY |
BTRFS_ROOT_SUBVOL_DEAD ;
2019-12-09 18:54:34 +08:00
int ret ;
2019-07-16 17:00:34 +08:00
2019-12-09 18:54:34 +08:00
ret = check_root_key ( leaf , key , slot ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( ret < 0 ) )
2019-12-09 18:54:34 +08:00
return ret ;
2019-07-16 17:00:34 +08:00
2021-10-21 14:58:35 -04:00
if ( unlikely ( btrfs_item_size ( leaf , slot ) ! = sizeof ( ri ) & &
btrfs_item_size ( leaf , slot ) ! =
2020-11-04 16:12:45 +01:00
btrfs_legacy_root_item_size ( ) ) ) {
2019-07-16 17:00:34 +08:00
generic_err ( leaf , slot ,
2020-09-22 10:37:01 +08:00
" invalid root item size, have %u expect %zu or %u " ,
2021-10-21 14:58:35 -04:00
btrfs_item_size ( leaf , slot ) , sizeof ( ri ) ,
2020-09-22 10:37:01 +08:00
btrfs_legacy_root_item_size ( ) ) ;
2020-11-12 17:55:06 -08:00
return - EUCLEAN ;
2019-07-16 17:00:34 +08:00
}
2020-09-22 10:37:01 +08:00
/*
* For legacy root item , the members starting at generation_v2 will be
* all filled with 0.
* And since we allow geneartion_v2 as 0 , it will still pass the check .
*/
2019-07-16 17:00:34 +08:00
read_extent_buffer ( leaf , & ri , btrfs_item_ptr_offset ( leaf , slot ) ,
2021-10-21 14:58:35 -04:00
btrfs_item_size ( leaf , slot ) ) ;
2019-07-16 17:00:34 +08:00
/* Generation related */
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_root_generation ( & ri ) >
btrfs_super_generation ( fs_info - > super_copy ) + 1 ) ) {
2019-07-16 17:00:34 +08:00
generic_err ( leaf , slot ,
" invalid root generation, have %llu expect (0, %llu] " ,
btrfs_root_generation ( & ri ) ,
btrfs_super_generation ( fs_info - > super_copy ) + 1 ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_root_generation_v2 ( & ri ) >
btrfs_super_generation ( fs_info - > super_copy ) + 1 ) ) {
2019-07-16 17:00:34 +08:00
generic_err ( leaf , slot ,
" invalid root v2 generation, have %llu expect (0, %llu] " ,
btrfs_root_generation_v2 ( & ri ) ,
btrfs_super_generation ( fs_info - > super_copy ) + 1 ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_root_last_snapshot ( & ri ) >
btrfs_super_generation ( fs_info - > super_copy ) + 1 ) ) {
2019-07-16 17:00:34 +08:00
generic_err ( leaf , slot ,
" invalid root last_snapshot, have %llu expect (0, %llu] " ,
btrfs_root_last_snapshot ( & ri ) ,
btrfs_super_generation ( fs_info - > super_copy ) + 1 ) ;
return - EUCLEAN ;
}
/* Alignment and level check */
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! IS_ALIGNED ( btrfs_root_bytenr ( & ri ) , fs_info - > sectorsize ) ) ) {
2019-07-16 17:00:34 +08:00
generic_err ( leaf , slot ,
" invalid root bytenr, have %llu expect to be aligned to %u " ,
btrfs_root_bytenr ( & ri ) , fs_info - > sectorsize ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_root_level ( & ri ) > = BTRFS_MAX_LEVEL ) ) {
2019-07-16 17:00:34 +08:00
generic_err ( leaf , slot ,
" invalid root level, have %u expect [0, %u] " ,
btrfs_root_level ( & ri ) , BTRFS_MAX_LEVEL - 1 ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_root_drop_level ( & ri ) > = BTRFS_MAX_LEVEL ) ) {
2019-07-16 17:00:34 +08:00
generic_err ( leaf , slot ,
" invalid root level, have %u expect [0, %u] " ,
2020-09-15 21:44:52 +02:00
btrfs_root_drop_level ( & ri ) , BTRFS_MAX_LEVEL - 1 ) ;
2019-07-16 17:00:34 +08:00
return - EUCLEAN ;
}
/* Flags check */
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_root_flags ( & ri ) & ~ valid_root_flags ) ) {
2019-07-16 17:00:34 +08:00
generic_err ( leaf , slot ,
" invalid root flags, have 0x%llx expect mask 0x%llx " ,
btrfs_root_flags ( & ri ) , valid_root_flags ) ;
return - EUCLEAN ;
}
return 0 ;
}
2019-08-09 09:24:22 +08:00
__printf ( 3 , 4 )
__cold
static void extent_err ( const struct extent_buffer * eb , int slot ,
const char * fmt , . . . )
{
struct btrfs_key key ;
struct va_format vaf ;
va_list args ;
u64 bytenr ;
u64 len ;
btrfs_item_key_to_cpu ( eb , & key , slot ) ;
bytenr = key . objectid ;
2019-08-09 09:24:23 +08:00
if ( key . type = = BTRFS_METADATA_ITEM_KEY | |
key . type = = BTRFS_TREE_BLOCK_REF_KEY | |
key . type = = BTRFS_SHARED_BLOCK_REF_KEY )
2019-08-09 09:24:22 +08:00
len = eb - > fs_info - > nodesize ;
else
len = key . offset ;
va_start ( args , fmt ) ;
vaf . fmt = fmt ;
vaf . va = & args ;
btrfs: tree-checker: dump the page status if hit something wrong
[BUG]
There is a bug report about very suspicious tree-checker got triggered:
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
SELinux: inode_doinit_use_xattr: getxattr returned 117 for dev=dm-0
ino=5737268
[ANALYZE]
The root cause is still unclear, but there are some clues already:
- Unaligned eb bytenr
The block bytenr is 8550954455682405139, which is not even aligned to
2.
This bytenr is fetched from extent buffer header, not from eb->start.
This means, at the initial time of read, eb header bytenr is still
correct (the very basis check to continue read), but later something
wrong happened, got at least the first page corrupted.
Thus we got such obviously incorrect value.
- Invalid extent buffer header owner
The read itself is triggered for subvolume 256, but the eb header
owner is 11858205567642294356, which is not really possible.
The problem here is, subvolume id is limited to (1 << 48 - 1),
and this one definitely goes beyond that limit.
So this value is another garbage.
We already got two garbage from an extent buffer, which passed the
initial bytenr and csum checks, but later the contents become garbage at
some point.
This looks like a page lifespan problem (e.g. we didn't properly hold the
page).
[ENHANCEMENT]
The current tree-checker only outputs things from the extent buffer,
nothing with the page status.
So this patch would enhance the tree-checker output by also dumping the
first page, which would look like this:
page:00000000aa9f3ce8 refcount:4 mapcount:0 mapping:00000000169aa6b6 index:0x1d0c pfn:0x1022e5
memcg:ffff888103456000
aops:btree_aops [btrfs] ino:1
flags: 0x2ffff0000008000(private|node=0|zone=2|lastcpupid=0xffff)
page_type: 0xffffffff()
raw: 02ffff0000008000 0000000000000000 dead000000000122 ffff88811e06e220
raw: 0000000000001d0c ffff888102fdb1d8 00000004ffffffff ffff888103456000
page dumped because: eb page dump
BTRFS critical (device dm-3): corrupt leaf: root=5 block=30457856 slot=6 ino=257 file_offset=0, invalid disk_bytenr for file extent, have 10617606235235216665, should be aligned to 4096
BTRFS error (device dm-3): read time tree block corruption detected on logical 30457856 mirror 1
From the dump we can see some extra info, something can help us to do
extra cross-checks:
- Page refcount
if it's too low, it definitely means something bad.
- Page aops
Any mapped eb page should have btree_aops with inode number 1.
- Page index
Since a mapped eb page should has its bytenr matching the page
position, (index << PAGE_SHIFT) should match the bytenr of the
bytenr from the critical line.
- Page Private flags
A mapped eb page should have Private flag set to indicate it's managed
by btrfs.
Link: https://lore.kernel.org/linux-btrfs/CAHk-=whNdMaN9ntZ47XRKP6DBes2E5w7fi-0U3H2+PS18p+Pzw@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-27 10:18:36 +10:30
dump_page ( folio_page ( eb - > folios [ 0 ] , 0 ) , " eb page dump " ) ;
2019-08-09 09:24:22 +08:00
btrfs_crit ( eb - > fs_info ,
" corrupt %s: block=%llu slot=%d extent bytenr=%llu len=%llu %pV " ,
btrfs_header_level ( eb ) = = 0 ? " leaf " : " node " ,
eb - > start , slot , bytenr , len , & vaf ) ;
va_end ( args ) ;
}
static int check_extent_item ( struct extent_buffer * leaf ,
2022-08-03 14:28:47 -04:00
struct btrfs_key * key , int slot ,
struct btrfs_key * prev_key )
2019-08-09 09:24:22 +08:00
{
struct btrfs_fs_info * fs_info = leaf - > fs_info ;
struct btrfs_extent_item * ei ;
bool is_tree_block = false ;
unsigned long ptr ; /* Current pointer inside inline refs */
unsigned long end ; /* Extent item end */
2021-10-21 14:58:35 -04:00
const u32 item_size = btrfs_item_size ( leaf , slot ) ;
btrfs: tree-checker: add type and sequence check for inline backrefs
[BUG]
There is a bug report that ntfs2btrfs had a bug that it can lead to
transaction abort and the filesystem flips to read-only.
[CAUSE]
For inline backref items, kernel has a strict requirement for their
ordered, they must follow the following rules:
- All btrfs_extent_inline_ref::type should be in an ascending order
- Within the same type, the items should follow a descending order by
their sequence number
For EXTENT_DATA_REF type, the sequence number is result from
hash_extent_data_ref().
For other types, their sequence numbers are
btrfs_extent_inline_ref::offset.
Thus if there is any code not following above rules, the resulted
inline backrefs can prevent the kernel to locate the needed inline
backref and lead to transaction abort.
[FIX]
Ntrfs2btrfs has already fixed the problem, and btrfs-progs has added the
ability to detect such problems.
For kernel, let's be more noisy and be more specific about the order, so
that the next time kernel hits such problem we would reject it in the
first place, without leading to transaction abort.
Link: https://github.com/kdave/btrfs-progs/pull/622
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-24 12:41:11 +10:30
u8 last_type = 0 ;
u64 last_seq = U64_MAX ;
2019-08-09 09:24:22 +08:00
u64 flags ;
u64 generation ;
u64 total_refs ; /* Total refs in btrfs_extent_item */
u64 inline_refs = 0 ; /* found total inline refs */
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > type = = BTRFS_METADATA_ITEM_KEY & &
! btrfs_fs_incompat ( fs_info , SKINNY_METADATA ) ) ) {
2019-08-09 09:24:22 +08:00
generic_err ( leaf , slot ,
" invalid key type, METADATA_ITEM type invalid when SKINNY_METADATA feature disabled " ) ;
return - EUCLEAN ;
}
/* key->objectid is the bytenr for both key types */
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! IS_ALIGNED ( key - > objectid , fs_info - > sectorsize ) ) ) {
2019-08-09 09:24:22 +08:00
generic_err ( leaf , slot ,
" invalid key objectid, have %llu expect to be aligned to %u " ,
key - > objectid , fs_info - > sectorsize ) ;
return - EUCLEAN ;
}
/* key->offset is tree level for METADATA_ITEM_KEY */
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > type = = BTRFS_METADATA_ITEM_KEY & &
key - > offset > = BTRFS_MAX_LEVEL ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" invalid tree level, have %llu expect [0, %u] " ,
key - > offset , BTRFS_MAX_LEVEL - 1 ) ;
return - EUCLEAN ;
}
/*
* EXTENT / METADATA_ITEM consists of :
* 1 ) One btrfs_extent_item
* Records the total refs , type and generation of the extent .
*
* 2 ) One btrfs_tree_block_info ( for EXTENT_ITEM and tree backref only )
* Records the first key and level of the tree block .
*
* 2 ) Zero or more btrfs_extent_inline_ref ( s )
* Each inline ref has one btrfs_extent_inline_ref shows :
* 2.1 ) The ref type , one of the 4
* TREE_BLOCK_REF Tree block only
* SHARED_BLOCK_REF Tree block only
* EXTENT_DATA_REF Data only
* SHARED_DATA_REF Data only
* 2.2 ) Ref type specific data
* Either using btrfs_extent_inline_ref : : offset , or specific
* data structure .
btrfs: tree-checker: add type and sequence check for inline backrefs
[BUG]
There is a bug report that ntfs2btrfs had a bug that it can lead to
transaction abort and the filesystem flips to read-only.
[CAUSE]
For inline backref items, kernel has a strict requirement for their
ordered, they must follow the following rules:
- All btrfs_extent_inline_ref::type should be in an ascending order
- Within the same type, the items should follow a descending order by
their sequence number
For EXTENT_DATA_REF type, the sequence number is result from
hash_extent_data_ref().
For other types, their sequence numbers are
btrfs_extent_inline_ref::offset.
Thus if there is any code not following above rules, the resulted
inline backrefs can prevent the kernel to locate the needed inline
backref and lead to transaction abort.
[FIX]
Ntrfs2btrfs has already fixed the problem, and btrfs-progs has added the
ability to detect such problems.
For kernel, let's be more noisy and be more specific about the order, so
that the next time kernel hits such problem we would reject it in the
first place, without leading to transaction abort.
Link: https://github.com/kdave/btrfs-progs/pull/622
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-24 12:41:11 +10:30
*
* All above inline items should follow the order :
*
* - All btrfs_extent_inline_ref : : type should be in an ascending
* order
*
* - Within the same type , the items should follow a descending
* order by their sequence number . The sequence number is
* determined by :
* * btrfs_extent_inline_ref : : offset for all types other than
* EXTENT_DATA_REF
* * hash_extent_data_ref ( ) for EXTENT_DATA_REF
2019-08-09 09:24:22 +08:00
*/
2020-11-04 16:12:45 +01:00
if ( unlikely ( item_size < sizeof ( * ei ) ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" invalid item size, have %u expect [%zu, %u) " ,
item_size , sizeof ( * ei ) ,
BTRFS_LEAF_DATA_SIZE ( fs_info ) ) ;
return - EUCLEAN ;
}
end = item_size + btrfs_item_ptr_offset ( leaf , slot ) ;
/* Checks against extent_item */
ei = btrfs_item_ptr ( leaf , slot , struct btrfs_extent_item ) ;
flags = btrfs_extent_flags ( leaf , ei ) ;
total_refs = btrfs_extent_refs ( leaf , ei ) ;
generation = btrfs_extent_generation ( leaf , ei ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( generation >
btrfs_super_generation ( fs_info - > super_copy ) + 1 ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" invalid generation, have %llu expect (0, %llu] " ,
generation ,
btrfs_super_generation ( fs_info - > super_copy ) + 1 ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! has_single_bit_set ( flags & ( BTRFS_EXTENT_FLAG_DATA |
BTRFS_EXTENT_FLAG_TREE_BLOCK ) ) ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" invalid extent flag, have 0x%llx expect 1 bit set in 0x%llx " ,
flags , BTRFS_EXTENT_FLAG_DATA |
BTRFS_EXTENT_FLAG_TREE_BLOCK ) ;
return - EUCLEAN ;
}
is_tree_block = ! ! ( flags & BTRFS_EXTENT_FLAG_TREE_BLOCK ) ;
if ( is_tree_block ) {
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > type = = BTRFS_EXTENT_ITEM_KEY & &
key - > offset ! = fs_info - > nodesize ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" invalid extent length, have %llu expect %u " ,
key - > offset , fs_info - > nodesize ) ;
return - EUCLEAN ;
}
} else {
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > type ! = BTRFS_EXTENT_ITEM_KEY ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" invalid key type, have %u expect %u for data backref " ,
key - > type , BTRFS_EXTENT_ITEM_KEY ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! IS_ALIGNED ( key - > offset , fs_info - > sectorsize ) ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" invalid extent length, have %llu expect aligned to %u " ,
key - > offset , fs_info - > sectorsize ) ;
return - EUCLEAN ;
}
2021-03-12 15:25:26 -05:00
if ( unlikely ( flags & BTRFS_BLOCK_FLAG_FULL_BACKREF ) ) {
extent_err ( leaf , slot ,
" invalid extent flag, data has full backref set " ) ;
return - EUCLEAN ;
}
2019-08-09 09:24:22 +08:00
}
ptr = ( unsigned long ) ( struct btrfs_extent_item * ) ( ei + 1 ) ;
/* Check the special case of btrfs_tree_block_info */
if ( is_tree_block & & key - > type ! = BTRFS_METADATA_ITEM_KEY ) {
struct btrfs_tree_block_info * info ;
info = ( struct btrfs_tree_block_info * ) ptr ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_tree_block_level ( leaf , info ) > = BTRFS_MAX_LEVEL ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" invalid tree block info level, have %u expect [0, %u] " ,
btrfs_tree_block_level ( leaf , info ) ,
BTRFS_MAX_LEVEL - 1 ) ;
return - EUCLEAN ;
}
ptr = ( unsigned long ) ( struct btrfs_tree_block_info * ) ( info + 1 ) ;
}
/* Check inline refs */
while ( ptr < end ) {
struct btrfs_extent_inline_ref * iref ;
struct btrfs_extent_data_ref * dref ;
struct btrfs_shared_data_ref * sref ;
btrfs: tree-checker: add type and sequence check for inline backrefs
[BUG]
There is a bug report that ntfs2btrfs had a bug that it can lead to
transaction abort and the filesystem flips to read-only.
[CAUSE]
For inline backref items, kernel has a strict requirement for their
ordered, they must follow the following rules:
- All btrfs_extent_inline_ref::type should be in an ascending order
- Within the same type, the items should follow a descending order by
their sequence number
For EXTENT_DATA_REF type, the sequence number is result from
hash_extent_data_ref().
For other types, their sequence numbers are
btrfs_extent_inline_ref::offset.
Thus if there is any code not following above rules, the resulted
inline backrefs can prevent the kernel to locate the needed inline
backref and lead to transaction abort.
[FIX]
Ntrfs2btrfs has already fixed the problem, and btrfs-progs has added the
ability to detect such problems.
For kernel, let's be more noisy and be more specific about the order, so
that the next time kernel hits such problem we would reject it in the
first place, without leading to transaction abort.
Link: https://github.com/kdave/btrfs-progs/pull/622
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-24 12:41:11 +10:30
u64 seq ;
2019-08-09 09:24:22 +08:00
u64 dref_offset ;
u64 inline_offset ;
u8 inline_type ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( ptr + sizeof ( * iref ) > end ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" inline ref item overflows extent item, ptr %lu iref size %zu end %lu " ,
ptr , sizeof ( * iref ) , end ) ;
return - EUCLEAN ;
}
iref = ( struct btrfs_extent_inline_ref * ) ptr ;
inline_type = btrfs_extent_inline_ref_type ( leaf , iref ) ;
inline_offset = btrfs_extent_inline_ref_offset ( leaf , iref ) ;
btrfs: tree-checker: add type and sequence check for inline backrefs
[BUG]
There is a bug report that ntfs2btrfs had a bug that it can lead to
transaction abort and the filesystem flips to read-only.
[CAUSE]
For inline backref items, kernel has a strict requirement for their
ordered, they must follow the following rules:
- All btrfs_extent_inline_ref::type should be in an ascending order
- Within the same type, the items should follow a descending order by
their sequence number
For EXTENT_DATA_REF type, the sequence number is result from
hash_extent_data_ref().
For other types, their sequence numbers are
btrfs_extent_inline_ref::offset.
Thus if there is any code not following above rules, the resulted
inline backrefs can prevent the kernel to locate the needed inline
backref and lead to transaction abort.
[FIX]
Ntrfs2btrfs has already fixed the problem, and btrfs-progs has added the
ability to detect such problems.
For kernel, let's be more noisy and be more specific about the order, so
that the next time kernel hits such problem we would reject it in the
first place, without leading to transaction abort.
Link: https://github.com/kdave/btrfs-progs/pull/622
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-24 12:41:11 +10:30
seq = inline_offset ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( ptr + btrfs_extent_inline_ref_size ( inline_type ) > end ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" inline ref item overflows extent item, ptr %lu iref size %u end %lu " ,
2024-01-12 15:41:05 +08:00
ptr , btrfs_extent_inline_ref_size ( inline_type ) , end ) ;
2019-08-09 09:24:22 +08:00
return - EUCLEAN ;
}
switch ( inline_type ) {
/* inline_offset is subvolid of the owner, no need to check */
case BTRFS_TREE_BLOCK_REF_KEY :
inline_refs + + ;
break ;
/* Contains parent bytenr */
case BTRFS_SHARED_BLOCK_REF_KEY :
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! IS_ALIGNED ( inline_offset ,
fs_info - > sectorsize ) ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" invalid tree parent bytenr, have %llu expect aligned to %u " ,
inline_offset , fs_info - > sectorsize ) ;
return - EUCLEAN ;
}
inline_refs + + ;
break ;
/*
* Contains owner subvolid , owner key objectid , adjusted offset .
* The only obvious corruption can happen in that offset .
*/
case BTRFS_EXTENT_DATA_REF_KEY :
dref = ( struct btrfs_extent_data_ref * ) ( & iref - > offset ) ;
dref_offset = btrfs_extent_data_ref_offset ( leaf , dref ) ;
btrfs: tree-checker: add type and sequence check for inline backrefs
[BUG]
There is a bug report that ntfs2btrfs had a bug that it can lead to
transaction abort and the filesystem flips to read-only.
[CAUSE]
For inline backref items, kernel has a strict requirement for their
ordered, they must follow the following rules:
- All btrfs_extent_inline_ref::type should be in an ascending order
- Within the same type, the items should follow a descending order by
their sequence number
For EXTENT_DATA_REF type, the sequence number is result from
hash_extent_data_ref().
For other types, their sequence numbers are
btrfs_extent_inline_ref::offset.
Thus if there is any code not following above rules, the resulted
inline backrefs can prevent the kernel to locate the needed inline
backref and lead to transaction abort.
[FIX]
Ntrfs2btrfs has already fixed the problem, and btrfs-progs has added the
ability to detect such problems.
For kernel, let's be more noisy and be more specific about the order, so
that the next time kernel hits such problem we would reject it in the
first place, without leading to transaction abort.
Link: https://github.com/kdave/btrfs-progs/pull/622
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-24 12:41:11 +10:30
seq = hash_extent_data_ref (
btrfs_extent_data_ref_root ( leaf , dref ) ,
btrfs_extent_data_ref_objectid ( leaf , dref ) ,
btrfs_extent_data_ref_offset ( leaf , dref ) ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! IS_ALIGNED ( dref_offset ,
fs_info - > sectorsize ) ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" invalid data ref offset, have %llu expect aligned to %u " ,
dref_offset , fs_info - > sectorsize ) ;
return - EUCLEAN ;
}
inline_refs + = btrfs_extent_data_ref_count ( leaf , dref ) ;
break ;
/* Contains parent bytenr and ref count */
case BTRFS_SHARED_DATA_REF_KEY :
sref = ( struct btrfs_shared_data_ref * ) ( iref + 1 ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! IS_ALIGNED ( inline_offset ,
fs_info - > sectorsize ) ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" invalid data parent bytenr, have %llu expect aligned to %u " ,
inline_offset , fs_info - > sectorsize ) ;
return - EUCLEAN ;
}
inline_refs + = btrfs_shared_data_ref_count ( leaf , sref ) ;
break ;
2023-01-30 14:45:55 -08:00
case BTRFS_EXTENT_OWNER_REF_KEY :
WARN_ON ( ! btrfs_fs_incompat ( fs_info , SIMPLE_QUOTA ) ) ;
break ;
2019-08-09 09:24:22 +08:00
default :
extent_err ( leaf , slot , " unknown inline ref type: %u " ,
inline_type ) ;
return - EUCLEAN ;
}
btrfs: tree-checker: add type and sequence check for inline backrefs
[BUG]
There is a bug report that ntfs2btrfs had a bug that it can lead to
transaction abort and the filesystem flips to read-only.
[CAUSE]
For inline backref items, kernel has a strict requirement for their
ordered, they must follow the following rules:
- All btrfs_extent_inline_ref::type should be in an ascending order
- Within the same type, the items should follow a descending order by
their sequence number
For EXTENT_DATA_REF type, the sequence number is result from
hash_extent_data_ref().
For other types, their sequence numbers are
btrfs_extent_inline_ref::offset.
Thus if there is any code not following above rules, the resulted
inline backrefs can prevent the kernel to locate the needed inline
backref and lead to transaction abort.
[FIX]
Ntrfs2btrfs has already fixed the problem, and btrfs-progs has added the
ability to detect such problems.
For kernel, let's be more noisy and be more specific about the order, so
that the next time kernel hits such problem we would reject it in the
first place, without leading to transaction abort.
Link: https://github.com/kdave/btrfs-progs/pull/622
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-24 12:41:11 +10:30
if ( inline_type < last_type ) {
extent_err ( leaf , slot ,
" inline ref out-of-order: has type %u, prev type %u " ,
inline_type , last_type ) ;
return - EUCLEAN ;
}
/* Type changed, allow the sequence starts from U64_MAX again. */
if ( inline_type > last_type )
last_seq = U64_MAX ;
if ( seq > last_seq ) {
extent_err ( leaf , slot ,
" inline ref out-of-order: has type %u offset %llu seq 0x%llx, prev type %u seq 0x%llx " ,
inline_type , inline_offset , seq ,
last_type , last_seq ) ;
return - EUCLEAN ;
}
last_type = inline_type ;
last_seq = seq ;
2019-08-09 09:24:22 +08:00
ptr + = btrfs_extent_inline_ref_size ( inline_type ) ;
}
/* No padding is allowed */
2020-11-04 16:12:45 +01:00
if ( unlikely ( ptr ! = end ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" invalid extent item size, padding bytes found " ) ;
return - EUCLEAN ;
}
/* Finally, check the inline refs against total refs */
2020-11-04 16:12:45 +01:00
if ( unlikely ( inline_refs > total_refs ) ) {
2019-08-09 09:24:22 +08:00
extent_err ( leaf , slot ,
" invalid extent refs, have %llu expect >= inline %llu " ,
total_refs , inline_refs ) ;
return - EUCLEAN ;
}
2022-08-03 14:28:47 -04:00
if ( ( prev_key - > type = = BTRFS_EXTENT_ITEM_KEY ) | |
( prev_key - > type = = BTRFS_METADATA_ITEM_KEY ) ) {
u64 prev_end = prev_key - > objectid ;
if ( prev_key - > type = = BTRFS_METADATA_ITEM_KEY )
prev_end + = fs_info - > nodesize ;
else
prev_end + = prev_key - > offset ;
if ( unlikely ( prev_end > key - > objectid ) ) {
extent_err ( leaf , slot ,
" previous extent [%llu %u %llu] overlaps current extent [%llu %u %llu] " ,
prev_key - > objectid , prev_key - > type ,
prev_key - > offset , key - > objectid , key - > type ,
key - > offset ) ;
return - EUCLEAN ;
}
}
2019-08-09 09:24:22 +08:00
return 0 ;
}
2019-08-09 09:24:23 +08:00
static int check_simple_keyed_refs ( struct extent_buffer * leaf ,
struct btrfs_key * key , int slot )
{
u32 expect_item_size = 0 ;
if ( key - > type = = BTRFS_SHARED_DATA_REF_KEY )
expect_item_size = sizeof ( struct btrfs_shared_data_ref ) ;
2021-10-21 14:58:35 -04:00
if ( unlikely ( btrfs_item_size ( leaf , slot ) ! = expect_item_size ) ) {
2019-08-09 09:24:23 +08:00
generic_err ( leaf , slot ,
" invalid item size, have %u expect %u for key type %u " ,
2021-10-21 14:58:35 -04:00
btrfs_item_size ( leaf , slot ) ,
2019-08-09 09:24:23 +08:00
expect_item_size , key - > type ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! IS_ALIGNED ( key - > objectid , leaf - > fs_info - > sectorsize ) ) ) {
2019-08-09 09:24:23 +08:00
generic_err ( leaf , slot ,
" invalid key objectid for shared block ref, have %llu expect aligned to %u " ,
key - > objectid , leaf - > fs_info - > sectorsize ) ;
return - EUCLEAN ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( key - > type ! = BTRFS_TREE_BLOCK_REF_KEY & &
! IS_ALIGNED ( key - > offset , leaf - > fs_info - > sectorsize ) ) ) {
2019-08-09 09:24:23 +08:00
extent_err ( leaf , slot ,
" invalid tree parent bytenr, have %llu expect aligned to %u " ,
key - > offset , leaf - > fs_info - > sectorsize ) ;
return - EUCLEAN ;
}
return 0 ;
}
2019-08-09 09:24:24 +08:00
static int check_extent_data_ref ( struct extent_buffer * leaf ,
struct btrfs_key * key , int slot )
{
struct btrfs_extent_data_ref * dref ;
unsigned long ptr = btrfs_item_ptr_offset ( leaf , slot ) ;
2021-10-21 14:58:35 -04:00
const unsigned long end = ptr + btrfs_item_size ( leaf , slot ) ;
2019-08-09 09:24:24 +08:00
2021-10-21 14:58:35 -04:00
if ( unlikely ( btrfs_item_size ( leaf , slot ) % sizeof ( * dref ) ! = 0 ) ) {
2019-08-09 09:24:24 +08:00
generic_err ( leaf , slot ,
" invalid item size, have %u expect aligned to %zu for key type %u " ,
2021-10-21 14:58:35 -04:00
btrfs_item_size ( leaf , slot ) ,
2019-08-09 09:24:24 +08:00
sizeof ( * dref ) , key - > type ) ;
2020-11-16 19:53:52 +01:00
return - EUCLEAN ;
2019-08-09 09:24:24 +08:00
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! IS_ALIGNED ( key - > objectid , leaf - > fs_info - > sectorsize ) ) ) {
2019-08-09 09:24:24 +08:00
generic_err ( leaf , slot ,
" invalid key objectid for shared block ref, have %llu expect aligned to %u " ,
key - > objectid , leaf - > fs_info - > sectorsize ) ;
return - EUCLEAN ;
}
for ( ; ptr < end ; ptr + = sizeof ( * dref ) ) {
u64 offset ;
2021-02-16 15:43:22 -05:00
/*
* We cannot check the extent_data_ref hash due to possible
* overflow from the leaf due to hash collisions .
*/
2019-08-09 09:24:24 +08:00
dref = ( struct btrfs_extent_data_ref * ) ptr ;
offset = btrfs_extent_data_ref_offset ( leaf , dref ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! IS_ALIGNED ( offset , leaf - > fs_info - > sectorsize ) ) ) {
2019-08-09 09:24:24 +08:00
extent_err ( leaf , slot ,
" invalid extent data backref offset, have %llu expect aligned to %u " ,
offset , leaf - > fs_info - > sectorsize ) ;
2020-11-16 19:53:52 +01:00
return - EUCLEAN ;
2019-08-09 09:24:24 +08:00
}
}
return 0 ;
}
2019-12-09 18:54:32 +08:00
# define inode_ref_err(eb, slot, fmt, args...) \
inode_item_err ( eb , slot , fmt , # # args )
2019-08-26 15:40:39 +08:00
static int check_inode_ref ( struct extent_buffer * leaf ,
struct btrfs_key * key , struct btrfs_key * prev_key ,
int slot )
{
struct btrfs_inode_ref * iref ;
unsigned long ptr ;
unsigned long end ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! check_prev_ino ( leaf , key , slot , prev_key ) ) )
2019-10-04 17:31:32 +08:00
return - EUCLEAN ;
2019-08-26 15:40:39 +08:00
/* namelen can't be 0, so item_size == sizeof() is also invalid */
2021-10-21 14:58:35 -04:00
if ( unlikely ( btrfs_item_size ( leaf , slot ) < = sizeof ( * iref ) ) ) {
2019-12-09 18:54:32 +08:00
inode_ref_err ( leaf , slot ,
2019-08-26 15:40:39 +08:00
" invalid item size, have %u expect (%zu, %u) " ,
2021-10-21 14:58:35 -04:00
btrfs_item_size ( leaf , slot ) ,
2019-08-26 15:40:39 +08:00
sizeof ( * iref ) , BTRFS_LEAF_DATA_SIZE ( leaf - > fs_info ) ) ;
return - EUCLEAN ;
}
ptr = btrfs_item_ptr_offset ( leaf , slot ) ;
2021-10-21 14:58:35 -04:00
end = ptr + btrfs_item_size ( leaf , slot ) ;
2019-08-26 15:40:39 +08:00
while ( ptr < end ) {
u16 namelen ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( ptr + sizeof ( iref ) > end ) ) {
2019-12-09 18:54:32 +08:00
inode_ref_err ( leaf , slot ,
2019-08-26 15:40:39 +08:00
" inode ref overflow, ptr %lu end %lu inode_ref_size %zu " ,
ptr , end , sizeof ( iref ) ) ;
return - EUCLEAN ;
}
iref = ( struct btrfs_inode_ref * ) ptr ;
namelen = btrfs_inode_ref_name_len ( leaf , iref ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( ptr + sizeof ( * iref ) + namelen > end ) ) {
2019-12-09 18:54:32 +08:00
inode_ref_err ( leaf , slot ,
2019-08-26 15:40:39 +08:00
" inode ref overflow, ptr %lu end %lu namelen %u " ,
ptr , end , namelen ) ;
return - EUCLEAN ;
}
/*
* NOTE : In theory we should record all found index numbers
* to find any duplicated indexes , but that will be too time
* consuming for inodes with too many hard links .
*/
ptr + = sizeof ( * iref ) + namelen ;
}
return 0 ;
}
2023-09-18 07:14:33 -07:00
static int check_raid_stripe_extent ( const struct extent_buffer * leaf ,
const struct btrfs_key * key , int slot )
{
struct btrfs_stripe_extent * stripe_extent =
btrfs_item_ptr ( leaf , slot , struct btrfs_stripe_extent ) ;
if ( unlikely ( ! IS_ALIGNED ( key - > objectid , leaf - > fs_info - > sectorsize ) ) ) {
generic_err ( leaf , slot ,
" invalid key objectid for raid stripe extent, have %llu expect aligned to %u " ,
key - > objectid , leaf - > fs_info - > sectorsize ) ;
return - EUCLEAN ;
}
if ( unlikely ( ! btrfs_fs_incompat ( leaf - > fs_info , RAID_STRIPE_TREE ) ) ) {
generic_err ( leaf , slot ,
" RAID_STRIPE_EXTENT present but RAID_STRIPE_TREE incompat bit unset " ) ;
return - EUCLEAN ;
}
switch ( btrfs_stripe_extent_encoding ( leaf , stripe_extent ) ) {
case BTRFS_STRIPE_RAID0 :
case BTRFS_STRIPE_RAID1 :
case BTRFS_STRIPE_DUP :
case BTRFS_STRIPE_RAID10 :
case BTRFS_STRIPE_RAID5 :
case BTRFS_STRIPE_RAID6 :
case BTRFS_STRIPE_RAID1C3 :
case BTRFS_STRIPE_RAID1C4 :
break ;
default :
generic_err ( leaf , slot , " invalid raid stripe encoding %u " ,
btrfs_stripe_extent_encoding ( leaf , stripe_extent ) ) ;
return - EUCLEAN ;
}
return 0 ;
}
2017-10-09 01:51:02 +00:00
/*
* Common point to switch the item - specific validation .
*/
2023-04-29 16:07:14 -04:00
static enum btrfs_tree_block_status check_leaf_item ( struct extent_buffer * leaf ,
struct btrfs_key * key ,
int slot ,
struct btrfs_key * prev_key )
2017-10-09 01:51:02 +00:00
{
int ret = 0 ;
2019-03-20 13:42:33 +08:00
struct btrfs_chunk * chunk ;
2017-10-09 01:51:02 +00:00
switch ( key - > type ) {
case BTRFS_EXTENT_DATA_KEY :
2019-05-06 16:44:12 +01:00
ret = check_extent_data_item ( leaf , key , slot , prev_key ) ;
2017-10-09 01:51:02 +00:00
break ;
case BTRFS_EXTENT_CSUM_KEY :
2019-12-02 11:01:03 +00:00
ret = check_csum_item ( leaf , key , slot , prev_key ) ;
2017-10-09 01:51:02 +00:00
break ;
2017-11-08 08:54:25 +08:00
case BTRFS_DIR_ITEM_KEY :
case BTRFS_DIR_INDEX_KEY :
case BTRFS_XATTR_ITEM_KEY :
2019-08-26 15:40:38 +08:00
ret = check_dir_item ( leaf , key , prev_key , slot ) ;
2017-11-08 08:54:25 +08:00
break ;
2019-08-26 15:40:39 +08:00
case BTRFS_INODE_REF_KEY :
ret = check_inode_ref ( leaf , key , prev_key , slot ) ;
break ;
2018-07-03 17:10:05 +08:00
case BTRFS_BLOCK_GROUP_ITEM_KEY :
2019-03-20 16:19:31 +01:00
ret = check_block_group_item ( leaf , key , slot ) ;
2018-07-03 17:10:05 +08:00
break ;
2019-03-20 13:42:33 +08:00
case BTRFS_CHUNK_ITEM_KEY :
chunk = btrfs_item_ptr ( leaf , slot , struct btrfs_chunk ) ;
2019-12-17 18:58:20 +08:00
ret = check_leaf_chunk_item ( leaf , chunk , key , slot ) ;
2019-03-20 13:42:33 +08:00
break ;
2019-03-08 14:20:03 +08:00
case BTRFS_DEV_ITEM_KEY :
2019-03-20 16:22:58 +01:00
ret = check_dev_item ( leaf , key , slot ) ;
2019-03-08 14:20:03 +08:00
break ;
2019-03-13 14:31:35 +08:00
case BTRFS_INODE_ITEM_KEY :
2019-03-20 16:22:58 +01:00
ret = check_inode_item ( leaf , key , slot ) ;
2019-03-13 14:31:35 +08:00
break ;
2019-07-16 17:00:34 +08:00
case BTRFS_ROOT_ITEM_KEY :
ret = check_root_item ( leaf , key , slot ) ;
break ;
2019-08-09 09:24:22 +08:00
case BTRFS_EXTENT_ITEM_KEY :
case BTRFS_METADATA_ITEM_KEY :
2022-08-03 14:28:47 -04:00
ret = check_extent_item ( leaf , key , slot , prev_key ) ;
2019-08-09 09:24:22 +08:00
break ;
2019-08-09 09:24:23 +08:00
case BTRFS_TREE_BLOCK_REF_KEY :
case BTRFS_SHARED_DATA_REF_KEY :
case BTRFS_SHARED_BLOCK_REF_KEY :
ret = check_simple_keyed_refs ( leaf , key , slot ) ;
break ;
2019-08-09 09:24:24 +08:00
case BTRFS_EXTENT_DATA_REF_KEY :
ret = check_extent_data_ref ( leaf , key , slot ) ;
break ;
2023-09-18 07:14:33 -07:00
case BTRFS_RAID_STRIPE_KEY :
ret = check_raid_stripe_extent ( leaf , key , slot ) ;
break ;
2017-10-09 01:51:02 +00:00
}
2023-04-29 16:07:14 -04:00
if ( ret )
return BTRFS_TREE_BLOCK_INVALID_ITEM ;
return BTRFS_TREE_BLOCK_CLEAN ;
2017-10-09 01:51:02 +00:00
}
2023-04-29 16:07:15 -04:00
enum btrfs_tree_block_status __btrfs_check_leaf ( struct extent_buffer * leaf )
2017-10-09 01:51:02 +00:00
{
2019-03-20 16:22:58 +01:00
struct btrfs_fs_info * fs_info = leaf - > fs_info ;
2017-10-09 01:51:02 +00:00
/* No valid key type is 0, so all key should be larger than this key */
struct btrfs_key prev_key = { 0 , 0 , 0 } ;
struct btrfs_key key ;
u32 nritems = btrfs_header_nritems ( leaf ) ;
int slot ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_header_level ( leaf ) ! = 0 ) ) {
2019-03-20 15:31:28 +01:00
generic_err ( leaf , 0 ,
2018-09-28 07:59:34 +08:00
" invalid level for leaf, have %d expect 0 " ,
btrfs_header_level ( leaf ) ) ;
2023-04-29 16:07:15 -04:00
return BTRFS_TREE_BLOCK_INVALID_LEVEL ;
2018-09-28 07:59:34 +08:00
}
btrfs: make sure that WRITTEN is set on all metadata blocks
We previously would call btrfs_check_leaf() if we had the check
integrity code enabled, which meant that we could only run the extended
leaf checks if we had WRITTEN set on the header flags.
This leaves a gap in our checking, because we could end up with
corruption on disk where WRITTEN isn't set on the leaf, and then the
extended leaf checks don't get run which we rely on to validate all of
the item pointers to make sure we don't access memory outside of the
extent buffer.
However, since 732fab95abe2 ("btrfs: check-integrity: remove
CONFIG_BTRFS_FS_CHECK_INTEGRITY option") we no longer call
btrfs_check_leaf() from btrfs_mark_buffer_dirty(), which means we only
ever call it on blocks that are being written out, and thus have WRITTEN
set, or that are being read in, which should have WRITTEN set.
Add checks to make sure we have WRITTEN set appropriately, and then make
sure __btrfs_check_leaf() always does the item checking. This will
protect us from file systems that have been corrupted and no longer have
WRITTEN set on some of the blocks.
This was hit on a crafted image tweaking the WRITTEN bit and reported by
KASAN as out-of-bound access in the eb accessors. The example is a dir
item at the end of an eb.
[2.042] BTRFS warning (device loop1): bad eb member start: ptr 0x3fff start 30572544 member offset 16410 size 2
[2.040] general protection fault, probably for non-canonical address 0xe0009d1000000003: 0000 [#1] PREEMPT SMP KASAN NOPTI
[2.537] KASAN: maybe wild-memory-access in range [0x0005088000000018-0x000508800000001f]
[2.729] CPU: 0 PID: 2587 Comm: mount Not tainted 6.8.2 #1
[2.729] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[2.621] RIP: 0010:btrfs_get_16+0x34b/0x6d0
[2.621] RSP: 0018:ffff88810871fab8 EFLAGS: 00000206
[2.621] RAX: 0000a11000000003 RBX: ffff888104ff8720 RCX: ffff88811b2288c0
[2.621] RDX: dffffc0000000000 RSI: ffffffff81dd8aca RDI: ffff88810871f748
[2.621] RBP: 000000000000401a R08: 0000000000000001 R09: ffffed10210e3ee9
[2.621] R10: ffff88810871f74f R11: 205d323430333737 R12: 000000000000001a
[2.621] R13: 000508800000001a R14: 1ffff110210e3f5d R15: ffffffff850011e8
[2.621] FS: 00007f56ea275840(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000
[2.621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2.621] CR2: 00007febd13b75c0 CR3: 000000010bb50000 CR4: 00000000000006f0
[2.621] Call Trace:
[2.621] <TASK>
[2.621] ? show_regs+0x74/0x80
[2.621] ? die_addr+0x46/0xc0
[2.621] ? exc_general_protection+0x161/0x2a0
[2.621] ? asm_exc_general_protection+0x26/0x30
[2.621] ? btrfs_get_16+0x33a/0x6d0
[2.621] ? btrfs_get_16+0x34b/0x6d0
[2.621] ? btrfs_get_16+0x33a/0x6d0
[2.621] ? __pfx_btrfs_get_16+0x10/0x10
[2.621] ? __pfx_mutex_unlock+0x10/0x10
[2.621] btrfs_match_dir_item_name+0x101/0x1a0
[2.621] btrfs_lookup_dir_item+0x1f3/0x280
[2.621] ? __pfx_btrfs_lookup_dir_item+0x10/0x10
[2.621] btrfs_get_tree+0xd25/0x1910
Reported-by: lei lu <llfamsec@gmail.com>
CC: stable@vger.kernel.org # 6.7+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ copy more details from report ]
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-29 09:03:35 -04:00
if ( unlikely ( ! btrfs_header_flag ( leaf , BTRFS_HEADER_FLAG_WRITTEN ) ) ) {
generic_err ( leaf , 0 , " invalid flag for leaf, WRITTEN not set " ) ;
return BTRFS_TREE_BLOCK_WRITTEN_NOT_SET ;
}
2017-10-09 01:51:02 +00:00
/*
* Extent buffers from a relocation tree have a owner field that
* corresponds to the subvolume tree they are based on . So just from an
* extent buffer alone we can not find out what is the id of the
* corresponding subvolume tree , so we can not figure out if the extent
* buffer corresponds to the root of the relocation tree or not . So
* skip this check for relocation trees .
*/
if ( nritems = = 0 & & ! btrfs_header_flag ( leaf , BTRFS_HEADER_FLAG_RELOC ) ) {
2018-07-03 17:10:06 +08:00
u64 owner = btrfs_header_owner ( leaf ) ;
2017-10-09 01:51:02 +00:00
2018-07-03 17:10:06 +08:00
/* These trees must never be empty */
2020-11-04 16:12:45 +01:00
if ( unlikely ( owner = = BTRFS_ROOT_TREE_OBJECTID | |
owner = = BTRFS_CHUNK_TREE_OBJECTID | |
owner = = BTRFS_DEV_TREE_OBJECTID | |
owner = = BTRFS_FS_TREE_OBJECTID | |
owner = = BTRFS_DATA_RELOC_TREE_OBJECTID ) ) {
2019-03-20 15:31:28 +01:00
generic_err ( leaf , 0 ,
2018-07-03 17:10:06 +08:00
" invalid root, root %llu must never be empty " ,
owner ) ;
2023-04-29 16:07:15 -04:00
return BTRFS_TREE_BLOCK_INVALID_NRITEMS ;
2018-07-03 17:10:06 +08:00
}
2021-12-15 15:40:05 -05:00
2019-08-22 10:14:15 +08:00
/* Unknown tree */
2020-11-04 16:12:45 +01:00
if ( unlikely ( owner = = 0 ) ) {
2019-08-22 10:14:15 +08:00
generic_err ( leaf , 0 ,
" invalid owner, root 0 is not defined " ) ;
2023-04-29 16:07:15 -04:00
return BTRFS_TREE_BLOCK_INVALID_OWNER ;
2019-08-22 10:14:15 +08:00
}
2021-12-15 15:40:05 -05:00
/* EXTENT_TREE_V2 can have empty extent trees. */
if ( btrfs_fs_incompat ( fs_info , EXTENT_TREE_V2 ) )
2023-04-29 16:07:15 -04:00
return BTRFS_TREE_BLOCK_CLEAN ;
2021-12-15 15:40:05 -05:00
if ( unlikely ( owner = = BTRFS_EXTENT_TREE_OBJECTID ) ) {
generic_err ( leaf , 0 ,
" invalid root, root %llu must never be empty " ,
owner ) ;
2023-04-29 16:07:15 -04:00
return BTRFS_TREE_BLOCK_INVALID_NRITEMS ;
2021-12-15 15:40:05 -05:00
}
2023-04-29 16:07:15 -04:00
return BTRFS_TREE_BLOCK_CLEAN ;
2017-10-09 01:51:02 +00:00
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( nritems = = 0 ) )
2023-04-29 16:07:15 -04:00
return BTRFS_TREE_BLOCK_CLEAN ;
2017-10-09 01:51:02 +00:00
/*
* Check the following things to make sure this is a good leaf , and
* leaf users won ' t need to bother with similar sanity checks :
*
* 1 ) key ordering
* 2 ) item offset and size
* No overlap , no hole , all inside the leaf .
* 3 ) item content
* If possible , do comprehensive sanity check .
* NOTE : All checks must only rely on the item data itself .
*/
for ( slot = 0 ; slot < nritems ; slot + + ) {
u32 item_end_expected ;
2022-02-22 16:42:07 +08:00
u64 item_data_end ;
btrfs: make sure that WRITTEN is set on all metadata blocks
We previously would call btrfs_check_leaf() if we had the check
integrity code enabled, which meant that we could only run the extended
leaf checks if we had WRITTEN set on the header flags.
This leaves a gap in our checking, because we could end up with
corruption on disk where WRITTEN isn't set on the leaf, and then the
extended leaf checks don't get run which we rely on to validate all of
the item pointers to make sure we don't access memory outside of the
extent buffer.
However, since 732fab95abe2 ("btrfs: check-integrity: remove
CONFIG_BTRFS_FS_CHECK_INTEGRITY option") we no longer call
btrfs_check_leaf() from btrfs_mark_buffer_dirty(), which means we only
ever call it on blocks that are being written out, and thus have WRITTEN
set, or that are being read in, which should have WRITTEN set.
Add checks to make sure we have WRITTEN set appropriately, and then make
sure __btrfs_check_leaf() always does the item checking. This will
protect us from file systems that have been corrupted and no longer have
WRITTEN set on some of the blocks.
This was hit on a crafted image tweaking the WRITTEN bit and reported by
KASAN as out-of-bound access in the eb accessors. The example is a dir
item at the end of an eb.
[2.042] BTRFS warning (device loop1): bad eb member start: ptr 0x3fff start 30572544 member offset 16410 size 2
[2.040] general protection fault, probably for non-canonical address 0xe0009d1000000003: 0000 [#1] PREEMPT SMP KASAN NOPTI
[2.537] KASAN: maybe wild-memory-access in range [0x0005088000000018-0x000508800000001f]
[2.729] CPU: 0 PID: 2587 Comm: mount Not tainted 6.8.2 #1
[2.729] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[2.621] RIP: 0010:btrfs_get_16+0x34b/0x6d0
[2.621] RSP: 0018:ffff88810871fab8 EFLAGS: 00000206
[2.621] RAX: 0000a11000000003 RBX: ffff888104ff8720 RCX: ffff88811b2288c0
[2.621] RDX: dffffc0000000000 RSI: ffffffff81dd8aca RDI: ffff88810871f748
[2.621] RBP: 000000000000401a R08: 0000000000000001 R09: ffffed10210e3ee9
[2.621] R10: ffff88810871f74f R11: 205d323430333737 R12: 000000000000001a
[2.621] R13: 000508800000001a R14: 1ffff110210e3f5d R15: ffffffff850011e8
[2.621] FS: 00007f56ea275840(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000
[2.621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2.621] CR2: 00007febd13b75c0 CR3: 000000010bb50000 CR4: 00000000000006f0
[2.621] Call Trace:
[2.621] <TASK>
[2.621] ? show_regs+0x74/0x80
[2.621] ? die_addr+0x46/0xc0
[2.621] ? exc_general_protection+0x161/0x2a0
[2.621] ? asm_exc_general_protection+0x26/0x30
[2.621] ? btrfs_get_16+0x33a/0x6d0
[2.621] ? btrfs_get_16+0x34b/0x6d0
[2.621] ? btrfs_get_16+0x33a/0x6d0
[2.621] ? __pfx_btrfs_get_16+0x10/0x10
[2.621] ? __pfx_mutex_unlock+0x10/0x10
[2.621] btrfs_match_dir_item_name+0x101/0x1a0
[2.621] btrfs_lookup_dir_item+0x1f3/0x280
[2.621] ? __pfx_btrfs_lookup_dir_item+0x10/0x10
[2.621] btrfs_get_tree+0xd25/0x1910
Reported-by: lei lu <llfamsec@gmail.com>
CC: stable@vger.kernel.org # 6.7+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ copy more details from report ]
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-29 09:03:35 -04:00
enum btrfs_tree_block_status ret ;
2017-10-09 01:51:02 +00:00
btrfs_item_key_to_cpu ( leaf , & key , slot ) ;
/* Make sure the keys are in the right order */
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_comp_cpu_keys ( & prev_key , & key ) > = 0 ) ) {
2019-03-20 15:31:28 +01:00
generic_err ( leaf , slot ,
2017-10-09 01:51:04 +00:00
" bad key order, prev (%llu %u %llu) current (%llu %u %llu) " ,
prev_key . objectid , prev_key . type ,
prev_key . offset , key . objectid , key . type ,
key . offset ) ;
2023-04-29 16:07:15 -04:00
return BTRFS_TREE_BLOCK_BAD_KEY_ORDER ;
2017-10-09 01:51:02 +00:00
}
2022-02-22 16:42:07 +08:00
item_data_end = ( u64 ) btrfs_item_offset ( leaf , slot ) +
btrfs_item_size ( leaf , slot ) ;
2017-10-09 01:51:02 +00:00
/*
* Make sure the offset and ends are right , remember that the
* item data starts at the end of the leaf and grows towards the
* front .
*/
if ( slot = = 0 )
item_end_expected = BTRFS_LEAF_DATA_SIZE ( fs_info ) ;
else
2021-10-21 14:58:35 -04:00
item_end_expected = btrfs_item_offset ( leaf ,
2017-10-09 01:51:02 +00:00
slot - 1 ) ;
2022-02-22 16:42:07 +08:00
if ( unlikely ( item_data_end ! = item_end_expected ) ) {
2019-03-20 15:31:28 +01:00
generic_err ( leaf , slot ,
2022-02-22 16:42:07 +08:00
" unexpected item end, have %llu expect %u " ,
item_data_end , item_end_expected ) ;
2023-04-29 16:07:15 -04:00
return BTRFS_TREE_BLOCK_INVALID_OFFSETS ;
2017-10-09 01:51:02 +00:00
}
/*
* Check to make sure that we don ' t point outside of the leaf ,
* just in case all the items are consistent to each other , but
* all point outside of the leaf .
*/
2022-02-22 16:42:07 +08:00
if ( unlikely ( item_data_end > BTRFS_LEAF_DATA_SIZE ( fs_info ) ) ) {
2019-03-20 15:31:28 +01:00
generic_err ( leaf , slot ,
2022-02-22 16:42:07 +08:00
" slot end outside of leaf, have %llu expect range [0, %u] " ,
item_data_end , BTRFS_LEAF_DATA_SIZE ( fs_info ) ) ;
2023-04-29 16:07:15 -04:00
return BTRFS_TREE_BLOCK_INVALID_OFFSETS ;
2017-10-09 01:51:02 +00:00
}
/* Also check if the item pointer overlaps with btrfs item. */
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_item_ptr_offset ( leaf , slot ) <
2022-11-15 11:16:15 -05:00
btrfs_item_nr_offset ( leaf , slot ) + sizeof ( struct btrfs_item ) ) ) {
2019-03-20 15:31:28 +01:00
generic_err ( leaf , slot ,
2017-10-09 01:51:04 +00:00
" slot overlaps with its data, item end %lu data start %lu " ,
2022-11-15 11:16:15 -05:00
btrfs_item_nr_offset ( leaf , slot ) +
2017-10-09 01:51:04 +00:00
sizeof ( struct btrfs_item ) ,
btrfs_item_ptr_offset ( leaf , slot ) ) ;
2023-04-29 16:07:15 -04:00
return BTRFS_TREE_BLOCK_INVALID_OFFSETS ;
2017-10-09 01:51:02 +00:00
}
btrfs: make sure that WRITTEN is set on all metadata blocks
We previously would call btrfs_check_leaf() if we had the check
integrity code enabled, which meant that we could only run the extended
leaf checks if we had WRITTEN set on the header flags.
This leaves a gap in our checking, because we could end up with
corruption on disk where WRITTEN isn't set on the leaf, and then the
extended leaf checks don't get run which we rely on to validate all of
the item pointers to make sure we don't access memory outside of the
extent buffer.
However, since 732fab95abe2 ("btrfs: check-integrity: remove
CONFIG_BTRFS_FS_CHECK_INTEGRITY option") we no longer call
btrfs_check_leaf() from btrfs_mark_buffer_dirty(), which means we only
ever call it on blocks that are being written out, and thus have WRITTEN
set, or that are being read in, which should have WRITTEN set.
Add checks to make sure we have WRITTEN set appropriately, and then make
sure __btrfs_check_leaf() always does the item checking. This will
protect us from file systems that have been corrupted and no longer have
WRITTEN set on some of the blocks.
This was hit on a crafted image tweaking the WRITTEN bit and reported by
KASAN as out-of-bound access in the eb accessors. The example is a dir
item at the end of an eb.
[2.042] BTRFS warning (device loop1): bad eb member start: ptr 0x3fff start 30572544 member offset 16410 size 2
[2.040] general protection fault, probably for non-canonical address 0xe0009d1000000003: 0000 [#1] PREEMPT SMP KASAN NOPTI
[2.537] KASAN: maybe wild-memory-access in range [0x0005088000000018-0x000508800000001f]
[2.729] CPU: 0 PID: 2587 Comm: mount Not tainted 6.8.2 #1
[2.729] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[2.621] RIP: 0010:btrfs_get_16+0x34b/0x6d0
[2.621] RSP: 0018:ffff88810871fab8 EFLAGS: 00000206
[2.621] RAX: 0000a11000000003 RBX: ffff888104ff8720 RCX: ffff88811b2288c0
[2.621] RDX: dffffc0000000000 RSI: ffffffff81dd8aca RDI: ffff88810871f748
[2.621] RBP: 000000000000401a R08: 0000000000000001 R09: ffffed10210e3ee9
[2.621] R10: ffff88810871f74f R11: 205d323430333737 R12: 000000000000001a
[2.621] R13: 000508800000001a R14: 1ffff110210e3f5d R15: ffffffff850011e8
[2.621] FS: 00007f56ea275840(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000
[2.621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2.621] CR2: 00007febd13b75c0 CR3: 000000010bb50000 CR4: 00000000000006f0
[2.621] Call Trace:
[2.621] <TASK>
[2.621] ? show_regs+0x74/0x80
[2.621] ? die_addr+0x46/0xc0
[2.621] ? exc_general_protection+0x161/0x2a0
[2.621] ? asm_exc_general_protection+0x26/0x30
[2.621] ? btrfs_get_16+0x33a/0x6d0
[2.621] ? btrfs_get_16+0x34b/0x6d0
[2.621] ? btrfs_get_16+0x33a/0x6d0
[2.621] ? __pfx_btrfs_get_16+0x10/0x10
[2.621] ? __pfx_mutex_unlock+0x10/0x10
[2.621] btrfs_match_dir_item_name+0x101/0x1a0
[2.621] btrfs_lookup_dir_item+0x1f3/0x280
[2.621] ? __pfx_btrfs_lookup_dir_item+0x10/0x10
[2.621] btrfs_get_tree+0xd25/0x1910
Reported-by: lei lu <llfamsec@gmail.com>
CC: stable@vger.kernel.org # 6.7+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ copy more details from report ]
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-29 09:03:35 -04:00
/* Check if the item size and content meet other criteria. */
ret = check_leaf_item ( leaf , & key , slot , & prev_key ) ;
if ( unlikely ( ret ! = BTRFS_TREE_BLOCK_CLEAN ) )
return ret ;
2017-10-09 01:51:02 +00:00
prev_key . objectid = key . objectid ;
prev_key . type = key . type ;
prev_key . offset = key . offset ;
}
2023-04-29 16:07:15 -04:00
return BTRFS_TREE_BLOCK_CLEAN ;
2017-10-09 01:51:02 +00:00
}
2023-04-29 16:07:15 -04:00
int btrfs_check_leaf ( struct extent_buffer * leaf )
2017-11-08 08:54:24 +08:00
{
2023-04-29 16:07:15 -04:00
enum btrfs_tree_block_status ret ;
2017-11-08 08:54:24 +08:00
2023-04-29 16:07:15 -04:00
ret = __btrfs_check_leaf ( leaf ) ;
if ( unlikely ( ret ! = BTRFS_TREE_BLOCK_CLEAN ) )
return - EUCLEAN ;
2017-10-09 01:51:02 +00:00
return 0 ;
2017-11-08 08:54:24 +08:00
}
2023-04-29 16:07:12 -04:00
ALLOW_ERROR_INJECTION ( btrfs_check_leaf , ERRNO ) ;
2017-11-08 08:54:24 +08:00
2023-04-29 16:07:16 -04:00
enum btrfs_tree_block_status __btrfs_check_node ( struct extent_buffer * node )
2017-10-09 01:51:02 +00:00
{
2019-03-20 16:25:00 +01:00
struct btrfs_fs_info * fs_info = node - > fs_info ;
2017-10-09 01:51:02 +00:00
unsigned long nr = btrfs_header_nritems ( node ) ;
struct btrfs_key key , next_key ;
int slot ;
2018-09-28 07:59:34 +08:00
int level = btrfs_header_level ( node ) ;
2017-10-09 01:51:02 +00:00
u64 bytenr ;
btrfs: make sure that WRITTEN is set on all metadata blocks
We previously would call btrfs_check_leaf() if we had the check
integrity code enabled, which meant that we could only run the extended
leaf checks if we had WRITTEN set on the header flags.
This leaves a gap in our checking, because we could end up with
corruption on disk where WRITTEN isn't set on the leaf, and then the
extended leaf checks don't get run which we rely on to validate all of
the item pointers to make sure we don't access memory outside of the
extent buffer.
However, since 732fab95abe2 ("btrfs: check-integrity: remove
CONFIG_BTRFS_FS_CHECK_INTEGRITY option") we no longer call
btrfs_check_leaf() from btrfs_mark_buffer_dirty(), which means we only
ever call it on blocks that are being written out, and thus have WRITTEN
set, or that are being read in, which should have WRITTEN set.
Add checks to make sure we have WRITTEN set appropriately, and then make
sure __btrfs_check_leaf() always does the item checking. This will
protect us from file systems that have been corrupted and no longer have
WRITTEN set on some of the blocks.
This was hit on a crafted image tweaking the WRITTEN bit and reported by
KASAN as out-of-bound access in the eb accessors. The example is a dir
item at the end of an eb.
[2.042] BTRFS warning (device loop1): bad eb member start: ptr 0x3fff start 30572544 member offset 16410 size 2
[2.040] general protection fault, probably for non-canonical address 0xe0009d1000000003: 0000 [#1] PREEMPT SMP KASAN NOPTI
[2.537] KASAN: maybe wild-memory-access in range [0x0005088000000018-0x000508800000001f]
[2.729] CPU: 0 PID: 2587 Comm: mount Not tainted 6.8.2 #1
[2.729] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[2.621] RIP: 0010:btrfs_get_16+0x34b/0x6d0
[2.621] RSP: 0018:ffff88810871fab8 EFLAGS: 00000206
[2.621] RAX: 0000a11000000003 RBX: ffff888104ff8720 RCX: ffff88811b2288c0
[2.621] RDX: dffffc0000000000 RSI: ffffffff81dd8aca RDI: ffff88810871f748
[2.621] RBP: 000000000000401a R08: 0000000000000001 R09: ffffed10210e3ee9
[2.621] R10: ffff88810871f74f R11: 205d323430333737 R12: 000000000000001a
[2.621] R13: 000508800000001a R14: 1ffff110210e3f5d R15: ffffffff850011e8
[2.621] FS: 00007f56ea275840(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000
[2.621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2.621] CR2: 00007febd13b75c0 CR3: 000000010bb50000 CR4: 00000000000006f0
[2.621] Call Trace:
[2.621] <TASK>
[2.621] ? show_regs+0x74/0x80
[2.621] ? die_addr+0x46/0xc0
[2.621] ? exc_general_protection+0x161/0x2a0
[2.621] ? asm_exc_general_protection+0x26/0x30
[2.621] ? btrfs_get_16+0x33a/0x6d0
[2.621] ? btrfs_get_16+0x34b/0x6d0
[2.621] ? btrfs_get_16+0x33a/0x6d0
[2.621] ? __pfx_btrfs_get_16+0x10/0x10
[2.621] ? __pfx_mutex_unlock+0x10/0x10
[2.621] btrfs_match_dir_item_name+0x101/0x1a0
[2.621] btrfs_lookup_dir_item+0x1f3/0x280
[2.621] ? __pfx_btrfs_lookup_dir_item+0x10/0x10
[2.621] btrfs_get_tree+0xd25/0x1910
Reported-by: lei lu <llfamsec@gmail.com>
CC: stable@vger.kernel.org # 6.7+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ copy more details from report ]
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-29 09:03:35 -04:00
if ( unlikely ( ! btrfs_header_flag ( node , BTRFS_HEADER_FLAG_WRITTEN ) ) ) {
generic_err ( node , 0 , " invalid flag for node, WRITTEN not set " ) ;
return BTRFS_TREE_BLOCK_WRITTEN_NOT_SET ;
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( level < = 0 | | level > = BTRFS_MAX_LEVEL ) ) {
2019-03-20 15:31:28 +01:00
generic_err ( node , 0 ,
2018-09-28 07:59:34 +08:00
" invalid level for node, have %d expect [1, %d] " ,
level , BTRFS_MAX_LEVEL - 1 ) ;
2023-04-29 16:07:16 -04:00
return BTRFS_TREE_BLOCK_INVALID_LEVEL ;
2018-09-28 07:59:34 +08:00
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( nr = = 0 | | nr > BTRFS_NODEPTRS_PER_BLOCK ( fs_info ) ) ) {
2018-01-25 14:56:18 +08:00
btrfs_crit ( fs_info ,
2017-10-09 01:51:03 +00:00
" corrupt node: root=%llu block=%llu, nritems too %s, have %lu expect range [1,%u] " ,
2018-01-25 14:56:18 +08:00
btrfs_header_owner ( node ) , node - > start ,
2017-10-09 01:51:03 +00:00
nr = = 0 ? " small " : " large " , nr ,
2018-01-25 14:56:18 +08:00
BTRFS_NODEPTRS_PER_BLOCK ( fs_info ) ) ;
2023-04-29 16:07:16 -04:00
return BTRFS_TREE_BLOCK_INVALID_NRITEMS ;
2017-10-09 01:51:02 +00:00
}
for ( slot = 0 ; slot < nr - 1 ; slot + + ) {
bytenr = btrfs_node_blockptr ( node , slot ) ;
btrfs_node_key_to_cpu ( node , & key , slot ) ;
btrfs_node_key_to_cpu ( node , & next_key , slot + 1 ) ;
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! bytenr ) ) {
2019-03-20 15:31:28 +01:00
generic_err ( node , slot ,
2017-10-09 01:51:03 +00:00
" invalid NULL node pointer " ) ;
2023-04-29 16:07:16 -04:00
return BTRFS_TREE_BLOCK_INVALID_BLOCKPTR ;
2017-10-09 01:51:03 +00:00
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( ! IS_ALIGNED ( bytenr , fs_info - > sectorsize ) ) ) {
2019-03-20 15:31:28 +01:00
generic_err ( node , slot ,
2017-10-09 01:51:03 +00:00
" unaligned pointer, have %llu should be aligned to %u " ,
2018-01-25 14:56:18 +08:00
bytenr , fs_info - > sectorsize ) ;
2023-04-29 16:07:16 -04:00
return BTRFS_TREE_BLOCK_INVALID_BLOCKPTR ;
2017-10-09 01:51:02 +00:00
}
2020-11-04 16:12:45 +01:00
if ( unlikely ( btrfs_comp_cpu_keys ( & key , & next_key ) > = 0 ) ) {
2019-03-20 15:31:28 +01:00
generic_err ( node , slot ,
2017-10-09 01:51:03 +00:00
" bad key order, current (%llu %u %llu) next (%llu %u %llu) " ,
key . objectid , key . type , key . offset ,
next_key . objectid , next_key . type ,
next_key . offset ) ;
2023-04-29 16:07:16 -04:00
return BTRFS_TREE_BLOCK_BAD_KEY_ORDER ;
2017-10-09 01:51:02 +00:00
}
}
2023-04-29 16:07:16 -04:00
return BTRFS_TREE_BLOCK_CLEAN ;
}
int btrfs_check_node ( struct extent_buffer * node )
{
enum btrfs_tree_block_status ret ;
ret = __btrfs_check_node ( node ) ;
if ( unlikely ( ret ! = BTRFS_TREE_BLOCK_CLEAN ) )
return - EUCLEAN ;
return 0 ;
2017-10-09 01:51:02 +00:00
}
2019-04-24 15:22:53 +08:00
ALLOW_ERROR_INJECTION ( btrfs_check_node , ERRNO ) ;
btrfs: tree-checker: check extent buffer owner against owner rootid
Btrfs doesn't check whether the tree block respects the root owner.
This means, if a tree block referred by a parent in extent tree, but has
owner of 5, btrfs can still continue reading the tree block, as long as
it doesn't trigger other sanity checks.
Normally this is fine, but combined with the empty tree check in
check_leaf(), if we hit an empty extent tree, but the root node has
csum tree owner, we can let such extent buffer to sneak in.
Shrink the hole by:
- Do extra eb owner check at tree read time
- Make sure the root owner extent buffer exactly matches the root id.
Unfortunately we can't yet completely patch the hole, there are several
call sites can't pass all info we need:
- For reloc/log trees
Their owner is key::offset, not key::objectid.
We need the full root key to do that accurate check.
For now, we just skip the ownership check for those trees.
- For add_data_references() of relocation
That call site doesn't have any parent/ownership info, as all the
bytenrs are all from btrfs_find_all_leafs().
- For direct backref items walk
Direct backref items records the parent bytenr directly, thus unlike
indirect backref item, we don't do a full tree search.
Thus in that case, we don't have full parent owner to check.
For the later two cases, they all pass 0 as @owner_root, thus we can
skip those cases if @owner_root is 0.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-16 08:05:58 +08:00
int btrfs_check_eb_owner ( const struct extent_buffer * eb , u64 root_owner )
{
const bool is_subvol = is_fstree ( root_owner ) ;
const u64 eb_owner = btrfs_header_owner ( eb ) ;
/*
* Skip dummy fs , as selftests don ' t create unique ebs for each dummy
* root .
*/
2024-04-18 00:47:13 +02:00
if ( btrfs_is_testing ( eb - > fs_info ) )
btrfs: tree-checker: check extent buffer owner against owner rootid
Btrfs doesn't check whether the tree block respects the root owner.
This means, if a tree block referred by a parent in extent tree, but has
owner of 5, btrfs can still continue reading the tree block, as long as
it doesn't trigger other sanity checks.
Normally this is fine, but combined with the empty tree check in
check_leaf(), if we hit an empty extent tree, but the root node has
csum tree owner, we can let such extent buffer to sneak in.
Shrink the hole by:
- Do extra eb owner check at tree read time
- Make sure the root owner extent buffer exactly matches the root id.
Unfortunately we can't yet completely patch the hole, there are several
call sites can't pass all info we need:
- For reloc/log trees
Their owner is key::offset, not key::objectid.
We need the full root key to do that accurate check.
For now, we just skip the ownership check for those trees.
- For add_data_references() of relocation
That call site doesn't have any parent/ownership info, as all the
bytenrs are all from btrfs_find_all_leafs().
- For direct backref items walk
Direct backref items records the parent bytenr directly, thus unlike
indirect backref item, we don't do a full tree search.
Thus in that case, we don't have full parent owner to check.
For the later two cases, they all pass 0 as @owner_root, thus we can
skip those cases if @owner_root is 0.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-16 08:05:58 +08:00
return 0 ;
/*
* There are several call sites ( backref walking , qgroup , and data
* reloc ) passing 0 as @ root_owner , as they are not holding the
* tree root . In that case , we can not do a reliable ownership check ,
* so just exit .
*/
if ( root_owner = = 0 )
return 0 ;
/*
* These trees use key . offset as their owner , our callers don ' t have
* the extra capacity to pass key . offset here . So we just skip them .
*/
if ( root_owner = = BTRFS_TREE_LOG_OBJECTID | |
root_owner = = BTRFS_TREE_RELOC_OBJECTID )
return 0 ;
if ( ! is_subvol ) {
/* For non-subvolume trees, the eb owner should match root owner */
if ( unlikely ( root_owner ! = eb_owner ) ) {
btrfs_crit ( eb - > fs_info ,
" corrupted %s, root=%llu block=%llu owner mismatch, have %llu expect %llu " ,
btrfs_header_level ( eb ) = = 0 ? " leaf " : " node " ,
root_owner , btrfs_header_bytenr ( eb ) , eb_owner ,
root_owner ) ;
return - EUCLEAN ;
}
return 0 ;
}
/*
* For subvolume trees , owners can mismatch , but they should all belong
* to subvolume trees .
*/
if ( unlikely ( is_subvol ! = is_fstree ( eb_owner ) ) ) {
btrfs_crit ( eb - > fs_info ,
" corrupted %s, root=%llu block=%llu owner mismatch, have %llu expect [%llu, %llu] " ,
btrfs_header_level ( eb ) = = 0 ? " leaf " : " node " ,
root_owner , btrfs_header_bytenr ( eb ) , eb_owner ,
BTRFS_FIRST_FREE_OBJECTID , BTRFS_LAST_FREE_OBJECTID ) ;
return - EUCLEAN ;
}
return 0 ;
}
2023-04-29 16:07:17 -04:00
int btrfs_verify_level_key ( struct extent_buffer * eb , int level ,
struct btrfs_key * first_key , u64 parent_transid )
{
struct btrfs_fs_info * fs_info = eb - > fs_info ;
int found_level ;
struct btrfs_key found_key ;
int ret ;
found_level = btrfs_header_level ( eb ) ;
if ( found_level ! = level ) {
WARN ( IS_ENABLED ( CONFIG_BTRFS_DEBUG ) ,
KERN_ERR " BTRFS: tree level check failed \n " ) ;
btrfs_err ( fs_info ,
" tree level mismatch detected, bytenr=%llu level expected=%u has=%u " ,
eb - > start , level , found_level ) ;
return - EIO ;
}
if ( ! first_key )
return 0 ;
/*
* For live tree block ( new tree blocks in current transaction ) ,
* we need proper lock context to avoid race , which is impossible here .
* So we only checks tree blocks which is read from disk , whose
* generation < = fs_info - > last_trans_committed .
*/
2023-10-04 11:38:51 +01:00
if ( btrfs_header_generation ( eb ) > btrfs_get_last_trans_committed ( fs_info ) )
2023-04-29 16:07:17 -04:00
return 0 ;
/* We have @first_key, so this @eb must have at least one item */
if ( btrfs_header_nritems ( eb ) = = 0 ) {
btrfs_err ( fs_info ,
" invalid tree nritems, bytenr=%llu nritems=0 expect >0 " ,
eb - > start ) ;
WARN_ON ( IS_ENABLED ( CONFIG_BTRFS_DEBUG ) ) ;
return - EUCLEAN ;
}
if ( found_level )
btrfs_node_key_to_cpu ( eb , & found_key , 0 ) ;
else
btrfs_item_key_to_cpu ( eb , & found_key , 0 ) ;
ret = btrfs_comp_cpu_keys ( first_key , & found_key ) ;
if ( ret ) {
WARN ( IS_ENABLED ( CONFIG_BTRFS_DEBUG ) ,
KERN_ERR " BTRFS: tree first key check failed \n " ) ;
btrfs_err ( fs_info ,
" tree first key mismatch detected, bytenr=%llu parent_transid=%llu key expected=(%llu,%u,%llu) has=(%llu,%u,%llu) " ,
eb - > start , parent_transid , first_key - > objectid ,
first_key - > type , first_key - > offset ,
found_key . objectid , found_key . type ,
found_key . offset ) ;
}
return ret ;
}