IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
btrfs scrub - make fixups sync, don't reuse fixup bios
Fixups are already sync for csum failures, this patch makes them sync
for EIO case as well.
Fixups are now sharing pages with the parent sbio - instead of
allocating a separate page to do a fixup we grab the page from the sbio
buffer.
Fixup bios are no longer reused.
struct fixup is no longer needed, instead pass [sbio pointer, index].
Originally this was added to look at the possibility of sharing the code
between drive swap and scrub, but it actually fixes a serious bug in
scrub code where errors that could be corrected were ignored and
reported as uncorrectable.
btrfs scrub - restore bios properly after media errors
The current code reallocates a bio after a media error. This is a
temporary measure introduced in v3 after a serious problem related to
bio reuse was found in v2 of scrub patchset.
Basically we did not reset bv_offset and bv_len fields of the bio_vec
structure. They are changed in case I/O error happens, for example, at
offset 512 or 1024 into the page. Also bi_flags field wasn't properly
setup before reusing the bio.
Signed-off-by: Arne Jansen <sensille@gmx.net>
adds ioctls necessary to start and cancel scrubs, to get current
progress and to get info about devices to be scrubbed.
Note that the scrub is done per-device and that the ioctl only
returns after the scrub for this devices is finished or has been
canceled.
Signed-off-by: Arne Jansen <sensille@gmx.net>
This adds an initial implementation for scrub. It works quite
straightforward. The usermode issues an ioctl for each device in the
fs. For each device, it enumerates the allocated device chunks. For
each chunk, the contained extents are enumerated and the data checksums
fetched. The extents are read sequentially and the checksums verified.
If an error occurs (checksum or EIO), a good copy is searched for. If
one is found, the bad copy will be rewritten.
All enumerations happen from the commit roots. During a transaction
commit, the scrubs get paused and afterwards continue from the new
roots.
This commit is based on the series originally posted to linux-btrfs
with some improvements that resulted from comments from David Sterba,
Ilya Dryomov and Jan Schmidt.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Remove code which has been #if0-ed out for a very long time and does not
seem to be related to current codebase anymore.
Signed-off-by: David Sterba <dsterba@suse.cz>
Remove static and global declarations and/or definitions. Reduces size
of btrfs.ko by ~3.4kB.
text data bss dec hex filename
402081 7464 200 409745 64091 btrfs.ko.base
398620 7144 200 405964 631cc btrfs.ko.remove-all
Signed-off-by: David Sterba <dsterba@suse.cz>
parameter tree root it's not used since commit
5f39d397dfbe140a14edecd4e73c34ce23c4f9ee ("Btrfs: Create extent_buffer
interface for large blocksizes")
Signed-off-by: David Sterba <dsterba@suse.cz>
all callers pass GFP_NOFS, but the GFP mask argument is not used in the
function; GFP_ATOMIC is passed to radix tree initialization and it's the
only correct one, since we're using the preload/insert mechanism of
radix tree.
Let's drop the gfp mask from btrfs function, this will not change
behaviour.
Signed-off-by: David Sterba <dsterba@suse.cz>
The superblock's ->s_fs_info is properly set in btrfs_fill_super, after
a call to open_ctree, which derefereces it before check. Although
tree_root is set via btrfs_set_super, let's be defensive and leave the
check in place.
Signed-off-by: David Sterba <dsterba@suse.cz>
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: cleanup error handling in inode.c
Btrfs: put the right bio if we have an error
Btrfs: free bitmaps properly when evicting the cache
Btrfs: Free free_space item properly in btrfs_trim_block_group()
btrfs: add missing spin_unlock to a rare exit path
Btrfs: check return value of kmalloc()
btrfs: fix wrong allocating flag when reading page
Btrfs: fix missing mutex_unlock in btrfs_del_dir_entries_in_log()
The error processing of several places is changed like setting the
error number only at the error.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
In btrfs_submit_direct_hook if the first btrfs_map_block fails we need to put
the orig_bio, not bio.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
If our space cache is wrong, we do the right thing and free up everything that
we loaded, however we don't reset the total_bitmaps counter or the thresholds or
anything. So in btrfs_remove_free_space_cache make sure to call free_bitmap()
if it's a bitmap, this will keep us from panicing when we check to make sure we
don't have too many bitmaps. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Since commit dc89e9824464e91fa0b06267864ceabe3186fd8b, we've changed
to use a specific slab for alocation of free_space items.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The check on the return value of kmalloc() is added to some places.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
the space cache use extent_readpages() to read free space information,
so we can not use GFP_KERNEL flag to allocate memory, or it may lead
to deadlock.
Signed-off-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
It is necessary to unlock mutex_lock before it return an error when
btrfs_alloc_path() fails.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This is similar to block group caching.
We dedicate a special inode in fs tree to save free ino cache.
At the very first time we create/delete a file after mount, the free ino
cache will be loaded from disk into memory. When the fs tree is commited,
the cache will be written back to disk.
To keep compatibility, we check the root generation against the generation
of the special inode when loading the cache, so the loading will fail
if the btrfs filesystem was mounted in an older kernel before.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
There's a potential problem in 32bit system when we exhaust 32bit inode
numbers and start to allocate big inode numbers, because btrfs uses
inode->i_ino in many places.
So here we always use BTRFS_I(inode)->location.objectid, which is an
u64 variable.
There are 2 exceptions that BTRFS_I(inode)->location.objectid !=
inode->i_ino: the btree inode (0 vs 1) and empty subvol dirs (256 vs 2),
and inode->i_ino will be used in those cases.
Another reason to make this change is I'm going to use a special inode
to save free ino cache, and the inode number must be > (u64)-256.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Extract out block group specific code from lookup_free_space_inode(),
create_free_space_inode(), load_free_space_cache() and
btrfs_write_out_cache(), so the code can be used to read/write
free ino cache.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
So we can re-use the code to cache free inode numbers.
The change is quite straightforward. Two new structures are introduced.
- struct btrfs_free_space_ctl
We move those variables that are used for caching free space from
struct btrfs_block_group_cache to this new struct.
- struct btrfs_free_space_op
We do block group specific work (e.g. calculation of extents threshold)
through functions registered in this struct.
And then we can remove references to struct btrfs_block_group_cache.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
The Btrfs submit bio threads have a small number of
threads responsible for pushing down bios we've collected
for a large number of devices.
Since we do all the bios for a single device at once,
we want to make sure we unplug and send down the bios
for each device as we're done processing them.
The new plugging API removed the btrfs code to
unplug while processing bios, this adds it back with
the new API.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits)
Btrfs: fix free space cache leak
Btrfs: avoid taking the chunk_mutex in do_chunk_alloc
Btrfs end_bio_extent_readpage should look for locked bits
Btrfs: don't force chunk allocation in find_free_extent
Btrfs: Check validity before setting an acl
Btrfs: Fix incorrect inode nlink in btrfs_link()
Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir()
Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr()
Btrfs: make uncache_state unconditional
btrfs: using cached extent_state in set/unlock combinations
Btrfs: avoid taking the trans_mutex in btrfs_end_transaction
Btrfs: fix subvolume mount by name problem when default mount subvolume is set
fix user annotation in ioctl.c
Btrfs: check for duplicate iov_base's when doing dio reads
btrfs: properly handle overlapping areas in memmove_extent_buffer
Btrfs: fix memory leaks in btrfs_new_inode()
Btrfs: check for duplicate iov_base's when doing dio reads
Btrfs: reuse the extent_map we found when calling btrfs_get_extent
Btrfs: do not use async submit for small DIO io's
Btrfs: don't split dio bios if we don't have to
...
The free space caching code was recently reworked to
cache all the pages it needed instead of using find_get_page everywhere.
One loop was missed though, so it ended up leaking pages. This fixes
it to use our page array instead of find_get_page.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Everytime we try to allocate disk space we try and see if we can pre-emptively
allocate a chunk, but in the common case we don't allocate anything, so there is
no sense in taking the chunk_mutex at all. So instead if we are allocating a
chunk, mark it in the space_info so we don't get two people trying to allocate
at the same time. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Reviewed-by: Liu Bo <liubo2009@cn.fujitsu.com>
A recent commit caches the extent state in end_bio_extent_readpage,
but the search it does should look for locked extents. This
fixes things to make it more effective.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
find_free_extent likes to allocate in contiguous clusters,
which makes writeback faster, especially on SSD storage. As
the FS fragments, these clusters become harder to find and we have
to decide between allocating a new chunk to make more clusters
or giving up on the cluster to allocate from the free space
we have.
Right now it creates too many chunks, and you can end up with
a whole FS that is mostly empty metadata chunks. This commit
changes the allocation code to be more strict and only
allocate new chunks when we've made good use of the chunks we
already have.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Call posix_acl_valid() to check if an acl is valid or not.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Link count of the inode is not decreased if btrfs_set_inode_index()
fails.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Singed-off-by: Li Zefan <lizf@cn.fujitsu.com>
btrfs_next_leaf() can return -errno, and we should propagate
it to userspace.
This also simplifies how we walk the btree path.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
btrfs_next_leaf() can return -errno, and we should propagate
it to userspace.
This also simplifies how we walk the btree path.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>