2007-06-12 09:07:21 -04:00
/*
* Copyright ( C ) 2007 Oracle . All rights reserved .
*
* This program is free software ; you can redistribute it and / or
* modify it under the terms of the GNU General Public
* License v2 as published by the Free Software Foundation .
*
* This program is distributed in the hope that it will be useful ,
* but WITHOUT ANY WARRANTY ; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE . See the GNU
* General Public License for more details .
*
* You should have received a copy of the GNU General Public
* License along with this program ; if not , write to the
* Free Software Foundation , Inc . , 59 Temple Place - Suite 330 ,
* Boston , MA 021110 - 1307 , USA .
*/
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
# include <linux/delay.h>
# include <linux/kthread.h>
# include <linux/pagemap.h>
2007-03-20 14:38:32 -04:00
# include "ctree.h"
# include "disk-io.h"
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
# include "free-space-cache.h"
# include "inode-map.h"
2007-03-20 14:38:32 -04:00
# include "transaction.h"
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
static int caching_kthread ( void * data )
{
struct btrfs_root * root = data ;
struct btrfs_fs_info * fs_info = root - > fs_info ;
struct btrfs_free_space_ctl * ctl = root - > free_ino_ctl ;
struct btrfs_key key ;
struct btrfs_path * path ;
struct extent_buffer * leaf ;
u64 last = ( u64 ) - 1 ;
int slot ;
int ret ;
2011-06-03 09:36:29 -04:00
if ( ! btrfs_test_opt ( root , INODE_MAP_CACHE ) )
return 0 ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
path = btrfs_alloc_path ( ) ;
if ( ! path )
return - ENOMEM ;
/* Since the commit root is read-only, we can safely skip locking. */
path - > skip_locking = 1 ;
path - > search_commit_root = 1 ;
path - > reada = 2 ;
key . objectid = BTRFS_FIRST_FREE_OBJECTID ;
key . offset = 0 ;
key . type = BTRFS_INODE_ITEM_KEY ;
again :
/* need to make sure the commit_root doesn't disappear */
mutex_lock ( & root - > fs_commit_mutex ) ;
ret = btrfs_search_slot ( NULL , root , & key , path , 0 , 0 ) ;
if ( ret < 0 )
goto out ;
while ( 1 ) {
2011-05-31 18:07:27 +02:00
if ( btrfs_fs_closing ( fs_info ) )
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
goto out ;
leaf = path - > nodes [ 0 ] ;
slot = path - > slots [ 0 ] ;
2011-05-26 06:38:30 +00:00
if ( slot > = btrfs_header_nritems ( leaf ) ) {
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
ret = btrfs_next_leaf ( root , path ) ;
if ( ret < 0 )
goto out ;
else if ( ret > 0 )
break ;
if ( need_resched ( ) | |
btrfs_transaction_in_commit ( fs_info ) ) {
leaf = path - > nodes [ 0 ] ;
if ( btrfs_header_nritems ( leaf ) = = 0 ) {
WARN_ON ( 1 ) ;
break ;
}
/*
* Save the key so we can advances forward
* in the next search .
*/
btrfs_item_key_to_cpu ( leaf , & key , 0 ) ;
2011-05-22 12:33:42 -04:00
btrfs_release_path ( path ) ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
root - > cache_progress = last ;
mutex_unlock ( & root - > fs_commit_mutex ) ;
schedule_timeout ( 1 ) ;
goto again ;
} else
continue ;
}
btrfs_item_key_to_cpu ( leaf , & key , slot ) ;
if ( key . type ! = BTRFS_INODE_ITEM_KEY )
goto next ;
2011-05-26 06:38:30 +00:00
if ( key . objectid > = root - > highest_objectid )
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
break ;
if ( last ! = ( u64 ) - 1 & & last + 1 ! = key . objectid ) {
__btrfs_add_free_space ( ctl , last + 1 ,
key . objectid - last - 1 ) ;
wake_up ( & root - > cache_wait ) ;
}
last = key . objectid ;
next :
path - > slots [ 0 ] + + ;
}
2011-05-26 06:38:30 +00:00
if ( last < root - > highest_objectid - 1 ) {
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
__btrfs_add_free_space ( ctl , last + 1 ,
2011-05-26 06:38:30 +00:00
root - > highest_objectid - last - 1 ) ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
}
spin_lock ( & root - > cache_lock ) ;
root - > cached = BTRFS_CACHE_FINISHED ;
spin_unlock ( & root - > cache_lock ) ;
root - > cache_progress = ( u64 ) - 1 ;
btrfs_unpin_free_ino ( root ) ;
out :
wake_up ( & root - > cache_wait ) ;
mutex_unlock ( & root - > fs_commit_mutex ) ;
btrfs_free_path ( path ) ;
return ret ;
}
static void start_caching ( struct btrfs_root * root )
{
2011-05-26 06:38:30 +00:00
struct btrfs_free_space_ctl * ctl = root - > free_ino_ctl ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
struct task_struct * tsk ;
2011-04-20 10:33:24 +08:00
int ret ;
2011-05-26 06:38:30 +00:00
u64 objectid ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
2011-06-03 09:36:29 -04:00
if ( ! btrfs_test_opt ( root , INODE_MAP_CACHE ) )
return ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
spin_lock ( & root - > cache_lock ) ;
if ( root - > cached ! = BTRFS_CACHE_NO ) {
spin_unlock ( & root - > cache_lock ) ;
return ;
}
root - > cached = BTRFS_CACHE_STARTED ;
spin_unlock ( & root - > cache_lock ) ;
2011-04-20 10:33:24 +08:00
ret = load_free_ino_cache ( root - > fs_info , root ) ;
if ( ret = = 1 ) {
spin_lock ( & root - > cache_lock ) ;
root - > cached = BTRFS_CACHE_FINISHED ;
spin_unlock ( & root - > cache_lock ) ;
return ;
}
2011-05-26 06:38:30 +00:00
/*
* It can be quite time - consuming to fill the cache by searching
* through the extent tree , and this can keep ino allocation path
* waiting . Therefore at start we quickly find out the highest
* inode number and we know we can use inode numbers which fall in
* [ highest_ino + 1 , BTRFS_LAST_FREE_OBJECTID ] .
*/
ret = btrfs_find_free_objectid ( root , & objectid ) ;
if ( ! ret & & objectid < = BTRFS_LAST_FREE_OBJECTID ) {
__btrfs_add_free_space ( ctl , objectid ,
BTRFS_LAST_FREE_OBJECTID - objectid + 1 ) ;
}
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
tsk = kthread_run ( caching_kthread , root , " btrfs-ino-cache-%llu \n " ,
root - > root_key . objectid ) ;
2012-03-12 16:03:00 +01:00
BUG_ON ( IS_ERR ( tsk ) ) ; /* -ENOMEM */
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
}
int btrfs_find_free_ino ( struct btrfs_root * root , u64 * objectid )
{
2011-06-03 09:36:29 -04:00
if ( ! btrfs_test_opt ( root , INODE_MAP_CACHE ) )
return btrfs_find_free_objectid ( root , objectid ) ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
again :
* objectid = btrfs_find_ino_for_alloc ( root ) ;
if ( * objectid ! = 0 )
return 0 ;
start_caching ( root ) ;
wait_event ( root - > cache_wait ,
root - > cached = = BTRFS_CACHE_FINISHED | |
root - > free_ino_ctl - > free_space > 0 ) ;
if ( root - > cached = = BTRFS_CACHE_FINISHED & &
root - > free_ino_ctl - > free_space = = 0 )
return - ENOSPC ;
else
goto again ;
}
void btrfs_return_ino ( struct btrfs_root * root , u64 objectid )
{
struct btrfs_free_space_ctl * ctl = root - > free_ino_ctl ;
struct btrfs_free_space_ctl * pinned = root - > free_ino_pinned ;
2011-06-03 09:36:29 -04:00
if ( ! btrfs_test_opt ( root , INODE_MAP_CACHE ) )
return ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
again :
if ( root - > cached = = BTRFS_CACHE_FINISHED ) {
__btrfs_add_free_space ( ctl , objectid , 1 ) ;
} else {
/*
* If we are in the process of caching free ino chunks ,
* to avoid adding the same inode number to the free_ino
* tree twice due to cross transaction , we ' ll leave it
* in the pinned tree until a transaction is committed
* or the caching work is done .
*/
mutex_lock ( & root - > fs_commit_mutex ) ;
spin_lock ( & root - > cache_lock ) ;
if ( root - > cached = = BTRFS_CACHE_FINISHED ) {
spin_unlock ( & root - > cache_lock ) ;
mutex_unlock ( & root - > fs_commit_mutex ) ;
goto again ;
}
spin_unlock ( & root - > cache_lock ) ;
start_caching ( root ) ;
2011-05-26 06:38:30 +00:00
if ( objectid < = root - > cache_progress | |
objectid > root - > highest_objectid )
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
__btrfs_add_free_space ( ctl , objectid , 1 ) ;
else
__btrfs_add_free_space ( pinned , objectid , 1 ) ;
mutex_unlock ( & root - > fs_commit_mutex ) ;
}
}
/*
* When a transaction is committed , we ' ll move those inode numbers which
* are smaller than root - > cache_progress from pinned tree to free_ino tree ,
* and others will just be dropped , because the commit root we were
* searching has changed .
*
* Must be called with root - > fs_commit_mutex held
*/
void btrfs_unpin_free_ino ( struct btrfs_root * root )
{
struct btrfs_free_space_ctl * ctl = root - > free_ino_ctl ;
struct rb_root * rbroot = & root - > free_ino_pinned - > free_space_offset ;
struct btrfs_free_space * info ;
struct rb_node * n ;
u64 count ;
2011-06-03 09:36:29 -04:00
if ( ! btrfs_test_opt ( root , INODE_MAP_CACHE ) )
return ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
while ( 1 ) {
n = rb_first ( rbroot ) ;
if ( ! n )
break ;
info = rb_entry ( n , struct btrfs_free_space , offset_index ) ;
2012-03-12 16:03:00 +01:00
BUG_ON ( info - > bitmap ) ; /* Logic error */
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
if ( info - > offset > root - > cache_progress )
goto free ;
else if ( info - > offset + info - > bytes > root - > cache_progress )
count = root - > cache_progress - info - > offset + 1 ;
else
count = info - > bytes ;
__btrfs_add_free_space ( ctl , info - > offset , count ) ;
free :
rb_erase ( & info - > offset_index , rbroot ) ;
kfree ( info ) ;
}
}
# define INIT_THRESHOLD (((1024 * 32) / 2) / sizeof(struct btrfs_free_space))
# define INODES_PER_BITMAP (PAGE_CACHE_SIZE * 8)
/*
* The goal is to keep the memory used by the free_ino tree won ' t
* exceed the memory if we use bitmaps only .
*/
static void recalculate_thresholds ( struct btrfs_free_space_ctl * ctl )
{
struct btrfs_free_space * info ;
struct rb_node * n ;
int max_ino ;
int max_bitmaps ;
n = rb_last ( & ctl - > free_space_offset ) ;
if ( ! n ) {
ctl - > extents_thresh = INIT_THRESHOLD ;
return ;
}
info = rb_entry ( n , struct btrfs_free_space , offset_index ) ;
/*
* Find the maximum inode number in the filesystem . Note we
* ignore the fact that this can be a bitmap , because we are
* not doing precise calculation .
*/
max_ino = info - > bytes - 1 ;
max_bitmaps = ALIGN ( max_ino , INODES_PER_BITMAP ) / INODES_PER_BITMAP ;
if ( max_bitmaps < = ctl - > total_bitmaps ) {
ctl - > extents_thresh = 0 ;
return ;
}
ctl - > extents_thresh = ( max_bitmaps - ctl - > total_bitmaps ) *
PAGE_CACHE_SIZE / sizeof ( * info ) ;
}
/*
* We don ' t fall back to bitmap , if we are below the extents threshold
* or this chunk of inode numbers is a big one .
*/
static bool use_bitmap ( struct btrfs_free_space_ctl * ctl ,
struct btrfs_free_space * info )
{
if ( ctl - > free_extents < ctl - > extents_thresh | |
info - > bytes > INODES_PER_BITMAP / 10 )
return false ;
return true ;
}
static struct btrfs_free_space_op free_ino_op = {
. recalc_thresholds = recalculate_thresholds ,
. use_bitmap = use_bitmap ,
} ;
static void pinned_recalc_thresholds ( struct btrfs_free_space_ctl * ctl )
{
}
static bool pinned_use_bitmap ( struct btrfs_free_space_ctl * ctl ,
struct btrfs_free_space * info )
{
/*
* We always use extents for two reasons :
*
* - The pinned tree is only used during the process of caching
* work .
* - Make code simpler . See btrfs_unpin_free_ino ( ) .
*/
return false ;
}
static struct btrfs_free_space_op pinned_free_ino_op = {
. recalc_thresholds = pinned_recalc_thresholds ,
. use_bitmap = pinned_use_bitmap ,
} ;
void btrfs_init_free_ino_ctl ( struct btrfs_root * root )
{
struct btrfs_free_space_ctl * ctl = root - > free_ino_ctl ;
struct btrfs_free_space_ctl * pinned = root - > free_ino_pinned ;
spin_lock_init ( & ctl - > tree_lock ) ;
ctl - > unit = 1 ;
ctl - > start = 0 ;
ctl - > private = NULL ;
ctl - > op = & free_ino_op ;
/*
* Initially we allow to use 16 K of ram to cache chunks of
* inode numbers before we resort to bitmaps . This is somewhat
* arbitrary , but it will be adjusted in runtime .
*/
ctl - > extents_thresh = INIT_THRESHOLD ;
spin_lock_init ( & pinned - > tree_lock ) ;
pinned - > unit = 1 ;
pinned - > start = 0 ;
pinned - > private = NULL ;
pinned - > extents_thresh = 0 ;
pinned - > op = & pinned_free_ino_op ;
}
2011-04-20 10:33:24 +08:00
int btrfs_save_ino_cache ( struct btrfs_root * root ,
struct btrfs_trans_handle * trans )
{
struct btrfs_free_space_ctl * ctl = root - > free_ino_ctl ;
struct btrfs_path * path ;
struct inode * inode ;
2011-11-10 20:45:04 -05:00
struct btrfs_block_rsv * rsv ;
u64 num_bytes ;
2011-04-20 10:33:24 +08:00
u64 alloc_hint = 0 ;
int ret ;
int prealloc ;
bool retry = false ;
2011-06-01 09:42:49 +00:00
/* only fs tree and subvol/snap needs ino cache */
if ( root - > root_key . objectid ! = BTRFS_FS_TREE_OBJECTID & &
( root - > root_key . objectid < BTRFS_FIRST_FREE_OBJECTID | |
root - > root_key . objectid > BTRFS_LAST_FREE_OBJECTID ) )
return 0 ;
2011-05-31 19:33:33 +00:00
/* Don't save inode cache if we are deleting this root */
if ( btrfs_root_refs ( & root - > root_item ) = = 0 & &
root ! = root - > fs_info - > tree_root )
return 0 ;
2011-06-03 09:36:29 -04:00
if ( ! btrfs_test_opt ( root , INODE_MAP_CACHE ) )
return 0 ;
2011-04-20 10:33:24 +08:00
path = btrfs_alloc_path ( ) ;
if ( ! path )
return - ENOMEM ;
2011-06-03 09:36:29 -04:00
2011-11-10 20:45:04 -05:00
rsv = trans - > block_rsv ;
trans - > block_rsv = & root - > fs_info - > trans_block_rsv ;
num_bytes = trans - > bytes_reserved ;
/*
* 1 item for inode item insertion if need
2013-05-13 13:55:09 +00:00
* 4 items for inode item update ( in the worst case )
* 1 items for slack space if we need do truncation
2011-11-10 20:45:04 -05:00
* 1 item for free space object
* 3 items for pre - allocation
*/
2013-05-13 13:55:09 +00:00
trans - > bytes_reserved = btrfs_calc_trans_metadata_size ( root , 10 ) ;
Btrfs: improve the noflush reservation
In some places(such as: evicting inode), we just can not flush the reserved
space of delalloc, flushing the delayed directory index and delayed inode
is OK, but we don't try to flush those things and just go back when there is
no enough space to be reserved. This patch fixes this problem.
We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL.
If we can in the transaction, we should not flush anything, or the deadlock
would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc
would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used,
and we will flush all things.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-10-16 11:33:38 +00:00
ret = btrfs_block_rsv_add ( root , trans - > block_rsv ,
trans - > bytes_reserved ,
BTRFS_RESERVE_NO_FLUSH ) ;
2011-11-10 20:45:04 -05:00
if ( ret )
goto out ;
2012-02-24 10:39:05 -05:00
trace_btrfs_space_reservation ( root - > fs_info , " ino_cache " ,
2012-03-29 09:57:44 -04:00
trans - > transid , trans - > bytes_reserved , 1 ) ;
2011-04-20 10:33:24 +08:00
again :
inode = lookup_free_ino_inode ( root , path ) ;
2012-03-12 16:03:00 +01:00
if ( IS_ERR ( inode ) & & ( PTR_ERR ( inode ) ! = - ENOENT | | retry ) ) {
2011-04-20 10:33:24 +08:00
ret = PTR_ERR ( inode ) ;
2011-11-10 20:45:04 -05:00
goto out_release ;
2011-04-20 10:33:24 +08:00
}
if ( IS_ERR ( inode ) ) {
2012-03-12 16:03:00 +01:00
BUG_ON ( retry ) ; /* Logic error */
2011-04-20 10:33:24 +08:00
retry = true ;
ret = create_free_ino_inode ( root , trans , path ) ;
if ( ret )
2011-11-10 20:45:04 -05:00
goto out_release ;
2011-04-20 10:33:24 +08:00
goto again ;
}
BTRFS_I ( inode ) - > generation = 0 ;
ret = btrfs_update_inode ( trans , root , inode ) ;
2012-03-12 16:03:00 +01:00
if ( ret ) {
btrfs_abort_transaction ( trans , root , ret ) ;
goto out_put ;
}
2011-04-20 10:33:24 +08:00
if ( i_size_read ( inode ) > 0 ) {
ret = btrfs_truncate_free_space_cache ( root , trans , path , inode ) ;
2012-03-12 16:03:00 +01:00
if ( ret ) {
2013-05-13 13:55:08 +00:00
if ( ret ! = - ENOSPC )
btrfs_abort_transaction ( trans , root , ret ) ;
2011-04-20 10:33:24 +08:00
goto out_put ;
2012-03-12 16:03:00 +01:00
}
2011-04-20 10:33:24 +08:00
}
spin_lock ( & root - > cache_lock ) ;
if ( root - > cached ! = BTRFS_CACHE_FINISHED ) {
ret = - 1 ;
spin_unlock ( & root - > cache_lock ) ;
goto out_put ;
}
spin_unlock ( & root - > cache_lock ) ;
spin_lock ( & ctl - > tree_lock ) ;
prealloc = sizeof ( struct btrfs_free_space ) * ctl - > free_extents ;
prealloc = ALIGN ( prealloc , PAGE_CACHE_SIZE ) ;
prealloc + = ctl - > total_bitmaps * PAGE_CACHE_SIZE ;
spin_unlock ( & ctl - > tree_lock ) ;
/* Just to make sure we have enough space */
prealloc + = 8 * PAGE_CACHE_SIZE ;
2011-08-30 10:19:10 -04:00
ret = btrfs_delalloc_reserve_space ( inode , prealloc ) ;
2011-04-20 10:33:24 +08:00
if ( ret )
goto out_put ;
ret = btrfs_prealloc_file_range_trans ( inode , trans , 0 , 0 , prealloc ,
prealloc , prealloc , & alloc_hint ) ;
2011-08-30 10:19:10 -04:00
if ( ret ) {
btrfs_delalloc_release_space ( inode , prealloc ) ;
2011-04-20 10:33:24 +08:00
goto out_put ;
2011-08-30 10:19:10 -04:00
}
2011-04-20 10:33:24 +08:00
btrfs_free_reserved_data_space ( inode , prealloc ) ;
2011-11-10 20:45:04 -05:00
ret = btrfs_write_out_ino_cache ( root , trans , path ) ;
2011-04-20 10:33:24 +08:00
out_put :
iput ( inode ) ;
2011-11-10 20:45:04 -05:00
out_release :
2012-02-24 10:39:05 -05:00
trace_btrfs_space_reservation ( root - > fs_info , " ino_cache " ,
2012-03-29 09:57:44 -04:00
trans - > transid , trans - > bytes_reserved , 0 ) ;
2011-11-10 20:45:04 -05:00
btrfs_block_rsv_release ( root , trans - > block_rsv , trans - > bytes_reserved ) ;
2011-04-20 10:33:24 +08:00
out :
2011-11-10 20:45:04 -05:00
trans - > block_rsv = rsv ;
trans - > bytes_reserved = num_bytes ;
2011-04-20 10:33:24 +08:00
btrfs_free_path ( path ) ;
return ret ;
}
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
static int btrfs_find_highest_objectid ( struct btrfs_root * root , u64 * objectid )
2007-04-05 13:35:25 -04:00
{
struct btrfs_path * path ;
int ret ;
2007-10-15 16:14:19 -04:00
struct extent_buffer * l ;
2007-04-05 13:35:25 -04:00
struct btrfs_key search_key ;
2007-10-15 16:14:19 -04:00
struct btrfs_key found_key ;
2007-04-05 13:35:25 -04:00
int slot ;
path = btrfs_alloc_path ( ) ;
2011-03-23 08:14:16 +00:00
if ( ! path )
return - ENOMEM ;
2007-04-05 13:35:25 -04:00
2008-09-05 16:43:53 -04:00
search_key . objectid = BTRFS_LAST_FREE_OBJECTID ;
search_key . type = - 1 ;
2007-04-05 13:35:25 -04:00
search_key . offset = ( u64 ) - 1 ;
ret = btrfs_search_slot ( NULL , root , & search_key , path , 0 , 0 ) ;
if ( ret < 0 )
goto error ;
2012-03-12 16:03:00 +01:00
BUG_ON ( ret = = 0 ) ; /* Corruption */
2007-04-05 13:35:25 -04:00
if ( path - > slots [ 0 ] > 0 ) {
slot = path - > slots [ 0 ] - 1 ;
2007-10-15 16:14:19 -04:00
l = path - > nodes [ 0 ] ;
btrfs_item_key_to_cpu ( l , & found_key , slot ) ;
2009-09-21 15:56:00 -04:00
* objectid = max_t ( u64 , found_key . objectid ,
BTRFS_FIRST_FREE_OBJECTID - 1 ) ;
2007-04-05 13:35:25 -04:00
} else {
2009-09-21 15:56:00 -04:00
* objectid = BTRFS_FIRST_FREE_OBJECTID - 1 ;
2007-04-05 13:35:25 -04:00
}
ret = 0 ;
error :
btrfs_free_path ( path ) ;
return ret ;
}
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
int btrfs_find_free_objectid ( struct btrfs_root * root , u64 * objectid )
2007-03-20 14:38:32 -04:00
{
int ret ;
2008-06-25 16:01:30 -04:00
mutex_lock ( & root - > objectid_mutex ) ;
2007-03-20 14:38:32 -04:00
2009-09-21 15:56:00 -04:00
if ( unlikely ( root - > highest_objectid < BTRFS_FIRST_FREE_OBJECTID ) ) {
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
ret = btrfs_find_highest_objectid ( root ,
& root - > highest_objectid ) ;
2009-09-21 15:56:00 -04:00
if ( ret )
goto out ;
}
2008-09-26 10:05:38 -04:00
2009-09-21 15:56:00 -04:00
if ( unlikely ( root - > highest_objectid > = BTRFS_LAST_FREE_OBJECTID ) ) {
ret = - ENOSPC ;
goto out ;
2007-03-20 14:38:32 -04:00
}
2009-09-21 15:56:00 -04:00
* objectid = + + root - > highest_objectid ;
ret = 0 ;
out :
2008-06-25 16:01:30 -04:00
mutex_unlock ( & root - > objectid_mutex ) ;
2007-03-20 14:38:32 -04:00
return ret ;
}