2007-06-12 09:07:21 -04:00
/*
* Copyright ( C ) 2007 Oracle . All rights reserved .
*
* This program is free software ; you can redistribute it and / or
* modify it under the terms of the GNU General Public
* License v2 as published by the Free Software Foundation .
*
* This program is distributed in the hope that it will be useful ,
* but WITHOUT ANY WARRANTY ; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE . See the GNU
* General Public License for more details .
*
* You should have received a copy of the GNU General Public
* License along with this program ; if not , write to the
* Free Software Foundation , Inc . , 59 Temple Place - Suite 330 ,
* Boston , MA 021110 - 1307 , USA .
*/
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
# include <linux/delay.h>
# include <linux/kthread.h>
# include <linux/pagemap.h>
2007-03-20 14:38:32 -04:00
# include "ctree.h"
# include "disk-io.h"
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
# include "free-space-cache.h"
# include "inode-map.h"
2007-03-20 14:38:32 -04:00
# include "transaction.h"
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
static int caching_kthread ( void * data )
{
struct btrfs_root * root = data ;
struct btrfs_fs_info * fs_info = root - > fs_info ;
struct btrfs_free_space_ctl * ctl = root - > free_ino_ctl ;
struct btrfs_key key ;
struct btrfs_path * path ;
struct extent_buffer * leaf ;
u64 last = ( u64 ) - 1 ;
int slot ;
int ret ;
2011-06-03 09:36:29 -04:00
if ( ! btrfs_test_opt ( root , INODE_MAP_CACHE ) )
return 0 ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
path = btrfs_alloc_path ( ) ;
if ( ! path )
return - ENOMEM ;
/* Since the commit root is read-only, we can safely skip locking. */
path - > skip_locking = 1 ;
path - > search_commit_root = 1 ;
path - > reada = 2 ;
key . objectid = BTRFS_FIRST_FREE_OBJECTID ;
key . offset = 0 ;
key . type = BTRFS_INODE_ITEM_KEY ;
again :
/* need to make sure the commit_root doesn't disappear */
mutex_lock ( & root - > fs_commit_mutex ) ;
ret = btrfs_search_slot ( NULL , root , & key , path , 0 , 0 ) ;
if ( ret < 0 )
goto out ;
while ( 1 ) {
2011-05-31 18:07:27 +02:00
if ( btrfs_fs_closing ( fs_info ) )
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
goto out ;
leaf = path - > nodes [ 0 ] ;
slot = path - > slots [ 0 ] ;
2011-05-26 06:38:30 +00:00
if ( slot > = btrfs_header_nritems ( leaf ) ) {
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
ret = btrfs_next_leaf ( root , path ) ;
if ( ret < 0 )
goto out ;
else if ( ret > 0 )
break ;
if ( need_resched ( ) | |
btrfs_transaction_in_commit ( fs_info ) ) {
leaf = path - > nodes [ 0 ] ;
if ( btrfs_header_nritems ( leaf ) = = 0 ) {
WARN_ON ( 1 ) ;
break ;
}
/*
* Save the key so we can advances forward
* in the next search .
*/
btrfs_item_key_to_cpu ( leaf , & key , 0 ) ;
2011-05-22 12:33:42 -04:00
btrfs_release_path ( path ) ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
root - > cache_progress = last ;
mutex_unlock ( & root - > fs_commit_mutex ) ;
schedule_timeout ( 1 ) ;
goto again ;
} else
continue ;
}
btrfs_item_key_to_cpu ( leaf , & key , slot ) ;
if ( key . type ! = BTRFS_INODE_ITEM_KEY )
goto next ;
2011-05-26 06:38:30 +00:00
if ( key . objectid > = root - > highest_objectid )
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
break ;
if ( last ! = ( u64 ) - 1 & & last + 1 ! = key . objectid ) {
__btrfs_add_free_space ( ctl , last + 1 ,
key . objectid - last - 1 ) ;
wake_up ( & root - > cache_wait ) ;
}
last = key . objectid ;
next :
path - > slots [ 0 ] + + ;
}
2011-05-26 06:38:30 +00:00
if ( last < root - > highest_objectid - 1 ) {
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
__btrfs_add_free_space ( ctl , last + 1 ,
2011-05-26 06:38:30 +00:00
root - > highest_objectid - last - 1 ) ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
}
spin_lock ( & root - > cache_lock ) ;
root - > cached = BTRFS_CACHE_FINISHED ;
spin_unlock ( & root - > cache_lock ) ;
root - > cache_progress = ( u64 ) - 1 ;
btrfs_unpin_free_ino ( root ) ;
out :
wake_up ( & root - > cache_wait ) ;
mutex_unlock ( & root - > fs_commit_mutex ) ;
btrfs_free_path ( path ) ;
return ret ;
}
static void start_caching ( struct btrfs_root * root )
{
2011-05-26 06:38:30 +00:00
struct btrfs_free_space_ctl * ctl = root - > free_ino_ctl ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
struct task_struct * tsk ;
2011-04-20 10:33:24 +08:00
int ret ;
2011-05-26 06:38:30 +00:00
u64 objectid ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
2011-06-03 09:36:29 -04:00
if ( ! btrfs_test_opt ( root , INODE_MAP_CACHE ) )
return ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
spin_lock ( & root - > cache_lock ) ;
if ( root - > cached ! = BTRFS_CACHE_NO ) {
spin_unlock ( & root - > cache_lock ) ;
return ;
}
root - > cached = BTRFS_CACHE_STARTED ;
spin_unlock ( & root - > cache_lock ) ;
2011-04-20 10:33:24 +08:00
ret = load_free_ino_cache ( root - > fs_info , root ) ;
if ( ret = = 1 ) {
spin_lock ( & root - > cache_lock ) ;
root - > cached = BTRFS_CACHE_FINISHED ;
spin_unlock ( & root - > cache_lock ) ;
return ;
}
2011-05-26 06:38:30 +00:00
/*
* It can be quite time - consuming to fill the cache by searching
* through the extent tree , and this can keep ino allocation path
* waiting . Therefore at start we quickly find out the highest
* inode number and we know we can use inode numbers which fall in
* [ highest_ino + 1 , BTRFS_LAST_FREE_OBJECTID ] .
*/
ret = btrfs_find_free_objectid ( root , & objectid ) ;
if ( ! ret & & objectid < = BTRFS_LAST_FREE_OBJECTID ) {
__btrfs_add_free_space ( ctl , objectid ,
BTRFS_LAST_FREE_OBJECTID - objectid + 1 ) ;
}
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
tsk = kthread_run ( caching_kthread , root , " btrfs-ino-cache-%llu \n " ,
root - > root_key . objectid ) ;
BUG_ON ( IS_ERR ( tsk ) ) ;
}
int btrfs_find_free_ino ( struct btrfs_root * root , u64 * objectid )
{
2011-06-03 09:36:29 -04:00
if ( ! btrfs_test_opt ( root , INODE_MAP_CACHE ) )
return btrfs_find_free_objectid ( root , objectid ) ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
again :
* objectid = btrfs_find_ino_for_alloc ( root ) ;
if ( * objectid ! = 0 )
return 0 ;
start_caching ( root ) ;
wait_event ( root - > cache_wait ,
root - > cached = = BTRFS_CACHE_FINISHED | |
root - > free_ino_ctl - > free_space > 0 ) ;
if ( root - > cached = = BTRFS_CACHE_FINISHED & &
root - > free_ino_ctl - > free_space = = 0 )
return - ENOSPC ;
else
goto again ;
}
void btrfs_return_ino ( struct btrfs_root * root , u64 objectid )
{
struct btrfs_free_space_ctl * ctl = root - > free_ino_ctl ;
struct btrfs_free_space_ctl * pinned = root - > free_ino_pinned ;
2011-06-03 09:36:29 -04:00
if ( ! btrfs_test_opt ( root , INODE_MAP_CACHE ) )
return ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
again :
if ( root - > cached = = BTRFS_CACHE_FINISHED ) {
__btrfs_add_free_space ( ctl , objectid , 1 ) ;
} else {
/*
* If we are in the process of caching free ino chunks ,
* to avoid adding the same inode number to the free_ino
* tree twice due to cross transaction , we ' ll leave it
* in the pinned tree until a transaction is committed
* or the caching work is done .
*/
mutex_lock ( & root - > fs_commit_mutex ) ;
spin_lock ( & root - > cache_lock ) ;
if ( root - > cached = = BTRFS_CACHE_FINISHED ) {
spin_unlock ( & root - > cache_lock ) ;
mutex_unlock ( & root - > fs_commit_mutex ) ;
goto again ;
}
spin_unlock ( & root - > cache_lock ) ;
start_caching ( root ) ;
2011-05-26 06:38:30 +00:00
if ( objectid < = root - > cache_progress | |
objectid > root - > highest_objectid )
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
__btrfs_add_free_space ( ctl , objectid , 1 ) ;
else
__btrfs_add_free_space ( pinned , objectid , 1 ) ;
mutex_unlock ( & root - > fs_commit_mutex ) ;
}
}
/*
* When a transaction is committed , we ' ll move those inode numbers which
* are smaller than root - > cache_progress from pinned tree to free_ino tree ,
* and others will just be dropped , because the commit root we were
* searching has changed .
*
* Must be called with root - > fs_commit_mutex held
*/
void btrfs_unpin_free_ino ( struct btrfs_root * root )
{
struct btrfs_free_space_ctl * ctl = root - > free_ino_ctl ;
struct rb_root * rbroot = & root - > free_ino_pinned - > free_space_offset ;
struct btrfs_free_space * info ;
struct rb_node * n ;
u64 count ;
2011-06-03 09:36:29 -04:00
if ( ! btrfs_test_opt ( root , INODE_MAP_CACHE ) )
return ;
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
while ( 1 ) {
n = rb_first ( rbroot ) ;
if ( ! n )
break ;
info = rb_entry ( n , struct btrfs_free_space , offset_index ) ;
BUG_ON ( info - > bitmap ) ;
if ( info - > offset > root - > cache_progress )
goto free ;
else if ( info - > offset + info - > bytes > root - > cache_progress )
count = root - > cache_progress - info - > offset + 1 ;
else
count = info - > bytes ;
__btrfs_add_free_space ( ctl , info - > offset , count ) ;
free :
rb_erase ( & info - > offset_index , rbroot ) ;
kfree ( info ) ;
}
}
# define INIT_THRESHOLD (((1024 * 32) / 2) / sizeof(struct btrfs_free_space))
# define INODES_PER_BITMAP (PAGE_CACHE_SIZE * 8)
/*
* The goal is to keep the memory used by the free_ino tree won ' t
* exceed the memory if we use bitmaps only .
*/
static void recalculate_thresholds ( struct btrfs_free_space_ctl * ctl )
{
struct btrfs_free_space * info ;
struct rb_node * n ;
int max_ino ;
int max_bitmaps ;
n = rb_last ( & ctl - > free_space_offset ) ;
if ( ! n ) {
ctl - > extents_thresh = INIT_THRESHOLD ;
return ;
}
info = rb_entry ( n , struct btrfs_free_space , offset_index ) ;
/*
* Find the maximum inode number in the filesystem . Note we
* ignore the fact that this can be a bitmap , because we are
* not doing precise calculation .
*/
max_ino = info - > bytes - 1 ;
max_bitmaps = ALIGN ( max_ino , INODES_PER_BITMAP ) / INODES_PER_BITMAP ;
if ( max_bitmaps < = ctl - > total_bitmaps ) {
ctl - > extents_thresh = 0 ;
return ;
}
ctl - > extents_thresh = ( max_bitmaps - ctl - > total_bitmaps ) *
PAGE_CACHE_SIZE / sizeof ( * info ) ;
}
/*
* We don ' t fall back to bitmap , if we are below the extents threshold
* or this chunk of inode numbers is a big one .
*/
static bool use_bitmap ( struct btrfs_free_space_ctl * ctl ,
struct btrfs_free_space * info )
{
if ( ctl - > free_extents < ctl - > extents_thresh | |
info - > bytes > INODES_PER_BITMAP / 10 )
return false ;
return true ;
}
static struct btrfs_free_space_op free_ino_op = {
. recalc_thresholds = recalculate_thresholds ,
. use_bitmap = use_bitmap ,
} ;
static void pinned_recalc_thresholds ( struct btrfs_free_space_ctl * ctl )
{
}
static bool pinned_use_bitmap ( struct btrfs_free_space_ctl * ctl ,
struct btrfs_free_space * info )
{
/*
* We always use extents for two reasons :
*
* - The pinned tree is only used during the process of caching
* work .
* - Make code simpler . See btrfs_unpin_free_ino ( ) .
*/
return false ;
}
static struct btrfs_free_space_op pinned_free_ino_op = {
. recalc_thresholds = pinned_recalc_thresholds ,
. use_bitmap = pinned_use_bitmap ,
} ;
void btrfs_init_free_ino_ctl ( struct btrfs_root * root )
{
struct btrfs_free_space_ctl * ctl = root - > free_ino_ctl ;
struct btrfs_free_space_ctl * pinned = root - > free_ino_pinned ;
spin_lock_init ( & ctl - > tree_lock ) ;
ctl - > unit = 1 ;
ctl - > start = 0 ;
ctl - > private = NULL ;
ctl - > op = & free_ino_op ;
/*
* Initially we allow to use 16 K of ram to cache chunks of
* inode numbers before we resort to bitmaps . This is somewhat
* arbitrary , but it will be adjusted in runtime .
*/
ctl - > extents_thresh = INIT_THRESHOLD ;
spin_lock_init ( & pinned - > tree_lock ) ;
pinned - > unit = 1 ;
pinned - > start = 0 ;
pinned - > private = NULL ;
pinned - > extents_thresh = 0 ;
pinned - > op = & pinned_free_ino_op ;
}
2011-04-20 10:33:24 +08:00
int btrfs_save_ino_cache ( struct btrfs_root * root ,
struct btrfs_trans_handle * trans )
{
struct btrfs_free_space_ctl * ctl = root - > free_ino_ctl ;
struct btrfs_path * path ;
struct inode * inode ;
2011-11-10 20:45:04 -05:00
struct btrfs_block_rsv * rsv ;
u64 num_bytes ;
2011-04-20 10:33:24 +08:00
u64 alloc_hint = 0 ;
int ret ;
int prealloc ;
bool retry = false ;
2011-06-01 09:42:49 +00:00
/* only fs tree and subvol/snap needs ino cache */
if ( root - > root_key . objectid ! = BTRFS_FS_TREE_OBJECTID & &
( root - > root_key . objectid < BTRFS_FIRST_FREE_OBJECTID | |
root - > root_key . objectid > BTRFS_LAST_FREE_OBJECTID ) )
return 0 ;
2011-05-31 19:33:33 +00:00
/* Don't save inode cache if we are deleting this root */
if ( btrfs_root_refs ( & root - > root_item ) = = 0 & &
root ! = root - > fs_info - > tree_root )
return 0 ;
2011-06-03 09:36:29 -04:00
if ( ! btrfs_test_opt ( root , INODE_MAP_CACHE ) )
return 0 ;
2011-04-20 10:33:24 +08:00
path = btrfs_alloc_path ( ) ;
if ( ! path )
return - ENOMEM ;
2011-06-03 09:36:29 -04:00
2011-11-10 20:45:04 -05:00
rsv = trans - > block_rsv ;
trans - > block_rsv = & root - > fs_info - > trans_block_rsv ;
num_bytes = trans - > bytes_reserved ;
/*
* 1 item for inode item insertion if need
* 3 items for inode item update ( in the worst case )
* 1 item for free space object
* 3 items for pre - allocation
*/
trans - > bytes_reserved = btrfs_calc_trans_metadata_size ( root , 8 ) ;
ret = btrfs_block_rsv_add_noflush ( root , trans - > block_rsv ,
trans - > bytes_reserved ) ;
if ( ret )
goto out ;
2011-04-20 10:33:24 +08:00
again :
inode = lookup_free_ino_inode ( root , path ) ;
if ( IS_ERR ( inode ) & & PTR_ERR ( inode ) ! = - ENOENT ) {
ret = PTR_ERR ( inode ) ;
2011-11-10 20:45:04 -05:00
goto out_release ;
2011-04-20 10:33:24 +08:00
}
if ( IS_ERR ( inode ) ) {
BUG_ON ( retry ) ;
retry = true ;
ret = create_free_ino_inode ( root , trans , path ) ;
if ( ret )
2011-11-10 20:45:04 -05:00
goto out_release ;
2011-04-20 10:33:24 +08:00
goto again ;
}
BTRFS_I ( inode ) - > generation = 0 ;
ret = btrfs_update_inode ( trans , root , inode ) ;
WARN_ON ( ret ) ;
if ( i_size_read ( inode ) > 0 ) {
ret = btrfs_truncate_free_space_cache ( root , trans , path , inode ) ;
if ( ret )
goto out_put ;
}
spin_lock ( & root - > cache_lock ) ;
if ( root - > cached ! = BTRFS_CACHE_FINISHED ) {
ret = - 1 ;
spin_unlock ( & root - > cache_lock ) ;
goto out_put ;
}
spin_unlock ( & root - > cache_lock ) ;
spin_lock ( & ctl - > tree_lock ) ;
prealloc = sizeof ( struct btrfs_free_space ) * ctl - > free_extents ;
prealloc = ALIGN ( prealloc , PAGE_CACHE_SIZE ) ;
prealloc + = ctl - > total_bitmaps * PAGE_CACHE_SIZE ;
spin_unlock ( & ctl - > tree_lock ) ;
/* Just to make sure we have enough space */
prealloc + = 8 * PAGE_CACHE_SIZE ;
2011-08-30 10:19:10 -04:00
ret = btrfs_delalloc_reserve_space ( inode , prealloc ) ;
2011-04-20 10:33:24 +08:00
if ( ret )
goto out_put ;
ret = btrfs_prealloc_file_range_trans ( inode , trans , 0 , 0 , prealloc ,
prealloc , prealloc , & alloc_hint ) ;
2011-08-30 10:19:10 -04:00
if ( ret ) {
btrfs_delalloc_release_space ( inode , prealloc ) ;
2011-04-20 10:33:24 +08:00
goto out_put ;
2011-08-30 10:19:10 -04:00
}
2011-04-20 10:33:24 +08:00
btrfs_free_reserved_data_space ( inode , prealloc ) ;
2011-11-10 20:45:04 -05:00
ret = btrfs_write_out_ino_cache ( root , trans , path ) ;
2011-04-20 10:33:24 +08:00
out_put :
iput ( inode ) ;
2011-11-10 20:45:04 -05:00
out_release :
btrfs_block_rsv_release ( root , trans - > block_rsv , trans - > bytes_reserved ) ;
2011-04-20 10:33:24 +08:00
out :
2011-11-10 20:45:04 -05:00
trans - > block_rsv = rsv ;
trans - > bytes_reserved = num_bytes ;
2011-04-20 10:33:24 +08:00
btrfs_free_path ( path ) ;
return ret ;
}
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
static int btrfs_find_highest_objectid ( struct btrfs_root * root , u64 * objectid )
2007-04-05 13:35:25 -04:00
{
struct btrfs_path * path ;
int ret ;
2007-10-15 16:14:19 -04:00
struct extent_buffer * l ;
2007-04-05 13:35:25 -04:00
struct btrfs_key search_key ;
2007-10-15 16:14:19 -04:00
struct btrfs_key found_key ;
2007-04-05 13:35:25 -04:00
int slot ;
path = btrfs_alloc_path ( ) ;
2011-03-23 08:14:16 +00:00
if ( ! path )
return - ENOMEM ;
2007-04-05 13:35:25 -04:00
2008-09-05 16:43:53 -04:00
search_key . objectid = BTRFS_LAST_FREE_OBJECTID ;
search_key . type = - 1 ;
2007-04-05 13:35:25 -04:00
search_key . offset = ( u64 ) - 1 ;
ret = btrfs_search_slot ( NULL , root , & search_key , path , 0 , 0 ) ;
if ( ret < 0 )
goto error ;
BUG_ON ( ret = = 0 ) ;
if ( path - > slots [ 0 ] > 0 ) {
slot = path - > slots [ 0 ] - 1 ;
2007-10-15 16:14:19 -04:00
l = path - > nodes [ 0 ] ;
btrfs_item_key_to_cpu ( l , & found_key , slot ) ;
2009-09-21 15:56:00 -04:00
* objectid = max_t ( u64 , found_key . objectid ,
BTRFS_FIRST_FREE_OBJECTID - 1 ) ;
2007-04-05 13:35:25 -04:00
} else {
2009-09-21 15:56:00 -04:00
* objectid = BTRFS_FIRST_FREE_OBJECTID - 1 ;
2007-04-05 13:35:25 -04:00
}
ret = 0 ;
error :
btrfs_free_path ( path ) ;
return ret ;
}
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
int btrfs_find_free_objectid ( struct btrfs_root * root , u64 * objectid )
2007-03-20 14:38:32 -04:00
{
int ret ;
2008-06-25 16:01:30 -04:00
mutex_lock ( & root - > objectid_mutex ) ;
2007-03-20 14:38:32 -04:00
2009-09-21 15:56:00 -04:00
if ( unlikely ( root - > highest_objectid < BTRFS_FIRST_FREE_OBJECTID ) ) {
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
ret = btrfs_find_highest_objectid ( root ,
& root - > highest_objectid ) ;
2009-09-21 15:56:00 -04:00
if ( ret )
goto out ;
}
2008-09-26 10:05:38 -04:00
2009-09-21 15:56:00 -04:00
if ( unlikely ( root - > highest_objectid > = BTRFS_LAST_FREE_OBJECTID ) ) {
ret = - ENOSPC ;
goto out ;
2007-03-20 14:38:32 -04:00
}
2009-09-21 15:56:00 -04:00
* objectid = + + root - > highest_objectid ;
ret = 0 ;
out :
2008-06-25 16:01:30 -04:00
mutex_unlock ( & root - > objectid_mutex ) ;
2007-03-20 14:38:32 -04:00
return ret ;
}