doc: documentation of inode and dentry management
Change-Id: Ica510752d011596e8ecff5ea13c4b2bbf76ba186 BUG: 1145475 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/8815 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
This commit is contained in:
parent
b2ae71dbcf
commit
2855dff243
226
doc/hacker-guide/en-US/markdown/inode.md
Normal file
226
doc/hacker-guide/en-US/markdown/inode.md
Normal file
@ -0,0 +1,226 @@
|
||||
#Inode and dentry management in GlusterFS:
|
||||
|
||||
##Background
|
||||
Filesystems internally refer to files and directories via inodes. Inodes
|
||||
are unique identifiers of the entities stored in a filesystem. Whenever an
|
||||
application has to operate on a file/directory (read/modify), the filesystem
|
||||
maps that file/directory to the right inode and start referring to that inode
|
||||
whenever an operation has to be performed on the file/directory.
|
||||
|
||||
In GlusterFS a new inode gets created whenever a new file/directory is created
|
||||
OR when a successful lookup is done on a file/directory for the first time.
|
||||
Inodes in GlusterFS are maintained by the inode table which gets initiated when
|
||||
the filesystem daemon is started (both for the brick process as well as the
|
||||
mount process). Below are some important data structures for inode management.
|
||||
|
||||
## Data-structure (inode-table)
|
||||
```
|
||||
struct _inode_table {
|
||||
pthread_mutex_t lock;
|
||||
size_t hashsize; /* bucket size of inode hash and dentry hash */
|
||||
char *name; /* name of the inode table, just for gf_log() */
|
||||
inode_t *root; /* root directory inode, with inode
|
||||
number and gfid 1 */
|
||||
xlator_t *xl; /* xlator to be called to do purge and
|
||||
the xlator which maintains the inode table*/
|
||||
uint32_t lru_limit; /* maximum LRU cache size */
|
||||
struct list_head *inode_hash; /* buckets for inode hash table */
|
||||
struct list_head *name_hash; /* buckets for dentry hash table */
|
||||
struct list_head active; /* list of inodes currently active (in an fop) */
|
||||
uint32_t active_size; /* count of inodes in active list */
|
||||
struct list_head lru; /* list of inodes recently used.
|
||||
lru.next most recent */
|
||||
uint32_t lru_size; /* count of inodes in lru list */
|
||||
struct list_head purge; /* list of inodes to be purged soon */
|
||||
uint32_t purge_size; /* count of inodes in purge list */
|
||||
|
||||
struct mem_pool *inode_pool; /* memory pool for inodes */
|
||||
struct mem_pool *dentry_pool; /* memory pool for dentrys */
|
||||
struct mem_pool *fd_mem_pool; /* memory pool for fd_t */
|
||||
int ctxcount; /* number of slots in inode->ctx */
|
||||
};
|
||||
```
|
||||
|
||||
#Life-cycle
|
||||
```
|
||||
|
||||
inode_table_new (size_t lru_limit, xlator_t *xl)
|
||||
|
||||
This is a function which allocates a new inode table. Usually the top xlators in
|
||||
the graph such as protocol/server (for bricks), fuse and nfs (for fuse and nfs
|
||||
mounts) and libgfapi do inode managements. Hence they are the ones which will
|
||||
allocate a new inode table by calling the above function.
|
||||
|
||||
Each xlator graph in glusterfs maintains an inode table. So in fuse clients,
|
||||
whenever there is a graph change due to add brick/remove brick or
|
||||
addition/removal of some other xlators, a new graph is created which creates a
|
||||
new inode table.
|
||||
|
||||
Thus an allocated inode table is destroyed only when the filesystem daemon is
|
||||
killed or unmounted.
|
||||
|
||||
```
|
||||
|
||||
#what it contains.
|
||||
```
|
||||
|
||||
Inode table in glusterfs mainly contains a hash table for maintaining inodes.
|
||||
In general a file/directory is considered to be existing if there is a
|
||||
corresponding inode present in the inode table. If a inode for a file/directory
|
||||
cannot be found in the inode table, glusterfs tries to resolve it by sending a
|
||||
lookup on the entry for which the inode is needed. If lookup is successful, then
|
||||
a new inode correponding to the entry is added to the hash table present in the
|
||||
inode table. Thus an inode present in the hash-table means, its an existing
|
||||
file/directory within the filesystem. The inode table also contains the hash
|
||||
size of the hash table (as of now it is hard coded to 14057. The hash value of
|
||||
a inode is calculated using its gfid).
|
||||
|
||||
Apart from the hash table, inode table also maintains 3 important list of inodes
|
||||
1) Active list:
|
||||
Active list contains all the active inodes (i.e inodes which are currently part
|
||||
of some fop).
|
||||
2) Lru list:
|
||||
Least recently used inodes list. A limit can be set for the size of the lru
|
||||
list. For bricks it is 16384 and for clients it is infinity.
|
||||
3) Purge list:
|
||||
List of all the inodes which have to be purged (i.e inodes which have to be
|
||||
deleted from the inode table due to unlink/rmdir/forget).
|
||||
|
||||
And at last it also contains the mem-pool for allocating inodes, dentries so
|
||||
that frequent malloc/calloc and free of the data structures can be avoided.
|
||||
```
|
||||
|
||||
#Data structure (inode)
|
||||
```
|
||||
struct _inode {
|
||||
inode_table_t *table; /* the table this inode belongs to */
|
||||
uuid_t gfid; /* unique identifier of the inode */
|
||||
gf_lock_t lock;
|
||||
uint64_t nlookup;
|
||||
uint32_t fd_count; /* Open fd count */
|
||||
uint32_t ref; /* reference count on this inode */
|
||||
ia_type_t ia_type; /* what kind of file */
|
||||
struct list_head fd_list; /* list of open files on this inode */
|
||||
struct list_head dentry_list; /* list of directory entries for this inode */
|
||||
struct list_head hash; /* hash table pointers */
|
||||
struct list_head list; /* active/lru/purge */
|
||||
|
||||
struct _inode_ctx *_ctx; /* place holder for keeping the
|
||||
information about the inode by different xlators */
|
||||
};
|
||||
|
||||
As said above, inodes are internal way of identifying the files/directories. A
|
||||
inode uniquely represents a file/directory. A new inode is created whenever a
|
||||
create/mkdir/symlink/mknod operations are performed. Apart from that a new inode
|
||||
is created upon the successful fresh lookup of a file/directory. Say the
|
||||
filesystem contained some file "a" within root and the filesystem was
|
||||
unmounted. Now when glusterfs is mounted and some operation is perfomed on "/a",
|
||||
glusterfs tries to get the inode for the entry "a" with parent inode as
|
||||
root. But, since glusterfs just came up, it will not be able to find the inode
|
||||
for "a" and will send a lookup on "/a". If the lookup operation succeeds (i.e.
|
||||
the root of glusterfs contains an entry called "a"), then a new inode for "/a"
|
||||
is created and added to the inode table.
|
||||
|
||||
Depending upon the situation, an inode can be in one of the 3 lists maintained
|
||||
by the inode table. If some fop is happening on the inode, then the inode will
|
||||
be present in the active inodes list maintained by the inode table. Active
|
||||
inodes are those inodes whose refcount is greater than zero. Whenever some
|
||||
operation comes on a file/directory, and the resolver tries to find the inode
|
||||
for it, it increments the refcount of the inode before returning the inode. The
|
||||
refcount of an inode can be incremented by calling the below function
|
||||
|
||||
inode_ref (inode_t *inode)
|
||||
|
||||
Any xlator which wants to operate on a inode as part of some fop (or wants the
|
||||
inode in the callback), should hold a ref on the inode.
|
||||
Once the fop is completed before sending the reply of the fop to the above
|
||||
layers , the inode has to be unrefed. When the refcount of an inode becomes
|
||||
zero, it is removed from the active inodes list and put into LRU list maintained
|
||||
by the inode table. Thus in short if some fop is happening on a file/directory,
|
||||
the corresponding inode will be in the active list or it will be in the LRU
|
||||
list.
|
||||
```
|
||||
|
||||
#Life Cycle
|
||||
|
||||
A new inode is created whenever a new file/directory/symlink is created OR a
|
||||
successful lookup of an existing entry is done. The xlators which does inode
|
||||
management (as of now protocol/server, fuse, nfs, gfapi) will perform inode_link
|
||||
operation upon successful lookup or successful creation of a new entry.
|
||||
|
||||
inode_link (inode_t *inode, inode_t *parent, const char *name,
|
||||
struct iatt *buf);
|
||||
|
||||
inode_link actually adds the inode to the inode table (to be precise it adds
|
||||
the inode to the hash table maintained by the inode table. The hash value is
|
||||
calculated based on the gfid). Copies the gfid to the inode (the gfid is
|
||||
present in the iatt structure). Creates a dentry with the new name.
|
||||
|
||||
A inode is removed from the inode table and eventually destroyed when unlink
|
||||
or rmdir operation is performed on a file/directory, or the the lru limit of
|
||||
the inode table has been exceeded.
|
||||
|
||||
#Data structure (dentry)
|
||||
```
|
||||
|
||||
struct _dentry {
|
||||
struct list_head inode_list; /* list of dentries of inode */
|
||||
struct list_head hash; /* hash table pointers */
|
||||
inode_t *inode; /* inode of this directory entry */
|
||||
char *name; /* name of the directory entry */
|
||||
inode_t *parent; /* directory of the entry */
|
||||
};
|
||||
|
||||
A dentry is the presence of an entry for a file/directory within its parent
|
||||
directory. A dentry usually points to the inode to which it belongs to. In
|
||||
glusterfs a dentry contains the following fields.
|
||||
1) a hook using which it can add itself to the list of
|
||||
the dentries maintained by the inode to which it points to.
|
||||
2) A hash table pointer.
|
||||
3) Pointer to the inode to which it belongs to.
|
||||
4) Name of the dentry
|
||||
5) Pointer to the inode of the parent directory in which the dentry is present
|
||||
|
||||
A new dentry is created when a new file/directory/symlink is created or a hard
|
||||
link to an existing file is created.
|
||||
|
||||
__dentry_create (inode_t *inode, inode_t *parent, const char *name);
|
||||
|
||||
A dentry holds a refcount on the parent
|
||||
directory so that the parent inode is never removed from the active inode's list
|
||||
and put to the lru list (If the lru limit of the lru list is exceeded, there is
|
||||
a chance of parent inode being destroyed. To avoid it, the dentries hold a
|
||||
reference to the parent inode). A dentry is removed whenevern a unlink/rmdir
|
||||
is perfomed on a file/directory. Or when the lru limit has been exceeded, the
|
||||
oldest inodes are purged out of the inode table, during which all the dentries
|
||||
of the inode are removed.
|
||||
|
||||
Whenever a unlink/rmdir comes on a file/directory, the corresponding inode
|
||||
should be removed from the inode table. So upon unlink/rmdir, the inode will
|
||||
be moved to the purge list maintained by the inode table and from there it is
|
||||
destroyed. To be more specific, if a inode has to be destroyed, its refcount
|
||||
and nlookup count both should become 0. For refcount to become 0, the inode
|
||||
should not be part of any fop (there should not be any open fds). Or if the
|
||||
inode belongs to a directory, then there should not be any fop happening on the
|
||||
directory and it should not contain any dentries within it. For nlookup count to
|
||||
become zero, a forget has to be sent on the inode with nlookup count set to 0 as
|
||||
an argument. For fuse clients, forget is sent by the kernel itself whenever a
|
||||
unlink/rmdir is performed. But for brick processes, upon unlink/rmdir, the
|
||||
protocol/server itself has to do inode_forget. Whenever the inode has to be
|
||||
deleted due to file removal or lru limit being exceeded the inode is retired
|
||||
(i.e. all the dentries of the inode are deleted and the inode is moved to the
|
||||
purge list maintained by the inode table), the nlookup count is set to 0 via
|
||||
inode_forget api. The inode table, then prunes all the inodes from the purge
|
||||
list by destroying the inode contexts maintained by each xlator.
|
||||
|
||||
unlinking of the dentry is done via inode_unlink;
|
||||
|
||||
void
|
||||
inode_unlink (inode_t *inode, inode_t *parent, const char *name);
|
||||
|
||||
If the inode has multiple hard links, then the unlink operation performed by
|
||||
the application results just in the removal of the dentry with the name provided
|
||||
by the application. For the inode to be removed, all the dentries of the inode
|
||||
should be unlinked.
|
||||
```
|
||||
|
Loading…
x
Reference in New Issue
Block a user