Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 221 (stat prefetch implementation) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=221
155 lines
6.8 KiB
Plaintext
155 lines
6.8 KiB
Plaintext
what is stat-prefetch?
|
|
======================
|
|
It is a translator which caches the dentries read in readdir. This dentry
|
|
list is stored in the context of fd. Later when lookup happens on
|
|
[parent-inode, basename (path)] combination, this list is searched for the
|
|
basename. The dentry thus searched is used to fill up the stat corresponding
|
|
to path being looked upon, thereby short-cutting lookup calls. This cache is
|
|
preserved till closedir is called on the fd. The purpose of this translator
|
|
is to optimize operations like 'ls -l', where a readdir is followed by
|
|
lookup (stat) calls on each directory entry.
|
|
|
|
1. stat-prefetch harnesses the efficiency of short lookup calls
|
|
(saves network roundtrip time for lookup calls from being accounted to
|
|
the stat call).
|
|
2. To maintain the correctness, it does lookup-behind - lookup is winded to
|
|
underlying translators after it is unwound to upper translators.
|
|
lookup-behind is necessary as inode gets populated in server inode table
|
|
only in lookup-cbk and also because various translators store their
|
|
contexts in inode contexts during lookup calls.
|
|
|
|
fops to be implemented:
|
|
=======================
|
|
* lookup
|
|
1. check the dentry cache stored in context of fds opened by the same process
|
|
on parent inode for basename. If found unwind with cached stat, else wind
|
|
the lookup call to underlying translators.
|
|
2. stat is stored in the context of inode if the path being looked upon
|
|
happens to be directory. This stat will be used to fill postparent stat
|
|
when lookup happens on any of the directory contents.
|
|
|
|
* readdir
|
|
1. cache the direntries returned in readdir_cbk in the context of fd.
|
|
2. if the readdir is happening on non-expected offsets (means a seekdir/rewinddir
|
|
has happened), cache has to be flushed.
|
|
3. delete the entry corresponding to basename of path on which fd is opened
|
|
from cache stored in parent.
|
|
|
|
* chmod/fchmod
|
|
delete the entry corresponding to basename from cache stored in context of
|
|
fds opened on parent inode, since these calls change st_mode and st_ctime of
|
|
stat.
|
|
|
|
* chown/fchown
|
|
delete the entry corresponding to basename from cache stored in context of
|
|
fds opened on parent inode, since these calls change st_uid/st_gid and
|
|
st_ctime of stat.
|
|
|
|
* truncate/ftruncate
|
|
delete the entry corresponding to basename from cache stored in context of
|
|
fds opened on parent inode, since these calls change st_size/st_mtime of stat.
|
|
|
|
* utimens
|
|
delete the entry corresponding to basename from cache stored in context of
|
|
fds opened on parent inode, since this call changes st_atime/st_mtime of stat.
|
|
|
|
* readlink
|
|
delete the entry corresponding to basename from cache stored in context of fds
|
|
opened on parent inode, since this call changes st_atime of stat.
|
|
|
|
* unlink
|
|
1. delete the entry corresponding to basename from cache stored in context of
|
|
fds, opened on parent directory containing the file being unlinked.
|
|
2. delete the entry corresponding to basename of parent directory from cache
|
|
of grand-parent.
|
|
|
|
* rmdir
|
|
1. delete the entry corresponding to basename from cache stored in context of
|
|
fds opened on parent inode.
|
|
2. remove the entire cache from all fds opened on inode corresponding to
|
|
directory being removed.
|
|
3. delete the entry correspondig to basename of parent from cache stored in
|
|
grand-parent.
|
|
|
|
* readv
|
|
delete the entry corresponding to basename from cache stored in context of fds
|
|
opened on parent inode, since readv changes st_atime of file.
|
|
|
|
* writev
|
|
delete the entry corresponding to basename from cache stored in context of fds
|
|
opened on parent inode, since writev can possibly change st_size and definitely
|
|
changes st_mtime of file.
|
|
|
|
* fsync
|
|
there is a confusion here as to whether fsync updates mtime/ctimes. Disk based
|
|
filesystems (atleast ext2) just writes the times stored in inode to disk
|
|
during fsync and not the time at which fsync is being done. But in glusterfs,
|
|
a translator like write-behind actually sends writes during fsync which will
|
|
change mtime/ctime. Hence stat-prefetch implements fsync to delete the entry
|
|
corresponding to basename from cache stored in context of fds opened on parent
|
|
inode.
|
|
|
|
* rename
|
|
1. remove entry corresponding to oldname from cache stored in fd contexts of
|
|
oldparent.
|
|
2. remove entry corresponding to newname from cache stored in fd contexts of
|
|
newparent.
|
|
3. remove entry corresponding to oldparent from cache stored in
|
|
old-grand-parent, since removing oldname changes st_mtime and st_ctime
|
|
of oldparent stat.
|
|
4. remove entry corresponding to newparent from cache stored in
|
|
new-grand-parent, since adding newname changes st_mtime and st_ctime
|
|
of newparent stat.
|
|
5. if oldname happens to be a directory, remove entire cache from all fds
|
|
opened on it.
|
|
|
|
* create/mknod/mkdir/symlink/link
|
|
delete entry corresponding to basename of parent directory in which these
|
|
operations are happening, from cache stored in context of fds opened on
|
|
grand-parent, since adding a new entry to a directory changes st_mtime
|
|
and st_ctime of parent directory.
|
|
|
|
* setxattr/removexattr
|
|
delete the entry corresponding to basename from cache stored in context of
|
|
fds opened on parent inode, since setxattr changes st_ctime of file.
|
|
|
|
* setdents
|
|
1. remove entry corresponding to basename of path on which fd is opened from
|
|
cache stored in context of fds opened on parent.
|
|
2. for each of the entry in the direntry list, delete from cache stored in
|
|
context of fd, the entry corresponding to basename of path being passed.
|
|
|
|
* getdents
|
|
1. remove entry corresponding to basename of path on which fd is opened from
|
|
cache stored in parent, since getdents changes st_atime.
|
|
2. remove entries corresponding to symbolic links from cache, since readlink
|
|
would've changed st_atime.
|
|
|
|
* checksum
|
|
delete the entry corresponding to basename from cache stored in context of
|
|
fds opened on parent inode, since st_atime is changed during this call.
|
|
|
|
* xattrop/fxattrop
|
|
delete the entry corresponding to basename from cache stored in context of fds
|
|
opened on parent inode, since these calls modify st_ctime of file.
|
|
|
|
callbacks to be implemented:
|
|
============================
|
|
* releasedir
|
|
free the context stored in fd.
|
|
|
|
* forget
|
|
dree the stat if the inode corresponds to a directory.
|
|
|
|
limitations:
|
|
============
|
|
* since a readdir does not return extended attributes of file, if need_xattr is
|
|
set, short-cutting of lookup does not happen and lookup is passed to
|
|
underlying translators.
|
|
|
|
* posix_readdir does not check whether the dentries are spanning across multiple
|
|
mount points. Hence it is not transforming inode numbers in stat buffers if
|
|
posix is configured to allow export directory spanning on multiple mountpoints.
|
|
This is a bug which needs to be fixed. posix_readdir should treat dentries the
|
|
same way as if lookup is happening on dentries.
|