Merge branch 'stable/cleancache.v13' into linux-next
* stable/cleancache.v13: mm: cleancache: Use __read_mostly as appropiate. mm: cleancache: report statistics via debugfs instead of sysfs. mm: zcache/tmem/cleancache: s/flush/invalidate/ mm: cleancache: s/flush/invalidate/
This commit is contained in:
@ -1,11 +0,0 @@
|
||||
What: /sys/kernel/mm/cleancache/
|
||||
Date: April 2011
|
||||
Contact: Dan Magenheimer <dan.magenheimer@oracle.com>
|
||||
Description:
|
||||
/sys/kernel/mm/cleancache/ contains a number of files which
|
||||
record a count of various cleancache operations
|
||||
(sum across all filesystems):
|
||||
succ_gets
|
||||
failed_gets
|
||||
puts
|
||||
flushes
|
@ -46,10 +46,11 @@ a negative return value indicates failure. A "put_page" will copy a
|
||||
the pool id, a file key, and a page index into the file. (The combination
|
||||
of a pool id, a file key, and an index is sometimes called a "handle".)
|
||||
A "get_page" will copy the page, if found, from cleancache into kernel memory.
|
||||
A "flush_page" will ensure the page no longer is present in cleancache;
|
||||
a "flush_inode" will flush all pages associated with the specified file;
|
||||
and, when a filesystem is unmounted, a "flush_fs" will flush all pages in
|
||||
all files specified by the given pool id and also surrender the pool id.
|
||||
An "invalidate_page" will ensure the page no longer is present in cleancache;
|
||||
an "invalidate_inode" will invalidate all pages associated with the specified
|
||||
file; and, when a filesystem is unmounted, an "invalidate_fs" will invalidate
|
||||
all pages in all files specified by the given pool id and also surrender
|
||||
the pool id.
|
||||
|
||||
An "init_shared_fs", like init_fs, obtains a pool id but tells cleancache
|
||||
to treat the pool as shared using a 128-bit UUID as a key. On systems
|
||||
@ -62,12 +63,12 @@ of the kernel (e.g. by "tools" that control cleancache). Or a
|
||||
cleancache implementation can simply disable shared_init by always
|
||||
returning a negative value.
|
||||
|
||||
If a get_page is successful on a non-shared pool, the page is flushed (thus
|
||||
making cleancache an "exclusive" cache). On a shared pool, the page
|
||||
is NOT flushed on a successful get_page so that it remains accessible to
|
||||
If a get_page is successful on a non-shared pool, the page is invalidated
|
||||
(thus making cleancache an "exclusive" cache). On a shared pool, the page
|
||||
is NOT invalidated on a successful get_page so that it remains accessible to
|
||||
other sharers. The kernel is responsible for ensuring coherency between
|
||||
cleancache (shared or not), the page cache, and the filesystem, using
|
||||
cleancache flush operations as required.
|
||||
cleancache invalidate operations as required.
|
||||
|
||||
Note that cleancache must enforce put-put-get coherency and get-get
|
||||
coherency. For the former, if two puts are made to the same handle but
|
||||
@ -77,20 +78,20 @@ if a get for a given handle fails, subsequent gets for that handle will
|
||||
never succeed unless preceded by a successful put with that handle.
|
||||
|
||||
Last, cleancache provides no SMP serialization guarantees; if two
|
||||
different Linux threads are simultaneously putting and flushing a page
|
||||
different Linux threads are simultaneously putting and invalidating a page
|
||||
with the same handle, the results are indeterminate. Callers must
|
||||
lock the page to ensure serial behavior.
|
||||
|
||||
CLEANCACHE PERFORMANCE METRICS
|
||||
|
||||
Cleancache monitoring is done by sysfs files in the
|
||||
/sys/kernel/mm/cleancache directory. The effectiveness of cleancache
|
||||
If properly configured, monitoring of cleancache is done via debugfs in
|
||||
the /sys/kernel/debug/mm/cleancache directory. The effectiveness of cleancache
|
||||
can be measured (across all filesystems) with:
|
||||
|
||||
succ_gets - number of gets that were successful
|
||||
failed_gets - number of gets that failed
|
||||
puts - number of puts attempted (all "succeed")
|
||||
flushes - number of flushes attempted
|
||||
invalidates - number of invalidates attempted
|
||||
|
||||
A backend implementatation may provide additional metrics.
|
||||
|
||||
@ -143,7 +144,7 @@ systems.
|
||||
|
||||
The core hooks for cleancache in VFS are in most cases a single line
|
||||
and the minimum set are placed precisely where needed to maintain
|
||||
coherency (via cleancache_flush operations) between cleancache,
|
||||
coherency (via cleancache_invalidate operations) between cleancache,
|
||||
the page cache, and disk. All hooks compile into nothingness if
|
||||
cleancache is config'ed off and turn into a function-pointer-
|
||||
compare-to-NULL if config'ed on but no backend claims the ops
|
||||
@ -184,15 +185,15 @@ or for real kernel-addressable RAM, it makes perfect sense for
|
||||
transcendent memory.
|
||||
|
||||
4) Why is non-shared cleancache "exclusive"? And where is the
|
||||
page "flushed" after a "get"? (Minchan Kim)
|
||||
page "invalidated" after a "get"? (Minchan Kim)
|
||||
|
||||
The main reason is to free up space in transcendent memory and
|
||||
to avoid unnecessary cleancache_flush calls. If you want inclusive,
|
||||
to avoid unnecessary cleancache_invalidate calls. If you want inclusive,
|
||||
the page can be "put" immediately following the "get". If
|
||||
put-after-get for inclusive becomes common, the interface could
|
||||
be easily extended to add a "get_no_flush" call.
|
||||
be easily extended to add a "get_no_invalidate" call.
|
||||
|
||||
The flush is done by the cleancache backend implementation.
|
||||
The invalidate is done by the cleancache backend implementation.
|
||||
|
||||
5) What's the performance impact?
|
||||
|
||||
@ -222,7 +223,7 @@ Some points for a filesystem to consider:
|
||||
as tmpfs should not enable cleancache)
|
||||
- To ensure coherency/correctness, the FS must ensure that all
|
||||
file removal or truncation operations either go through VFS or
|
||||
add hooks to do the equivalent cleancache "flush" operations
|
||||
add hooks to do the equivalent cleancache "invalidate" operations
|
||||
- To ensure coherency/correctness, either inode numbers must
|
||||
be unique across the lifetime of the on-disk file OR the
|
||||
FS must provide an "encode_fh" function.
|
||||
@ -243,11 +244,11 @@ If cleancache would use the inode virtual address instead of
|
||||
inode/filehandle, the pool id could be eliminated. But, this
|
||||
won't work because cleancache retains pagecache data pages
|
||||
persistently even when the inode has been pruned from the
|
||||
inode unused list, and only flushes the data page if the file
|
||||
inode unused list, and only invalidates the data page if the file
|
||||
gets removed/truncated. So if cleancache used the inode kva,
|
||||
there would be potential coherency issues if/when the inode
|
||||
kva is reused for a different file. Alternately, if cleancache
|
||||
flushed the pages when the inode kva was freed, much of the value
|
||||
invalidated the pages when the inode kva was freed, much of the value
|
||||
of cleancache would be lost because the cache of pages in cleanache
|
||||
is potentially much larger than the kernel pagecache and is most
|
||||
useful if the pages survive inode cache removal.
|
||||
|
Reference in New Issue
Block a user