2019-05-19 15:07:45 +03:00
# SPDX-License-Identifier: GPL-2.0-only
2005-04-17 02:20:36 +04:00
#
# File system configuration
#
menu "File systems"
2012-03-06 23:16:17 +04:00
# Use unaligned word dcache accesses
config DCACHE_WORD_ACCESS
bool
vfs: Add configuration parser helpers
Because the new API passes in key,value parameters, match_token() cannot be
used with it. Instead, provide three new helpers to aid with parsing:
(1) fs_parse(). This takes a parameter and a simple static description of
all the parameters and maps the key name to an ID. It returns 1 on a
match, 0 on no match if unknowns should be ignored and some other
negative error code on a parse error.
The parameter description includes a list of key names to IDs, desired
parameter types and a list of enumeration name -> ID mappings.
[!] Note that for the moment I've required that the key->ID mapping
array is expected to be sorted and unterminated. The size of the
array is noted in the fsconfig_parser struct. This allows me to use
bsearch(), but I'm not sure any performance gain is worth the hassle
of requiring people to keep the array sorted.
The parameter type array is sized according to the number of parameter
IDs and is indexed directly. The optional enum mapping array is an
unterminated, unsorted list and the size goes into the fsconfig_parser
struct.
The function can do some additional things:
(a) If it's not ambiguous and no value is given, the prefix "no" on
a key name is permitted to indicate that the parameter should
be considered negatory.
(b) If the desired type is a single simple integer, it will perform
an appropriate conversion and store the result in a union in
the parse result.
(c) If the desired type is an enumeration, {key ID, name} will be
looked up in the enumeration list and the matching value will
be stored in the parse result union.
(d) Optionally generate an error if the key is unrecognised.
This is called something like:
enum rdt_param {
Opt_cdp,
Opt_cdpl2,
Opt_mba_mpbs,
nr__rdt_params
};
const struct fs_parameter_spec rdt_param_specs[nr__rdt_params] = {
[Opt_cdp] = { fs_param_is_bool },
[Opt_cdpl2] = { fs_param_is_bool },
[Opt_mba_mpbs] = { fs_param_is_bool },
};
const const char *const rdt_param_keys[nr__rdt_params] = {
[Opt_cdp] = "cdp",
[Opt_cdpl2] = "cdpl2",
[Opt_mba_mpbs] = "mba_mbps",
};
const struct fs_parameter_description rdt_parser = {
.name = "rdt",
.nr_params = nr__rdt_params,
.keys = rdt_param_keys,
.specs = rdt_param_specs,
.no_source = true,
};
int rdt_parse_param(struct fs_context *fc,
struct fs_parameter *param)
{
struct fs_parse_result parse;
struct rdt_fs_context *ctx = rdt_fc2context(fc);
int ret;
ret = fs_parse(fc, &rdt_parser, param, &parse);
if (ret < 0)
return ret;
switch (parse.key) {
case Opt_cdp:
ctx->enable_cdpl3 = true;
return 0;
case Opt_cdpl2:
ctx->enable_cdpl2 = true;
return 0;
case Opt_mba_mpbs:
ctx->enable_mba_mbps = true;
return 0;
}
return -EINVAL;
}
(2) fs_lookup_param(). This takes a { dirfd, path, LOOKUP_EMPTY? } or
string value and performs an appropriate path lookup to convert it
into a path object, which it will then return.
If the desired type was a blockdev, the type of the looked up inode
will be checked to make sure it is one.
This can be used like:
enum foo_param {
Opt_source,
nr__foo_params
};
const struct fs_parameter_spec foo_param_specs[nr__foo_params] = {
[Opt_source] = { fs_param_is_blockdev },
};
const char *char foo_param_keys[nr__foo_params] = {
[Opt_source] = "source",
};
const struct constant_table foo_param_alt_keys[] = {
{ "device", Opt_source },
};
const struct fs_parameter_description foo_parser = {
.name = "foo",
.nr_params = nr__foo_params,
.nr_alt_keys = ARRAY_SIZE(foo_param_alt_keys),
.keys = foo_param_keys,
.alt_keys = foo_param_alt_keys,
.specs = foo_param_specs,
};
int foo_parse_param(struct fs_context *fc,
struct fs_parameter *param)
{
struct fs_parse_result parse;
struct foo_fs_context *ctx = foo_fc2context(fc);
int ret;
ret = fs_parse(fc, &foo_parser, param, &parse);
if (ret < 0)
return ret;
switch (parse.key) {
case Opt_source:
return fs_lookup_param(fc, &foo_parser, param,
&parse, &ctx->source);
default:
return -EINVAL;
}
}
(3) lookup_constant(). This takes a table of named constants and looks up
the given name within it. The table is expected to be sorted such
that bsearch() be used upon it.
Possibly I should require the table be terminated and just use a
for-loop to scan it instead of using bsearch() to reduce hassle.
Tables look something like:
static const struct constant_table bool_names[] = {
{ "0", false },
{ "1", true },
{ "false", false },
{ "no", false },
{ "true", true },
{ "yes", true },
};
and a lookup is done with something like:
b = lookup_constant(bool_names, param->string, -1);
Additionally, optional validation routines for the parameter description
are provided that can be enabled at compile time. A later patch will
invoke these when a filesystem is registered.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-02 02:07:24 +03:00
config VALIDATE_FS_PARSER
bool "Validate filesystem parameter description"
help
Enable this to perform validation of the parameter description for a
filesystem when it is registered.
2016-06-21 02:23:11 +03:00
config FS_IOMAP
bool
2023-08-01 20:22:01 +03:00
config BUFFER_HEAD
bool
2023-01-25 09:58:39 +03:00
# old blockdev_direct_IO implementation. Use iomap for new code instead
config LEGACY_DIRECT_IO
2023-08-01 20:22:01 +03:00
depends on BUFFER_HEAD
2023-01-25 09:58:39 +03:00
bool
2021-11-29 13:22:02 +03:00
if BLOCK
2008-10-20 22:28:45 +04:00
source "fs/ext2/Kconfig"
source "fs/ext4/Kconfig"
source "fs/jbd2/Kconfig"
2006-10-11 12:21:01 +04:00
2005-04-17 02:20:36 +04:00
config FS_MBCACHE
2006-10-11 12:20:56 +04:00
# Meta block cache for Extended Attributes (ext2/ext3/ext4)
2005-04-17 02:20:36 +04:00
tristate
2008-08-21 03:56:22 +04:00
default y if EXT2_FS=y && EXT2_FS_XATTR
2012-12-11 01:30:43 +04:00
default y if EXT4_FS=y
2015-06-18 17:52:29 +03:00
default m if EXT2_FS_XATTR || EXT4_FS
2005-04-17 02:20:36 +04:00
2009-01-22 10:22:31 +03:00
source "fs/reiserfs/Kconfig"
2009-01-22 10:24:27 +03:00
source "fs/jfs/Kconfig"
2005-04-17 02:20:36 +04:00
2009-06-17 02:33:56 +04:00
source "fs/xfs/Kconfig"
source "fs/gfs2/Kconfig"
source "fs/ocfs2/Kconfig"
source "fs/btrfs/Kconfig"
2009-08-08 11:09:46 +04:00
source "fs/nilfs2/Kconfig"
2015-03-04 04:06:55 +03:00
source "fs/f2fs/Kconfig"
2017-03-17 09:18:50 +03:00
source "fs/bcachefs/Kconfig"
fs: New zonefs file system
zonefs is a very simple file system exposing each zone of a zoned block
device as a file. Unlike a regular file system with zoned block device
support (e.g. f2fs), zonefs does not hide the sequential write
constraint of zoned block devices to the user. Files representing
sequential write zones of the device must be written sequentially
starting from the end of the file (append only writes).
As such, zonefs is in essence closer to a raw block device access
interface than to a full featured POSIX file system. The goal of zonefs
is to simplify the implementation of zoned block device support in
applications by replacing raw block device file accesses with a richer
file API, avoiding relying on direct block device file ioctls which may
be more obscure to developers. One example of this approach is the
implementation of LSM (log-structured merge) tree structures (such as
used in RocksDB and LevelDB) on zoned block devices by allowing SSTables
to be stored in a zone file similarly to a regular file system rather
than as a range of sectors of a zoned device. The introduction of the
higher level construct "one file is one zone" can help reducing the
amount of changes needed in the application as well as introducing
support for different application programming languages.
Zonefs on-disk metadata is reduced to an immutable super block to
persistently store a magic number and optional feature flags and
values. On mount, zonefs uses blkdev_report_zones() to obtain the device
zone configuration and populates the mount point with a static file tree
solely based on this information. E.g. file sizes come from the device
zone type and write pointer offset managed by the device itself.
The zone files created on mount have the following characteristics.
1) Files representing zones of the same type are grouped together
under a common sub-directory:
* For conventional zones, the sub-directory "cnv" is used.
* For sequential write zones, the sub-directory "seq" is used.
These two directories are the only directories that exist in zonefs.
Users cannot create other directories and cannot rename nor delete
the "cnv" and "seq" sub-directories.
2) The name of zone files is the number of the file within the zone
type sub-directory, in order of increasing zone start sector.
3) The size of conventional zone files is fixed to the device zone size.
Conventional zone files cannot be truncated.
4) The size of sequential zone files represent the file's zone write
pointer position relative to the zone start sector. Truncating these
files is allowed only down to 0, in which case, the zone is reset to
rewind the zone write pointer position to the start of the zone, or
up to the zone size, in which case the file's zone is transitioned
to the FULL state (finish zone operation).
5) All read and write operations to files are not allowed beyond the
file zone size. Any access exceeding the zone size is failed with
the -EFBIG error.
6) Creating, deleting, renaming or modifying any attribute of files and
sub-directories is not allowed.
7) There are no restrictions on the type of read and write operations
that can be issued to conventional zone files. Buffered, direct and
mmap read & write operations are accepted. For sequential zone files,
there are no restrictions on read operations, but all write
operations must be direct IO append writes. mmap write of sequential
files is not allowed.
Several optional features of zonefs can be enabled at format time.
* Conventional zone aggregation: ranges of contiguous conventional
zones can be aggregated into a single larger file instead of the
default one file per zone.
* File ownership: The owner UID and GID of zone files is by default 0
(root) but can be changed to any valid UID/GID.
* File access permissions: the default 640 access permissions can be
changed.
The mkzonefs tool is used to format zoned block devices for use with
zonefs. This tool is available on Github at:
git@github.com:damien-lemoal/zonefs-tools.git.
zonefs-tools also includes a test suite which can be run against any
zoned block device, including null_blk block device created with zoned
mode.
Example: the following formats a 15TB host-managed SMR HDD with 256 MB
zones with the conventional zones aggregation feature enabled.
$ sudo mkzonefs -o aggr_cnv /dev/sdX
$ sudo mount -t zonefs /dev/sdX /mnt
$ ls -l /mnt/
total 0
dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv
dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq
The size of the zone files sub-directories indicate the number of files
existing for each type of zones. In this example, there is only one
conventional zone file (all conventional zones are aggregated under a
single file).
$ ls -l /mnt/cnv
total 137101312
-rw-r----- 1 root root 140391743488 Nov 25 13:23 0
This aggregated conventional zone file can be used as a regular file.
$ sudo mkfs.ext4 /mnt/cnv/0
$ sudo mount -o loop /mnt/cnv/0 /data
The "seq" sub-directory grouping files for sequential write zones has
in this example 55356 zones.
$ ls -lv /mnt/seq
total 14511243264
-rw-r----- 1 root root 0 Nov 25 13:23 0
-rw-r----- 1 root root 0 Nov 25 13:23 1
-rw-r----- 1 root root 0 Nov 25 13:23 2
...
-rw-r----- 1 root root 0 Nov 25 13:23 55354
-rw-r----- 1 root root 0 Nov 25 13:23 55355
For sequential write zone files, the file size changes as data is
appended at the end of the file, similarly to any regular file system.
$ dd if=/dev/zero of=/mnt/seq/0 bs=4K count=1 conv=notrunc oflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000452219 s, 9.1 MB/s
$ ls -l /mnt/seq/0
-rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0
The written file can be truncated to the zone size, preventing any
further write operation.
$ truncate -s 268435456 /mnt/seq/0
$ ls -l /mnt/seq/0
-rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0
Truncation to 0 size allows freeing the file zone storage space and
restart append-writes to the file.
$ truncate -s 0 /mnt/seq/0
$ ls -l /mnt/seq/0
-rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0
Since files are statically mapped to zones on the disk, the number of
blocks of a file as reported by stat() and fstat() indicates the size
of the file zone.
$ stat /mnt/seq/0
File: /mnt/seq/0
Size: 0 Blocks: 524288 IO Block: 4096 regular empty file
Device: 870h/2160d Inode: 50431 Links: 1
Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2019-11-25 13:23:57.048971997 +0900
Modify: 2019-11-25 13:52:25.553805765 +0900
Change: 2019-11-25 13:52:25.553805765 +0900
Birth: -
The number of blocks of the file ("Blocks") in units of 512B blocks
gives the maximum file size of 524288 * 512 B = 256 MB, corresponding
to the device zone size in this example. Of note is that the "IO block"
field always indicates the minimum IO size for writes and corresponds
to the device physical sector size.
This code contains contributions from:
* Johannes Thumshirn <jthumshirn@suse.de>,
* Darrick J. Wong <darrick.wong@oracle.com>,
* Christoph Hellwig <hch@lst.de>,
* Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> and
* Ting Yao <tingyao@hust.edu.cn>.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-12-25 10:07:44 +03:00
source "fs/zonefs/Kconfig"
2009-06-17 02:33:56 +04:00
2021-11-29 13:22:03 +03:00
endif # BLOCK
2015-02-17 02:59:25 +03:00
config FS_DAX
2021-08-26 16:55:02 +03:00
bool "File system based Direct Access (DAX) support"
2015-02-17 02:59:25 +03:00
depends on MMU
2015-02-17 02:59:44 +03:00
depends on !(ARM || MIPS || SPARC)
2022-02-16 07:31:37 +03:00
depends on ZONE_DEVICE || FS_DAX_LIMITED
dax: fix build warnings with FS_DAX and !FS_IOMAP
As reported by Arnd:
https://lkml.org/lkml/2017/1/10/756
Compiling with the following configuration:
# CONFIG_EXT2_FS is not set
# CONFIG_EXT4_FS is not set
# CONFIG_XFS_FS is not set
# CONFIG_FS_IOMAP depends on the above filesystems, as is not set
CONFIG_FS_DAX=y
generates build warnings about unused functions in fs/dax.c:
fs/dax.c:878:12: warning: `dax_insert_mapping' defined but not used [-Wunused-function]
static int dax_insert_mapping(struct address_space *mapping,
^~~~~~~~~~~~~~~~~~
fs/dax.c:572:12: warning: `copy_user_dax' defined but not used [-Wunused-function]
static int copy_user_dax(struct block_device *bdev, sector_t sector, size_t size,
^~~~~~~~~~~~~
fs/dax.c:542:12: warning: `dax_load_hole' defined but not used [-Wunused-function]
static int dax_load_hole(struct address_space *mapping, void **entry,
^~~~~~~~~~~~~
fs/dax.c:312:14: warning: `grab_mapping_entry' defined but not used [-Wunused-function]
static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
^~~~~~~~~~~~~~~~~~
Now that the struct buffer_head based DAX fault paths and I/O path have
been removed we really depend on iomap support being present for DAX.
Make this explicit by selecting FS_IOMAP if we compile in DAX support.
This allows us to remove conditional selections of FS_IOMAP when FS_DAX
was present for ext2 and ext4, and to remove an #ifdef in fs/dax.c.
Link: http://lkml.kernel.org/r/1484087383-29478-1-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-25 02:17:51 +03:00
select FS_IOMAP
2017-05-08 20:55:27 +03:00
select DAX
2015-02-17 02:59:25 +03:00
help
Direct Access (DAX) can be used on memory-backed block devices.
If the block device supports DAX and the filesystem supports DAX,
then you can avoid using the pagecache to buffer I/Os. Turning
2021-08-26 16:55:02 +03:00
on this option will compile in support for DAX.
For a DAX device to support file system access it needs to have
struct pages. For the nfit based NVDIMMs this can be enabled
using the ndctl utility:
# ndctl create-namespace --force --reconfig=namespace0.0 \
--mode=fsdax --map=mem
See the 'create-namespace' man page for details on the overhead of
--map=mem:
https://docs.pmem.io/ndctl-user-guide/ndctl-man-pages/ndctl-create-namespace
For ndctl to work CONFIG_DEV_DAX needs to be enabled as well. For most
file systems DAX support needs to be manually enabled globally or
per-inode using a mount option as well. See the file documentation in
Documentation/filesystems/dax.rst for details.
2015-02-17 02:59:25 +03:00
If you do not have a block device that is capable of using this,
or if unsure, say N. Saying Y will increase the size of the kernel
by about 5kB.
2015-11-16 03:06:32 +03:00
config FS_DAX_PMD
bool
default FS_DAX
depends on FS_DAX
2016-01-16 03:57:01 +03:00
depends on ZONE_DEVICE
depends on TRANSPARENT_HUGEPAGE
2015-11-16 03:06:32 +03:00
2017-10-14 21:33:32 +03:00
# Selected by DAX drivers that do not expect filesystem DAX to support
# get_user_pages() of DAX mappings. I.e. "limited" indicates no support
# for fork() of processes with MAP_SHARED mappings or support for
# direct-I/O to a DAX mapping.
config FS_DAX_LIMITED
bool
2011-01-03 01:44:00 +03:00
# Posix ACL utility routines
#
# Note: Posix ACLs can be implemented without these helpers. Never use
# this symbol for ifdefs in core code.
#
config FS_POSIX_ACL
def_bool n
2010-10-27 01:22:32 +04:00
config EXPORTFS
2011-05-24 22:12:08 +04:00
tristate
2010-10-27 01:22:32 +04:00
2016-07-08 16:53:20 +03:00
config EXPORTFS_BLOCK_OPS
bool "Enable filesystem export operations for block IO"
help
This option enables the export operations for a filesystem to support
external block IO.
2008-08-06 17:12:22 +04:00
config FILE_LOCKING
2011-01-21 01:44:16 +03:00
bool "Enable POSIX file locking API" if EXPERT
2008-08-06 17:12:22 +04:00
default y
help
This option enables standard file locking support, required
for filesystems like NFS and for the flock() system
call. Disabling this option saves about 11k.
2015-05-16 02:26:10 +03:00
source "fs/crypto/Kconfig"
2019-07-22 19:26:21 +03:00
source "fs/verity/Kconfig"
2008-12-17 21:59:41 +03:00
source "fs/notify/Kconfig"
[PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 01:06:03 +04:00
2009-01-26 17:28:09 +03:00
source "fs/quota/Kconfig"
2005-04-17 02:20:36 +04:00
2018-06-08 03:11:31 +03:00
source "fs/autofs/Kconfig"
2009-01-22 10:33:25 +03:00
source "fs/fuse/Kconfig"
overlay filesystem
Overlayfs allows one, usually read-write, directory tree to be
overlaid onto another, read-only directory tree. All modifications
go to the upper, writable layer.
This type of mechanism is most often used for live CDs but there's a
wide variety of other uses.
The implementation differs from other "union filesystem"
implementations in that after a file is opened all operations go
directly to the underlying, lower or upper, filesystems. This
simplifies the implementation and allows native performance in these
cases.
The dentry tree is duplicated from the underlying filesystems, this
enables fast cached lookups without adding special support into the
VFS. This uses slightly more memory than union mounts, but dentries
are relatively small.
Currently inodes are duplicated as well, but it is a possible
optimization to share inodes for non-directories.
Opening non directories results in the open forwarded to the
underlying filesystem. This makes the behavior very similar to union
mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file
descriptors).
Usage:
mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper/upper,workdir=/upper/work /overlay
The following cotributions have been folded into this patch:
Neil Brown <neilb@suse.de>:
- minimal remount support
- use correct seek function for directories
- initialise is_real before use
- rename ovl_fill_cache to ovl_dir_read
Felix Fietkau <nbd@openwrt.org>:
- fix a deadlock in ovl_dir_read_merged
- fix a deadlock in ovl_remove_whiteouts
Erez Zadok <ezk@fsl.cs.sunysb.edu>
- fix cleanup after WARN_ON
Sedat Dilek <sedat.dilek@googlemail.com>
- fix up permission to confirm to new API
Robin Dong <hao.bigrat@gmail.com>
- fix possible leak in ovl_new_inode
- create new inode in ovl_link
Andy Whitcroft <apw@canonical.com>
- switch to __inode_permission()
- copy up i_uid/i_gid from the underlying inode
AV:
- ovl_copy_up_locked() - dput(ERR_PTR(...)) on two failure exits
- ovl_clear_empty() - one failure exit forgetting to do unlock_rename(),
lack of check for udir being the parent of upper, dropping and regaining
the lock on udir (which would require _another_ check for parent being
right).
- bogus d_drop() in copyup and rename [fix from your mail]
- copyup/remove and copyup/rename races [fix from your mail]
- ovl_dir_fsync() leaving ERR_PTR() in ->realfile
- ovl_entry_free() is pointless - it's just a kfree_rcu()
- fold ovl_do_lookup() into ovl_lookup()
- manually assigning ->d_op is wrong. Just use ->s_d_op.
[patches picked from Miklos]:
* copyup/remove and copyup/rename races
* bogus d_drop() in copyup and rename
Also thanks to the following people for testing and reporting bugs:
Jordi Pujol <jordipujolp@gmail.com>
Andy Whitcroft <apw@canonical.com>
Michal Suchanek <hramrach@centrum.cz>
Felix Fietkau <nbd@openwrt.org>
Erez Zadok <ezk@fsl.cs.sunysb.edu>
Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-10-24 02:14:38 +04:00
source "fs/overlayfs/Kconfig"
2005-09-10 00:10:22 +04:00
2009-04-03 19:42:36 +04:00
menu "Caches"
netfs: Provide readahead and readpage netfs helpers
Add a pair of helper functions:
(*) netfs_readahead()
(*) netfs_readpage()
to do the work of handling a readahead or a readpage, where the page(s)
that form part of the request may be split between the local cache, the
server or just require clearing, and may be single pages and transparent
huge pages. This is all handled within the helper.
Note that while both will read from the cache if there is data present,
only netfs_readahead() will expand the request beyond what it was asked to
do, and only netfs_readahead() will write back to the cache.
netfs_readpage(), on the other hand, is synchronous and only fetches the
page (which might be a THP) it is asked for.
The netfs gives the helper parameters from the VM, the cache cookie it
wants to use (or NULL) and a table of operations (only one of which is
mandatory):
(*) expand_readahead() [optional]
Called to allow the netfs to request an expansion of a readahead
request to meet its own alignment requirements. This is done by
changing rreq->start and rreq->len.
(*) clamp_length() [optional]
Called to allow the netfs to cut down a subrequest to meet its own
boundary requirements. If it does this, the helper will generate
additional subrequests until the full request is satisfied.
(*) is_still_valid() [optional]
Called to find out if the data just read from the cache has been
invalidated and must be reread from the server.
(*) issue_op() [required]
Called to ask the netfs to issue a read to the server. The subrequest
describes the read. The read request holds information about the file
being accessed.
The netfs can cache information in rreq->netfs_priv.
Upon completion, the netfs should set the error, transferred and can
also set FSCACHE_SREQ_CLEAR_TAIL and then call
fscache_subreq_terminated().
(*) done() [optional]
Called after the pages have been unlocked. The read request is still
pinning the file and mapping and may still be pinning pages with
PG_fscache. rreq->error indicates any error that has been
accumulated.
(*) cleanup() [optional]
Called when the helper is disposing of a finished read request. This
allows the netfs to clear rreq->netfs_priv.
Netfs support is enabled with CONFIG_NETFS_SUPPORT=y. It will be built
even if CONFIG_FSCACHE=n and in this case much of it should be optimised
away, allowing the filesystem to use it even when caching is disabled.
Changes:
v5:
- Comment why netfs_readahead() is putting pages[2].
- Use page_file_mapping() rather than page->mapping[2].
- Use page_index() rather than page->index[2].
- Use set_page_fscache()[3] rather then SetPageFsCache() as this takes an
appropriate ref too[4].
v4:
- Folded in a kerneldoc comment fix.
- Folded in a fix for the error handling in the case that ENOMEM occurs.
- Added flag to netfs_subreq_terminated() to indicate that the caller may
have been running async and stuff that might sleep needs punting to a
workqueue (can't use in_softirq()[1]).
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-mm@kvack.org
cc: linux-cachefs@redhat.com
cc: linux-afs@lists.infradead.org
cc: linux-nfs@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: v9fs-developer@lists.sourceforge.net
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [1]
Link: https://lore.kernel.org/r/20210321014202.GF3420@casper.infradead.org/ [2]
Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk/ [3]
Link: https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pUm3ww@mail.gmail.com/ [4]
Link: https://lore.kernel.org/r/160588497406.3465195.18003475695899726222.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118136849.1232039.8923686136144228724.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161032290.2537118.13400578415247339173.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340394873.1303470.6237319335883242536.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539537375.286939.16642940088716990995.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653795430.2770958.4947584573720000554.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789076581.6155.6745849361504760209.stgit@warthog.procyon.org.uk/ # v6
2020-05-13 19:41:20 +03:00
source "fs/netfs/Kconfig"
2009-04-03 19:42:36 +04:00
source "fs/fscache/Kconfig"
CacheFiles: A cache that backs onto a mounted filesystem
Add an FS-Cache cache-backend that permits a mounted filesystem to be used as a
backing store for the cache.
CacheFiles uses a userspace daemon to do some of the cache management - such as
reaping stale nodes and culling. This is called cachefilesd and lives in
/sbin. The source for the daemon can be downloaded from:
http://people.redhat.com/~dhowells/cachefs/cachefilesd.c
And an example configuration from:
http://people.redhat.com/~dhowells/cachefs/cachefilesd.conf
The filesystem and data integrity of the cache are only as good as those of the
filesystem providing the backing services. Note that CacheFiles does not
attempt to journal anything since the journalling interfaces of the various
filesystems are very specific in nature.
CacheFiles creates a misc character device - "/dev/cachefiles" - that is used
to communication with the daemon. Only one thing may have this open at once,
and whilst it is open, a cache is at least partially in existence. The daemon
opens this and sends commands down it to control the cache.
CacheFiles is currently limited to a single cache.
CacheFiles attempts to maintain at least a certain percentage of free space on
the filesystem, shrinking the cache by culling the objects it contains to make
space if necessary - see the "Cache Culling" section. This means it can be
placed on the same medium as a live set of data, and will expand to make use of
spare space and automatically contract when the set of data requires more
space.
============
REQUIREMENTS
============
The use of CacheFiles and its daemon requires the following features to be
available in the system and in the cache filesystem:
- dnotify.
- extended attributes (xattrs).
- openat() and friends.
- bmap() support on files in the filesystem (FIBMAP ioctl).
- The use of bmap() to detect a partial page at the end of the file.
It is strongly recommended that the "dir_index" option is enabled on Ext3
filesystems being used as a cache.
=============
CONFIGURATION
=============
The cache is configured by a script in /etc/cachefilesd.conf. These commands
set up cache ready for use. The following script commands are available:
(*) brun <N>%
(*) bcull <N>%
(*) bstop <N>%
(*) frun <N>%
(*) fcull <N>%
(*) fstop <N>%
Configure the culling limits. Optional. See the section on culling
The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
The commands beginning with a 'b' are file space (block) limits, those
beginning with an 'f' are file count limits.
(*) dir <path>
Specify the directory containing the root of the cache. Mandatory.
(*) tag <name>
Specify a tag to FS-Cache to use in distinguishing multiple caches.
Optional. The default is "CacheFiles".
(*) debug <mask>
Specify a numeric bitmask to control debugging in the kernel module.
Optional. The default is zero (all off). The following values can be
OR'd into the mask to collect various information:
1 Turn on trace of function entry (_enter() macros)
2 Turn on trace of function exit (_leave() macros)
4 Turn on trace of internal debug points (_debug())
This mask can also be set through sysfs, eg:
echo 5 >/sys/modules/cachefiles/parameters/debug
==================
STARTING THE CACHE
==================
The cache is started by running the daemon. The daemon opens the cache device,
configures the cache and tells it to begin caching. At that point the cache
binds to fscache and the cache becomes live.
The daemon is run as follows:
/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
The flags are:
(*) -d
Increase the debugging level. This can be specified multiple times and
is cumulative with itself.
(*) -s
Send messages to stderr instead of syslog.
(*) -n
Don't daemonise and go into background.
(*) -f <configfile>
Use an alternative configuration file rather than the default one.
===============
THINGS TO AVOID
===============
Do not mount other things within the cache as this will cause problems. The
kernel module contains its own very cut-down path walking facility that ignores
mountpoints, but the daemon can't avoid them.
Do not create, rename or unlink files and directories in the cache whilst the
cache is active, as this may cause the state to become uncertain.
Renaming files in the cache might make objects appear to be other objects (the
filename is part of the lookup key).
Do not change or remove the extended attributes attached to cache files by the
cache as this will cause the cache state management to get confused.
Do not create files or directories in the cache, lest the cache get confused or
serve incorrect data.
Do not chmod files in the cache. The module creates things with minimal
permissions to prevent random users being able to access them directly.
=============
CACHE CULLING
=============
The cache may need culling occasionally to make space. This involves
discarding objects from the cache that have been used less recently than
anything else. Culling is based on the access time of data objects. Empty
directories are culled if not in use.
Cache culling is done on the basis of the percentage of blocks and the
percentage of files available in the underlying filesystem. There are six
"limits":
(*) brun
(*) frun
If the amount of free space and the number of available files in the cache
rises above both these limits, then culling is turned off.
(*) bcull
(*) fcull
If the amount of available space or the number of available files in the
cache falls below either of these limits, then culling is started.
(*) bstop
(*) fstop
If the amount of available space or the number of available files in the
cache falls below either of these limits, then no further allocation of
disk space or files is permitted until culling has raised things above
these limits again.
These must be configured thusly:
0 <= bstop < bcull < brun < 100
0 <= fstop < fcull < frun < 100
Note that these are percentages of available space and available files, and do
_not_ appear as 100 minus the percentage displayed by the "df" program.
The userspace daemon scans the cache to build up a table of cullable objects.
These are then culled in least recently used order. A new scan of the cache is
started as soon as space is made in the table. Objects will be skipped if
their atimes have changed or if the kernel module says it is still using them.
===============
CACHE STRUCTURE
===============
The CacheFiles module will create two directories in the directory it was
given:
(*) cache/
(*) graveyard/
The active cache objects all reside in the first directory. The CacheFiles
kernel module moves any retired or culled objects that it can't simply unlink
to the graveyard from which the daemon will actually delete them.
The daemon uses dnotify to monitor the graveyard directory, and will delete
anything that appears therein.
The module represents index objects as directories with the filename "I..." or
"J...". Note that the "cache/" directory is itself a special index.
Data objects are represented as files if they have no children, or directories
if they do. Their filenames all begin "D..." or "E...". If represented as a
directory, data objects will have a file in the directory called "data" that
actually holds the data.
Special objects are similar to data objects, except their filenames begin
"S..." or "T...".
If an object has children, then it will be represented as a directory.
Immediately in the representative directory are a collection of directories
named for hash values of the child object keys with an '@' prepended. Into
this directory, if possible, will be placed the representations of the child
objects:
INDEX INDEX INDEX DATA FILES
========= ========== ================================= ================
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry
If the key is so long that it exceeds NAME_MAX with the decorations added on to
it, then it will be cut into pieces, the first few of which will be used to
make a nest of directories, and the last one of which will be the objects
inside the last directory. The names of the intermediate directories will have
'+' prepended:
J1223/@23/+xy...z/+kl...m/Epqr
Note that keys are raw data, and not only may they exceed NAME_MAX in size,
they may also contain things like '/' and NUL characters, and so they may not
be suitable for turning directly into a filename.
To handle this, CacheFiles will use a suitably printable filename directly and
"base-64" encode ones that aren't directly suitable. The two versions of
object filenames indicate the encoding:
OBJECT TYPE PRINTABLE ENCODED
=============== =============== ===============
Index "I..." "J..."
Data "D..." "E..."
Special "S..." "T..."
Intermediate directories are always "@" or "+" as appropriate.
Each object in the cache has an extended attribute label that holds the object
type ID (required to distinguish special objects) and the auxiliary data from
the netfs. The latter is used to detect stale objects in the cache and update
or retire them.
Note that CacheFiles will erase from the cache any file it doesn't recognise or
any file of an incorrect type (such as a FIFO file or a device file).
==========================
SECURITY MODEL AND SELINUX
==========================
CacheFiles is implemented to deal properly with the LSM security features of
the Linux kernel and the SELinux facility.
One of the problems that CacheFiles faces is that it is generally acting on
behalf of a process, and running in that process's context, and that includes a
security context that is not appropriate for accessing the cache - either
because the files in the cache are inaccessible to that process, or because if
the process creates a file in the cache, that file may be inaccessible to other
processes.
The way CacheFiles works is to temporarily change the security context (fsuid,
fsgid and actor security label) that the process acts as - without changing the
security context of the process when it the target of an operation performed by
some other process (so signalling and suchlike still work correctly).
When the CacheFiles module is asked to bind to its cache, it:
(1) Finds the security label attached to the root cache directory and uses
that as the security label with which it will create files. By default,
this is:
cachefiles_var_t
(2) Finds the security label of the process which issued the bind request
(presumed to be the cachefilesd daemon), which by default will be:
cachefilesd_t
and asks LSM to supply a security ID as which it should act given the
daemon's label. By default, this will be:
cachefiles_kernel_t
SELinux transitions the daemon's security ID to the module's security ID
based on a rule of this form in the policy.
type_transition <daemon's-ID> kernel_t : process <module's-ID>;
For instance:
type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
The module's security ID gives it permission to create, move and remove files
and directories in the cache, to find and access directories and files in the
cache, to set and access extended attributes on cache objects, and to read and
write files in the cache.
The daemon's security ID gives it only a very restricted set of permissions: it
may scan directories, stat files and erase files and directories. It may
not read or write files in the cache, and so it is precluded from accessing the
data cached therein; nor is it permitted to create new files in the cache.
There are policy source files available in:
http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
and later versions. In that tarball, see the files:
cachefilesd.te
cachefilesd.fc
cachefilesd.if
They are built and installed directly by the RPM.
If a non-RPM based system is being used, then copy the above files to their own
directory and run:
make -f /usr/share/selinux/devel/Makefile
semodule -i cachefilesd.pp
You will need checkpolicy and selinux-policy-devel installed prior to the
build.
By default, the cache is located in /var/fscache, but if it is desirable that
it should be elsewhere, than either the above policy files must be altered, or
an auxiliary policy must be installed to label the alternate location of the
cache.
For instructions on how to add an auxiliary policy to enable the cache to be
located elsewhere when SELinux is in enforcing mode, please see:
/usr/share/doc/cachefilesd-*/move-cache.txt
When the cachefilesd rpm is installed; alternatively, the document can be found
in the sources.
==================
A NOTE ON SECURITY
==================
CacheFiles makes use of the split security in the task_struct. It allocates
its own task_security structure, and redirects current->act_as to point to it
when it acts on behalf of another process, in that process's context.
The reason it does this is that it calls vfs_mkdir() and suchlike rather than
bypassing security and calling inode ops directly. Therefore the VFS and LSM
may deny the CacheFiles access to the cache data because under some
circumstances the caching code is running in the security context of whatever
process issued the original syscall on the netfs.
Furthermore, should CacheFiles create a file or directory, the security
parameters with that object is created (UID, GID, security label) would be
derived from that process that issued the system call, thus potentially
preventing other processes from accessing the cache - including CacheFiles's
cache management daemon (cachefilesd).
What is required is to temporarily override the security of the process that
issued the system call. We can't, however, just do an in-place change of the
security data as that affects the process as an object, not just as a subject.
This means it may lose signals or ptrace events for example, and affects what
the process looks like in /proc.
So CacheFiles makes use of a logical split in the security between the
objective security (task->sec) and the subjective security (task->act_as). The
objective security holds the intrinsic security properties of a process and is
never overridden. This is what appears in /proc, and is what is used when a
process is the target of an operation by some other process (SIGKILL for
example).
The subjective security holds the active security properties of a process, and
may be overridden. This is not seen externally, and is used whan a process
acts upon another object, for example SIGKILLing another process or opening a
file.
LSM hooks exist that allow SELinux (or Smack or whatever) to reject a request
for CacheFiles to run in a context of a specific security label, or to create
files and directories with another security label.
This documentation is added by the patch to:
Documentation/filesystems/caching/cachefiles.txt
Signed-Off-By: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 19:42:41 +04:00
source "fs/cachefiles/Kconfig"
2009-04-03 19:42:36 +04:00
endmenu
[PATCH] BLOCK: Make it possible to disable the block layer [try #6]
Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.
This patch does the following:
(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.
(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:
(*) Block I/O tracing.
(*) Disk partition code.
(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.
(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.
(*) MTD blockdev handling and FTL.
(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.
(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.
(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.
(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.
(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:
(*) Default blockdev file operations (to give error ENODEV on opening).
(*) Makes some /proc changes:
(*) /proc/devices does not list any blockdevs.
(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.
(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.
(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).
(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 22:45:40 +04:00
if BLOCK
2005-04-17 02:20:36 +04:00
menu "CD-ROM/DVD Filesystems"
2009-01-22 10:35:21 +03:00
source "fs/isofs/Kconfig"
source "fs/udf/Kconfig"
2005-04-17 02:20:36 +04:00
endmenu
2008-02-07 11:15:16 +03:00
endif # BLOCK
2005-04-17 02:20:36 +04:00
[PATCH] BLOCK: Make it possible to disable the block layer [try #6]
Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.
This patch does the following:
(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.
(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:
(*) Block I/O tracing.
(*) Disk partition code.
(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.
(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.
(*) MTD blockdev handling and FTL.
(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.
(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.
(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.
(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.
(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:
(*) Default blockdev file operations (to give error ENODEV on opening).
(*) Makes some /proc changes:
(*) /proc/devices does not list any blockdevs.
(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.
(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.
(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).
(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 22:45:40 +04:00
if BLOCK
2020-03-02 09:21:42 +03:00
menu "DOS/FAT/EXFAT/NT Filesystems"
2005-04-17 02:20:36 +04:00
2009-01-22 10:37:59 +03:00
source "fs/fat/Kconfig"
2020-03-02 09:21:42 +03:00
source "fs/exfat/Kconfig"
2009-01-22 10:39:20 +03:00
source "fs/ntfs/Kconfig"
2021-08-13 17:21:30 +03:00
source "fs/ntfs3/Kconfig"
2005-04-17 02:20:36 +04:00
endmenu
2008-02-07 11:15:16 +03:00
endif # BLOCK
2005-04-17 02:20:36 +04:00
menu "Pseudo filesystems"
2008-07-25 12:48:30 +04:00
source "fs/proc/Kconfig"
2014-02-03 23:09:17 +04:00
source "fs/kernfs/Kconfig"
2009-01-22 10:40:58 +03:00
source "fs/sysfs/Kconfig"
2005-04-17 02:20:36 +04:00
config TMPFS
2011-11-01 04:07:21 +04:00
bool "Tmpfs virtual memory file system support (former shm fs)"
2009-09-22 04:03:37 +04:00
depends on SHMEM
2023-06-30 12:08:53 +03:00
select MEMFD_CREATE
2005-04-17 02:20:36 +04:00
help
Tmpfs is a file system which keeps all files in virtual memory.
Everything in tmpfs is temporary in the sense that no files will be
created on your hard drive. The files live in memory and swap
space. If you unmount a tmpfs instance, everything stored therein is
lost.
2020-04-14 19:48:37 +03:00
See <file:Documentation/filesystems/tmpfs.rst> for details.
2005-04-17 02:20:36 +04:00
2011-05-26 06:49:18 +04:00
config TMPFS_POSIX_ACL
bool "Tmpfs POSIX Access Control Lists"
depends on TMPFS
select TMPFS_XATTR
2013-12-20 17:16:54 +04:00
select FS_POSIX_ACL
2011-05-26 06:49:18 +04:00
help
2011-08-04 03:21:29 +04:00
POSIX Access Control Lists (ACLs) support additional access rights
for users and groups beyond the standard owner/group/world scheme,
and this option selects support for ACLs specifically for tmpfs
filesystems.
If you've selected TMPFS, it's possible that you'll also need
this option as there are a number of Linux distros that require
POSIX ACL support under /dev for certain features to work properly.
For example, some distros need this feature for ALSA-related /dev
files for sound to work properly. In short, if you're not sure,
say Y.
2011-05-26 06:49:18 +04:00
tmpfs: implement generic xattr support
Implement generic xattrs for tmpfs filesystems. The Feodra project, while
trying to replace suid apps with file capabilities, realized that tmpfs,
which is used on the build systems, does not support file capabilities and
thus cannot be used to build packages which use file capabilities. Xattrs
are also needed for overlayfs.
The xattr interface is a bit odd. If a filesystem does not implement any
{get,set,list}xattr functions the VFS will call into some random LSM hooks
and the running LSM can then implement some method for handling xattrs.
SELinux for example provides a method to support security.selinux but no
other security.* xattrs.
As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
xattr handler routines specifically to handle acls. Because of this tmpfs
would loose the VFS/LSM helpers to support the running LSM. To make up
for that tmpfs had stub functions that did nothing but call into the LSM
hooks which implement the helpers.
This new patch does not use the LSM fallback functions and instead just
implements a native get/set/list xattr feature for the full security.* and
trusted.* namespace like a normal filesystem. This means that tmpfs can
now support both security.selinux and security.capability, which was not
previously possible.
The basic implementation is that I attach a:
struct shmem_xattr {
struct list_head list; /* anchored by shmem_inode_info->xattr_list */
char *name;
size_t size;
char value[0];
};
Into the struct shmem_inode_info for each xattr that is set. This
implementation could easily support the user.* namespace as well, except
some care needs to be taken to prevent large amounts of unswappable memory
being allocated for unprivileged users.
[mszeredi@suse.cz: new config option, suport trusted.*, support symlinks]
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Hugh Dickins <hughd@google.com>
Tested-by: Jordi Pujol <jordipujolp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 04:12:39 +04:00
config TMPFS_XATTR
bool "Tmpfs extended attributes"
depends on TMPFS
default n
help
Extended attributes are name:value pairs associated with inodes by
2017-12-20 16:58:52 +03:00
the kernel or by users (see the attr(5) manual page for details).
tmpfs: implement generic xattr support
Implement generic xattrs for tmpfs filesystems. The Feodra project, while
trying to replace suid apps with file capabilities, realized that tmpfs,
which is used on the build systems, does not support file capabilities and
thus cannot be used to build packages which use file capabilities. Xattrs
are also needed for overlayfs.
The xattr interface is a bit odd. If a filesystem does not implement any
{get,set,list}xattr functions the VFS will call into some random LSM hooks
and the running LSM can then implement some method for handling xattrs.
SELinux for example provides a method to support security.selinux but no
other security.* xattrs.
As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
xattr handler routines specifically to handle acls. Because of this tmpfs
would loose the VFS/LSM helpers to support the running LSM. To make up
for that tmpfs had stub functions that did nothing but call into the LSM
hooks which implement the helpers.
This new patch does not use the LSM fallback functions and instead just
implements a native get/set/list xattr feature for the full security.* and
trusted.* namespace like a normal filesystem. This means that tmpfs can
now support both security.selinux and security.capability, which was not
previously possible.
The basic implementation is that I attach a:
struct shmem_xattr {
struct list_head list; /* anchored by shmem_inode_info->xattr_list */
char *name;
size_t size;
char value[0];
};
Into the struct shmem_inode_info for each xattr that is set. This
implementation could easily support the user.* namespace as well, except
some care needs to be taken to prevent large amounts of unswappable memory
being allocated for unprivileged users.
[mszeredi@suse.cz: new config option, suport trusted.*, support symlinks]
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Hugh Dickins <hughd@google.com>
Tested-by: Jordi Pujol <jordipujolp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 04:12:39 +04:00
2023-08-09 07:33:56 +03:00
This enables support for the trusted.*, security.* and user.*
namespaces.
tmpfs: implement generic xattr support
Implement generic xattrs for tmpfs filesystems. The Feodra project, while
trying to replace suid apps with file capabilities, realized that tmpfs,
which is used on the build systems, does not support file capabilities and
thus cannot be used to build packages which use file capabilities. Xattrs
are also needed for overlayfs.
The xattr interface is a bit odd. If a filesystem does not implement any
{get,set,list}xattr functions the VFS will call into some random LSM hooks
and the running LSM can then implement some method for handling xattrs.
SELinux for example provides a method to support security.selinux but no
other security.* xattrs.
As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
xattr handler routines specifically to handle acls. Because of this tmpfs
would loose the VFS/LSM helpers to support the running LSM. To make up
for that tmpfs had stub functions that did nothing but call into the LSM
hooks which implement the helpers.
This new patch does not use the LSM fallback functions and instead just
implements a native get/set/list xattr feature for the full security.* and
trusted.* namespace like a normal filesystem. This means that tmpfs can
now support both security.selinux and security.capability, which was not
previously possible.
The basic implementation is that I attach a:
struct shmem_xattr {
struct list_head list; /* anchored by shmem_inode_info->xattr_list */
char *name;
size_t size;
char value[0];
};
Into the struct shmem_inode_info for each xattr that is set. This
implementation could easily support the user.* namespace as well, except
some care needs to be taken to prevent large amounts of unswappable memory
being allocated for unprivileged users.
[mszeredi@suse.cz: new config option, suport trusted.*, support symlinks]
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Hugh Dickins <hughd@google.com>
Tested-by: Jordi Pujol <jordipujolp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 04:12:39 +04:00
You need this for POSIX ACL support on tmpfs.
2011-05-26 06:49:18 +04:00
If unsure, say N.
2006-09-29 13:01:35 +04:00
2020-08-07 09:20:25 +03:00
config TMPFS_INODE64
bool "Use 64-bit ino_t by default in tmpfs"
2021-03-01 23:02:49 +03:00
depends on TMPFS && 64BIT
2020-08-07 09:20:25 +03:00
default n
help
tmpfs has historically used only inode numbers as wide as an unsigned
int. In some cases this can cause wraparound, potentially resulting
in multiple files with the same inode number on a single device. This
option makes tmpfs use the full width of ino_t by default, without
needing to specify the inode64 option when mounting.
But if a long-lived tmpfs is to be accessed by 32-bit applications so
ancient that opening a file larger than 2GiB fails with EINVAL, then
the INODE64 config option and inode64 mount option risk operations
failing with EOVERFLOW once 33-bit inode numbers are reached.
To override this configured default, use the inode32 or inode64
option when mounting.
If unsure, say N.
2023-07-25 17:45:07 +03:00
config TMPFS_QUOTA
bool "Tmpfs quota support"
depends on TMPFS
select QUOTA
help
Quota support allows to set per user and group limits for tmpfs
usage. Say Y to enable quota support. Once enabled you can control
user and group quota enforcement with quota, usrquota and grpquota
mount options.
If unsure, say N.
2021-05-05 04:38:13 +03:00
config ARCH_SUPPORTS_HUGETLBFS
def_bool n
2005-04-17 02:20:36 +04:00
config HUGETLBFS
bool "HugeTLB file system support"
2021-09-08 18:45:06 +03:00
depends on X86 || IA64 || SPARC64 || ARCH_SUPPORTS_HUGETLBFS || BROKEN
2022-09-01 15:00:30 +03:00
depends on (SYSFS || SYSCTL)
2023-06-30 12:08:53 +03:00
select MEMFD_CREATE
2006-04-19 09:20:57 +04:00
help
hugetlbfs is a filesystem backing for HugeTLB pages, based on
ramfs. For architectures that support it, say Y here and read
2018-04-18 11:07:49 +03:00
<file:Documentation/admin-guide/mm/hugetlbpage.rst> for details.
2006-04-19 09:20:57 +04:00
If unsure, say N.
2005-04-17 02:20:36 +04:00
config HUGETLB_PAGE
def_bool HUGETLBFS
2022-04-29 09:16:15 +03:00
config HUGETLB_PAGE_OPTIMIZE_VMEMMAP
2021-07-01 04:47:04 +03:00
def_bool HUGETLB_PAGE
2023-07-24 22:07:53 +03:00
depends on ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
2021-07-01 04:47:04 +03:00
depends on SPARSEMEM_VMEMMAP
2022-04-29 09:16:15 +03:00
config HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON
2022-06-28 12:22:30 +03:00
bool "HugeTLB Vmemmap Optimization (HVO) defaults to on"
2021-07-01 04:48:28 +03:00
default n
2022-04-29 09:16:15 +03:00
depends on HUGETLB_PAGE_OPTIMIZE_VMEMMAP
2021-07-01 04:48:28 +03:00
help
2022-06-28 12:22:30 +03:00
The HugeTLB VmemmapvOptimization (HVO) defaults to off. Say Y here to
enable HVO by default. It can be disabled via hugetlb_free_vmemmap=off
(boot command line) or hugetlb_optimize_vmemmap (sysctl).
2021-07-01 04:48:28 +03:00
2016-10-08 03:01:46 +03:00
config ARCH_HAS_GIGANTIC_PAGE
bool
2009-01-22 10:42:52 +03:00
source "fs/configfs/Kconfig"
2014-12-18 20:50:49 +03:00
source "fs/efivarfs/Kconfig"
2005-12-16 01:29:43 +03:00
2005-04-17 02:20:36 +04:00
endmenu
2009-01-07 01:40:57 +03:00
menuconfig MISC_FILESYSTEMS
bool "Miscellaneous filesystems"
default y
2020-06-13 19:50:22 +03:00
help
2009-01-07 01:40:57 +03:00
Say Y here to get to see options for various miscellaneous
filesystems, such as filesystems that came from other
operating systems.
This option alone does not add any kernel code.
If you say N, all options in this submenu will be skipped and
disabled; if unsure, say Y here.
if MISC_FILESYSTEMS
2005-04-17 02:20:36 +04:00
2015-07-17 17:38:17 +03:00
source "fs/orangefs/Kconfig"
2009-01-22 10:48:46 +03:00
source "fs/adfs/Kconfig"
2009-01-22 10:49:44 +03:00
source "fs/affs/Kconfig"
2009-01-22 10:50:50 +03:00
source "fs/ecryptfs/Kconfig"
2009-01-22 10:53:24 +03:00
source "fs/hfs/Kconfig"
source "fs/hfsplus/Kconfig"
2009-01-22 10:54:16 +03:00
source "fs/befs/Kconfig"
2009-01-22 10:55:13 +03:00
source "fs/bfs/Kconfig"
2009-01-22 10:56:07 +03:00
source "fs/efs/Kconfig"
2008-08-29 07:19:50 +04:00
source "fs/jffs2/Kconfig"
2008-07-14 20:08:38 +04:00
# UBIFS File system configuration
source "fs/ubifs/Kconfig"
2009-01-22 10:56:54 +03:00
source "fs/cramfs/Kconfig"
2009-01-22 10:57:46 +03:00
source "fs/squashfs/Kconfig"
2009-01-22 10:58:51 +03:00
source "fs/freevxfs/Kconfig"
2009-01-22 10:59:49 +03:00
source "fs/minix/Kconfig"
2009-01-22 11:00:41 +03:00
source "fs/omfs/Kconfig"
2009-01-22 11:01:26 +03:00
source "fs/hpfs/Kconfig"
2009-01-22 11:02:21 +03:00
source "fs/qnx4/Kconfig"
2012-02-17 08:59:20 +04:00
source "fs/qnx6/Kconfig"
2009-01-22 11:03:34 +03:00
source "fs/romfs/Kconfig"
2010-12-29 01:25:21 +03:00
source "fs/pstore/Kconfig"
2009-01-22 11:04:23 +03:00
source "fs/sysv/Kconfig"
2009-01-22 11:05:02 +03:00
source "fs/ufs/Kconfig"
2019-08-23 00:36:59 +03:00
source "fs/erofs/Kconfig"
2019-12-12 17:09:14 +03:00
source "fs/vboxsf/Kconfig"
2009-04-07 06:01:41 +04:00
2009-01-07 01:40:57 +03:00
endif # MISC_FILESYSTEMS
2005-04-17 02:20:36 +04:00
2007-10-17 10:30:16 +04:00
menuconfig NETWORK_FILESYSTEMS
bool "Network File Systems"
default y
2005-04-17 02:20:36 +04:00
depends on NET
2020-06-13 19:50:22 +03:00
help
2007-10-17 10:30:16 +04:00
Say Y here to get to see options for network filesystems and
filesystem-related networking code, such as NFS daemon and
RPCSEC security modules.
2008-05-22 01:09:04 +04:00
2007-10-17 10:30:16 +04:00
This option alone does not add any kernel code.
If you say N, all options in this submenu will be skipped and
disabled; if unsure, say Y here.
if NETWORK_FILESYSTEMS
2005-04-17 02:20:36 +04:00
2009-01-22 11:07:41 +03:00
source "fs/nfs/Kconfig"
2009-01-22 11:08:58 +03:00
source "fs/nfsd/Kconfig"
2005-04-17 02:20:36 +04:00
2014-09-13 00:40:20 +04:00
config GRACE_PERIOD
tristate
2005-04-17 02:20:36 +04:00
config LOCKD
tristate
2009-05-13 00:28:09 +04:00
depends on FILE_LOCKING
2014-09-13 00:40:20 +04:00
select GRACE_PERIOD
2005-04-17 02:20:36 +04:00
config LOCKD_V4
bool
2022-02-06 20:25:47 +03:00
depends on NFSD || NFS_V3
2009-05-13 00:28:09 +04:00
depends on FILE_LOCKING
2005-04-17 02:20:36 +04:00
default y
2005-06-22 21:16:26 +04:00
config NFS_ACL_SUPPORT
tristate
select FS_POSIX_ACL
config NFS_COMMON
bool
2014-09-13 00:40:20 +04:00
depends on NFSD || NFS_FS || LOCKD
2005-06-22 21:16:26 +04:00
default y
2021-01-28 09:42:26 +03:00
config NFS_V4_2_SSC_HELPER
2021-04-22 10:37:49 +03:00
bool
default y if NFS_V4_2
2021-01-28 09:42:26 +03:00
2009-01-22 11:11:56 +03:00
source "net/sunrpc/Kconfig"
2009-10-06 22:31:15 +04:00
source "fs/ceph/Kconfig"
2021-08-19 13:34:59 +03:00
2023-05-22 04:46:30 +03:00
source "fs/smb/Kconfig"
2009-01-22 11:15:06 +03:00
source "fs/coda/Kconfig"
2009-01-22 11:16:02 +03:00
source "fs/afs/Kconfig"
2009-01-22 11:16:42 +03:00
source "fs/9p/Kconfig"
[PATCH] v9fs: Documentation, Makefiles, Configuration
OVERVIEW
V9FS is a distributed file system for Linux which provides an
implementation of the Plan 9 resource sharing protocol 9P. It can be
used to share all sorts of resources: static files, synthetic file servers
(such as /proc or /sys), devices, and application file servers (such as
FUSE).
BACKGROUND
Plan 9 (http://plan9.bell-labs.com/plan9) is a research operating
system and associated applications suite developed by the Computing
Science Research Center of AT&T Bell Laboratories (now a part of
Lucent Technologies), the same group that developed UNIX , C, and C++.
Plan 9 was initially released in 1993 to universities, and then made
generally available in 1995. Its core operating systems code laid the
foundation for the Inferno Operating System released as a product by
Lucent Bell-Labs in 1997. The Inferno venture was the only commercial
embodiment of Plan 9 and is currently maintained as a product by Vita
Nuova (http://www.vitanuova.com). After updated releases in 2000 and
2002, Plan 9 was open-sourced under the OSI approved Lucent Public
License in 2003.
The Plan 9 project was started by Ken Thompson and Rob Pike in 1985.
Their intent was to explore potential solutions to some of the
shortcomings of UNIX in the face of the widespread use of high-speed
networks to connect machines. In UNIX, networking was an afterthought
and UNIX clusters became little more than a network of stand-alone
systems. Plan 9 was designed from first principles as a seamless
distributed system with integrated secure network resource sharing.
Applications and services were architected in such a way as to allow
for implicit distribution across a cluster of systems. Configuring an
environment to use remote application components or services in place
of their local equivalent could be achieved with a few simple command
line instructions. For the most part, application implementations
operated independent of the location of their actual resources.
Commercial operating systems haven't changed much in the 20 years
since Plan 9 was conceived. Network and distributed systems support is
provided by a patchwork of middle-ware, with an endless number of
packages supplying pieces of the puzzle. Matters are complicated by
the use of different complicated protocols for individual services,
and separate implementations for kernel and application resources.
The V9FS project (http://v9fs.sourceforge.net) is an attempt to bring
Plan 9's unified approach to resource sharing to Linux and other
operating systems via support for the 9P2000 resource sharing
protocol.
V9FS HISTORY
V9FS was originally developed by Ron Minnich and Maya Gokhale at Los
Alamos National Labs (LANL) in 1997. In November of 2001, Greg Watson
setup a SourceForge project as a public repository for the code which
supported the Linux 2.4 kernel.
About a year ago, I picked up the initial attempt Ron Minnich had
made to provide 2.6 support and got the code integrated into a 2.6.5
kernel. I then went through a line-for-line re-write attempting to
clean-up the code while more closely following the Linux Kernel style
guidelines. I co-authored a paper with Ron Minnich on the V9FS Linux
support including performance comparisons to NFSv3 using Bonnie and
PostMark - this paper appeared at the USENIX/FREENIX 2005
conference in April 2005:
( http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html ).
CALL FOR PARTICIPATION/REQUEST FOR COMMENTS
Our 2.6 kernel support is stabilizing and we'd like to begin pursuing
its integration into the official kernel tree. We would appreciate any
review, comments, critiques, and additions from this community and are
actively seeking people to join our project and help us produce
something that would be acceptable and useful to the Linux community.
STATUS
The code is reasonably stable, although there are no doubt corner cases
our regression tests haven't discovered yet. It is in regular use by several
of the developers and has been tested on x86 and PowerPC
(32-bit and 64-bit) in both small and large (LANL cluster) deployments.
Our current regression tests include fsx, bonnie, and postmark.
It was our intention to keep things as simple as possible for this
release -- trying to focus on correctness within the core of the
protocol support versus a rich set of features. For example: a more
complete security model and cache layer are in the road map, but
excluded from this release. Additionally, we have removed support for
mmap operations at Al Viro's request.
PERFORMANCE
Detailed performance numbers and analysis are included in the FREENIX
paper, but we show comparable performance to NFSv3 for large file
operations based on the Bonnie benchmark, and superior performance for
many small file operations based on the PostMark benchmark. Somewhat
preliminary graphs (from the FREENIX paper) are available
(http://v9fs.sourceforge.net/perf/index.html).
RESOURCES
The source code is available in a few different forms:
tarballs: http://v9fs.sf.net
CVSweb: http://cvs.sourceforge.net/viewcvs.py/v9fs/linux-9p/
CVS: :pserver:anonymous@cvs.sourceforge.net:/cvsroot/v9fs/linux-9p
Git: rsync://v9fs.graverobber.org/v9fs (webgit: http://v9fs.graverobber.org)
9P: tcp!v9fs.graverobber.org!6564
The user-level server is available from either the Plan 9 distribution
or from http://v9fs.sf.net
Other support applications are still being developed, but preliminary
version can be downloaded from sourceforge.
Documentation on the protocol has historically been the Plan 9 Man
pages (http://plan9.bell-labs.com/sys/man/5/INDEX.html), but there is
an effort under way to write a more complete Internet-Draft style
specification (http://v9fs.sf.net/rfc).
There are a couple of mailing lists supporting v9fs, but the most used
is v9fs-developer@lists.sourceforge.net -- please direct/cc your
comments there so the other v9fs contibutors can participate in the
conversation. There is also an IRC channel: irc://freenode.net/#v9fs
This part of the patch contains Documentation, Makefiles, and configuration
file changes.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 00:04:18 +04:00
2007-10-17 10:30:16 +04:00
endif # NETWORK_FILESYSTEMS
2005-04-17 02:20:36 +04:00
source "fs/nls/Kconfig"
2006-01-18 12:30:29 +03:00
source "fs/dlm/Kconfig"
2019-04-25 20:38:44 +03:00
source "fs/unicode/Kconfig"
2005-04-17 02:20:36 +04:00
2019-10-22 19:25:58 +03:00
config IO_WQ
bool
2005-04-17 02:20:36 +04:00
endmenu