linux/fs
Christian Brauner b36a5780cb ovl: modify layer parameter parsing
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: https://github.com/util-linux/util-linux/issues/2287
Link: https://github.com/util-linux/util-linux/issues/1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-06-20 14:10:40 +03:00
..
9p Including fixes from netfilter. 2023-05-05 19:12:01 -07:00
adfs fs: port ->setattr() to pass mnt_idmap 2023-01-19 09:24:02 +01:00
affs for-6.3/dio-2023-02-16 2023-02-20 14:10:36 -08:00
afs afs: Fix setting of mtime when creating a file/dir/symlink 2023-06-07 09:03:12 -07:00
autofs fs: port ->permission() to pass mnt_idmap 2023-01-19 09:24:28 +01:00
befs befs: Convert befs_symlink_read_folio() to use a folio 2022-08-02 12:34:03 -04:00
bfs fs: port inode_init_owner() to mnt_idmap 2023-01-19 09:24:28 +01:00
btrfs for-6.4-rc4-tag 2023-06-02 17:16:19 -04:00
cachefiles fs/cachefiles: simplify one-level sysctl registration for cachefiles_sysctls 2023-04-13 11:49:35 -07:00
ceph ceph: fix use-after-free bug for inodes when flushing capsnaps 2023-06-08 08:56:25 +02:00
coda sysctl-6.4-rc1 2023-04-27 16:52:33 -07:00
configfs fs: consolidate duplicate dt_type helpers 2023-04-03 09:23:54 +02:00
cramfs fs/cramfs/inode.c: initialize file_ra_state 2023-03-02 21:54:23 -08:00
crypto fscrypt: optimize fscrypt_initialize() 2023-04-06 11:16:39 -07:00
debugfs ARM: SoC drivers for 6.3 2023-02-27 10:04:49 -08:00
devpts devpts: simplify two-level sysctl registration for pty_kern_table 2023-03-13 12:36:34 +01:00
dlm Networking changes for 6.4. 2023-04-26 16:07:23 -07:00
ecryptfs fs: drop unused posix acl handlers 2023-03-06 09:57:12 +01:00
efivarfs A healthy mix of EFI contributions this time: 2023-02-23 14:41:48 -08:00
efs
erofs erofs: use HIPRI by default if per-cpu kthreads are enabled 2023-05-23 16:57:08 +08:00
exfat Description for this pull request: 2023-03-01 08:42:27 -08:00
exportfs fs: port ->permission() to pass mnt_idmap 2023-01-19 09:24:28 +01:00
ext2 \n 2023-04-26 09:07:46 -07:00
ext4 ext4: only check dquot_initialize_needed() when debugging 2023-06-08 10:06:40 -04:00
f2fs f2fs update for 6.4-rc1 2023-04-26 09:42:10 -07:00
fat There is no particular theme here - mainly quick hits all over the tree. 2023-02-23 17:55:40 -08:00
freevxfs There is no particular theme here - mainly quick hits all over the tree. 2023-02-23 17:55:40 -08:00
fscache fscache: Use clear_and_wake_up_bit() in fscache_create_volume_work() 2023-01-30 12:51:54 +00:00
fuse Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
gfs2 gfs2: Don't get stuck writing page onto itself under direct I/O 2023-06-01 14:55:43 +02:00
hfs There is no particular theme here - mainly quick hits all over the tree. 2023-02-23 17:55:40 -08:00
hfsplus fs: hfsplus: remove WARN_ON() from hfsplus_cat_{read,write}_inode() 2023-04-12 11:29:32 +02:00
hostfs um: hostfs: define our own API boundary 2023-04-20 23:04:40 +02:00
hpfs fs: port ->rename() to pass mnt_idmap 2023-01-19 09:24:26 +01:00
hugetlbfs mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() 2023-04-21 14:52:05 -07:00
iomap New code for 6.4: 2023-04-29 10:35:48 -07:00
isofs - hfs and hfsplus kmap API modernization from Fabio Francesco 2022-10-12 11:00:22 -07:00
jbd2 jdb2: Don't refuse invalidation of already invalidated buffers 2023-04-14 19:38:50 -04:00
jffs2 fs: rename generic posix acl handlers 2023-03-06 09:57:13 +01:00
jfs write_one_page series 2023-04-24 19:20:27 -07:00
kernfs Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
lockd nfsd-6.4 fixes: 2023-05-17 09:56:01 -07:00
minix Merge branch 'work.minix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2023-02-24 19:01:15 -08:00
netfs - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
nfs NFSv4.2: Fix a potential double free with READ_PLUS 2023-05-19 17:11:59 -04:00
nfs_common NFSv4.2: remove MODULE_LICENSE in non-modules 2023-04-13 13:13:52 -07:00
nfsd nfsd-6.4 fixes: 2023-06-02 13:38:55 -04:00
nilfs2 nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode() 2023-05-17 15:24:34 -07:00
nls
notify inotify: Avoid reporting event with invalid wd 2023-04-25 12:36:55 +02:00
ntfs ntfs: simplfy one-level sysctl registration for ntfs_sysctls 2023-04-13 11:49:35 -07:00
ntfs3 driver ntfs3 for linux 6.4 2023-04-29 10:52:37 -07:00
ocfs2 Mainly singleton patches all over the place. Series of note are: 2023-04-27 19:57:00 -07:00
omfs fs: port inode_init_owner() to mnt_idmap 2023-01-19 09:24:28 +01:00
openpromfs
orangefs - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
overlayfs ovl: modify layer parameter parsing 2023-06-20 14:10:40 +03:00
proc sysctl: remove register_sysctl_paths() 2023-05-02 19:24:16 -07:00
pstore pstore update for v6.4-rc1 2023-04-27 17:03:40 -07:00
qnx4 qnx4: credit contributors in CREDITS 2023-03-14 12:56:30 -06:00
qnx6 qnx6: credit contributor and mark filesystem orphan 2023-03-14 12:56:30 -06:00
quota quota: mark PRINT_QUOTA_WARNING as BROKEN 2023-04-14 13:06:50 +02:00
ramfs mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
reiserfs \n 2023-04-26 09:07:46 -07:00
romfs mm/nommu: factor out check for NOMMU shared mappings into is_nommu_shared_mapping() 2023-01-18 17:12:56 -08:00
smb ksmbd: validate smb request protocol id 2023-06-02 12:30:57 -05:00
squashfs revert "squashfs: harden sanity check in squashfs_read_xattr_id_table" 2023-02-03 17:52:25 -08:00
sysfs
sysv sysv: switch to put_and_unmap_page() 2023-03-12 20:03:41 -04:00
tracefs fs: port ->mkdir() to pass mnt_idmap 2023-01-19 09:24:26 +01:00
ubifs ubifs: Fix memleak when insert_old_idx() failed 2023-04-23 23:36:38 +02:00
udf udf: use wrapper i_blocksize() in udf_discard_prealloc() 2023-03-13 11:16:16 +01:00
ufs ufs: don't flush page immediately for DIRSYNC directories 2023-03-28 16:20:14 -07:00
unicode unicode: remove MODULE_LICENSE in non-modules 2023-04-13 13:13:54 -07:00
vboxsf fs: port ->rename() to pass mnt_idmap 2023-01-19 09:24:26 +01:00
verity fsverity: reject FS_IOC_ENABLE_VERITY on mode 3 fds 2023-04-11 19:23:23 -07:00
xfs xfs: collect errors from inodegc for unlinked inode recovery 2023-06-05 14:48:15 +10:00
zonefs zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user space 2023-03-30 20:56:02 +09:00
aio.c Merge branch 'mm-hotfixes-stable' into mm-stable 2023-02-10 15:34:48 -08:00
anon_inodes.c dynamic_dname(): drop unused dentry argument 2022-08-20 11:34:04 -04:00
attr.c nfs: use vfs setgid helper 2023-03-30 08:51:48 +02:00
bad_inode.c fs: port ->permission() to pass mnt_idmap 2023-01-19 09:24:28 +01:00
binfmt_elf_fdpic.c ELF: fix all "Elf" typos 2023-04-08 13:45:37 -07:00
binfmt_elf_test.c
binfmt_elf.c Mainly singleton patches all over the place. Series of note are: 2023-04-27 19:57:00 -07:00
binfmt_flat.c
binfmt_misc.c binfmt_misc: fix shift-out-of-bounds in check_special_flags 2022-12-02 13:57:04 -08:00
binfmt_script.c
buffer.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
char_dev.c chardev: fix error handling in cdev_device_add() 2022-12-02 17:48:59 +01:00
compat_binfmt_elf.c
coredump.c fork, vhost: Use CLONE_THREAD to fix freezer/ps regression 2023-06-01 17:15:33 -04:00
d_path.c d_path.c: typo fix... 2022-08-20 11:34:33 -04:00
dax.c fsdax: force clear dirty mark if CoW 2023-04-05 18:06:23 -07:00
dcache.c tmpfile API change 2022-10-10 19:45:17 -07:00
direct-io.c __blockdev_direct_IO(): get rid of submit_io callback 2023-03-05 20:27:41 -05:00
drop_caches.c
eventfd.c eventfd: use wait_event_interruptible_locked_irq() helper 2023-04-06 10:01:50 +02:00
eventpoll.c Mainly singleton patches all over the place. Series of note are: 2023-04-27 19:57:00 -07:00
exec.c tracing updates for 6.4: 2023-04-28 15:57:53 -07:00
fcntl.c fs.idmapped.v6.3 2023-02-20 11:53:11 -08:00
fhandle.c do_sys_name_to_handle(): constify path 2022-09-01 17:36:39 -04:00
file_table.c filelock: move file locking definitions to separate header file 2023-01-11 06:52:32 -05:00
file.c fs: prevent out-of-bounds array speculation when closing a file descriptor 2023-03-09 22:46:21 -05:00
filesystems.c
fs_context.c
fs_parser.c ext4: journal_path mount options should follow links 2022-12-01 10:46:54 -05:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c for-6.4/block-2023-05-06 2023-05-06 08:28:58 -07:00
fsopen.c
init.c fs: port ->permission() to pass mnt_idmap 2023-01-19 09:24:28 +01:00
inode.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
internal.h five ksmbd server fixes, and new lock_rename_child VFS helper routine 2023-04-29 11:10:39 -07:00
ioctl.c fs: port inode_owner_or_capable() to mnt_idmap 2023-01-19 09:24:29 +01:00
Kconfig smb: move client and server files to common directory fs/smb 2023-05-24 16:29:21 -05:00
Kconfig.binfmt Xtensa updates for v6.1 2022-10-10 14:21:11 -07:00
kernel_read_file.c
libfs.c fs: consolidate duplicate dt_type helpers 2023-04-03 09:23:54 +02:00
locks.c filelocks: use mount idmapping for setlease permission check 2023-03-09 22:36:12 +01:00
Makefile smb: move client and server files to common directory fs/smb 2023-05-24 16:29:21 -05:00
mbcache.c ext4: fix deadlock due to mbcache entry corruption 2022-12-08 21:49:25 -05:00
mnt_idmapping.c fs: move mnt_idmap 2023-01-19 09:24:30 +01:00
mount.h
mpage.c mpage: use folios in bio end_io handler 2023-04-18 16:30:02 -07:00
namei.c five ksmbd server fixes, and new lock_rename_child VFS helper routine 2023-04-29 11:10:39 -07:00
namespace.c fget() to fdget() conversions 2023-04-24 19:14:20 -07:00
no-block.c
nsfs.c kill the last remaining user of proc_ns_fget() 2023-04-20 22:55:35 -04:00
open.c open: return EINVAL for O_DIRECTORY | O_CREAT 2023-03-22 11:06:55 +01:00
pipe.c pipe: check for IOCB_NOWAIT alongside O_NONBLOCK 2023-05-12 17:17:27 +02:00
pnode.c pnode: pass mountpoint directly 2023-04-06 14:53:38 +02:00
pnode.h
posix_acl.c acl: don't depend on IOP_XATTR 2023-03-06 09:59:20 +01:00
proc_namespace.c
read_write.c iov_iter: add iter_iov_addr() and iter_iov_len() helpers 2023-03-30 08:12:29 -06:00
readdir.c Change calling conventions for filldir_t 2022-08-17 17:25:04 -04:00
remap_range.c fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap 2023-01-19 09:24:29 +01:00
select.c
seq_file.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
signalfd.c
splice.c pipe-nonblock-2023-05-06 2023-05-06 08:15:20 -07:00
stack.c
stat.c fs.idmapped.v6.3 2023-02-20 11:53:11 -08:00
statfs.c statfs: enforce statfs[64] structure initialization 2023-05-17 15:20:17 +02:00
super.c mm: shrinkers: convert shrinker_rwsem to mutex 2023-03-28 16:20:17 -07:00
sync.c
sysctls.c
timerfd.c
userfaultfd.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
utimes.c fs.idmapped.v6.3 2023-02-20 11:53:11 -08:00
xattr.c fs: don't call posix_acl_listxattr in generic_listxattr 2023-05-17 15:25:20 +02:00