linux/fs
Linus Torvalds 02bf43c7b7 fs.xattr.simple.rework.rbtree.rwlock.v6.2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY5bw/wAKCRCRxhvAZXjc
 ol79AQCsHS9s78dLUvdasfQ1023dyF9zaQ8XGkDO6tRssJzGAAD7B8odxDsfQgjQ
 Qzzn9YPZVUgHjd4xBg21UVPmRP5snwQ=
 =wYgr
 -----END PGP SIGNATURE-----

Merge tag 'fs.xattr.simple.rework.rbtree.rwlock.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull simple-xattr updates from Christian Brauner:
 "This ports the simple xattr infrastucture to rely on a simple rbtree
  protected by a read-write lock instead of a linked list protected by a
  spinlock.

  A while ago we received reports about scaling issues for filesystems
  using the simple xattr infrastructure that also support setting a
  larger number of xattrs. Specifically, cgroups and tmpfs.

  Both cgroupfs and tmpfs can be mounted by unprivileged users in
  unprivileged containers and root in an unprivileged container can set
  an unrestricted number of security.* xattrs and privileged users can
  also set unlimited trusted.* xattrs. A few more words on further that
  below. Other xattrs such as user.* are restricted for kernfs-based
  instances to a fairly limited number.

  As there are apparently users that have a fairly large number of
  xattrs we should scale a bit better. Using a simple linked list
  protected by a spinlock used for set, get, and list operations doesn't
  scale well if users use a lot of xattrs even if it's not a crazy
  number.

  Let's switch to a simple rbtree protected by a rwlock. It scales way
  better and gets rid of the perf issues some people reported. We
  originally had fancier solutions even using an rcu+seqlock protected
  rbtree but we had concerns about being to clever and also that
  deletion from an rbtree with rcu+seqlock isn't entirely safe.

  The rbtree plus rwlock is perfectly fine. By far the most common
  operation is getting an xattr. While setting an xattr is not and
  should be comparatively rare. And listxattr() often only happens when
  copying xattrs between files or together with the contents to a new
  file.

  Holding a lock across listxattr() is unproblematic because it doesn't
  list the values of xattrs. It can only be used to list the names of
  all xattrs set on a file. And the number of xattr names that can be
  listed with listxattr() is limited to XATTR_LIST_MAX aka 65536 bytes.
  If a larger buffer is passed then vfs_listxattr() caps it to
  XATTR_LIST_MAX and if more xattr names are found it will return
  -E2BIG. In short, the maximum amount of memory that can be retrieved
  via listxattr() is limited and thus listxattr() bounded.

  Of course, the API is broken as documented on xattr(7) already. While
  I have no idea how the xattr api ended up in this state we should
  probably try to come up with something here at some point. An iterator
  pattern similar to readdir() as an alternative to listxattr() or
  something else.

  Right now it is extremly strange that users can set millions of xattrs
  but then can't use listxattr() to know which xattrs are actually set.
  And it's really trivial to do:

	for i in {1..1000000}; do setfattr -n security.$i -v $i ./file1; done

  And around 5000 xattrs it's impossible to use listxattr() to figure
  out which xattrs are actually set. So I have suggested that we try to
  limit the number of xattrs for simple xattrs at least. But that's a
  future patch and I don't consider it very urgent.

  A bonus of this port to rbtree+rwlock is that we shrink the memory
  consumption for users of the simple xattr infrastructure.

  This also adds kernel documentation to all the functions"

* tag 'fs.xattr.simple.rework.rbtree.rwlock.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  xattr: use rbtree for simple_xattrs
2022-12-13 10:08:36 -08:00
..
9p fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
adfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
affs affs: move from strlcpy with unused retval to strscpy 2022-08-19 13:03:10 +02:00
afs iov_iter work; most of that is about getting rid of 2022-12-12 18:29:54 -08:00
autofs autofs: remove unused ino field inode 2022-07-17 17:31:42 -07:00
befs befs: Convert befs_symlink_read_folio() to use a folio 2022-08-02 12:34:03 -04:00
bfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
btrfs for-6.2-tag 2022-12-12 20:47:51 -08:00
cachefiles fscache,cachefiles: add prepare_ondemand_read() callback 2022-12-07 10:56:29 +08:00
ceph fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
cifs fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
coda coda: Convert coda_symlink_filler() to use a folio 2022-08-02 12:34:03 -04:00
configfs configfs: fix possible memory leak in configfs_create_dir() 2022-12-02 11:11:22 +01:00
cramfs cramfs: read_mapping_page() is synchronous 2022-08-02 12:34:02 -04:00
crypto fscrypt: Add SM4 XTS/CTS symmetric algorithm support 2022-12-01 11:23:58 -08:00
debugfs debugfs: fix error when writing negative value to atomic_t debugfs file 2022-11-30 16:13:16 -08:00
devpts
dlm fs: dlm: fix building without lockdep 2022-11-22 10:14:26 -06:00
ecryptfs ecryptfs: use stub posix acl handlers 2022-10-20 10:13:31 +02:00
efivarfs efi: efivars: Fix variable writes without query_variable_store() 2022-10-21 11:09:40 +02:00
efs efs: Convert efs symlinks to read_folio 2022-05-09 16:21:45 -04:00
erofs Changes since the last update: 2022-12-12 20:14:04 -08:00
exfat treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
exportfs exportfs: use pr_debug for unreachable debug statements 2022-11-28 12:54:45 -05:00
ext2 \n 2022-12-12 20:32:50 -08:00
ext4 fsverity updates for 6.2 2022-12-12 20:06:35 -08:00
f2fs fsverity updates for 6.2 2022-12-12 20:06:35 -08:00
fat fat (exportfs): fix some kernel-doc warnings 2022-11-30 16:13:17 -08:00
freevxfs freevxfs: Convert vxfs_immed_read_folio() to use a folio 2022-08-02 12:34:03 -04:00
fscache iov_iter work; most of that is about getting rid of 2022-12-12 18:29:54 -08:00
fuse fuse update for 6.2 2022-12-12 20:22:09 -08:00
gfs2 fs: rename current get acl method 2022-10-20 10:13:27 +02:00
hfs hfs: Fix OOB Write in hfs_asc2mac 2022-12-11 19:30:19 -08:00
hfsplus hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount 2022-12-11 19:30:20 -08:00
hostfs hostfs: move from strlcpy with unused retval to strscpy 2022-09-19 22:46:25 +02:00
hpfs hpfs: Convert symlinks to read_folio 2022-05-09 16:21:45 -04:00
hugetlbfs hugetlbfs: don't delete error page from pagecache 2022-11-08 15:57:22 -08:00
iomap iomap: add a tracepoint for mappings returned by map_blocks 2022-10-02 11:42:19 -07:00
isofs - hfs and hfsplus kmap API modernization from Fabio Francesco 2022-10-12 11:00:22 -07:00
jbd2 jbd2: switch jbd2_submit_inode_data() to use fs-provided hook for data writeout 2022-12-08 21:49:25 -05:00
jffs2 fs: rename current get acl method 2022-10-20 10:13:27 +02:00
jfs Assorted JFS fixes for 6.2 2022-12-12 20:38:28 -08:00
kernfs kernfs: Fix spurious lockdep warning in kernfs_find_and_get_node_by_id() 2022-11-10 19:03:42 +01:00
ksmbd fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
lockd NFSD 6.2 Release Notes 2022-12-12 20:54:39 -08:00
minix vfs: open inside ->tmpfile() 2022-09-24 07:00:00 +02:00
netfs use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
nfs NFS client updates for Linux 6.2 2022-12-13 08:44:41 -08:00
nfs_common
nfsd NFSD 6.2 Release Notes 2022-12-12 20:54:39 -08:00
nilfs2 Non-MM patches for 6.2-rc1. 2022-12-12 17:28:58 -08:00
nls
notify Merge tag 'fsnotify-for_v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2022-10-07 08:28:50 -07:00
ntfs - hfs and hfsplus kmap API modernization from Fabio Francesco 2022-10-12 11:00:22 -07:00
ntfs3 fs: rename current get acl method 2022-10-20 10:13:27 +02:00
ocfs2 fs.ovl.setgid.v6.2 2022-12-12 19:03:10 -08:00
omfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
openpromfs
orangefs fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
overlayfs overlayfs update for 6.2 2022-12-12 20:18:26 -08:00
proc iov_iter work; most of that is about getting rid of 2022-12-12 18:29:54 -08:00
pstore pstore: Avoid kcore oops by vmap()ing with VM_IOREMAP 2022-12-05 16:15:09 -08:00
qnx4 fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
qnx6 fs/qnx6: delete unnecessary checks before brelse() 2022-09-11 21:55:07 -07:00
quota ext4: fix bug_on in __es_tree_search caused by bad quota inode 2022-12-08 21:49:23 -05:00
ramfs tmpfile API change 2022-10-10 19:45:17 -07:00
reiserfs lsm/stable-6.2 PR 20221212 2022-12-13 09:47:48 -08:00
romfs romfs: Convert romfs to read_folio 2022-05-09 16:21:46 -04:00
smbfs_common smb3: define missing create contexts 2022-10-05 01:55:27 -05:00
squashfs fs.idmapped.squashfs.v6.2 2022-12-12 20:24:51 -08:00
sysfs
sysv fs: sysv: Fix sysv_nblocks() returns wrong value 2022-12-10 14:13:37 -05:00
tracefs tracefs: Only clobber mode/uid/gid on remount if asked 2022-09-08 17:10:54 -04:00
ubifs treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
udf \n 2022-12-12 20:32:50 -08:00
ufs ufs: replace ll_rw_block() 2022-09-11 20:26:07 -07:00
unicode
vboxsf vboxsf: Convert vboxsf to read_folio 2022-05-09 16:21:46 -04:00
verity fsverity: simplify fsverity_get_digest() 2022-11-29 21:07:41 -08:00
xfs fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
zonefs zonefs: Fix active zone accounting 2022-11-25 17:01:22 +09:00
aio.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
anon_inodes.c dynamic_dname(): drop unused dentry argument 2022-08-20 11:34:04 -04:00
attr.c attr: use consistent sgid stripping checks 2022-10-18 10:09:47 +02:00
bad_inode.c fs: rename current get acl method 2022-10-20 10:13:27 +02:00
binfmt_elf_fdpic.c binfmt: Fix error return code in load_elf_fdpic_binary() 2022-12-01 19:15:52 -08:00
binfmt_elf_test.c
binfmt_elf.c Unification of regset and non-regset sides of ELF coredump 2022-12-12 18:18:34 -08:00
binfmt_flat.c
binfmt_misc.c binfmt_misc: fix shift-out-of-bounds in check_special_flags 2022-12-02 13:57:04 -08:00
binfmt_script.c
buffer.c - hfs and hfsplus kmap API modernization from Fabio Francesco 2022-10-12 11:00:22 -07:00
char_dev.c
compat_binfmt_elf.c
coredump.c fs.vfsuid.conversion.v6.2 2022-12-12 19:20:05 -08:00
d_path.c d_path.c: typo fix... 2022-08-20 11:34:33 -04:00
dax.c Merge branch 'for-6.0/dax' into libnvdimm-fixes 2022-09-24 18:14:12 -07:00
dcache.c tmpfile API change 2022-10-10 19:45:17 -07:00
direct-io.c block: remove PSI accounting from the bio layer 2022-09-20 08:24:38 -06:00
drop_caches.c
eventfd.c eventfd: guard wake_up in eventfd fs calls as well 2022-09-21 10:30:42 -06:00
eventpoll.c epoll: use try_cmpxchg in list_add_tail_lockless 2022-09-11 21:55:07 -07:00
exec.c fs.vfsuid.conversion.v6.2 2022-12-12 19:20:05 -08:00
fcntl.c keep iocb_flags() result cached in struct file 2022-06-10 16:10:23 -04:00
fhandle.c do_sys_name_to_handle(): constify path 2022-09-01 17:36:39 -04:00
file_table.c locks: fix TOCTOU race when granting write lease 2022-08-16 10:59:54 -04:00
file.c fs: use acquire ordering in __fget_light() 2022-10-31 15:30:11 -04:00
filesystems.c
fs_context.c
fs_parser.c ext4: journal_path mount options should follow links 2022-12-01 10:46:54 -05:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c fs: do not update freeing inode i_io_list 2022-11-22 17:00:00 -05:00
fsopen.c uninline may_mount() and don't opencode it in fspick(2)/fsopen(2) 2022-05-19 23:25:10 -04:00
init.c
inode.c fs.vfsuid.conversion.v6.2 2022-12-12 19:20:05 -08:00
internal.h fs.ovl.setgid.v6.2 2022-12-12 19:03:10 -08:00
ioctl.c
Kconfig hugetlb: make hugetlb depends on SYSFS or SYSCTL 2022-09-11 20:26:10 -07:00
Kconfig.binfmt Xtensa updates for v6.1 2022-10-10 14:21:11 -07:00
kernel_read_file.c fs/kernel_read_file: allow to read files up-to ssize_t 2022-06-16 19:58:21 -07:00
libfs.c libfs: add DEFINE_SIMPLE_ATTRIBUTE_SIGNED for signed value 2022-11-30 16:13:16 -08:00
locks.c Add process name and pid to locks warning 2022-11-30 05:08:10 -05:00
Makefile a.out: Remove the a.out implementation 2022-09-27 07:11:02 -07:00
mbcache.c ext4: fix deadlock due to mbcache entry corruption 2022-12-08 21:49:25 -05:00
mount.h switch try_to_unlazy_next() to __legitimize_mnt() 2022-07-05 16:18:21 -04:00
mpage.c Folio changes for 6.0 2022-08-03 10:35:43 -07:00
namei.c Landlock updates for v6.2-rc1 2022-12-13 09:14:50 -08:00
namespace.c fs.idmapped.mnt_idmap.v6.2 2022-12-12 19:30:18 -08:00
no-block.c
nsfs.c dynamic_dname(): drop unused dentry argument 2022-08-20 11:34:04 -04:00
open.c Landlock updates for v6.2-rc1 2022-12-13 09:14:50 -08:00
pipe.c dynamic_dname(): drop unused dentry argument 2022-08-20 11:34:04 -04:00
pnode.c
pnode.h
posix_acl.c fs.idmapped.mnt_idmap.v6.2 2022-12-12 19:30:18 -08:00
proc_namespace.c vfs: escape hash as well 2022-06-28 13:58:05 -04:00
read_write.c iov_iter work; most of that is about getting rid of 2022-12-12 18:29:54 -08:00
readdir.c Change calling conventions for filldir_t 2022-08-17 17:25:04 -04:00
remap_range.c fs: use type safe idmapping helpers 2022-10-26 10:02:34 +02:00
select.c
seq_file.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
signalfd.c
splice.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
stack.c
stat.c fs: use type safe idmapping helpers 2022-10-26 10:02:34 +02:00
statfs.c
super.c misc pile 2022-12-12 18:38:47 -08:00
sync.c
sysctls.c
timerfd.c
userfaultfd.c fs/userfaultfd: Fix maple tree iterator in userfaultfd_unregister() 2022-11-07 12:58:26 -08:00
utimes.c
xattr.c fs.xattr.simple.rework.rbtree.rwlock.v6.2 2022-12-13 10:08:36 -08:00