linux/fs
Linus Torvalds cf619f8919 fs.ovl.setgid.v6.2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY5bt7AAKCRCRxhvAZXjc
 ovAOAP9qcrUqs2MoyBDe6qUXThYY9w2rgX/ZI4ZZmbtsXEDGtQEA/LPddq8lD8o9
 m17zpvMGbXXRwz4/zVGuyWsHgg0HsQ0=
 =ioRq
 -----END PGP SIGNATURE-----

Merge tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull setgid inheritance updates from Christian Brauner:
 "This contains the work to make setgid inheritance consistent between
  modifying a file and when changing ownership or mode as this has been
  a repeated source of very subtle bugs. The gist is that we perform the
  same permission checks in the write path as we do in the ownership and
  mode changing paths after this series where we're currently doing
  different things.

  We've already made setgid inheritance a lot more consistent and
  reliable in the last releases by moving setgid stripping from the
  individual filesystems up into the vfs. This aims to make the logic
  even more consistent and easier to understand and also to fix
  long-standing overlayfs setgid inheritance bugs. Miklos was nice
  enough to just let me carry the trivial overlayfs patches from Amir
  too.

  Below is a more detailed explanation how the current difference in
  setgid handling lead to very subtle bugs exemplified via overlayfs
  which is a victim of the current rules. I hope this explains why I
  think taking the regression risk here is worth it.

  A long while ago I found a few setgid inheritance bugs in overlayfs in
  the write path in certain conditions. Amir recently picked this back
  up in [1] and I jumped on board to fix this more generally.

  On the surface all that overlayfs would need to fix setgid inheritance
  would be to call file_remove_privs() or file_modified() but actually
  that isn't enough because the setgid inheritance api is wildly
  inconsistent in that area.

  Before this pr setgid stripping in file_remove_privs()'s old
  should_remove_suid() helper was inconsistent with other parts of the
  vfs. Specifically, it only raises ATTR_KILL_SGID if the inode is
  S_ISGID and S_IXGRP but not if the inode isn't in the caller's groups
  and the caller isn't privileged over the inode although we require
  this already in setattr_prepare() and setattr_copy() and so all
  filesystem implement this requirement implicitly because they have to
  use setattr_{prepare,copy}() anyway.

  But the inconsistency shows up in setgid stripping bugs for overlayfs
  in xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
  generic/687). For example, we test whether suid and setgid stripping
  works correctly when performing various write-like operations as an
  unprivileged user (fallocate, reflink, write, etc.):

      echo "Test 1 - qa_user, non-exec file $verb"
      setup_testfile
      chmod a+rws $junk_file
      commit_and_check "$qa_user" "$verb" 64k 64k

  The test basically creates a file with 6666 permissions. While the
  file has the S_ISUID and S_ISGID bits set it does not have the S_IXGRP
  set.

  On a regular filesystem like xfs what will happen is:

      sys_fallocate()
      -> vfs_fallocate()
         -> xfs_file_fallocate()
            -> file_modified()
               -> __file_remove_privs()
                  -> dentry_needs_remove_privs()
                     -> should_remove_suid()
                  -> __remove_privs()
                     newattrs.ia_valid = ATTR_FORCE | kill;
                     -> notify_change()
                        -> setattr_copy()

  In should_remove_suid() we can see that ATTR_KILL_SUID is raised
  unconditionally because the file in the test has S_ISUID set.

  But we also see that ATTR_KILL_SGID won't be set because while the
  file is S_ISGID it is not S_IXGRP (see above) which is a condition for
  ATTR_KILL_SGID being raised.

  So by the time we call notify_change() we have attr->ia_valid set to
  ATTR_KILL_SUID | ATTR_FORCE.

  Now notify_change() sees that ATTR_KILL_SUID is set and does:

      ia_valid      = attr->ia_valid |= ATTR_MODE
      attr->ia_mode = (inode->i_mode & ~S_ISUID);

  which means that when we call setattr_copy() later we will definitely
  update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.

  Now we call into the filesystem's ->setattr() inode operation which
  will end up calling setattr_copy(). Since ATTR_MODE is set we will
  hit:

      if (ia_valid & ATTR_MODE) {
              umode_t mode = attr->ia_mode;
              vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
              if (!vfsgid_in_group_p(vfsgid) &&
                  !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                      mode &= ~S_ISGID;
              inode->i_mode = mode;
      }

  and since the caller in the test is neither capable nor in the group
  of the inode the S_ISGID bit is stripped.

  But assume the file isn't suid then ATTR_KILL_SUID won't be raised
  which has the consequence that neither the setgid nor the suid bits
  are stripped even though it should be stripped because the inode isn't
  in the caller's groups and the caller isn't privileged over the inode.

  If overlayfs is in the mix things become a bit more complicated and
  the bug shows up more clearly.

  When e.g., ovl_setattr() is hit from ovl_fallocate()'s call to
  file_remove_privs() then ATTR_KILL_SUID and ATTR_KILL_SGID might be
  raised but because the check in notify_change() is questioning the
  ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be stripped
  the S_ISGID bit isn't removed even though it should be stripped:

      sys_fallocate()
      -> vfs_fallocate()
         -> ovl_fallocate()
            -> file_remove_privs()
               -> dentry_needs_remove_privs()
                  -> should_remove_suid()
               -> __remove_privs()
                  newattrs.ia_valid = ATTR_FORCE | kill;
                  -> notify_change()
                     -> ovl_setattr()
                        /* TAKE ON MOUNTER'S CREDS */
                        -> ovl_do_notify_change()
                           -> notify_change()
                        /* GIVE UP MOUNTER'S CREDS */
           /* TAKE ON MOUNTER'S CREDS */
           -> vfs_fallocate()
              -> xfs_file_fallocate()
                 -> file_modified()
                    -> __file_remove_privs()
                       -> dentry_needs_remove_privs()
                          -> should_remove_suid()
                       -> __remove_privs()
                          newattrs.ia_valid = attr_force | kill;
                          -> notify_change()

  The fix for all of this is to make file_remove_privs()'s
  should_remove_suid() helper perform the same checks as we already
  require in setattr_prepare() and setattr_copy() and have
  notify_change() not pointlessly requiring S_IXGRP again. It doesn't
  make any sense in the first place because the caller must calculate
  the flags via should_remove_suid() anyway which would raise
  ATTR_KILL_SGID

  Note that some xfstests will now fail as these patches will cause the
  setgid bit to be lost in certain conditions for unprivileged users
  modifying a setgid file when they would've been kept otherwise. I
  think this risk is worth taking and I explained and mentioned this
  multiple times on the list [2].

  Enforcing the rules consistently across write operations and
  chmod/chown will lead to losing the setgid bit in cases were it
  might've been retained before.

  While I've mentioned this a few times but it's worth repeating just to
  make sure that this is understood. For the sake of maintainability,
  consistency, and security this is a risk worth taking.

  If we really see regressions for workloads the fix is to have special
  setgid handling in the write path again with different semantics from
  chmod/chown and possibly additional duct tape for overlayfs. I'll
  update the relevant xfstests with if you should decide to merge this
  second setgid cleanup.

  Before that people should be aware that there might be failures for
  fstests where unprivileged users modify a setgid file"

Link: https://lore.kernel.org/linux-fsdevel/20221003123040.900827-1-amir73il@gmail.com [1]
Link: https://lore.kernel.org/linux-fsdevel/20221122142010.zchf2jz2oymx55qi@wittgenstein [2]

* tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  fs: use consistent setgid checks in is_sxid()
  ovl: remove privs in ovl_fallocate()
  ovl: remove privs in ovl_copyfile()
  attr: use consistent sgid stripping checks
  attr: add setattr_should_drop_sgid()
  fs: move should_remove_suid()
  attr: add in_group_or_capable()
2022-12-12 19:03:10 -08:00
..
9p fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
adfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
affs affs: move from strlcpy with unused retval to strscpy 2022-08-19 13:03:10 +02:00
afs iov_iter work; most of that is about getting rid of 2022-12-12 18:29:54 -08:00
autofs autofs: remove unused ino field inode 2022-07-17 17:31:42 -07:00
befs befs: Convert befs_symlink_read_folio() to use a folio 2022-08-02 12:34:03 -04:00
bfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
btrfs fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
cachefiles cachefiles: use vfs_tmpfile_open() helper 2022-09-24 07:00:00 +02:00
ceph fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
cifs fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
coda coda: Convert coda_symlink_filler() to use a folio 2022-08-02 12:34:03 -04:00
configfs
cramfs cramfs: read_mapping_page() is synchronous 2022-08-02 12:34:02 -04:00
crypto fscrypt: fix keyring memory leak on mount failure 2022-10-19 20:54:43 -07:00
debugfs debugfs: fix error when writing negative value to atomic_t debugfs file 2022-11-30 16:13:16 -08:00
devpts
dlm Networking changes for 6.1. 2022-10-04 13:38:03 -07:00
ecryptfs ecryptfs: use stub posix acl handlers 2022-10-20 10:13:31 +02:00
efivarfs efi: efivars: Fix variable writes without query_variable_store() 2022-10-21 11:09:40 +02:00
efs efs: Convert efs symlinks to read_folio 2022-05-09 16:21:45 -04:00
erofs fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
exfat treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
exportfs Change calling conventions for filldir_t 2022-08-17 17:25:04 -04:00
ext2 fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
ext4 fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
f2fs fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
fat fat (exportfs): fix some kernel-doc warnings 2022-11-30 16:13:17 -08:00
freevxfs freevxfs: Convert vxfs_immed_read_folio() to use a folio 2022-08-02 12:34:03 -04:00
fscache iov_iter work; most of that is about getting rid of 2022-12-12 18:29:54 -08:00
fuse fs.ovl.setgid.v6.2 2022-12-12 19:03:10 -08:00
gfs2 fs: rename current get acl method 2022-10-20 10:13:27 +02:00
hfs hfs: Fix OOB Write in hfs_asc2mac 2022-12-11 19:30:19 -08:00
hfsplus hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount 2022-12-11 19:30:20 -08:00
hostfs hostfs: move from strlcpy with unused retval to strscpy 2022-09-19 22:46:25 +02:00
hpfs hpfs: Convert symlinks to read_folio 2022-05-09 16:21:45 -04:00
hugetlbfs hugetlbfs: don't delete error page from pagecache 2022-11-08 15:57:22 -08:00
iomap iomap: add a tracepoint for mappings returned by map_blocks 2022-10-02 11:42:19 -07:00
isofs - hfs and hfsplus kmap API modernization from Fabio Francesco 2022-10-12 11:00:22 -07:00
jbd2 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
jffs2 fs: rename current get acl method 2022-10-20 10:13:27 +02:00
jfs fs: rename current get acl method 2022-10-20 10:13:27 +02:00
kernfs kernfs: Fix spurious lockdep warning in kernfs_find_and_get_node_by_id() 2022-11-10 19:03:42 +01:00
ksmbd fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
lockd lockd: use locks_inode_context helper 2022-11-30 05:08:10 -05:00
minix vfs: open inside ->tmpfile() 2022-09-24 07:00:00 +02:00
netfs use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
nfs fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
nfs_common
nfsd fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
nilfs2 Non-MM patches for 6.2-rc1. 2022-12-12 17:28:58 -08:00
nls
notify Merge tag 'fsnotify-for_v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2022-10-07 08:28:50 -07:00
ntfs - hfs and hfsplus kmap API modernization from Fabio Francesco 2022-10-12 11:00:22 -07:00
ntfs3 fs: rename current get acl method 2022-10-20 10:13:27 +02:00
ocfs2 fs.ovl.setgid.v6.2 2022-12-12 19:03:10 -08:00
omfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
openpromfs
orangefs fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
overlayfs fs.ovl.setgid.v6.2 2022-12-12 19:03:10 -08:00
proc iov_iter work; most of that is about getting rid of 2022-12-12 18:29:54 -08:00
pstore pstore: Avoid kcore oops by vmap()ing with VM_IOREMAP 2022-12-05 16:15:09 -08:00
qnx4 fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
qnx6 fs/qnx6: delete unnecessary checks before brelse() 2022-09-11 21:55:07 -07:00
quota quota: Add more checking after reading from quota file 2022-09-29 15:37:30 +02:00
ramfs tmpfile API change 2022-10-10 19:45:17 -07:00
reiserfs fs: rename current get acl method 2022-10-20 10:13:27 +02:00
romfs romfs: Convert romfs to read_folio 2022-05-09 16:21:46 -04:00
smbfs_common smb3: define missing create contexts 2022-10-05 01:55:27 -05:00
squashfs squashfs: fix null-ptr-deref in squashfs_fill_super 2022-11-18 13:55:09 -08:00
sysfs
sysv fs: sysv: Fix sysv_nblocks() returns wrong value 2022-12-10 14:13:37 -05:00
tracefs tracefs: Only clobber mode/uid/gid on remount if asked 2022-09-08 17:10:54 -04:00
ubifs treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
udf udf: Fix a slab-out-of-bounds write bug in udf_find_entry() 2022-11-09 12:24:42 +01:00
ufs ufs: replace ll_rw_block() 2022-09-11 20:26:07 -07:00
unicode
vboxsf vboxsf: Convert vboxsf to read_folio 2022-05-09 16:21:46 -04:00
verity for-6.1-tag 2022-10-06 17:36:48 -07:00
xfs fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
zonefs zonefs: Fix active zone accounting 2022-11-25 17:01:22 +09:00
aio.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
anon_inodes.c dynamic_dname(): drop unused dentry argument 2022-08-20 11:34:04 -04:00
attr.c attr: use consistent sgid stripping checks 2022-10-18 10:09:47 +02:00
bad_inode.c fs: rename current get acl method 2022-10-20 10:13:27 +02:00
binfmt_elf_fdpic.c binfmt: Fix error return code in load_elf_fdpic_binary() 2022-12-01 19:15:52 -08:00
binfmt_elf_test.c
binfmt_elf.c Unification of regset and non-regset sides of ELF coredump 2022-12-12 18:18:34 -08:00
binfmt_flat.c binfmt_flat: Remove shared library support 2022-04-22 10:57:18 -07:00
binfmt_misc.c binfmt_misc: fix shift-out-of-bounds in check_special_flags 2022-12-02 13:57:04 -08:00
binfmt_script.c
buffer.c - hfs and hfsplus kmap API modernization from Fabio Francesco 2022-10-12 11:00:22 -07:00
char_dev.c
compat_binfmt_elf.c
coredump.c iov_iter work; most of that is about getting rid of 2022-12-12 18:29:54 -08:00
d_path.c d_path.c: typo fix... 2022-08-20 11:34:33 -04:00
dax.c Merge branch 'for-6.0/dax' into libnvdimm-fixes 2022-09-24 18:14:12 -07:00
dcache.c tmpfile API change 2022-10-10 19:45:17 -07:00
direct-io.c block: remove PSI accounting from the bio layer 2022-09-20 08:24:38 -06:00
drop_caches.c
eventfd.c eventfd: guard wake_up in eventfd fs calls as well 2022-09-21 10:30:42 -06:00
eventpoll.c epoll: use try_cmpxchg in list_add_tail_lockless 2022-09-11 21:55:07 -07:00
exec.c execve updates for v6.2-rc1 2022-12-12 08:42:29 -08:00
fcntl.c keep iocb_flags() result cached in struct file 2022-06-10 16:10:23 -04:00
fhandle.c do_sys_name_to_handle(): constify path 2022-09-01 17:36:39 -04:00
file_table.c locks: fix TOCTOU race when granting write lease 2022-08-16 10:59:54 -04:00
file.c fs: use acquire ordering in __fget_light() 2022-10-31 15:30:11 -04:00
filesystems.c
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c fs: do not update freeing inode i_io_list 2022-11-22 17:00:00 -05:00
fsopen.c uninline may_mount() and don't opencode it in fspick(2)/fsopen(2) 2022-05-19 23:25:10 -04:00
init.c
inode.c fs.ovl.setgid.v6.2 2022-12-12 19:03:10 -08:00
internal.h fs.ovl.setgid.v6.2 2022-12-12 19:03:10 -08:00
ioctl.c
Kconfig hugetlb: make hugetlb depends on SYSFS or SYSCTL 2022-09-11 20:26:10 -07:00
Kconfig.binfmt Xtensa updates for v6.1 2022-10-10 14:21:11 -07:00
kernel_read_file.c fs/kernel_read_file: allow to read files up-to ssize_t 2022-06-16 19:58:21 -07:00
libfs.c libfs: add DEFINE_SIMPLE_ATTRIBUTE_SIGNED for signed value 2022-11-30 16:13:16 -08:00
locks.c Add process name and pid to locks warning 2022-11-30 05:08:10 -05:00
Makefile a.out: Remove the a.out implementation 2022-09-27 07:11:02 -07:00
mbcache.c mbcache: Avoid nesting of cache->c_list_lock under bit locks 2022-09-30 23:46:52 -04:00
mount.h switch try_to_unlazy_next() to __legitimize_mnt() 2022-07-05 16:18:21 -04:00
mpage.c Folio changes for 6.0 2022-08-03 10:35:43 -07:00
namei.c fs.acl.rework.v6.2 2022-12-12 18:46:39 -08:00
namespace.c copy_mnt_ns(): handle a corner case (overmounted mntns bindings) saner 2022-11-24 22:55:57 -05:00
no-block.c
nsfs.c dynamic_dname(): drop unused dentry argument 2022-08-20 11:34:04 -04:00
open.c attr: use consistent sgid stripping checks 2022-10-18 10:09:47 +02:00
pipe.c dynamic_dname(): drop unused dentry argument 2022-08-20 11:34:04 -04:00
pnode.c
pnode.h
posix_acl.c posix_acl: Fix the type of sentinel in get_acl 2022-12-02 10:01:28 +01:00
proc_namespace.c vfs: escape hash as well 2022-06-28 13:58:05 -04:00
read_write.c iov_iter work; most of that is about getting rid of 2022-12-12 18:29:54 -08:00
readdir.c Change calling conventions for filldir_t 2022-08-17 17:25:04 -04:00
remap_range.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
select.c
seq_file.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
signalfd.c
splice.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
stack.c
stat.c vfs: support STATX_DIOALIGN on block devices 2022-09-11 19:47:12 -05:00
statfs.c
super.c misc pile 2022-12-12 18:38:47 -08:00
sync.c riscv: compat: syscall: Add compat_sys_call_table implementation 2022-04-26 13:36:25 -07:00
sysctls.c
timerfd.c
userfaultfd.c fs/userfaultfd: Fix maple tree iterator in userfaultfd_unregister() 2022-11-07 12:58:26 -08:00
utimes.c
xattr.c acl: remove a slew of now unused helpers 2022-10-20 10:13:32 +02:00