IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc
ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f
4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg=
=yPaw
-----END PGP SIGNATURE-----
Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull idmapped mounts from Christian Brauner:
"This introduces idmapped mounts which has been in the making for some
time. Simply put, different mounts can expose the same file or
directory with different ownership. This initial implementation comes
with ports for fat, ext4 and with Christoph's port for xfs with more
filesystems being actively worked on by independent people and
maintainers.
Idmapping mounts handle a wide range of long standing use-cases. Here
are just a few:
- Idmapped mounts make it possible to easily share files between
multiple users or multiple machines especially in complex
scenarios. For example, idmapped mounts will be used in the
implementation of portable home directories in
systemd-homed.service(8) where they allow users to move their home
directory to an external storage device and use it on multiple
computers where they are assigned different uids and gids. This
effectively makes it possible to assign random uids and gids at
login time.
- It is possible to share files from the host with unprivileged
containers without having to change ownership permanently through
chown(2).
- It is possible to idmap a container's rootfs and without having to
mangle every file. For example, Chromebooks use it to share the
user's Download folder with their unprivileged containers in their
Linux subsystem.
- It is possible to share files between containers with
non-overlapping idmappings.
- Filesystem that lack a proper concept of ownership such as fat can
use idmapped mounts to implement discretionary access (DAC)
permission checking.
- They allow users to efficiently changing ownership on a per-mount
basis without having to (recursively) chown(2) all files. In
contrast to chown (2) changing ownership of large sets of files is
instantenous with idmapped mounts. This is especially useful when
ownership of a whole root filesystem of a virtual machine or
container is changed. With idmapped mounts a single syscall
mount_setattr syscall will be sufficient to change the ownership of
all files.
- Idmapped mounts always take the current ownership into account as
idmappings specify what a given uid or gid is supposed to be mapped
to. This contrasts with the chown(2) syscall which cannot by itself
take the current ownership of the files it changes into account. It
simply changes the ownership to the specified uid and gid. This is
especially problematic when recursively chown(2)ing a large set of
files which is commong with the aforementioned portable home
directory and container and vm scenario.
- Idmapped mounts allow to change ownership locally, restricting it
to specific mounts, and temporarily as the ownership changes only
apply as long as the mount exists.
Several userspace projects have either already put up patches and
pull-requests for this feature or will do so should you decide to pull
this:
- systemd: In a wide variety of scenarios but especially right away
in their implementation of portable home directories.
https://systemd.io/HOME_DIRECTORY/
- container runtimes: containerd, runC, LXD:To share data between
host and unprivileged containers, unprivileged and privileged
containers, etc. The pull request for idmapped mounts support in
containerd, the default Kubernetes runtime is already up for quite
a while now: https://github.com/containerd/containerd/pull/4734
- The virtio-fs developers and several users have expressed interest
in using this feature with virtual machines once virtio-fs is
ported.
- ChromeOS: Sharing host-directories with unprivileged containers.
I've tightly synced with all those projects and all of those listed
here have also expressed their need/desire for this feature on the
mailing list. For more info on how people use this there's a bunch of
talks about this too. Here's just two recent ones:
https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdfhttps://fosdem.org/2021/schedule/event/containers_idmap/
This comes with an extensive xfstests suite covering both ext4 and
xfs:
https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts
It covers truncation, creation, opening, xattrs, vfscaps, setid
execution, setgid inheritance and more both with idmapped and
non-idmapped mounts. It already helped to discover an unrelated xfs
setgid inheritance bug which has since been fixed in mainline. It will
be sent for inclusion with the xfstests project should you decide to
merge this.
In order to support per-mount idmappings vfsmounts are marked with
user namespaces. The idmapping of the user namespace will be used to
map the ids of vfs objects when they are accessed through that mount.
By default all vfsmounts are marked with the initial user namespace.
The initial user namespace is used to indicate that a mount is not
idmapped. All operations behave as before and this is verified in the
testsuite.
Based on prior discussions we want to attach the whole user namespace
and not just a dedicated idmapping struct. This allows us to reuse all
the helpers that already exist for dealing with idmappings instead of
introducing a whole new range of helpers. In addition, if we decide in
the future that we are confident enough to enable unprivileged users
to setup idmapped mounts the permission checking can take into account
whether the caller is privileged in the user namespace the mount is
currently marked with.
The user namespace the mount will be marked with can be specified by
passing a file descriptor refering to the user namespace as an
argument to the new mount_setattr() syscall together with the new
MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
of extensibility.
The following conditions must be met in order to create an idmapped
mount:
- The caller must currently have the CAP_SYS_ADMIN capability in the
user namespace the underlying filesystem has been mounted in.
- The underlying filesystem must support idmapped mounts.
- The mount must not already be idmapped. This also implies that the
idmapping of a mount cannot be altered once it has been idmapped.
- The mount must be a detached/anonymous mount, i.e. it must have
been created by calling open_tree() with the OPEN_TREE_CLONE flag
and it must not already have been visible in the filesystem.
The last two points guarantee easier semantics for userspace and the
kernel and make the implementation significantly simpler.
By default vfsmounts are marked with the initial user namespace and no
behavioral or performance changes are observed.
The manpage with a detailed description can be found here:
1d7b902e28
In order to support idmapped mounts, filesystems need to be changed
and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
patches to convert individual filesystem are not very large or
complicated overall as can be seen from the included fat, ext4, and
xfs ports. Patches for other filesystems are actively worked on and
will be sent out separately. The xfstestsuite can be used to verify
that port has been done correctly.
The mount_setattr() syscall is motivated independent of the idmapped
mounts patches and it's been around since July 2019. One of the most
valuable features of the new mount api is the ability to perform
mounts based on file descriptors only.
Together with the lookup restrictions available in the openat2()
RESOLVE_* flag namespace which we added in v5.6 this is the first time
we are close to hardened and race-free (e.g. symlinks) mounting and
path resolution.
While userspace has started porting to the new mount api to mount
proper filesystems and create new bind-mounts it is currently not
possible to change mount options of an already existing bind mount in
the new mount api since the mount_setattr() syscall is missing.
With the addition of the mount_setattr() syscall we remove this last
restriction and userspace can now fully port to the new mount api,
covering every use-case the old mount api could. We also add the
crucial ability to recursively change mount options for a whole mount
tree, both removing and adding mount options at the same time. This
syscall has been requested multiple times by various people and
projects.
There is a simple tool available at
https://github.com/brauner/mount-idmapped
that allows to create idmapped mounts so people can play with this
patch series. I'll add support for the regular mount binary should you
decide to pull this in the following weeks:
Here's an example to a simple idmapped mount of another user's home
directory:
u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt
u1001@f2-vm:/$ ls -al /home/ubuntu/
total 28
drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
drwxr-xr-x 4 root root 4096 Oct 28 04:00 ..
-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile
-rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ ls -al /mnt/
total 28
drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 .
drwxr-xr-x 29 root root 4096 Oct 28 22:01 ..
-rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile
-rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ touch /mnt/my-file
u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file
u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file
u1001@f2-vm:/$ ls -al /mnt/my-file
-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file
u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file
u1001@f2-vm:/$ getfacl /mnt/my-file
getfacl: Removing leading '/' from absolute path names
# file: mnt/my-file
# owner: u1001
# group: u1001
user::rw-
user:u1001:rwx
group::rw-
mask::rwx
other::r--
u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
getfacl: Removing leading '/' from absolute path names
# file: home/ubuntu/my-file
# owner: ubuntu
# group: ubuntu
user::rw-
user:ubuntu:rwx
group::rw-
mask::rwx
other::r--"
* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
xfs: support idmapped mounts
ext4: support idmapped mounts
fat: handle idmapped mounts
tests: add mount_setattr() selftests
fs: introduce MOUNT_ATTR_IDMAP
fs: add mount_setattr()
fs: add attr_flags_to_mnt_flags helper
fs: split out functions to hold writers
namespace: only take read lock in do_reconfigure_mnt()
mount: make {lock,unlock}_mount_hash() static
namespace: take lock_mount_hash() directly when changing flags
nfs: do not export idmapped mounts
overlayfs: do not mount on top of idmapped mounts
ecryptfs: do not mount on top of idmapped mounts
ima: handle idmapped mounts
apparmor: handle idmapped mounts
fs: make helpers idmap mount aware
exec: handle idmapped mounts
would_dump: handle idmapped mounts
...
When interacting with user namespace and non-user namespace aware
filesystem capabilities the vfs will perform various security checks to
determine whether or not the filesystem capabilities can be used by the
caller, whether they need to be removed and so on. The main
infrastructure for this resides in the capability codepaths but they are
called through the LSM security infrastructure even though they are not
technically an LSM or optional. This extends the existing security hooks
security_inode_removexattr(), security_inode_killpriv(),
security_inode_getsecurity() to pass down the mount's user namespace and
makes them aware of idmapped mounts.
In order to actually get filesystem capabilities from disk the
capability infrastructure exposes the get_vfs_caps_from_disk() helper.
For user namespace aware filesystem capabilities a root uid is stored
alongside the capabilities.
In order to determine whether the caller can make use of the filesystem
capability or whether it needs to be ignored it is translated according
to the superblock's user namespace. If it can be translated to uid 0
according to that id mapping the caller can use the filesystem
capabilities stored on disk. If we are accessing the inode that holds
the filesystem capabilities through an idmapped mount we map the root
uid according to the mount's user namespace. Afterwards the checks are
identical to non-idmapped mounts: reading filesystem caps from disk
enforces that the root uid associated with the filesystem capability
must have a mapping in the superblock's user namespace and that the
caller is either in the same user namespace or is a descendant of the
superblock's user namespace. For filesystems that are mountable inside
user namespace the caller can just mount the filesystem and won't
usually need to idmap it. If they do want to idmap it they can create an
idmapped mount and mark it with a user namespace they created and which
is thus a descendant of s_user_ns. For filesystems that are not
mountable inside user namespaces the descendant rule is trivially true
because the s_user_ns will be the initial user namespace.
If the initial user namespace is passed nothing changes so non-idmapped
mounts will see identical behavior as before.
Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
When interacting with extended attributes the vfs verifies that the
caller is privileged over the inode with which the extended attribute is
associated. For posix access and posix default extended attributes a uid
or gid can be stored on-disk. Let the functions handle posix extended
attributes on idmapped mounts. If the inode is accessed through an
idmapped mount we need to map it according to the mount's user
namespace. Afterwards the checks are identical to non-idmapped mounts.
This has no effect for e.g. security xattrs since they don't store uids
or gids and don't perform permission checks on them like posix acls do.
Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
The inode_owner_or_capable() helper determines whether the caller is the
owner of the inode or is capable with respect to that inode. Allow it to
handle idmapped mounts. If the inode is accessed through an idmapped
mount it according to the mount's user namespace. Afterwards the checks
are identical to non-idmapped mounts. If the initial user namespace is
passed nothing changes so non-idmapped mounts will see identical
behavior as before.
Similarly, allow the inode_init_owner() helper to handle idmapped
mounts. It initializes a new inode on idmapped mounts by mapping the
fsuid and fsgid of the caller from the mount's user namespace. If the
initial user namespace is passed nothing changes so non-idmapped mounts
will see identical behavior as before.
Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This change uses the anon_inodes and LSM infrastructure introduced in
the previous patches to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.
A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".
Example:
type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };
(The next patch in this series is necessary for making userfaultfd
support this new interface. The example above is just
for exposition.)
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
When a superblock is assigned the SECURITY_FS_USE_XATTR behavior by the
policy yet it lacks xattr support, try to fall back to genfs rather than
rejecting the mount. If a genfscon rule is found for the filesystem,
then change the behavior to SECURITY_FS_USE_GENFS, otherwise reject the
mount as before. A similar fallback is already done in security_fs_use()
if no behavior specification is found for the given filesystem.
This is needed e.g. for virtiofs, which may or may not support xattrs
depending on the backing host filesystem.
Example:
# seinfo --genfs | grep ' ramfs'
genfscon ramfs / system_u:object_r:ramfs_t:s0
# echo '(fsuse xattr ramfs (system_u object_r fs_t ((s0) (s0))))' >ramfs_xattr.cil
# semodule -i ramfs_xattr.cil
# mount -t ramfs none /mnt
Before:
mount: /mnt: mount(2) system call failed: Operation not supported.
After:
(mount succeeds)
# ls -Zd /mnt
system_u:object_r:ramfs_t:s0 /mnt
See also:
https://lore.kernel.org/selinux/20210105142148.GA3200@redhat.com/T/https://github.com/fedora-selinux/selinux-policy/pull/478
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
When inode has no listxattr op of its own (e.g. squashfs) vfs_listxattr
calls the LSM inode_listsecurity hooks to list the xattrs that LSMs will
intercept in inode_getxattr hooks.
When selinux LSM is installed but not initialized, it will list the
security.selinux xattr in inode_listsecurity, but will not intercept it
in inode_getxattr. This results in -ENODATA for a getxattr call for an
xattr returned by listxattr.
This situation was manifested as overlayfs failure to copy up lower
files from squashfs when selinux is built-in but not initialized,
because ovl_copy_xattr() iterates the lower inode xattrs by
vfs_listxattr() and vfs_getxattr().
Match the logic of inode_listsecurity to that of inode_getxattr and
do not list the security.selinux xattr if selinux is not initialized.
Reported-by: Michael Labriola <michael.d.labriola@gmail.com>
Tested-by: Michael Labriola <michael.d.labriola@gmail.com>
Link: https://lore.kernel.org/linux-unionfs/2nv9d47zt7.fsf@aldarion.sourceruckus.org/
Fixes: c8e222616c ("selinux: allow reading labels before policy is loaded")
Cc: stable@vger.kernel.org#v5.9+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The MPTCP protocol uses a specific protocol value, even if
it's an extension to TCP. Additionally, MPTCP sockets
could 'fall-back' to TCP at run-time, depending on peer MPTCP
support and available resources.
As a consequence of the specific protocol number, selinux
applies the raw_socket class to MPTCP sockets.
Existing TCP application converted to MPTCP - or forced to
use MPTCP socket with user-space hacks - will need an
updated policy to run successfully.
This change lets selinux attach the TCP socket class to
MPTCP sockets, too, so that no policy changes are needed in
the above scenario.
Note that the MPTCP is setting, propagating and updating the
security context on all the subflows and related request
socket.
Link: https://lore.kernel.org/linux-security-module/CAHC9VhTaK3xx0hEGByD2zxfF7fadyPP1kb-WeWH_YCyq9X-sRg@mail.gmail.com/T/#t
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[PM: tweaked subject's prefix]
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl/YBtEUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNnwA/9Ek8DG/1t8CEoJxpoRvwovQxNo+bi
0rCT9vqvx9PeCwoZi/0Vp6oKmpE1HADvbeB/+e00VrbLYnzE3oRY6VkpjoZRofKS
vc0/MzHSFxFUR1OTHwCefcXlPLK+bfitQbX5jEMeVyQCXNXXIrN7CnJf1LmCeLTR
kQBPlEN9lt7HyNVAi34FhOD/TQbWnFHgl2z5puffgri6cWnc+TALKMYytUZ+rYex
NYndDJW5b3g5kTat2eErn0FruxfzloGs0xMIiWb+z2i9kl41D+dkKPdAN7idqCSC
Jv0nJP/bDftzA0wOe9szmGaLQzu7YnCN5kiWcSspatZVnon42Cy/tp9tiuPGLRFU
XtelDfpyX6o3CLN0tX7LQEO+GYxPzvM6iaR2OrsChWPozUIIR3TLQg7jJN4bvNKl
TR6gCGZCoAeS5JLNGjzVKxT/oKQY+tCLLlYXQdQY6swNFi3EKmPr+K1D9lgm98fO
f3d1QmWiZZNmtxxoVogT0qoQYjkfgpnm3dVx813Vt+lwHlVpHGMEPpO27iD3/RYb
w2yWOJaGKwMD8iL0l+Cm6CPW0/nE5FFISQjWgC8b4Vgxlyan6+L9eViqGICkrUQ2
Edo0i1YFFZ4utHYkDf1VYBbJ+36KyCtdktgLAcbgnePiPB3E1XBsXTIIStSUIbVQ
iEbTkBlsCG4GIeU=
=6Cqb
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"While we have a small number of SELinux patches for v5.11, there are a
few changes worth highlighting:
- Change the LSM network hooks to pass flowi_common structs instead
of the parent flowi struct as the LSMs do not currently need the
full flowi struct and they do not have enough information to use it
safely (missing information on the address family).
This patch was discussed both with Herbert Xu (representing team
netdev) and James Morris (representing team
LSMs-other-than-SELinux).
- Fix how we handle errors in inode_doinit_with_dentry() so that we
attempt to properly label the inode on following lookups instead of
continuing to treat it as unlabeled.
- Tweak the kernel logic around allowx, auditallowx, and dontauditx
SELinux policy statements such that the auditx/dontauditx are
effective even without the allowx statement.
Everything passes our test suite"
* tag 'selinux-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
lsm,selinux: pass flowi_common instead of flowi to the LSM hooks
selinux: Fix fall-through warnings for Clang
selinux: drop super_block backpointer from superblock_security_struct
selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling
selinux: allow dontauditx and auditallowx rules to take effect without allowx
selinux: fix error initialization in inode_doinit_with_dentry()
A followup change to tcp_request_sock_op would have to drop the 'const'
qualifier from the 'route_req' function as the
'security_inet_conn_request' call is moved there - and that function
expects a 'struct sock *'.
However, it turns out its also possible to add a const qualifier to
security_inet_conn_request instead.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As pointed out by Herbert in a recent related patch, the LSM hooks do
not have the necessary address family information to use the flowi
struct safely. As none of the LSMs currently use any of the protocol
specific flowi information, replace the flowi pointers with pointers
to the address family independent flowi_common struct.
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of letting the code fall
through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
It appears to have been needed for selinux_complete_init() in the past,
but today it's useless.
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
A previous fix, commit 83370b31a9 ("selinux: fix error initialization
in inode_doinit_with_dentry()"), changed how failures were handled
before a SELinux policy was loaded. Unfortunately that patch was
potentially problematic for two reasons: it set the isec->initialized
state without holding a lock, and it didn't set the inode's SELinux
label to the "default" for the particular filesystem. The later can
be a problem if/when a later attempt to revalidate the inode fails
and SELinux reverts to the existing inode label.
This patch should restore the default inode labeling that existed
before the original fix, without affecting the LABEL_INVALID marking
such that revalidation will still be attempted in the future.
Fixes: 83370b31a9 ("selinux: fix error initialization in inode_doinit_with_dentry()")
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Mark the inode security label as invalid if we cannot find
a dentry so that we will retry later rather than marking it
initialized with the unlabeled SID.
Fixes: 9287aed2ad ("selinux: Convert isec->lock into a spinlock")
Signed-off-by: Tianyue Ren <rentianyue@kylinos.cn>
[PM: minor comment tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Here is the big set of char, misc, and other assorted driver subsystem
patches for 5.10-rc1.
There's a lot of different things in here, all over the drivers/
directory. Some summaries:
- soundwire driver updates
- habanalabs driver updates
- extcon driver updates
- nitro_enclaves new driver
- fsl-mc driver and core updates
- mhi core and bus updates
- nvmem driver updates
- eeprom driver updates
- binder driver updates and fixes
- vbox minor bugfixes
- fsi driver updates
- w1 driver updates
- coresight driver updates
- interconnect driver updates
- misc driver updates
- other minor driver updates
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX4g8YQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yngKgCeNpArCP/9vQJRK9upnDm8ZLunSCUAn1wUT/2A
/bTQ42c/WRQ+LU828GSM
=6sO2
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big set of char, misc, and other assorted driver subsystem
patches for 5.10-rc1.
There's a lot of different things in here, all over the drivers/
directory. Some summaries:
- soundwire driver updates
- habanalabs driver updates
- extcon driver updates
- nitro_enclaves new driver
- fsl-mc driver and core updates
- mhi core and bus updates
- nvmem driver updates
- eeprom driver updates
- binder driver updates and fixes
- vbox minor bugfixes
- fsi driver updates
- w1 driver updates
- coresight driver updates
- interconnect driver updates
- misc driver updates
- other minor driver updates
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (396 commits)
binder: fix UAF when releasing todo list
docs: w1: w1_therm: Fix broken xref, mistakes, clarify text
misc: Kconfig: fix a HISI_HIKEY_USB dependency
LSM: Fix type of id parameter in kernel_post_load_data prototype
misc: Kconfig: add a new dependency for HISI_HIKEY_USB
firmware_loader: fix a kernel-doc markup
w1: w1_therm: make w1_poll_completion static
binder: simplify the return expression of binder_mmap
test_firmware: Test partial read support
firmware: Add request_partial_firmware_into_buf()
firmware: Store opt_flags in fw_priv
fs/kernel_file_read: Add "offset" arg for partial reads
IMA: Add support for file reads without contents
LSM: Add "contents" flag to kernel_read_file hook
module: Call security_kernel_post_load_data()
firmware_loader: Use security_post_load_data()
LSM: Introduce kernel_post_load_data() hook
fs/kernel_read_file: Add file_size output argument
fs/kernel_read_file: Switch buffer size arg to size_t
fs/kernel_read_file: Remove redundant size argument
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl+E9UoUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMG2BAApHLKLsfH5gf7gZNjHmQxddg8maCl
BGt7K1xc9iYBZN56Cbc7v9uKc5pM+UOoOlVmWh+8jaROpX10jJmvhsebQzpcWEEs
O/BDg/Y/AafoLr5e7gbAnlA7TJXNSR9MG9RB7c9xC14LG/bqBmkaUNsv8isWlLgl
J2atHLsdlvCbmqJvnc6Fh3VJCbY/I0kt9L04GBQ4pEK3TKOxtORQaQcjVgLhlcw9
YdMPKYIwy2Ze2HUuyW2o9OuryHhoMrwxpN/35/PAxrRwpO0LVnjjiw6njQqYVGH3
el8mPXlhHah/7QUKcngSsvcvUcaSencp9sUBrp1vK9C1vkSFyubZweVi4A2TEWnh
Ctceje7XP/YWDcJ+5BgASvosQdqOBB7huuOOKVpvaBXqgUHFgaxphV4/FDNnlF62
AteX5RcWb/JiFJ4YnbknPNa/MWxVYuVn78AlNsM2ZponWYWs9JZ17lX4tHAKF1Qm
x6ZMvMCDJTj8622l8nw3dTZKNDE3nFblDThX8aSrAhCQQE6HvugbKU4Fzo1oiSPl
84PlCPgb+3tP3OsvZDIOPCJxC6IHgS+meA0IjhjwuCb+U+YWaAIeOlOPSkxUmfLu
iJVWHmDtsAM3bTBxwQudhgXF3a1oKCEqeqNxM6P6p55jti7xal9FnZNHTbSh2sO1
Km4oIqTEb1XWNdU=
=NNLw
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20201012' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"A decent number of SELinux patches for v5.10, twenty two in total. The
highlights are listed below, but all of the patches pass our test
suite and merge cleanly.
- A number of changes to how the SELinux policy is loaded and managed
inside the kernel with the goal of improving the atomicity of a
SELinux policy load operation.
These changes account for the bulk of the diffstat as well as the
patch count. A special thanks to everyone who contributed patches
and fixes for this work.
- Convert the SELinux policy read-write lock to RCU.
- A tracepoint was added for audited SELinux access control events;
this should help provide a more unified backtrace across kernel and
userspace.
- Allow the removal of security.selinux xattrs when a SELinux policy
is not loaded.
- Enable policy capabilities in SELinux policies created with the
scripts/selinux/mdp tool.
- Provide some "no sooner than" dates for the SELinux checkreqprot
sysfs deprecation"
* tag 'selinux-pr-20201012' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (22 commits)
selinux: provide a "no sooner than" date for the checkreqprot removal
selinux: Add helper functions to get and set checkreqprot
selinux: access policycaps with READ_ONCE/WRITE_ONCE
selinux: simplify away security_policydb_len()
selinux: move policy mutex to selinux_state, use in lockdep checks
selinux: fix error handling bugs in security_load_policy()
selinux: convert policy read-write lock to RCU
selinux: delete repeated words in comments
selinux: add basic filtering for audit trace events
selinux: add tracepoint on audited events
selinux: Create new booleans and class dirs out of tree
selinux: Standardize string literal usage for selinuxfs directory names
selinux: Refactor selinuxfs directory populating functions
selinux: Create function for selinuxfs directory cleanup
selinux: permit removing security.selinux xattr before policy load
selinux: fix memdup.cocci warnings
selinux: avoid dereferencing the policy prior to initialization
selinux: fix allocation failure check on newpolicy->sidtab
selinux: refactor changing booleans
selinux: move policy commit after updating selinuxfs
...
As with the kernel_load_data LSM hook, add a "contents" flag to the
kernel_read_file LSM hook that indicates whether the LSM can expect
a matching call to the kernel_post_read_file LSM hook with the full
contents of the file. With the coming addition of partial file read
support for kernel_read_file*() API, the LSM will no longer be able
to always see the entire contents of a file during the read calls.
For cases where the LSM must read examine the complete file contents,
it will need to do so on its own every time the kernel_read_file
hook is called with contents=false (or reject such cases). Adjust all
existing LSMs to retain existing behavior.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-12-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are a few places in the kernel where LSMs would like to have
visibility into the contents of a kernel buffer that has been loaded or
read. While security_kernel_post_read_file() (which includes the
buffer) exists as a pairing for security_kernel_read_file(), no such
hook exists to pair with security_kernel_load_data().
Earlier proposals for just using security_kernel_post_read_file() with a
NULL file argument were rejected (i.e. "file" should always be valid for
the security_..._file hooks, but it appears at least one case was
left in the kernel during earlier refactoring. (This will be fixed in
a subsequent patch.)
Since not all cases of security_kernel_load_data() can have a single
contiguous buffer made available to the LSM hook (e.g. kexec image
segments are separately loaded), there needs to be a way for the LSM to
reason about its expectations of the hook coverage. In order to handle
this, add a "contents" argument to the "kernel_load_data" hook that
indicates if the newly added "kernel_post_load_data" hook will be called
with the full contents once loaded. That way, LSMs requiring full contents
can choose to unilaterally reject "kernel_load_data" with contents=false
(which is effectively the existing hook coverage), but when contents=true
they can allow it and later evaluate the "kernel_post_load_data" hook
once the buffer is loaded.
With this change, LSMs can gain coverage over non-file-backed data loads
(e.g. init_module(2) and firmware userspace helper), which will happen
in subsequent patches.
Additionally prepare IMA to start processing these cases.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-9-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move kernel_read_file* out of linux/fs.h to its own linux/kernel_read_file.h
include file. That header gets pulled in just about everywhere
and doesn't really need functions not related to the general fs interface.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/r/20200706232309.12010-2-scott.branden@broadcom.com
Link: https://lore.kernel.org/r/20201002173828.2099543-4-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
checkreqprot data member in selinux_state struct is accessed directly by
SELinux functions to get and set. This could cause unexpected read or
write access to this data member due to compiler optimizations and/or
compiler's reordering of access to this field.
Add helper functions to get and set checkreqprot data member in
selinux_state struct. These helper functions use READ_ONCE and
WRITE_ONCE macros to ensure atomic read or write of memory for
this data member.
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Suggested-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Move the mutex used to synchronize policy changes (reloads and setting
of booleans) from selinux_fs_info to selinux_state and use it in
lockdep checks for rcu_dereference_protected() calls in the security
server functions. This makes the dependency on the mutex explicit
in the code rather than relying on comments.
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Convert the policy read-write lock to RCU. This is significantly
simplified by the earlier work to encapsulate the policy data
structures and refactor the policy load and boolean setting logic.
Move the latest_granting sequence number into the selinux_policy
structure so that it can be updated atomically with the policy.
Since removing the policy rwlock and moving latest_granting reduces
the selinux_ss structure to nothing more than a wrapper around the
selinux_policy pointer, get rid of the extra layer of indirection.
At present this change merely passes a hardcoded 1 to
rcu_dereference_check() in the cases where we know we do not need to
take rcu_read_lock(), with the preceding comment explaining why.
Alternatively we could pass fsi->mutex down from selinuxfs and
apply a lockdep check on it instead.
Based in part on earlier attempts to convert the policy rwlock
to RCU by Kaigai Kohei [1] and by Peter Enderborg [2].
[1] https://lore.kernel.org/selinux/6e2f9128-e191-ebb3-0e87-74bfccb0767f@tycho.nsa.gov/
[2] https://lore.kernel.org/selinux/20180530141104.28569-1-peter.enderborg@sony.com/
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Drop a repeated word in comments.
{open, is, then}
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: selinux@vger.kernel.org
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: linux-security-module@vger.kernel.org
[PM: fix subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Currently SELinux denies attempts to remove the security.selinux xattr
always, even when permissive or no policy is loaded. This was originally
motivated by the view that all files should be labeled, even if that label
is unlabeled_t, and we shouldn't permit files that were once labeled to
have their labels removed entirely. This however prevents removing
SELinux xattrs in the case where one "disables" SELinux by not loading
a policy (e.g. a system where runtime disable is removed and selinux=0
was not specified). Allow removing the xattr before SELinux is
initialized. We could conceivably permit it even after initialization
if permissive, or introduce a separate permission check here.
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This patch does for `getxattr` what commit 3e3e24b420 ("selinux: allow
labeling before policy is loaded") did for `setxattr`; it allows
querying the current SELinux label on disk before the policy is loaded.
One of the motivations described in that commit message also drives this
patch: for Fedora CoreOS (and eventually RHEL CoreOS), we want to be
able to move the root filesystem for example, from xfs to ext4 on RAID,
on first boot, at initrd time.[1]
Because such an operation works at the filesystem level, we need to be
able to read the SELinux labels first from the original root, and apply
them to the files of the new root. The previous commit enabled the
second part of this process; this commit enables the first part.
[1] https://github.com/coreos/fedora-coreos-tracker/issues/94
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl7U/i8ACgkQ+7dXa6fL
C2u2eg/+Oy6ybq0hPovYVkFI9WIG7ZCz7w9Q6BEnfYMqqn3dnfJxKQ3l4pnQEOWw
f4QfvpvevsYfMtOJkYcG6s66rQgbFdqc5TEyBBy0QNp3acRolN7IXkcopvv9xOpQ
JxedpbFG1PTFLWjvBpyjlrUPouwLzq2FXAf1Ox0ZIMw6165mYOMWoli1VL8dh0A0
Ai7JUB0WrvTNbrwhV413obIzXT/rPCdcrgbQcgrrLPex8lQ47ZAE9bq6k4q5HiwK
KRzEqkQgnzId6cCNTFBfkTWsx89zZunz7jkfM5yx30MvdAtPSxvvpfIPdZRZkXsP
E2K9Fk1/6OQZTC0Op3Pi/bt+hVG/mD1p0sQUDgo2MO3qlSS+5mMkR8h3mJEgwK12
72P4YfOJkuAy2z3v4lL0GYdUDAZY6i6G8TMxERKu/a9O3VjTWICDOyBUS6F8YEAK
C7HlbZxAEOKTVK0BTDTeEUBwSeDrBbvH6MnRlZCG5g1Fos2aWP0udhjiX8IfZLO7
GN6nWBvK1fYzfsUczdhgnoCzQs3suoDo04HnsTPGJ8De52T4x2RsjV+gPx0nrNAq
eWChl1JvMWsY2B3GLnl9XQz4NNN+EreKEkk+PULDGllrArrPsp5Vnhb9FJO1PVCU
hMDJHohPiXnKbc8f4Bd78OhIvnuoGfJPdM5MtNe2flUKy2a2ops=
=YTGf
-----END PGP SIGNATURE-----
Merge tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull notification queue from David Howells:
"This adds a general notification queue concept and adds an event
source for keys/keyrings, such as linking and unlinking keys and
changing their attributes.
Thanks to Debarshi Ray, we do have a pull request to use this to fix a
problem with gnome-online-accounts - as mentioned last time:
https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47
Without this, g-o-a has to constantly poll a keyring-based kerberos
cache to find out if kinit has changed anything.
[ There are other notification pending: mount/sb fsinfo notifications
for libmount that Karel Zak and Ian Kent have been working on, and
Christian Brauner would like to use them in lxc, but let's see how
this one works first ]
LSM hooks are included:
- A set of hooks are provided that allow an LSM to rule on whether or
not a watch may be set. Each of these hooks takes a different
"watched object" parameter, so they're not really shareable. The
LSM should use current's credentials. [Wanted by SELinux & Smack]
- A hook is provided to allow an LSM to rule on whether or not a
particular message may be posted to a particular queue. This is
given the credentials from the event generator (which may be the
system) and the watch setter. [Wanted by Smack]
I've provided SELinux and Smack with implementations of some of these
hooks.
WHY
===
Key/keyring notifications are desirable because if you have your
kerberos tickets in a file/directory, your Gnome desktop will monitor
that using something like fanotify and tell you if your credentials
cache changes.
However, we also have the ability to cache your kerberos tickets in
the session, user or persistent keyring so that it isn't left around
on disk across a reboot or logout. Keyrings, however, cannot currently
be monitored asynchronously, so the desktop has to poll for it - not
so good on a laptop. This facility will allow the desktop to avoid the
need to poll.
DESIGN DECISIONS
================
- The notification queue is built on top of a standard pipe. Messages
are effectively spliced in. The pipe is opened with a special flag:
pipe2(fds, O_NOTIFICATION_PIPE);
The special flag has the same value as O_EXCL (which doesn't seem
like it will ever be applicable in this context)[?]. It is given up
front to make it a lot easier to prohibit splice&co from accessing
the pipe.
[?] Should this be done some other way? I'd rather not use up a new
O_* flag if I can avoid it - should I add a pipe3() system call
instead?
The pipe is then configured::
ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);
Messages are then read out of the pipe using read().
- It should be possible to allow write() to insert data into the
notification pipes too, but this is currently disabled as the
kernel has to be able to insert messages into the pipe *without*
holding pipe->mutex and the code to make this work needs careful
auditing.
- sendfile(), splice() and vmsplice() are disabled on notification
pipes because of the pipe->mutex issue and also because they
sometimes want to revert what they just did - but one or more
notification messages might've been interleaved in the ring.
- The kernel inserts messages with the wait queue spinlock held. This
means that pipe_read() and pipe_write() have to take the spinlock
to update the queue pointers.
- Records in the buffer are binary, typed and have a length so that
they can be of varying size.
This allows multiple heterogeneous sources to share a common
buffer; there are 16 million types available, of which I've used
just a few, so there is scope for others to be used. Tags may be
specified when a watchpoint is created to help distinguish the
sources.
- Records are filterable as types have up to 256 subtypes that can be
individually filtered. Other filtration is also available.
- Notification pipes don't interfere with each other; each may be
bound to a different set of watches. Any particular notification
will be copied to all the queues that are currently watching for it
- and only those that are watching for it.
- When recording a notification, the kernel will not sleep, but will
rather mark a queue as having lost a message if there's
insufficient space. read() will fabricate a loss notification
message at an appropriate point later.
- The notification pipe is created and then watchpoints are attached
to it, using one of:
keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);
where in both cases, fd indicates the queue and the number after is
a tag between 0 and 255.
- Watches are removed if either the notification pipe is destroyed or
the watched object is destroyed. In the latter case, a message will
be generated indicating the enforced watch removal.
Things I want to avoid:
- Introducing features that make the core VFS dependent on the
network stack or networking namespaces (ie. usage of netlink).
- Dumping all this stuff into dmesg and having a daemon that sits
there parsing the output and distributing it as this then puts the
responsibility for security into userspace and makes handling
namespaces tricky. Further, dmesg might not exist or might be
inaccessible inside a container.
- Letting users see events they shouldn't be able to see.
TESTING AND MANPAGES
====================
- The keyutils tree has a pipe-watch branch that has keyctl commands
for making use of notifications. Proposed manual pages can also be
found on this branch, though a couple of them really need to go to
the main manpages repository instead.
If the kernel supports the watching of keys, then running "make
test" on that branch will cause the testing infrastructure to spawn
a monitoring process on the side that monitors a notifications pipe
for all the key/keyring changes induced by the tests and they'll
all be checked off to make sure they happened.
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch
- A test program is provided (samples/watch_queue/watch_test) that
can be used to monitor for keyrings, mount and superblock events.
Information on the notifications is simply logged to stdout"
* tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
smack: Implement the watch_key and post_notification hooks
selinux: Implement the watch_key security hook
keys: Make the KEY_NEED_* perms an enum rather than a mask
pipe: Add notification lossage handling
pipe: Allow buffers to be marked read-whole-or-error for notifications
Add sample notification program
watch_queue: Add a key/keyring notification facility
security: Add hooks to rule on setting a watch
pipe: Add general notification queue support
pipe: Add O_NOTIFICATION_PIPE
security: Add a hook for the point of notification insertion
uapi: General notification queue definitions
Pull execve updates from Eric Biederman:
"Last cycle for the Nth time I ran into bugs and quality of
implementation issues related to exec that could not be easily be
fixed because of the way exec is implemented. So I have been digging
into exec and cleanup up what I can.
I don't think I have exec sorted out enough to fix the issues I
started with but I have made some headway this cycle with 4 sets of
changes.
- promised cleanups after introducing exec_update_mutex
- trivial cleanups for exec
- control flow simplifications
- remove the recomputation of bprm->cred
The net result is code that is a bit easier to understand and work
with and a decrease in the number of lines of code (if you don't count
the added tests)"
* 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (24 commits)
exec: Compute file based creds only once
exec: Add a per bprm->file version of per_clear
binfmt_elf_fdpic: fix execfd build regression
selftests/exec: Add binfmt_script regression test
exec: Remove recursion from search_binary_handler
exec: Generic execfd support
exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC
exec: Move the call of prepare_binprm into search_binary_handler
exec: Allow load_misc_binary to call prepare_binprm unconditionally
exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
exec: Teach prepare_exec_creds how exec treats uids & gids
exec: Set the point of no return sooner
exec: Move handling of the point of no return to the top level
exec: Run sync_mm_rss before taking exec_update_mutex
exec: Fix spelling of search_binary_handler in a comment
exec: Move the comment from above de_thread to above unshare_sighand
exec: Rename flush_old_exec begin_new_exec
exec: Move most of setup_new_exec into flush_old_exec
exec: In setup_new_exec cache current in the local variable me
...
Today security_bprm_set_creds has several implementations:
apparmor_bprm_set_creds, cap_bprm_set_creds, selinux_bprm_set_creds,
smack_bprm_set_creds, and tomoyo_bprm_set_creds.
Except for cap_bprm_set_creds they all test bprm->called_set_creds and
return immediately if it is true. The function cap_bprm_set_creds
ignores bprm->calld_sed_creds entirely.
Create a new LSM hook security_bprm_creds_for_exec that is called just
before prepare_binprm in __do_execve_file, resulting in a LSM hook
that is called exactly once for the entire of exec. Modify the bits
of security_bprm_set_creds that only want to be called once per exec
into security_bprm_creds_for_exec, leaving only cap_bprm_set_creds
behind.
Remove bprm->called_set_creds all of it's former users have been moved
to security_bprm_creds_for_exec.
Add or upate comments a appropriate to bring them up to date and
to reflect this change.
Link: https://lkml.kernel.org/r/87v9kszrzh.fsf_-_@x220.int.ebiederm.org
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com> # For the LSM and Smack bits
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Implement the watch_key security hook to make sure that a key grants the
caller View permission in order to set a watch on a key.
For the moment, the watch_devices security hook is left unimplemented as
it's not obvious what the object should be since the queue is global and
didn't previously exist.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Since the meaning of combining the KEY_NEED_* constants is undefined, make
it so that you can't do that by turning them into an enum.
The enum is also given some extra values to represent special
circumstances, such as:
(1) The '0' value is reserved and causes a warning to trap the parameter
being unset.
(2) The key is to be unlinked and we require no permissions on it, only
the keyring, (this replaces the KEY_LOOKUP_FOR_UNLINK flag).
(3) An override due to CAP_SYS_ADMIN.
(4) An override due to an instantiation token being present.
(5) The permissions check is being deferred to later key_permission()
calls.
The extra values give the opportunity for LSMs to audit these situations.
[Note: This really needs overhauling so that lookup_user_key() tells
key_task_permission() and the LSM what operation is being done and leaves
it to those functions to decide how to map that onto the available
permits. However, I don't really want to make these change in the middle
of the notifications patchset.]
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
cc: Paul Moore <paul@paul-moore.com>
cc: Stephen Smalley <stephen.smalley.work@gmail.com>
cc: Casey Schaufler <casey@schaufler-ca.com>
cc: keyrings@vger.kernel.org
cc: selinux@vger.kernel.org
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl6rPswUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMJBQ//dAU7VS01kQUUsFjd8xUIOk9aSbNy
gjFkzcpbTsS4Mhqk0FSP3mfqDWP3lvxt9gx6WfnCf+a2KE3eTtf9bISW2OB2evIl
9ydae6frJLiP6yIeAEZBb1PBQ6AxwBT8j8drKi57sOBC8rkmF66wiMaG2nybYW/j
rvkOQCFtWj/A3b+Y7y8fVs8sjTHWvcsvkN7kwYmmdjyn7h/C1Tqc6TOOrt1jtLUG
dgeak9bCIvK7JB/W6squ1iKqvkJhld7h5fZn6WB/6Xd1DKD1LVjGT8HsKpI/ei49
0tAybqaLv8WxVc5ZGcAGoTt/X0hq3lXRiMG1Qgmed85wxjrLEpU12L6yprEtgtao
0yY1JNizuC3Ehbi02o4gHf+RffLPWDrT8Kmu00/IuridCesNZCrEFpbAZmOwPU67
nFufU0YlSnsVJ63C8TMhkI2eg/VejGjN4I2PEgcxEbZKBBW+nAcJfoKl1y2tzEo9
ZNdZcetY9yJdpewjsF6VgsXs4qUrm1NUiG8pCXdK23+w/qYvZ4UqoYfoRNYIuRS0
nRN40OkRYN6GzDZ+NCPYqgIhoEpps0p96VYQI6mp+PyOpwlCq8epKVjXD/TkteVG
mevM2Ffy8xVaL47ufXxAHn+pA6F6Mdmo/rIwe+U5Olase96DTcFr90JPmVz58mcP
6lWZhje3wS3mSpk=
=HTrZ
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20200430' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux fixes from Paul Moore:
"Two more SELinux patches to fix problems in the v5.7-rcX releases.
Wei Yongjun's patch fixes a return code in an error path, and my patch
fixes a problem where we were not correctly applying access controls
to all of the netlink messages in the netlink_send LSM hook"
* tag 'selinux-pr-20200430' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: properly handle multiple messages in selinux_netlink_send()
selinux: fix error return code in cond_read_list()
Fix the SELinux netlink_send hook to properly handle multiple netlink
messages in a single sk_buff; each message is parsed and subject to
SELinux access control. Prior to this patch, SELinux only inspected
the first message in the sk_buff.
Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl6Ch6wUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPdcg/9FDMS/n0Xl1HQBUyu26EwLu3aUpNE
BdghXW1LKSTp7MrOENE60PGzZSAiC07ci1DqFd7zfLPZf2q5IwPwOBa/Avy8z95V
oHKqcMT6WO1SPOm/PxZn16FCKyET4gZDTXvHBAyiyFsbk36R522ZY615P9T3eLu/
ZA1NFsSjj68SqMCUlAWfeqjcbQiX63bryEpugOIg0qWy7R/+rtWxj9TjriZ+v9tq
uC45UcjBqphpmoPG8BifA3jjyclwO3DeQb5u7E8//HPPraGeB19ntsymUg7ljoGk
NrqCkZtv6E+FRCDTR5f0O7M1T4BWJodxw2NwngnTwKByLC25EZaGx80o+VyMt0eT
Pog+++JZaa5zZr2OYOtdlPVMLc2ALL6p/8lHOqFU3GKfIf04hWOm6/Lb2IWoXs3f
CG2b6vzoXYyjbF0Q7kxadb8uBY2S1Ds+CVu2HMBBsXsPdwbbtFWOT/6aRAQu61qn
PW+f47NR8By3SO6nMzWts2SZEERZNIEdSKeXHuR7As1jFMXrHLItreb4GCSPay5h
2bzRpxt2br5CDLh7Jv2pZnHtUqBWOSbslTix77+Z/hPKaNowvD9v3tc5hX87rDmB
dYXROD6/KoyXFYDcMdphlvORFhqGqd5bEYuHHum/VjSIRh237+/nxFY/vZ4i4bzU
2fvpAmUlVX1c4rw=
=LlWA
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20200330' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux updates from Paul Moore:
"We've got twenty SELinux patches for the v5.7 merge window, the
highlights are below:
- Deprecate setting /sys/fs/selinux/checkreqprot to 1.
This flag was originally created to deal with legacy userspace and
the READ_IMPLIES_EXEC personality flag. We changed the default from
1 to 0 back in Linux v4.4 and now we are taking the next step of
deprecating it, at some point in the future we will take the final
step of rejecting 1.
- Allow kernfs symlinks to inherit the SELinux label of the parent
directory. In order to preserve backwards compatibility this is
protected by the genfs_seclabel_symlinks SELinux policy capability.
- Optimize how we store filename transitions in the kernel, resulting
in some significant improvements to policy load times.
- Do a better job calculating our internal hash table sizes which
resulted in additional policy load improvements and likely general
SELinux performance improvements as well.
- Remove the unused initial SIDs (labels) and improve how we handle
initial SIDs.
- Enable per-file labeling for the bpf filesystem.
- Ensure that we properly label NFS v4.2 filesystems to avoid a
temporary unlabeled condition.
- Add some missing XFS quota command types to the SELinux quota
access controls.
- Fix a problem where we were not updating the seq_file position
index correctly in selinuxfs.
- We consolidate some duplicated code into helper functions.
- A number of list to array conversions.
- Update Stephen Smalley's email address in MAINTAINERS"
* tag 'selinux-pr-20200330' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: clean up indentation issue with assignment statement
NFS: Ensure security label is set for root inode
MAINTAINERS: Update my email address
selinux: avtab_init() and cond_policydb_init() return void
selinux: clean up error path in policydb_init()
selinux: remove unused initial SIDs and improve handling
selinux: reduce the use of hard-coded hash sizes
selinux: Add xfs quota command types
selinux: optimize storage of filename transitions
selinux: factor out loop body from filename_trans_read()
security: selinux: allow per-file labeling for bpffs
selinux: generalize evaluate_cond_node()
selinux: convert cond_expr to array
selinux: convert cond_av_list to array
selinux: convert cond_list to array
selinux: sel_avc_get_stat_idx should increase position index
selinux: allow kernfs symlinks to inherit parent directory context
selinux: simplify evaluate_cond_node()
Documentation,selinux: deprecate setting checkreqprot to 1
selinux: move status variables out of selinux_ss
Add Q_XQUOTAOFF, Q_XQUOTAON and Q_XSETQLIM to trigger filesystem quotamod
permission check.
Add Q_XGETQUOTA, Q_XGETQSTAT, Q_XGETQSTATV and Q_XGETNEXTQUOTA to trigger
filesystem quotaget permission check.
Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add support for genfscon per-file labeling of bpffs files. This allows
for separate permissions for different pinned bpf objects, which may
be completely unrelated to each other.
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: Steven Moreland <smoreland@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl5ByJYUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMNLg/+Nj3gdWa5z0lsFuvUs87uM4HYHLtp
9/wENEaqy7W92LAAEW5y+6sK8Pm9/CO3xf2d40SAr/xjwBo2Fbnfmx7QwDMUCQS6
vyrLDiikSMyFU5j1AQDFuWjWF36FZlfqix+oXMETJ1tqxjIFxFN+rvbsDICPiQTl
VxBG4CWWzNtSllYDDj3KYcVaGzNb3nleb8n436gFoHS++12BaypVA0kBlV3BqzPi
mC0+gDuY06hwFmc/tAl8WndRwZpJ71rgsKJ5opgel61Zxf1J39QdTxeap9hhldJl
5FbtoiGpnMXtHzpRGh6BHag/2gGX/0+J+t8ZATuk4GRtt1ueyYw9kNnAMblooJSV
TwJlv1LLIKpAmSSJnKoN1AUAsxhS3dAmVyPuYWtmqEq8wq7YEYi5UnK1fH4ziI+b
5LmYD0D/FoNE1Dr1TV78HmM9i0+AteH5FhkS0V6GD3v+wO6vCb3hTYaQpMWhhgnY
/SEWbvLUO1fvQoKA8GT3UtmiUFdrqvrQFoL/l3weo9tvfyG3Walxo8s0w0r/DVUH
AevIhUVi+snSLhI95fF26wskrZtEE9gwl/BoMjfKse0HK2t77xBE3kZW88NnSXOU
Wk+ozfgibCnzk/Do/Y4iRr9GbgmCfscjOqlq1Yzyt/ZkUsG2egWHMzpritnzVEio
NSQ6n6kpbkRgkHc=
=0cTj
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20200210' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux fixes from Paul Moore:
"Two small fixes: one fixes a locking problem in the recently merged
label translation code, the other fixes an embarrassing 'binderfs' /
'binder' filesystem name check"
* tag 'selinux-pr-20200210' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix sidtab string cache locking
selinux: fix typo in filesystem name
Currently symlinks on kernel filesystems, like sysfs, are labeled on
creation with the parent filesystem root sid.
Allow symlinks to inherit the parent directory context, so fine-grained
kernfs labeling can be applied to symlinks too and checking contexts
doesn't complain about them.
For backward-compatibility this behavior is contained in a new policy
capability: genfs_seclabel_symlinks
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Deprecate setting the SELinux checkreqprot tunable to 1 via kernel
parameter or /sys/fs/selinux/checkreqprot. Setting it to 0 is left
intact for compatibility since Android and some Linux distributions
do so for security and treat an inability to set it as a fatal error.
Eventually setting it to 0 will become a no-op and the kernel will
stop using checkreqprot's value internally altogether.
checkreqprot was originally introduced as a compatibility mechanism
for legacy userspace and the READ_IMPLIES_EXEC personality flag.
However, if set to 1, it weakens security by allowing mappings to be
made executable without authorization by policy. The default value
for the SECURITY_SELINUX_CHECKREQPROT_VALUE config option was changed
from 1 to 0 in commit 2a35d196c1 ("selinux: change
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default") and both Android
and Linux distributions began explicitly setting
/sys/fs/selinux/checkreqprot to 0 some time ago.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
It fits more naturally in selinux_state, since it reflects also global
state (the enforcing and policyload fields).
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Pull vfs file system parameter updates from Al Viro:
"Saner fs_parser.c guts and data structures. The system-wide registry
of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
the horror switch() in fs_parse() that would have to grow another case
every time something got added to that system-wide registry.
New syntax types can be added by filesystems easily now, and their
namespace is that of functions - not of system-wide enum members. IOW,
they can be shared or kept private and if some turn out to be widely
useful, we can make them common library helpers, etc., without having
to do anything whatsoever to fs_parse() itself.
And we already get that kind of requests - the thing that finally
pushed me into doing that was "oh, and let's add one for timeouts -
things like 15s or 2h". If some filesystem really wants that, let them
do it. Without somebody having to play gatekeeper for the variants
blessed by direct support in fs_parse(), TYVM.
Quite a bit of boilerplate is gone. And IMO the data structures make a
lot more sense now. -200LoC, while we are at it"
* 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
tmpfs: switch to use of invalfc()
cgroup1: switch to use of errorfc() et.al.
procfs: switch to use of invalfc()
hugetlbfs: switch to use of invalfc()
cramfs: switch to use of errofc() et.al.
gfs2: switch to use of errorfc() et.al.
fuse: switch to use errorfc() et.al.
ceph: use errorfc() and friends instead of spelling the prefix out
prefix-handling analogues of errorf() and friends
turn fs_param_is_... into functions
fs_parse: handle optional arguments sanely
fs_parse: fold fs_parameter_desc/fs_parameter_spec
fs_parser: remove fs_parameter_description name field
add prefix to fs_context->log
ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
new primitive: __fs_parse()
switch rbd and libceph to p_log-based primitives
struct p_log, variants of warnf() et.al. taking that one instead
teach logfc() to handle prefices, give it saner calling conventions
get rid of cg_invalf()
...
Unused now.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Correct the filesystem name to "binder" to enable genfscon per-file
labelling for binderfs.
Fixes: 7a4b519474 ("selinux: allow per-file labelling for binderfs")
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: slight style changes to the subj/description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
commit 2db154b3ea ("vfs: syscall: Add move_mount(2) to move mounts around")
introduced a new move_mount(2) system call and a corresponding new LSM
security_move_mount hook but did not implement this hook for any existing
LSM. This creates a regression for SELinux with respect to consistent
checking of mounts; the existing selinux_mount hook checks mounton
permission to the mount point path. Provide a SELinux hook
implementation for move_mount that applies this same check for
consistency. In the future we may wish to add a new move_mount
filesystem permission and check as well, but this addresses
the immediate regression.
Fixes: 2db154b3ea ("vfs: syscall: Add move_mount(2) to move mounts around")
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This patch removes the inode, file, and superblock security blob
allocation functions and moves the associated code into the
respective LSM hooks. This patch also removes the inode_doinit()
function as it was a trivial wrapper around
inode_doinit_with_dentry() and called from one location in the code.
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
selinux_nlmsg_perm is used for only by selinux_netlink_send. Remove
the redundant function to simplify the code.
Fix a typo by suggestion from Stephen.
Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit b1d9e6b064 ("LSM: Switch to lists of hooks") switched the LSM
infrastructure to use per-hook lists, which meant that removing the
hooks for a given module was no longer atomic. Even though the commit
clearly documents that modules implementing runtime revmoval of hooks
(only SELinux attempts this madness) need to take special precautions to
avoid race conditions, SELinux has never addressed this.
By inserting an artificial delay between the loop iterations of
security_delete_hooks() (I used 100 ms), booting to a state where
SELinux is enabled, but policy is not yet loaded, and running these
commands:
while true; do ping -c 1 <some IP>; done &
echo -n 1 >/sys/fs/selinux/disable
kill %1
wait
...I was able to trigger NULL pointer dereferences in various places. I
also have a report of someone getting panics on a stock RHEL-8 kernel
after setting SELINUX=disabled in /etc/selinux/config and rebooting
(without adding "selinux=0" to kernel command-line).
Reordering the SELinux hooks such that those that allocate structures
are removed last seems to prevent these panics. It is very much possible
that this doesn't make the runtime disable completely race-free, but at
least it makes the operation much less fragile.
Cc: stable@vger.kernel.org
Fixes: b1d9e6b064 ("LSM: Switch to lists of hooks")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The disabled/enforcing/initialized flags are all accessed concurrently
by threads so use the appropriate accessors that ensure atomicity and
document that it is expected.
Use smp_load/acquire...() helpers (with memory barriers) for the
initialized flag, since it gates access to the rest of the state
structures.
Note that the disabled flag is currently not used for anything other
than avoiding double disable, but it will be used for bailing out of
hooks once security_delete_hooks() is removed.
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>