IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
[ Upstream commit 8d269a8e2a8f0bca89022f4ec98de460acb90365 ]
If seq_file .next function does not change position index,
read after some lseek can generate unexpected output.
$ dd if=/sys/fs/selinux/avc/cache_stats # usual output
lookups hits misses allocations reclaims frees
817223 810034 7189 7189 6992 7037
1934894 1926896 7998 7998 7632 7683
1322812 1317176 5636 5636 5456 5507
1560571 1551548 9023 9023 9056 9115
0+1 records in
0+1 records out
189 bytes copied, 5,1564e-05 s, 3,7 MB/s
$# read after lseek to midle of last line
$ dd if=/sys/fs/selinux/avc/cache_stats bs=180 skip=1
dd: /sys/fs/selinux/avc/cache_stats: cannot skip to specified offset
056 9115 <<<< end of last line
1560571 1551548 9023 9023 9056 9115 <<< whole last line once again
0+1 records in
0+1 records out
45 bytes copied, 8,7221e-05 s, 516 kB/s
$# read after lseek beyond end of of file
$ dd if=/sys/fs/selinux/avc/cache_stats bs=1000 skip=1
dd: /sys/fs/selinux/avc/cache_stats: cannot skip to specified offset
1560571 1551548 9023 9023 9056 9115 <<<< generates whole last line
0+1 records in
0+1 records out
36 bytes copied, 9,0934e-05 s, 396 kB/s
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 65de50969a77509452ae590e9449b70a22b923bb upstream.
Clang's static analysis tool reports these double free memory errors.
security/selinux/ss/services.c:2987:4: warning: Attempt to free released memory [unix.Malloc]
kfree(bnames[i]);
^~~~~~~~~~~~~~~~
security/selinux/ss/services.c:2990:2: warning: Attempt to free released memory [unix.Malloc]
kfree(bvalues);
^~~~~~~~~~~~~~
So improve the security_get_bools error handling by freeing these variables
and setting their return pointers to NULL and the return len to 0
Cc: stable@vger.kernel.org
Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb73974172ffaaf57a7c42f35424d9aece1a5af6 upstream.
Fix the SELinux netlink_send hook to properly handle multiple netlink
messages in a single sk_buff; each message is parsed and subject to
SELinux access control. Prior to this patch, SELinux only inspected
the first message in the sk_buff.
Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 030b995ad9ece9fa2d218af4429c1c78c2342096 ]
In AVC update we don't call avc_node_kill() when avc_xperms_populate()
fails, resulting in the avc->avc_cache.active_nodes counter having a
false value. In last patch this changes was missed , so correcting it.
Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
Signed-off-by: Jaihind Yadav <jaihindyadav@codeaurora.org>
Signed-off-by: Ravi Kumar Siddojigari <rsiddoji@codeaurora.org>
[PM: merge fuzz, minor description cleanup]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 45385237f65aeee73641f1ef737d7273905a233f upstream.
Since roles_init() adds some entries to the role hash table, we need to
destroy also its keys/values on error, otherwise we get a memory leak in
the error path.
Cc: <stable@vger.kernel.org>
Reported-by: syzbot+fee3a14d4cdf92646287@syzkaller.appspotmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a83d6ddaebe541570291205cb538e35ad4ff94f9 upstream.
In the SECURITY_FS_USE_MNTPOINT case we never want to allow relabeling
files/directories, so we should never set the SBLABEL_MNT flag. The
'special handling' in selinux_is_sblabel_mnt() is only intended for when
the behavior is set to SECURITY_FS_USE_GENFS.
While there, make the logic in selinux_is_sblabel_mnt() more explicit
and add a BUILD_BUG_ON() to make sure that introducing a new
SECURITY_FS_USE_* forces a review of the logic.
Fixes: d5f3a5f6e7e7 ("selinux: add security in-core xattr support for pstore and debugfs")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 53e0c2aa9a59a48e3798ef193d573ade85aa80f5 ]
Ignore all selinux_inode_notifysecctx() calls on mounts with SBLABEL_MNT
flag unset. This is achived by returning -EOPNOTSUPP for this case in
selinux_inode_setsecurtity() (because that function should not be called
in such case anyway) and translating this error to 0 in
selinux_inode_notifysecctx().
This fixes behavior of kernfs-based filesystems when mounted with the
'context=' option. Before this patch, if a node's context had been
explicitly set to a non-default value and later the filesystem has been
remounted with the 'context=' option, then this node would show up as
having the manually-set context and not the mount-specified one.
Steps to reproduce:
# mount -t cgroup2 cgroup2 /sys/fs/cgroup/unified
# chcon unconfined_u:object_r:user_home_t:s0 /sys/fs/cgroup/unified/cgroup.stat
# ls -lZ /sys/fs/cgroup/unified
total 0
-r--r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Dec 13 10:41 cgroup.controllers
-rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Dec 13 10:41 cgroup.max.depth
-rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Dec 13 10:41 cgroup.max.descendants
-rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Dec 13 10:41 cgroup.procs
-r--r--r--. 1 root root unconfined_u:object_r:user_home_t:s0 0 Dec 13 10:41 cgroup.stat
-rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Dec 13 10:41 cgroup.subtree_control
-rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Dec 13 10:41 cgroup.threads
# umount /sys/fs/cgroup/unified
# mount -o context=system_u:object_r:tmpfs_t:s0 -t cgroup2 cgroup2 /sys/fs/cgroup/unified
Result before:
# ls -lZ /sys/fs/cgroup/unified
total 0
-r--r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.controllers
-rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.max.depth
-rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.max.descendants
-rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.procs
-r--r--r--. 1 root root unconfined_u:object_r:user_home_t:s0 0 Dec 13 10:41 cgroup.stat
-rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.subtree_control
-rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.threads
Result after:
# ls -lZ /sys/fs/cgroup/unified
total 0
-r--r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.controllers
-rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.max.depth
-rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.max.descendants
-rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.procs
-r--r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.stat
-rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.subtree_control
-rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.threads
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2cbdcb882f97a45f7475c67ac6257bbc16277dfe ]
If a superblock has the MS_SUBMOUNT flag set, we should always allow
mounting it. These mounts are done automatically by the kernel either as
part of mounting some parent mount (e.g. debugfs always mounts tracefs
under "tracing" for compatibility) or they are mounted automatically as
needed on subdirectory accesses (e.g. NFS crossmnt mounts). Since such
automounts are either an implicit consequence of the parent mount (which
is already checked) or they can happen during regular accesses (where it
doesn't make sense to check against the current task's context), the
mount permission check should be skipped for them.
Without this patch, attempts to access contents of an automounted
directory can cause unexpected SELinux denials.
In the current kernel tree, the MS_SUBMOUNT flag is set only via
vfs_submount(), which is called only from the following places:
- AFS, when automounting special "symlinks" referencing other cells
- CIFS, when automounting "referrals"
- NFS, when automounting subtrees
- debugfs, when automounting tracefs
In all cases the submounts are meant to be transparent to the user and
it makes sense that if mounting the master is allowed, then so should be
the automounts. Note that CAP_SYS_ADMIN capability checking is already
skipped for (SB_KERNMOUNT|SB_SUBMOUNT) in:
- sget_userns() in fs/super.c:
if (!(flags & (SB_KERNMOUNT|SB_SUBMOUNT)) &&
!(type->fs_flags & FS_USERNS_MOUNT) &&
!capable(CAP_SYS_ADMIN))
return ERR_PTR(-EPERM);
- sget() in fs/super.c:
/* Ensure the requestor has permissions over the target filesystem */
if (!(flags & (SB_KERNMOUNT|SB_SUBMOUNT)) && !ns_capable(user_ns, CAP_SYS_ADMIN))
return ERR_PTR(-EPERM);
Verified internally on patched RHEL 7.6 with a reproducer using
NFS+httpd and selinux-tesuite.
Fixes: 93faccbbfa95 ("fs: Better permission checking for submounts")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 5b0e7310a2a33c06edc7eb81ffc521af9b2c5610 upstream.
levdatum->level can be NULL if we encounter an error while loading
the policy during sens_read prior to initializing it. Make sure
sens_destroy handles that case correctly.
Reported-by: syzbot+6664500f0f18f07a5c0e@syzkaller.appspotmail.com
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4458bba09788e70e8fb39ad003f087cd9dfbd6ac upstream.
syzbot is hitting warning at str_read() [1] because len parameter can
become larger than KMALLOC_MAX_SIZE. We don't need to emit warning for
this case.
[1] https://syzkaller.appspot.com/bug?id=7f2f5aad79ea8663c296a2eedb81978401a908f0
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+ac488b9811036cea7ea0@syzkaller.appspotmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 476accbe2f6ef69caeebe99f52a286e12ac35aee upstream.
There is a strange __GFP_NOMEMALLOC usage pattern in SELinux,
specifically GFP_ATOMIC | __GFP_NOMEMALLOC which doesn't make much
sense. GFP_ATOMIC on its own allows to access memory reserves while
__GFP_NOMEMALLOC dictates we cannot use memory reserves. Replace this
with the much more sane GFP_NOWAIT in the AVC code as we can tolerate
memory allocation failures in that code.
Signed-off-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit efe3de79e0b52ca281ef6691480c8c68c82a4657 upstream.
Call trace:
[<ffffff9203a8d7a8>] dump_backtrace+0x0/0x428
[<ffffff9203a8dbf8>] show_stack+0x28/0x38
[<ffffff920409bfb8>] dump_stack+0xd4/0x124
[<ffffff9203d187e8>] print_address_description+0x68/0x258
[<ffffff9203d18c00>] kasan_report.part.2+0x228/0x2f0
[<ffffff9203d1927c>] kasan_report+0x5c/0x70
[<ffffff9203d1776c>] check_memory_region+0x12c/0x1c0
[<ffffff9203d17cdc>] memcpy+0x34/0x68
[<ffffff9203d75348>] xattr_getsecurity+0xe0/0x160
[<ffffff9203d75490>] vfs_getxattr+0xc8/0x120
[<ffffff9203d75d68>] getxattr+0x100/0x2c8
[<ffffff9203d76fb4>] SyS_fgetxattr+0x64/0xa0
[<ffffff9203a83f70>] el0_svc_naked+0x24/0x28
If user get root access and calls security.selinux setxattr() with an
embedded NUL on a file and then if some process performs a getxattr()
on that file with a length greater than the actual length of the string,
it would result in a panic.
To fix this, add the actual length of the string to the security context
instead of the length passed by the userspace process.
Signed-off-by: Sachin Grover <sgrover@codeaurora.org>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ccb544781d34afdb73a9a73ae53035d824d193bf ]
open permission is currently only defined for files in the kernel
(COMMON_FILE_PERMS rather than COMMON_FILE_SOCK_PERMS). Construction of
an artificial test case that tries to open a socket via /proc/pid/fd will
generate a recvfrom avc denial because recvfrom and open happen to map to
the same permission bit in socket vs file classes.
open of a socket via /proc/pid/fd is not supported by the kernel regardless
and will ultimately return ENXIO. But we hit the permission check first and
can thus produce these odd/misleading denials. Omit the open check when
operating on a socket.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 270e8573145a26de924e2dc644596332d400445b upstream.
The check is already performed in ocontext_read() when the policy is
loaded. Removing the array also fixes the following warning when
building with clang:
security/selinux/hooks.c:338:20: error: variable 'labeling_behaviors'
is not needed and will not be emitted
[-Werror,-Wunneeded-internal-declaration]
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 342e91578eb6909529bc7095964cd44b9c057c4e upstream.
'perms' will never be NULL since it isn't a plain pointer but an array
of u32 values.
This fixes the following warning when building with clang:
security/selinux/ss/services.c:158:16: error: address of array
'p_in->perms' will always evaluate to 'true'
[-Werror,-Wpointer-bool-conversion]
while (p_in->perms && p_in->perms[k]) {
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e2f586bd83177d22072b275edd4b8b872daba924 ]
KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
uninitialized memory in selinux_socket_bind():
==================================================================
BUG: KMSAN: use of unitialized memory
inter: 0
CPU: 3 PID: 1074 Comm: packet2 Tainted: G B 4.8.0-rc6+ #1916
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
0000000000000000 ffff8800882ffb08 ffffffff825759c8 ffff8800882ffa48
ffffffff818bf551 ffffffff85bab870 0000000000000092 ffffffff85bab550
0000000000000000 0000000000000092 00000000bb0009bb 0000000000000002
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff825759c8>] dump_stack+0x238/0x290 lib/dump_stack.c:51
[<ffffffff818bdee6>] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1008
[<ffffffff818bf0fb>] __msan_warning+0x5b/0xb0 mm/kmsan/kmsan_instr.c:424
[<ffffffff822dae71>] selinux_socket_bind+0xf41/0x1080 security/selinux/hooks.c:4288
[<ffffffff8229357c>] security_socket_bind+0x1ec/0x240 security/security.c:1240
[<ffffffff84265d98>] SYSC_bind+0x358/0x5f0 net/socket.c:1366
[<ffffffff84265a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
[<ffffffff81005678>] do_syscall_64+0x58/0x70 arch/x86/entry/common.c:292
[<ffffffff8518217c>] entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.o:?
chained origin: 00000000ba6009bb
[<ffffffff810bb7a7>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67
[< inline >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322
[< inline >] kmsan_save_stack mm/kmsan/kmsan.c:337
[<ffffffff818bd2b8>] kmsan_internal_chain_origin+0x118/0x1e0 mm/kmsan/kmsan.c:530
[<ffffffff818bf033>] __msan_set_alloca_origin4+0xc3/0x130 mm/kmsan/kmsan_instr.c:380
[<ffffffff84265b69>] SYSC_bind+0x129/0x5f0 net/socket.c:1356
[<ffffffff84265a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
[<ffffffff81005678>] do_syscall_64+0x58/0x70 arch/x86/entry/common.c:292
[<ffffffff8518217c>] return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.o:?
origin description: ----address@SYSC_bind (origin=00000000b8c00900)
==================================================================
(the line numbers are relative to 4.8-rc6, but the bug persists upstream)
, when I run the following program as root:
=======================================================
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
int main(int argc, char *argv[]) {
struct sockaddr addr;
int size = 0;
if (argc > 1) {
size = atoi(argv[1]);
}
memset(&addr, 0, sizeof(addr));
int fd = socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP);
bind(fd, &addr, size);
return 0;
}
=======================================================
(for different values of |size| other error reports are printed).
This happens because bind() unconditionally copies |size| bytes of
|addr| to the kernel, leaving the rest uninitialized. Then
security_socket_bind() reads the IP address bytes, including the
uninitialized ones, to determine the port, or e.g. pass them further to
sel_netnode_find(), which uses them to calculate a hash.
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
[PM: fixed some whitespace damage]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b14752ec4e0d87126e636384cf37c8dd9df157c upstream.
We can't do anything reasonable in security_bounded_transition() if we
don't have a policy loaded, and in fact we could run into problems
with some of the code inside expecting a policy. Fix these problems
like we do many others in security/selinux/ss/services.c by checking
to see if the policy is loaded (ss_initialized) and returning quickly
if it isn't.
Reported-by: syzbot <syzkaller-bugs@googlegroups.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef28df55ac27e1e5cd122e19fa311d886d47a756 upstream.
The syzbot/syzkaller automated tests found a problem in
security_context_to_sid_core() during early boot (before we load the
SELinux policy) where we could potentially feed context strings without
NUL terminators into the strcmp() function.
We already guard against this during normal operation (after the SELinux
policy has been loaded) by making a copy of the context strings and
explicitly adding a NUL terminator to the end. The patch extends this
protection to the early boot case (no loaded policy) by moving the context
copy earlier in security_context_to_sid_core().
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Reviewed-By: William Roberts <william.c.roberts@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0c461cb727d146c9ef2d3e86214f498b78b7d125 upstream.
SELinux tries to support setting/clearing of /proc/pid/attr attributes
from the shell by ignoring terminating newlines and treating an
attribute value that begins with a NUL or newline as an attempt to
clear the attribute. However, the test for clearing attributes has
always been wrong; it has an off-by-one error, and this could further
lead to reading past the end of the allocated buffer since commit
bb646cdb12e75d82258c2f2e7746d5952d3e321a ("proc_pid_attr_write():
switch to memdup_user()"). Fix the off-by-one error.
Even with this fix, setting and clearing /proc/pid/attr attributes
from the shell is not straightforward since the interface does not
support multiple write() calls (so shells that write the value and
newline separately will set and then immediately clear the attribute,
requiring use of echo -n to set the attribute), whereas trying to use
echo -n "" to clear the attribute causes the shell to skip the
write() call altogether since POSIX says that a zero-length write
causes no side effects. Thus, one must use echo -n to set and echo
without -n to clear, as in the following example:
$ echo -n unconfined_u:object_r:user_home_t:s0 > /proc/$$/attr/fscreate
$ cat /proc/$$/attr/fscreate
unconfined_u:object_r:user_home_t:s0
$ echo "" > /proc/$$/attr/fscreate
$ cat /proc/$$/attr/fscreate
Note the use of /proc/$$ rather than /proc/self, as otherwise
the cat command will read its own attribute value, not that of the shell.
There are no users of this facility to my knowledge; possibly we
should just get rid of it.
UPDATE: Upon further investigation it appears that a local process
with the process:setfscreate permission can cause a kernel panic as a
result of this bug. This patch fixes CVE-2017-2618.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: added the update about CVE-2017-2618 to the commit description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Asking for a non-current task's stack can't be done without races
unless the task is frozen in kernel mode. As far as I know,
vm_is_stack_for_task() never had a safe non-current use case.
The __unused annotation is because some KSTK_ESP implementations
ignore their parameter, which IMO is further justification for this
patch.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux API <linux-api@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Link: http://lkml.kernel.org/r/4c3f68f426e6c061ca98b4fc7ef85ffbb0a25b0c.1475257877.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning
Pull vfs xattr updates from Al Viro:
"xattr stuff from Andreas
This completes the switch to xattr_handler ->get()/->set() from
->getxattr/->setxattr/->removexattr"
* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Remove {get,set,remove}xattr inode operations
xattr: Stop calling {get,set,remove}xattr inode operations
vfs: Check for the IOP_XATTR flag in listxattr
xattr: Add __vfs_{get,set,remove}xattr helpers
libfs: Use IOP_XATTR flag for empty directory handling
vfs: Use IOP_XATTR flag for bad-inode handling
vfs: Add IOP_XATTR inode operations flag
vfs: Move xattr_resolve_name to the front of fs/xattr.c
ecryptfs: Switch to generic xattr handlers
sockfs: Get rid of getxattr iop
sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
kernfs: Switch to generic xattr handlers
hfs: Switch to generic xattr handlers
jffs2: Remove jffs2_{get,set,remove}xattr macros
xattr: Remove unnecessary NULL attribute name check
Merge my system logging cleanups, triggered by the broken '\n' patches.
The line continuation handling has been broken basically forever, and
the code to handle the system log records was both confusing and
dubious. And it would do entirely the wrong thing unless you always had
a terminating newline, partly because it couldn't actually see whether a
message was marked KERN_CONT or not (but partly because the LOG_CONT
handling in the recording code was rather confusing too).
This re-introduces a real semantically meaningful KERN_CONT, and fixes
the few places I noticed where it was missing. There are probably more
missing cases, since KERN_CONT hasn't actually had any semantic meaning
for at least four years (other than the checkpatch meaning of "no log
level necessary, this is a continuation line").
This also allows the combination of KERN_CONT and a log level. In that
case the log level will be ignored if the merging with a previous line
is successful, but if a new record is needed, that new record will now
get the right log level.
That also means that you can at least in theory combine KERN_CONT with
the "pr_info()" style helpers, although any use of pr_fmt() prefixing
would make that just result in a mess, of course (the prefix would end
up in the middle of a continuing line).
* printk-cleanups:
printk: make reading the kernel log flush pending lines
printk: re-organize log_output() to be more legible
printk: split out core logging code into helper function
printk: reinstate KERN_CONT for printing continuation lines
Long long ago the kernel log buffer was a buffered stream of bytes, very
much like stdio in user space. It supported log levels by scanning the
stream and noticing the log level markers at the beginning of each line,
but if you wanted to print a partial line in multiple chunks, you just
did multiple printk() calls, and it just automatically worked.
Except when it didn't, and you had very confusing output when different
lines got all mixed up with each other. Then you got fragment lines
mixing with each other, or with non-fragment lines, because it was
traditionally impossible to tell whether a printk() call was a
continuation or not.
To at least help clarify the issue of continuation lines, we added a
KERN_CONT marker back in 2007 to mark continuation lines:
474925277671 ("printk: add KERN_CONT annotation").
That continuation marker was initially an empty string, and didn't
actuall make any semantic difference. But it at least made it possible
to annotate the source code, and have check-patch notice that a printk()
didn't need or want a log level marker, because it was a continuation of
a previous line.
To avoid the ambiguity between a continuation line that had that
KERN_CONT marker, and a printk with no level information at all, we then
in 2009 made KERN_CONT be a real log level marker which meant that we
could now reliably tell the difference between the two cases.
5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines")
and we could take advantage of that to make sure we didn't mix up
continuation lines with lines that just didn't have any loglevel at all.
Then, in 2012, the kernel log buffer was changed to be a "record" based
log, where each line was a record that has a loglevel and a timestamp.
You can see the beginning of that conversion in commits
e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface")
7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer")
with a number of follow-up commits to fix some painful fallout from that
conversion. Over all, it took a couple of months to sort out most of
it. But the upside was that you could have concurrent readers (and
writers) of the kernel log and not have lines with mixed output in them.
And one particular pain-point for the record-based kernel logging was
exactly the fragmentary lines that are generated in smaller chunks. In
order to still log them as one recrod, the continuation lines need to be
attached to the previous record properly.
However the explicit continuation record marker that is actually useful
for this exact case was actually removed in aroundm the same time by commit
61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT")
due to the incorrect belief that KERN_CONT wasn't meaningful. The
ambiguity between "is this a continuation line" or "is this a plain
printk with no log level information" was reintroduced, and in fact
became an even bigger pain point because there was now the whole
record-level merging of kernel messages going on.
This patch reinstates the KERN_CONT as a real non-empty string marker,
so that the ambiguity is fixed once again.
But it's not a plain revert of that original removal: in the four years
since we made KERN_CONT an empty string again, not only has the format
of the log level markers changed, we've also had some usage changes in
this area.
For example, some ACPI code seems to use KERN_CONT _together_ with a log
level, and now uses both the KERN_CONT marker and (for example) a
KERN_INFO marker to show that it's an informational continuation of a
line.
Which is actually not a bad idea - if the continuation line cannot be
attached to its predecessor, without the log level information we don't
know what log level to assign to it (and we traditionally just assigned
it the default loglevel). So having both a log level and the KERN_CONT
marker is not necessarily a bad idea, but it does mean that we need to
actually iterate over potentially multiple markers, rather than just a
single one.
Also, since KERN_CONT was still conceptually needed, and encouraged, but
didn't actually _do_ anything, we've also had the reverse problem:
rather than having too many annotations it has too few, and there is bit
rot with code that no longer marks the continuation lines with the
KERN_CONT marker.
So this patch not only re-instates the non-empty KERN_CONT marker, it
also fixes up the cases of bit-rot I noticed in my own logs.
There are probably other cases where KERN_CONT will be needed to be
added, either because it is new code that never dealt with the need for
KERN_CONT, or old code that has bitrotted without anybody noticing.
That said, we should strive to avoid the need for KERN_CONT. It does
result in real problems for logging, and should generally not be seen as
a good feature. If we some day can get rid of the feature entirely,
because nobody does any fragmented printk calls, that would be lovely.
But until that point, let's at mark the code that relies on the hacky
multi-fragment kernel printk's. Not only does it avoid the ambiguity,
it also annotates code as "maybe this would be good to fix some day".
(That said, particularly during single-threaded bootup, the downsides of
KERN_CONT are very limited. Things get much hairier when you have
multiple threads going on and user level reading and writing logs too).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Right now, various places in the kernel check for the existence of
getxattr, setxattr, and removexattr inode operations and directly call
those operations. Switch to helper functions and test for the IOP_XATTR
flag instead.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.
CURRENT_TIME is also not y2038 safe.
This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.
Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Right now LSM_AUDIT_DATA_PATH type contains "struct path" in union "u"
of common_audit_data. This information is used to print path of file
at the same time it is also used to get to dentry and inode. And this
inode information is used to get to superblock and device and print
device information.
This does not work well for layered filesystems like overlay where dentry
contained in path is overlay dentry and not the real dentry of underlying
file system. That means inode retrieved from dentry is also overlay
inode and not the real inode.
SELinux helpers like file_path_has_perm() are doing checks on inode
retrieved from file_inode(). This returns the real inode and not the
overlay inode. That means we are doing check on real inode but for audit
purposes we are printing details of overlay inode and that can be
confusing while debugging.
Hence, introduce a new type LSM_AUDIT_DATA_FILE which carries file
information and inode retrieved is real inode using file_inode(). That
way right avc denied information is given to user.
For example, following is one example avc before the patch.
type=AVC msg=audit(1473360868.399:214): avc: denied { read open } for
pid=1765 comm="cat"
path="/root/.../overlay/container1/merged/readfile"
dev="overlay" ino=21443
scontext=unconfined_u:unconfined_r:test_overlay_client_t:s0:c10,c20
tcontext=unconfined_u:object_r:test_overlay_files_ro_t:s0
tclass=file permissive=0
It looks as follows after the patch.
type=AVC msg=audit(1473360017.388:282): avc: denied { read open } for
pid=2530 comm="cat"
path="/root/.../overlay/container1/merged/readfile"
dev="dm-0" ino=2377915
scontext=unconfined_u:unconfined_r:test_overlay_client_t:s0:c10,c20
tcontext=unconfined_u:object_r:test_overlay_files_ro_t:s0
tclass=file permissive=0
Notice that now dev information points to "dm-0" device instead of
"overlay" device. This makes it clear that check failed on underlying
inode and not on the overlay inode.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
[PM: slight tweaks to the description to make checkpatch.pl happy]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Fix to return error code -EINVAL from the error handling case instead
of 0 (rc is overwrite to 0 when policyvers >=
POLICYDB_VERSION_ROLETRANS), as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
[PM: normalize "selinux" in patch subject, description line wrap]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Throughout the SELinux LSM, values taken from sepolicy are
used in places where length == 0 or length == <saturated>
matter, find and fix these.
Signed-off-by: William Roberts <william.c.roberts@intel.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
libsepol pointed out an issue where its possible to have
an unitialized jmp and invalid dereference, fix this.
While we're here, zero allocate all the *_val_to_struct
structures.
Signed-off-by: William Roberts <william.c.roberts@intel.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
When count is 0 and the highbit is not zero, the ebitmap is not
valid and the internal node is not allocated. This causes issues
when routines, like mls_context_isvalid() attempt to use the
ebitmap_for_each_bit() and ebitmap_node_get_bit() as they assume
a highbit > 0 will have a node allocated.
Signed-off-by: William Roberts <william.c.roberts@intel.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Remove the SECURITY_SELINUX_POLICYDB_VERSION_MAX Kconfig option
Per: https://github.com/SELinuxProject/selinux/wiki/Kernel-Todo
This was only needed on Fedora 3 and 4 and just causes issues now,
so drop it.
The MAX and MIN should just be whatever the kernel can support.
Signed-off-by: William Roberts <william.c.roberts@intel.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Calculate what would be the label of newly created file and set that
secid in the passed creds.
Context of the task which is actually creating file is retrieved from
set of creds passed in. (old->security).
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Right now selinux_determine_inode_label() works on security pointer of
current task. Soon I need this to work on a security pointer retrieved
from a set of creds. So start passing in a pointer and caller can
decide where to fetch security pointer from.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
When a file is copied up in overlay, we have already created file on
upper/ with right label and there is no need to copy up selinux
label/xattr from lower file to upper file. In fact in case of context
mount, we don't want to copy up label as newly created file got its label
from context= option.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
A file is being copied up for overlay file system. Prepare a new set of
creds and set create_sid appropriately so that new file is created with
appropriate label.
Overlay inode has right label for both context and non-context mount
cases. In case of non-context mount, overlay inode will have the label
of lower file and in case of context mount, overlay inode will have
the label from context= mount option.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled
either built-in or as a module, use that macro instead of open coding
the same.
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Pull qstr constification updates from Al Viro:
"Fairly self-contained bunch - surprising lot of places passes struct
qstr * as an argument when const struct qstr * would suffice; it
complicates analysis for no good reason.
I'd prefer to feed that separately from the assorted fixes (those are
in #for-linus and with somewhat trickier topology)"
* 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
qstr: constify instances in adfs
qstr: constify instances in lustre
qstr: constify instances in f2fs
qstr: constify instances in ext2
qstr: constify instances in vfat
qstr: constify instances in procfs
qstr: constify instances in fuse
qstr constify instances in fs/dcache.c
qstr: constify instances in nfs
qstr: constify instances in ocfs2
qstr: constify instances in autofs4
qstr: constify instances in hfs
qstr: constify instances in hfsplus
qstr: constify instances in logfs
qstr: constify dentry_init_security
Pull security subsystem updates from James Morris:
"Highlights:
- TPM core and driver updates/fixes
- IPv6 security labeling (CALIPSO)
- Lots of Apparmor fixes
- Seccomp: remove 2-phase API, close hole where ptrace can change
syscall #"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
tpm: Factor out common startup code
tpm: use devm_add_action_or_reset
tpm2_i2c_nuvoton: add irq validity check
tpm: read burstcount from TPM_STS in one 32-bit transaction
tpm: fix byte-order for the value read by tpm2_get_tpm_pt
tpm_tis_core: convert max timeouts from msec to jiffies
apparmor: fix arg_size computation for when setprocattr is null terminated
apparmor: fix oops, validate buffer size in apparmor_setprocattr()
apparmor: do not expose kernel stack
apparmor: fix module parameters can be changed after policy is locked
apparmor: fix oops in profile_unpack() when policy_db is not present
apparmor: don't check for vmalloc_addr if kvzalloc() failed
apparmor: add missing id bounds check on dfa verification
apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
apparmor: use list_next_entry instead of list_entry_next
apparmor: fix refcount race when finding a child profile
apparmor: fix ref count leak when profile sha1 hash is read
apparmor: check that xindex is in trans_table bounds
...
This works in exactly the same way as the CIPSO label cache.
The idea is to allow the lsm to cache the result of a secattr
lookup so that it doesn't need to perform the lookup for
every skbuff.
It introduces two sysctl controls:
calipso_cache_enable - enables/disables the cache.
calipso_cache_bucket_size - sets the size of a cache bucket.
Signed-off-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This makes it possible to route the error to the appropriate
labelling engine. CALIPSO is far less verbose than CIPSO
when encountering a bogus packet, so there is no need for a
CALIPSO error handler.
Signed-off-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In some cases, the lsm needs to add the label to the skbuff directly.
A NF_INET_LOCAL_OUT IPv6 hook is added to selinux to match the IPv4
behaviour. This allows selinux to label the skbuffs that it requires.
Signed-off-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Request sockets need to have a label that takes into account the
incoming connection as well as their parent's label. This is used
for the outgoing SYN-ACK and for their child full-socket.
Signed-off-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
If a socket has a netlabel in place then don't let setsockopt() alter
the socket's IPv6 hop-by-hop option. This is in the same spirit as
the existing check for IPv4.
Signed-off-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
CALIPSO is a hop-by-hop IPv6 option. A lot of this patch is based on
the equivalent CISPO code. The main difference is due to manipulating
the options in the hop-by-hop header.
Signed-off-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Security labels from unprivileged mounts in user namespaces must
be ignored. Force superblocks from user namespaces whose labeling
behavior is to use xattrs to use mountpoint labeling instead.
For the mountpoint label, default to converting the current task
context into a form suitable for file objects, but also allow the
policy writer to specify a different label through policy
transition rules.
Pieced together from code snippets provided by Stephen Smalley.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
If a process gets access to a mount from a different user
namespace, that process should not be able to take advantage of
setuid files or selinux entrypoints from that filesystem. Prevent
this by treating mounts from other mount namespaces and those not
owned by current_user_ns() or an ancestor as nosuid.
This will make it safer to allow more complex filesystems to be
mounted in non-root user namespaces.
This does not remove the need for MNT_LOCK_NOSUID. The setuid,
setgid, and file capability bits can no longer be abused if code in
a user namespace were to clear nosuid on an untrusted filesystem,
but this patch, by itself, is insufficient to protect the system
from abuse of files that, when execed, would increase MAC privilege.
As a more concrete explanation, any task that can manipulate a
vfsmount associated with a given user namespace already has
capabilities in that namespace and all of its descendents. If they
can cause a malicious setuid, setgid, or file-caps executable to
appear in that mount, then that executable will only allow them to
elevate privileges in exactly the set of namespaces in which they
are already privileges.
On the other hand, if they can cause a malicious executable to
appear with a dangerous MAC label, running it could change the
caller's security context in a way that should not have been
possible, even inside the namespace in which the task is confined.
As a hardening measure, this would have made CVE-2014-5207 much
more difficult to exploit.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>