mirror of
https://github.com/systemd/systemd-stable.git
synced 2024-12-23 17:34:00 +03:00
seccomp: allow x86-64 syscalls on x32, used by the VDSO (fix #8060)
The VDSO provided by the kernel for x32, uses x86-64 syscalls instead of x32 ones. I think we can safely allow this; the set of x86-64 syscalls should be very similar to the x32 ones. The real point is not to allow *x86* syscalls, because some of those are inconveniently multiplexed and we're apparently not able to block the specific actions we want to.
This commit is contained in:
parent
5c19ff79de
commit
2428aaf8a2
@ -1429,17 +1429,19 @@ CapabilityBoundingSet=~CAP_B CAP_C</programlisting>
|
||||
filter. The known architecture identifiers are the same as for <varname>ConditionArchitecture=</varname>
|
||||
described in <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
|
||||
as well as <constant>x32</constant>, <constant>mips64-n32</constant>, <constant>mips64-le-n32</constant>, and
|
||||
the special identifier <constant>native</constant>. If this setting is used, processes of this unit will only
|
||||
be permitted to call native system calls, and system calls of the specified architectures. This is an
|
||||
effective way to disable compatibility with non-native architectures for processes, for example to prohibit
|
||||
execution of 32-bit x86 binaries on 64-bit x86-64 systems. The special <constant>native</constant> identifier
|
||||
the special identifier <constant>native</constant>. The special identifier <constant>native</constant>
|
||||
implicitly maps to the native architecture of the system (or more precisely: to the architecture the system
|
||||
manager is compiled for). If running in user mode, or in system mode, but without the
|
||||
<constant>CAP_SYS_ADMIN</constant> capability (e.g. setting <varname>User=nobody</varname>),
|
||||
<varname>NoNewPrivileges=yes</varname> is implied. By default, this option is set to the empty list, i.e. no
|
||||
system call architecture filtering is applied.</para>
|
||||
|
||||
<para>Note that system call filtering is not equally effective on all architectures. For example, on x86
|
||||
<para>If this setting is used, processes of this unit will only be permitted to call native system calls, and
|
||||
system calls of the specified architectures. For the purposes of this option, the x32 architecture is treated
|
||||
as including x86-64 system calls. However, this setting still fulfills its purpose, as explained below, on
|
||||
x32.</para>
|
||||
|
||||
<para>System call filtering is not equally effective on all architectures. For example, on x86
|
||||
filtering of network socket-related calls is not possible, due to ABI limitations — a limitation that x86-64
|
||||
does not have, however. On systems supporting multiple ABIs at the same time — such as x86/x86-64 — it is hence
|
||||
recommended to limit the set of permitted system call architectures so that secondary ABIs may not be used to
|
||||
|
@ -1534,17 +1534,35 @@ int seccomp_restrict_archs(Set *archs) {
|
||||
int r;
|
||||
|
||||
/* This installs a filter with no rules, but that restricts the system call architectures to the specified
|
||||
* list. */
|
||||
* list.
|
||||
*
|
||||
* There are some qualifications. However the most important use is to stop processes from bypassing
|
||||
* system call restrictions, in case they used a broader (multiplexing) syscall which is only available
|
||||
* in a non-native architecture. There are no holes in this use case, at least so far. */
|
||||
|
||||
/* Note libseccomp includes our "native" (current) architecture in the filter by default.
|
||||
* We do not remove it. For example, our callers expect to be able to call execve() afterwards
|
||||
* to run a program with the restrictions applied. */
|
||||
seccomp = seccomp_init(SCMP_ACT_ALLOW);
|
||||
if (!seccomp)
|
||||
return -ENOMEM;
|
||||
|
||||
SET_FOREACH(id, archs, i) {
|
||||
r = seccomp_arch_add(seccomp, PTR_TO_UINT32(id) - 1);
|
||||
if (r == -EEXIST)
|
||||
continue;
|
||||
if (r < 0)
|
||||
if (r < 0 && r != -EEXIST)
|
||||
return r;
|
||||
}
|
||||
|
||||
/* The vdso for x32 assumes that x86-64 syscalls are available. Let's allow them, since x32
|
||||
* x32 syscalls should basically match x86-64 for everything except the pointer type.
|
||||
* The important thing is that you can block the old 32-bit x86 syscalls.
|
||||
* https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=850047 */
|
||||
|
||||
if (seccomp_arch_native() == SCMP_ARCH_X32 ||
|
||||
set_contains(archs, UINT32_TO_PTR(SCMP_ARCH_X32 + 1))) {
|
||||
|
||||
r = seccomp_arch_add(seccomp, SCMP_ARCH_X86_64);
|
||||
if (r < 0 && r != -EEXIST)
|
||||
return r;
|
||||
}
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user