1
1
mirror of https://github.com/systemd/systemd-stable.git synced 2025-01-11 05:17:44 +03:00

seccomp: allow x86-64 syscalls on x32, used by the VDSO (fix #8060)

The VDSO provided by the kernel for x32, uses x86-64 syscalls instead of
x32 ones.

I think we can safely allow this; the set of x86-64 syscalls should be
very similar to the x32 ones.  The real point is not to allow *x86*
syscalls, because some of those are inconveniently multiplexed and we're
apparently not able to block the specific actions we want to.
This commit is contained in:
Alan Jenkins 2018-02-02 16:06:32 +00:00
parent 5c19ff79de
commit 2428aaf8a2
2 changed files with 29 additions and 9 deletions

View File

@ -1429,17 +1429,19 @@ CapabilityBoundingSet=~CAP_B CAP_C</programlisting>
filter. The known architecture identifiers are the same as for <varname>ConditionArchitecture=</varname> filter. The known architecture identifiers are the same as for <varname>ConditionArchitecture=</varname>
described in <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>, described in <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
as well as <constant>x32</constant>, <constant>mips64-n32</constant>, <constant>mips64-le-n32</constant>, and as well as <constant>x32</constant>, <constant>mips64-n32</constant>, <constant>mips64-le-n32</constant>, and
the special identifier <constant>native</constant>. If this setting is used, processes of this unit will only the special identifier <constant>native</constant>. The special identifier <constant>native</constant>
be permitted to call native system calls, and system calls of the specified architectures. This is an
effective way to disable compatibility with non-native architectures for processes, for example to prohibit
execution of 32-bit x86 binaries on 64-bit x86-64 systems. The special <constant>native</constant> identifier
implicitly maps to the native architecture of the system (or more precisely: to the architecture the system implicitly maps to the native architecture of the system (or more precisely: to the architecture the system
manager is compiled for). If running in user mode, or in system mode, but without the manager is compiled for). If running in user mode, or in system mode, but without the
<constant>CAP_SYS_ADMIN</constant> capability (e.g. setting <varname>User=nobody</varname>), <constant>CAP_SYS_ADMIN</constant> capability (e.g. setting <varname>User=nobody</varname>),
<varname>NoNewPrivileges=yes</varname> is implied. By default, this option is set to the empty list, i.e. no <varname>NoNewPrivileges=yes</varname> is implied. By default, this option is set to the empty list, i.e. no
system call architecture filtering is applied.</para> system call architecture filtering is applied.</para>
<para>Note that system call filtering is not equally effective on all architectures. For example, on x86 <para>If this setting is used, processes of this unit will only be permitted to call native system calls, and
system calls of the specified architectures. For the purposes of this option, the x32 architecture is treated
as including x86-64 system calls. However, this setting still fulfills its purpose, as explained below, on
x32.</para>
<para>System call filtering is not equally effective on all architectures. For example, on x86
filtering of network socket-related calls is not possible, due to ABI limitations — a limitation that x86-64 filtering of network socket-related calls is not possible, due to ABI limitations — a limitation that x86-64
does not have, however. On systems supporting multiple ABIs at the same time — such as x86/x86-64 — it is hence does not have, however. On systems supporting multiple ABIs at the same time — such as x86/x86-64 — it is hence
recommended to limit the set of permitted system call architectures so that secondary ABIs may not be used to recommended to limit the set of permitted system call architectures so that secondary ABIs may not be used to

View File

@ -1534,17 +1534,35 @@ int seccomp_restrict_archs(Set *archs) {
int r; int r;
/* This installs a filter with no rules, but that restricts the system call architectures to the specified /* This installs a filter with no rules, but that restricts the system call architectures to the specified
* list. */ * list.
*
* There are some qualifications. However the most important use is to stop processes from bypassing
* system call restrictions, in case they used a broader (multiplexing) syscall which is only available
* in a non-native architecture. There are no holes in this use case, at least so far. */
/* Note libseccomp includes our "native" (current) architecture in the filter by default.
* We do not remove it. For example, our callers expect to be able to call execve() afterwards
* to run a program with the restrictions applied. */
seccomp = seccomp_init(SCMP_ACT_ALLOW); seccomp = seccomp_init(SCMP_ACT_ALLOW);
if (!seccomp) if (!seccomp)
return -ENOMEM; return -ENOMEM;
SET_FOREACH(id, archs, i) { SET_FOREACH(id, archs, i) {
r = seccomp_arch_add(seccomp, PTR_TO_UINT32(id) - 1); r = seccomp_arch_add(seccomp, PTR_TO_UINT32(id) - 1);
if (r == -EEXIST) if (r < 0 && r != -EEXIST)
continue; return r;
if (r < 0) }
/* The vdso for x32 assumes that x86-64 syscalls are available. Let's allow them, since x32
* x32 syscalls should basically match x86-64 for everything except the pointer type.
* The important thing is that you can block the old 32-bit x86 syscalls.
* https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=850047 */
if (seccomp_arch_native() == SCMP_ARCH_X32 ||
set_contains(archs, UINT32_TO_PTR(SCMP_ARCH_X32 + 1))) {
r = seccomp_arch_add(seccomp, SCMP_ARCH_X86_64);
if (r < 0 && r != -EEXIST)
return r; return r;
} }