1
1
mirror of https://github.com/systemd/systemd-stable.git synced 2024-12-24 21:34:08 +03:00

man: rework documentation for ReadOnlyPaths= and related settings

This reworks the documentation for ReadOnlyPaths=, ReadWritePaths=,
InaccessiblePaths=. It no longer claims that we'd follow symlinks relative to
the host file system. (Which wasn't true actually, as we didn't follow symlinks
at all in the most recent releases, and we know do follow them, but relative to
RootDirectory=).

This also replaces all references to the fact that all fs namespacing options
can be undone with enough privileges and disable propagation by a single one in
the documentation of ReadOnlyPaths= and friends, and then directs the read to
this in all other places.

Moreover a hint is added to the documentation of SystemCallFilter=, suggesting
usage of ~@mount in case any of the fs namespacing related options are used.
This commit is contained in:
Lennart Poettering 2016-08-26 12:24:37 +02:00 committed by Djalal Harouni
parent b2656f1b1c
commit effbd6d2ea

View File

@ -877,48 +877,34 @@
<term><varname>ReadOnlyPaths=</varname></term> <term><varname>ReadOnlyPaths=</varname></term>
<term><varname>InaccessiblePaths=</varname></term> <term><varname>InaccessiblePaths=</varname></term>
<listitem><para>Sets up a new file system namespace for <listitem><para>Sets up a new file system namespace for executed processes. These options may be used to limit
executed processes. These options may be used to limit access access a process might have to the file system hierarchy. Each setting takes a space-separated list of paths
a process might have to the main file system hierarchy. Each relative to the host's root directory (i.e. the system running the service manager). Note that if paths
setting takes a space-separated list of paths relative to contain symlinks, they are resolved relative to the root directory set with
the host's root directory (i.e. the system running the service manager). <varname>RootDirectory=</varname>.</para>
Note that if entries contain symlinks, they are resolved from the host's root directory as well.
Entries (files or directories) listed in <para>Paths listed in <varname>ReadWritePaths=</varname> are accessible from within the namespace with the same
<varname>ReadWritePaths=</varname> are accessible from access modes as from outside of it. Paths listed in <varname>ReadOnlyPaths=</varname> are accessible for
within the namespace with the same access rights as from reading only, writing will be refused even if the usual file access controls would permit this. Nest
outside. Entries listed in <varname>ReadWritePaths=</varname> inside of <varname>ReadOnlyPaths=</varname> in order to provide writable
<varname>ReadOnlyPaths=</varname> are accessible for subdirectories within read-only directories. Use <varname>ReadWritePaths=</varname> in order to whitelist
reading only, writing will be refused even if the usual file specific paths for write access if <varname>ProtectSystem=strict</varname> is used. Paths listed in
access controls would permit this. Entries listed in <varname>InaccessiblePaths=</varname> will be made inaccessible for processes inside the namespace (along with
<varname>InaccessiblePaths=</varname> will be made everything below them in the file system hierarchy).</para>
inaccessible for processes inside the namespace, and may not
countain any other mountpoints, including those specified by <para>Note that restricting access with these options does not extend to submounts of a directory that are
<varname>ReadWritePaths=</varname> or created later on. Non-directory paths may be specified as well. These options may be specified more than once,
<varname>ReadOnlyPaths=</varname>. in which case all paths listed will have limited access from within the namespace. If the empty string is
Note that restricting access with these options does not extend assigned to this option, the specific list is reset, and all prior assignments have no effect.</para>
to submounts of a directory that are created later on.
Non-directory paths can be specified as well. These <para>Paths in <varname>ReadOnlyPaths=</varname> and <varname>InaccessiblePaths=</varname> may be prefixed with
options may be specified more than once, in which case all <literal>-</literal>, in which case they will be ignored when they do not exist. Note that using this setting
paths listed will have limited access from within the will disconnect propagation of mounts from the service to the host (propagation in the opposite direction
namespace. If the empty string is assigned to this option, the continues to work). This means that this setting may not be used for services which shall be able to install
specific list is reset, and all prior assignments have no mount points in the main mount namespace. Note that the effect of these settings may be undone by privileged
effect.</para> processes. In order to set up an effective sandboxed environment for a unit it is thus recommended to combine
<para>Paths in these settings with either <varname>CapabilityBoundingSet=~CAP_SYS_ADMIN</varname> or
<varname>ReadOnlyPaths=</varname> <varname>SystemCallFilter=~@mount</varname>.</para></listitem>
and
<varname>InaccessiblePaths=</varname>
may be prefixed with
<literal>-</literal>, in which case
they will be ignored when they do not
exist. Note that using this
setting will disconnect propagation of
mounts from the service to the host
(propagation in the opposite direction
continues to work). This means that
this setting may not be used for
services which shall be able to
install mount points in the main mount
namespace.</para></listitem>
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>
@ -933,37 +919,30 @@
private <filename>/tmp</filename> and <filename>/var/tmp</filename> namespace by using the private <filename>/tmp</filename> and <filename>/var/tmp</filename> namespace by using the
<varname>JoinsNamespaceOf=</varname> directive, see <varname>JoinsNamespaceOf=</varname> directive, see
<citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry> for <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry> for
details. Note that using this setting will disconnect propagation of mounts from the service to the host details. This setting is implied if <varname>DynamicUser=</varname> is set. For this setting the same
(propagation in the opposite direction continues to work). This means that this setting may not be used for restrictions regarding mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and
services which shall be able to install mount points in the main mount namespace. This setting is implied if related calls, see above.</para></listitem>
<varname>DynamicUser=</varname> is set.</para></listitem>
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>
<term><varname>PrivateDevices=</varname></term> <term><varname>PrivateDevices=</varname></term>
<listitem><para>Takes a boolean argument. If true, sets up a <listitem><para>Takes a boolean argument. If true, sets up a new /dev namespace for the executed processes and
new /dev namespace for the executed processes and only adds only adds API pseudo devices such as <filename>/dev/null</filename>, <filename>/dev/zero</filename> or
API pseudo devices such as <filename>/dev/null</filename>, <filename>/dev/random</filename> (as well as the pseudo TTY subsystem) to it, but no physical devices such as
<filename>/dev/zero</filename> or <filename>/dev/sda</filename>. This is useful to securely turn off physical device access by the executed
<filename>/dev/random</filename> (as well as the pseudo TTY process. Defaults to false. Enabling this option will also remove <constant>CAP_MKNOD</constant> from the
subsystem) to it, but no physical devices such as capability bounding set for the unit (see above), and set <varname>DevicePolicy=closed</varname> (see
<filename>/dev/sda</filename>. This is useful to securely turn
off physical device access by the executed process. Defaults
to false. Enabling this option will also remove
<constant>CAP_MKNOD</constant> from the capability bounding
set for the unit (see above), and set
<varname>DevicePolicy=closed</varname> (see
<citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry> <citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>
for details). Note that using this setting will disconnect for details). Note that using this setting will disconnect propagation of mounts from the service to the host
propagation of mounts from the service to the host (propagation in the opposite direction continues to work). This means that this setting may not be used for
(propagation in the opposite direction continues to work). services which shall be able to install mount points in the main mount namespace. The /dev namespace will be
This means that this setting may not be used for services mounted read-only and 'noexec'. The latter may break old programs which try to set up executable memory by
which shall be able to install mount points in the main mount using <citerefentry><refentrytitle>mmap</refentrytitle><manvolnum>2</manvolnum></citerefentry> of
namespace. The /dev namespace will be mounted read-only and 'noexec'. <filename>/dev/zero</filename> instead of using <constant>MAP_ANON</constant>. This setting is implied if
The latter may break old programs which try to set up executable <varname>DynamicUser=</varname> is set. For this setting the same restrictions regarding mount propagation and
memory by using <citerefentry><refentrytitle>mmap</refentrytitle><manvolnum>2</manvolnum></citerefentry> privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see above.</para></listitem>
of <filename>/dev/zero</filename> instead of using <constant>MAP_ANON</constant>.</para></listitem>
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>
@ -1023,33 +1002,23 @@
operating system (and optionally its configuration, and local mounts) is prohibited for the service. It is operating system (and optionally its configuration, and local mounts) is prohibited for the service. It is
recommended to enable this setting for all long-running services, unless they are involved with system updates recommended to enable this setting for all long-running services, unless they are involved with system updates
or need to modify the operating system in other ways. If this option is used, or need to modify the operating system in other ways. If this option is used,
<varname>ReadWritePaths=</varname> may be used to exclude specific directories from being made read-only. Note <varname>ReadWritePaths=</varname> may be used to exclude specific directories from being made read-only. This
that processes retaining the <constant>CAP_SYS_ADMIN</constant> capability (and with no system call filter that setting is implied if <varname>DynamicUser=</varname> is set. For this setting the same restrictions regarding
prohibits mount-related system calls applied) can undo the effect of this setting. This setting is hence mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see
particularly useful for daemons which have this either the <literal>@mount</literal> set filtered using above. Defaults to off.</para></listitem>
<varname>SystemCallFilter=</varname>, or have the <constant>CAP_SYS_ADMIN</constant> capability removed, for
example with <varname>CapabilityBoundingSet=</varname>. Defaults to off.</para></listitem>
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>
<term><varname>ProtectHome=</varname></term> <term><varname>ProtectHome=</varname></term>
<listitem><para>Takes a boolean argument or <listitem><para>Takes a boolean argument or <literal>read-only</literal>. If true, the directories
<literal>read-only</literal>. If true, the directories <filename>/home</filename>, <filename>/root</filename> and <filename>/run/user</filename> are made inaccessible
<filename>/home</filename>, <filename>/root</filename> and and empty for processes invoked by this unit. If set to <literal>read-only</literal>, the three directories are
<filename>/run/user</filename> made read-only instead. It is recommended to enable this setting for all long-running services (in particular
are made inaccessible and empty for processes invoked by this network-facing ones), to ensure they cannot get access to private user data, unless the services actually
unit. If set to <literal>read-only</literal>, the three require access to the user's private data. This setting is implied if <varname>DynamicUser=</varname> is
directories are made read-only instead. It is recommended to set. For this setting the same restrictions regarding mount propagation and privileges apply as for
enable this setting for all long-running services (in <varname>ReadOnlyPaths=</varname> and related calls, see above.</para></listitem>
particular network-facing ones), to ensure they cannot get
access to private user data, unless the services actually
require access to the user's private data. Note however that
processes retaining the CAP_SYS_ADMIN capability can undo the
effect of this setting. This setting is hence particularly
useful for daemons which have this capability removed, for
example with <varname>CapabilityBoundingSet=</varname>.
Defaults to off.</para></listitem>
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>
@ -1059,48 +1028,41 @@
<filename>/proc/sys</filename> and <filename>/sys</filename> will be made read-only to all processes of the <filename>/proc/sys</filename> and <filename>/sys</filename> will be made read-only to all processes of the
unit. Usually, tunable kernel variables should only be written at boot-time, with the unit. Usually, tunable kernel variables should only be written at boot-time, with the
<citerefentry><refentrytitle>sysctl.d</refentrytitle><manvolnum>5</manvolnum></citerefentry> mechanism. Almost <citerefentry><refentrytitle>sysctl.d</refentrytitle><manvolnum>5</manvolnum></citerefentry> mechanism. Almost
no services need to write to these at runtime; it is hence recommended to turn this on for most no services need to write to these at runtime; it is hence recommended to turn this on for most services. For
services. Defaults to off.</para></listitem> this setting the same restrictions regarding mount propagation and privileges apply as for
<varname>ReadOnlyPaths=</varname> and related calls, see above. Defaults to off.</para></listitem>
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>
<term><varname>ProtectControlGroups=</varname></term> <term><varname>ProtectControlGroups=</varname></term>
<listitem><para>Takes a boolean argument. If true, the Linux Control Groups ("cgroups") hierarchies accessible <listitem><para>Takes a boolean argument. If true, the Linux Control Groups (<citerefentry
through <filename>/sys/fs/cgroup</filename> will be made read-only to all processes of the unit. Except for project='man-pages'><refentrytitle>cgroups</refentrytitle><manvolnum>7</manvolnum></citerefentry>) hierarchies
container managers no services should require write access to the control groups hierarchies; it is hence accessible through <filename>/sys/fs/cgroup</filename> will be made read-only to all processes of the
recommended to turn this on for most services. Defaults to off.</para></listitem> unit. Except for container managers no services should require write access to the control groups hierarchies;
it is hence recommended to turn this on for most services. For this setting the same restrictions regarding
mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see
above. Defaults to off.</para></listitem>
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>
<term><varname>MountFlags=</varname></term> <term><varname>MountFlags=</varname></term>
<listitem><para>Takes a mount propagation flag: <listitem><para>Takes a mount propagation flag: <option>shared</option>, <option>slave</option> or
<option>shared</option>, <option>slave</option> or <option>private</option>, which control whether mounts in the file system namespace set up for this unit's
<option>private</option>, which control whether mounts in the processes will receive or propagate mounts or unmounts. See <citerefentry
file system namespace set up for this unit's processes will project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry> for
receive or propagate mounts or unmounts. See details. Defaults to <option>shared</option>. Use <option>shared</option> to ensure that mounts and unmounts
<citerefentry project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry> are propagated from the host to the container and vice versa. Use <option>slave</option> to run processes so
for details. Defaults to <option>shared</option>. Use that none of their mounts and unmounts will propagate to the host. Use <option>private</option> to also ensure
<option>shared</option> to ensure that mounts and unmounts are that no mounts and unmounts from the host will propagate into the unit processes' namespace. Note that
propagated from the host to the container and vice versa. Use <option>slave</option> means that file systems mounted on the host might stay mounted continuously in the
<option>slave</option> to run processes so that none of their unit's namespace, and thus keep the device busy. Note that the file system namespace related options
mounts and unmounts will propagate to the host. Use (<varname>PrivateTmp=</varname>, <varname>PrivateDevices=</varname>, <varname>ProtectSystem=</varname>,
<option>private</option> to also ensure that no mounts and <varname>ProtectHome=</varname>, <varname>ProtectKernelTunables=</varname>,
unmounts from the host will propagate into the unit processes' <varname>ProtectControlGroups=</varname>, <varname>ReadOnlyPaths=</varname>,
namespace. Note that <option>slave</option> means that file <varname>InaccessiblePaths=</varname>, <varname>ReadWritePaths=</varname>) require that mount and unmount
systems mounted on the host might stay mounted continuously in propagation from the unit's file system namespace is disabled, and hence downgrade <option>shared</option> to
the unit's namespace, and thus keep the device busy. Note that
the file system namespace related options
(<varname>PrivateTmp=</varname>,
<varname>PrivateDevices=</varname>,
<varname>ProtectSystem=</varname>,
<varname>ProtectHome=</varname>,
<varname>ReadOnlyPaths=</varname>,
<varname>InaccessiblePaths=</varname> and
<varname>ReadWritePaths=</varname>) require that mount
and unmount propagation from the unit's file system namespace
is disabled, and hence downgrade <option>shared</option> to
<option>slave</option>. </para></listitem> <option>slave</option>. </para></listitem>
</varlistentry> </varlistentry>
@ -1335,7 +1297,15 @@
</table> </table>
Note, that as new system calls are added to the kernel, additional system calls might be added to the groups Note, that as new system calls are added to the kernel, additional system calls might be added to the groups
above, so the contents of the sets may change between systemd versions.</para></listitem> above, so the contents of the sets may change between systemd versions.</para>
<para>It is recommended to combine the file system namespacing related options with
<varname>SystemCallFilter=~@mount</varname>, in order to prohibit the unit's processes to undo the
mappings. Specifically these are the options <varname>PrivateTmp=</varname>,
<varname>PrivateDevices=</varname>, <varname>ProtectSystem=</varname>, <varname>ProtectHome=</varname>,
<varname>ProtectKernelTunables=</varname>, <varname>ProtectControlGroups=</varname>,
<varname>ReadOnlyPaths=</varname>, <varname>InaccessiblePaths=</varname> and
<varname>ReadWritePaths=</varname>.</para></listitem>
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>