mirror of
https://github.com/systemd/systemd-stable.git
synced 2024-12-24 21:34:08 +03:00
man: rework documentation for ReadOnlyPaths= and related settings
This reworks the documentation for ReadOnlyPaths=, ReadWritePaths=, InaccessiblePaths=. It no longer claims that we'd follow symlinks relative to the host file system. (Which wasn't true actually, as we didn't follow symlinks at all in the most recent releases, and we know do follow them, but relative to RootDirectory=). This also replaces all references to the fact that all fs namespacing options can be undone with enough privileges and disable propagation by a single one in the documentation of ReadOnlyPaths= and friends, and then directs the read to this in all other places. Moreover a hint is added to the documentation of SystemCallFilter=, suggesting usage of ~@mount in case any of the fs namespacing related options are used.
This commit is contained in:
parent
b2656f1b1c
commit
effbd6d2ea
@ -877,48 +877,34 @@
|
|||||||
<term><varname>ReadOnlyPaths=</varname></term>
|
<term><varname>ReadOnlyPaths=</varname></term>
|
||||||
<term><varname>InaccessiblePaths=</varname></term>
|
<term><varname>InaccessiblePaths=</varname></term>
|
||||||
|
|
||||||
<listitem><para>Sets up a new file system namespace for
|
<listitem><para>Sets up a new file system namespace for executed processes. These options may be used to limit
|
||||||
executed processes. These options may be used to limit access
|
access a process might have to the file system hierarchy. Each setting takes a space-separated list of paths
|
||||||
a process might have to the main file system hierarchy. Each
|
relative to the host's root directory (i.e. the system running the service manager). Note that if paths
|
||||||
setting takes a space-separated list of paths relative to
|
contain symlinks, they are resolved relative to the root directory set with
|
||||||
the host's root directory (i.e. the system running the service manager).
|
<varname>RootDirectory=</varname>.</para>
|
||||||
Note that if entries contain symlinks, they are resolved from the host's root directory as well.
|
|
||||||
Entries (files or directories) listed in
|
<para>Paths listed in <varname>ReadWritePaths=</varname> are accessible from within the namespace with the same
|
||||||
<varname>ReadWritePaths=</varname> are accessible from
|
access modes as from outside of it. Paths listed in <varname>ReadOnlyPaths=</varname> are accessible for
|
||||||
within the namespace with the same access rights as from
|
reading only, writing will be refused even if the usual file access controls would permit this. Nest
|
||||||
outside. Entries listed in
|
<varname>ReadWritePaths=</varname> inside of <varname>ReadOnlyPaths=</varname> in order to provide writable
|
||||||
<varname>ReadOnlyPaths=</varname> are accessible for
|
subdirectories within read-only directories. Use <varname>ReadWritePaths=</varname> in order to whitelist
|
||||||
reading only, writing will be refused even if the usual file
|
specific paths for write access if <varname>ProtectSystem=strict</varname> is used. Paths listed in
|
||||||
access controls would permit this. Entries listed in
|
<varname>InaccessiblePaths=</varname> will be made inaccessible for processes inside the namespace (along with
|
||||||
<varname>InaccessiblePaths=</varname> will be made
|
everything below them in the file system hierarchy).</para>
|
||||||
inaccessible for processes inside the namespace, and may not
|
|
||||||
countain any other mountpoints, including those specified by
|
<para>Note that restricting access with these options does not extend to submounts of a directory that are
|
||||||
<varname>ReadWritePaths=</varname> or
|
created later on. Non-directory paths may be specified as well. These options may be specified more than once,
|
||||||
<varname>ReadOnlyPaths=</varname>.
|
in which case all paths listed will have limited access from within the namespace. If the empty string is
|
||||||
Note that restricting access with these options does not extend
|
assigned to this option, the specific list is reset, and all prior assignments have no effect.</para>
|
||||||
to submounts of a directory that are created later on.
|
|
||||||
Non-directory paths can be specified as well. These
|
<para>Paths in <varname>ReadOnlyPaths=</varname> and <varname>InaccessiblePaths=</varname> may be prefixed with
|
||||||
options may be specified more than once, in which case all
|
<literal>-</literal>, in which case they will be ignored when they do not exist. Note that using this setting
|
||||||
paths listed will have limited access from within the
|
will disconnect propagation of mounts from the service to the host (propagation in the opposite direction
|
||||||
namespace. If the empty string is assigned to this option, the
|
continues to work). This means that this setting may not be used for services which shall be able to install
|
||||||
specific list is reset, and all prior assignments have no
|
mount points in the main mount namespace. Note that the effect of these settings may be undone by privileged
|
||||||
effect.</para>
|
processes. In order to set up an effective sandboxed environment for a unit it is thus recommended to combine
|
||||||
<para>Paths in
|
these settings with either <varname>CapabilityBoundingSet=~CAP_SYS_ADMIN</varname> or
|
||||||
<varname>ReadOnlyPaths=</varname>
|
<varname>SystemCallFilter=~@mount</varname>.</para></listitem>
|
||||||
and
|
|
||||||
<varname>InaccessiblePaths=</varname>
|
|
||||||
may be prefixed with
|
|
||||||
<literal>-</literal>, in which case
|
|
||||||
they will be ignored when they do not
|
|
||||||
exist. Note that using this
|
|
||||||
setting will disconnect propagation of
|
|
||||||
mounts from the service to the host
|
|
||||||
(propagation in the opposite direction
|
|
||||||
continues to work). This means that
|
|
||||||
this setting may not be used for
|
|
||||||
services which shall be able to
|
|
||||||
install mount points in the main mount
|
|
||||||
namespace.</para></listitem>
|
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
@ -933,37 +919,30 @@
|
|||||||
private <filename>/tmp</filename> and <filename>/var/tmp</filename> namespace by using the
|
private <filename>/tmp</filename> and <filename>/var/tmp</filename> namespace by using the
|
||||||
<varname>JoinsNamespaceOf=</varname> directive, see
|
<varname>JoinsNamespaceOf=</varname> directive, see
|
||||||
<citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry> for
|
<citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry> for
|
||||||
details. Note that using this setting will disconnect propagation of mounts from the service to the host
|
details. This setting is implied if <varname>DynamicUser=</varname> is set. For this setting the same
|
||||||
(propagation in the opposite direction continues to work). This means that this setting may not be used for
|
restrictions regarding mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and
|
||||||
services which shall be able to install mount points in the main mount namespace. This setting is implied if
|
related calls, see above.</para></listitem>
|
||||||
<varname>DynamicUser=</varname> is set.</para></listitem>
|
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<term><varname>PrivateDevices=</varname></term>
|
<term><varname>PrivateDevices=</varname></term>
|
||||||
|
|
||||||
<listitem><para>Takes a boolean argument. If true, sets up a
|
<listitem><para>Takes a boolean argument. If true, sets up a new /dev namespace for the executed processes and
|
||||||
new /dev namespace for the executed processes and only adds
|
only adds API pseudo devices such as <filename>/dev/null</filename>, <filename>/dev/zero</filename> or
|
||||||
API pseudo devices such as <filename>/dev/null</filename>,
|
<filename>/dev/random</filename> (as well as the pseudo TTY subsystem) to it, but no physical devices such as
|
||||||
<filename>/dev/zero</filename> or
|
<filename>/dev/sda</filename>. This is useful to securely turn off physical device access by the executed
|
||||||
<filename>/dev/random</filename> (as well as the pseudo TTY
|
process. Defaults to false. Enabling this option will also remove <constant>CAP_MKNOD</constant> from the
|
||||||
subsystem) to it, but no physical devices such as
|
capability bounding set for the unit (see above), and set <varname>DevicePolicy=closed</varname> (see
|
||||||
<filename>/dev/sda</filename>. This is useful to securely turn
|
|
||||||
off physical device access by the executed process. Defaults
|
|
||||||
to false. Enabling this option will also remove
|
|
||||||
<constant>CAP_MKNOD</constant> from the capability bounding
|
|
||||||
set for the unit (see above), and set
|
|
||||||
<varname>DevicePolicy=closed</varname> (see
|
|
||||||
<citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>
|
<citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>
|
||||||
for details). Note that using this setting will disconnect
|
for details). Note that using this setting will disconnect propagation of mounts from the service to the host
|
||||||
propagation of mounts from the service to the host
|
(propagation in the opposite direction continues to work). This means that this setting may not be used for
|
||||||
(propagation in the opposite direction continues to work).
|
services which shall be able to install mount points in the main mount namespace. The /dev namespace will be
|
||||||
This means that this setting may not be used for services
|
mounted read-only and 'noexec'. The latter may break old programs which try to set up executable memory by
|
||||||
which shall be able to install mount points in the main mount
|
using <citerefentry><refentrytitle>mmap</refentrytitle><manvolnum>2</manvolnum></citerefentry> of
|
||||||
namespace. The /dev namespace will be mounted read-only and 'noexec'.
|
<filename>/dev/zero</filename> instead of using <constant>MAP_ANON</constant>. This setting is implied if
|
||||||
The latter may break old programs which try to set up executable
|
<varname>DynamicUser=</varname> is set. For this setting the same restrictions regarding mount propagation and
|
||||||
memory by using <citerefentry><refentrytitle>mmap</refentrytitle><manvolnum>2</manvolnum></citerefentry>
|
privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see above.</para></listitem>
|
||||||
of <filename>/dev/zero</filename> instead of using <constant>MAP_ANON</constant>.</para></listitem>
|
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
@ -1023,33 +1002,23 @@
|
|||||||
operating system (and optionally its configuration, and local mounts) is prohibited for the service. It is
|
operating system (and optionally its configuration, and local mounts) is prohibited for the service. It is
|
||||||
recommended to enable this setting for all long-running services, unless they are involved with system updates
|
recommended to enable this setting for all long-running services, unless they are involved with system updates
|
||||||
or need to modify the operating system in other ways. If this option is used,
|
or need to modify the operating system in other ways. If this option is used,
|
||||||
<varname>ReadWritePaths=</varname> may be used to exclude specific directories from being made read-only. Note
|
<varname>ReadWritePaths=</varname> may be used to exclude specific directories from being made read-only. This
|
||||||
that processes retaining the <constant>CAP_SYS_ADMIN</constant> capability (and with no system call filter that
|
setting is implied if <varname>DynamicUser=</varname> is set. For this setting the same restrictions regarding
|
||||||
prohibits mount-related system calls applied) can undo the effect of this setting. This setting is hence
|
mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see
|
||||||
particularly useful for daemons which have this either the <literal>@mount</literal> set filtered using
|
above. Defaults to off.</para></listitem>
|
||||||
<varname>SystemCallFilter=</varname>, or have the <constant>CAP_SYS_ADMIN</constant> capability removed, for
|
|
||||||
example with <varname>CapabilityBoundingSet=</varname>. Defaults to off.</para></listitem>
|
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<term><varname>ProtectHome=</varname></term>
|
<term><varname>ProtectHome=</varname></term>
|
||||||
|
|
||||||
<listitem><para>Takes a boolean argument or
|
<listitem><para>Takes a boolean argument or <literal>read-only</literal>. If true, the directories
|
||||||
<literal>read-only</literal>. If true, the directories
|
<filename>/home</filename>, <filename>/root</filename> and <filename>/run/user</filename> are made inaccessible
|
||||||
<filename>/home</filename>, <filename>/root</filename> and
|
and empty for processes invoked by this unit. If set to <literal>read-only</literal>, the three directories are
|
||||||
<filename>/run/user</filename>
|
made read-only instead. It is recommended to enable this setting for all long-running services (in particular
|
||||||
are made inaccessible and empty for processes invoked by this
|
network-facing ones), to ensure they cannot get access to private user data, unless the services actually
|
||||||
unit. If set to <literal>read-only</literal>, the three
|
require access to the user's private data. This setting is implied if <varname>DynamicUser=</varname> is
|
||||||
directories are made read-only instead. It is recommended to
|
set. For this setting the same restrictions regarding mount propagation and privileges apply as for
|
||||||
enable this setting for all long-running services (in
|
<varname>ReadOnlyPaths=</varname> and related calls, see above.</para></listitem>
|
||||||
particular network-facing ones), to ensure they cannot get
|
|
||||||
access to private user data, unless the services actually
|
|
||||||
require access to the user's private data. Note however that
|
|
||||||
processes retaining the CAP_SYS_ADMIN capability can undo the
|
|
||||||
effect of this setting. This setting is hence particularly
|
|
||||||
useful for daemons which have this capability removed, for
|
|
||||||
example with <varname>CapabilityBoundingSet=</varname>.
|
|
||||||
Defaults to off.</para></listitem>
|
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
@ -1059,48 +1028,41 @@
|
|||||||
<filename>/proc/sys</filename> and <filename>/sys</filename> will be made read-only to all processes of the
|
<filename>/proc/sys</filename> and <filename>/sys</filename> will be made read-only to all processes of the
|
||||||
unit. Usually, tunable kernel variables should only be written at boot-time, with the
|
unit. Usually, tunable kernel variables should only be written at boot-time, with the
|
||||||
<citerefentry><refentrytitle>sysctl.d</refentrytitle><manvolnum>5</manvolnum></citerefentry> mechanism. Almost
|
<citerefentry><refentrytitle>sysctl.d</refentrytitle><manvolnum>5</manvolnum></citerefentry> mechanism. Almost
|
||||||
no services need to write to these at runtime; it is hence recommended to turn this on for most
|
no services need to write to these at runtime; it is hence recommended to turn this on for most services. For
|
||||||
services. Defaults to off.</para></listitem>
|
this setting the same restrictions regarding mount propagation and privileges apply as for
|
||||||
|
<varname>ReadOnlyPaths=</varname> and related calls, see above. Defaults to off.</para></listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<term><varname>ProtectControlGroups=</varname></term>
|
<term><varname>ProtectControlGroups=</varname></term>
|
||||||
|
|
||||||
<listitem><para>Takes a boolean argument. If true, the Linux Control Groups ("cgroups") hierarchies accessible
|
<listitem><para>Takes a boolean argument. If true, the Linux Control Groups (<citerefentry
|
||||||
through <filename>/sys/fs/cgroup</filename> will be made read-only to all processes of the unit. Except for
|
project='man-pages'><refentrytitle>cgroups</refentrytitle><manvolnum>7</manvolnum></citerefentry>) hierarchies
|
||||||
container managers no services should require write access to the control groups hierarchies; it is hence
|
accessible through <filename>/sys/fs/cgroup</filename> will be made read-only to all processes of the
|
||||||
recommended to turn this on for most services. Defaults to off.</para></listitem>
|
unit. Except for container managers no services should require write access to the control groups hierarchies;
|
||||||
|
it is hence recommended to turn this on for most services. For this setting the same restrictions regarding
|
||||||
|
mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see
|
||||||
|
above. Defaults to off.</para></listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<term><varname>MountFlags=</varname></term>
|
<term><varname>MountFlags=</varname></term>
|
||||||
|
|
||||||
<listitem><para>Takes a mount propagation flag:
|
<listitem><para>Takes a mount propagation flag: <option>shared</option>, <option>slave</option> or
|
||||||
<option>shared</option>, <option>slave</option> or
|
<option>private</option>, which control whether mounts in the file system namespace set up for this unit's
|
||||||
<option>private</option>, which control whether mounts in the
|
processes will receive or propagate mounts or unmounts. See <citerefentry
|
||||||
file system namespace set up for this unit's processes will
|
project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry> for
|
||||||
receive or propagate mounts or unmounts. See
|
details. Defaults to <option>shared</option>. Use <option>shared</option> to ensure that mounts and unmounts
|
||||||
<citerefentry project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry>
|
are propagated from the host to the container and vice versa. Use <option>slave</option> to run processes so
|
||||||
for details. Defaults to <option>shared</option>. Use
|
that none of their mounts and unmounts will propagate to the host. Use <option>private</option> to also ensure
|
||||||
<option>shared</option> to ensure that mounts and unmounts are
|
that no mounts and unmounts from the host will propagate into the unit processes' namespace. Note that
|
||||||
propagated from the host to the container and vice versa. Use
|
<option>slave</option> means that file systems mounted on the host might stay mounted continuously in the
|
||||||
<option>slave</option> to run processes so that none of their
|
unit's namespace, and thus keep the device busy. Note that the file system namespace related options
|
||||||
mounts and unmounts will propagate to the host. Use
|
(<varname>PrivateTmp=</varname>, <varname>PrivateDevices=</varname>, <varname>ProtectSystem=</varname>,
|
||||||
<option>private</option> to also ensure that no mounts and
|
<varname>ProtectHome=</varname>, <varname>ProtectKernelTunables=</varname>,
|
||||||
unmounts from the host will propagate into the unit processes'
|
<varname>ProtectControlGroups=</varname>, <varname>ReadOnlyPaths=</varname>,
|
||||||
namespace. Note that <option>slave</option> means that file
|
<varname>InaccessiblePaths=</varname>, <varname>ReadWritePaths=</varname>) require that mount and unmount
|
||||||
systems mounted on the host might stay mounted continuously in
|
propagation from the unit's file system namespace is disabled, and hence downgrade <option>shared</option> to
|
||||||
the unit's namespace, and thus keep the device busy. Note that
|
|
||||||
the file system namespace related options
|
|
||||||
(<varname>PrivateTmp=</varname>,
|
|
||||||
<varname>PrivateDevices=</varname>,
|
|
||||||
<varname>ProtectSystem=</varname>,
|
|
||||||
<varname>ProtectHome=</varname>,
|
|
||||||
<varname>ReadOnlyPaths=</varname>,
|
|
||||||
<varname>InaccessiblePaths=</varname> and
|
|
||||||
<varname>ReadWritePaths=</varname>) require that mount
|
|
||||||
and unmount propagation from the unit's file system namespace
|
|
||||||
is disabled, and hence downgrade <option>shared</option> to
|
|
||||||
<option>slave</option>. </para></listitem>
|
<option>slave</option>. </para></listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
@ -1335,7 +1297,15 @@
|
|||||||
</table>
|
</table>
|
||||||
|
|
||||||
Note, that as new system calls are added to the kernel, additional system calls might be added to the groups
|
Note, that as new system calls are added to the kernel, additional system calls might be added to the groups
|
||||||
above, so the contents of the sets may change between systemd versions.</para></listitem>
|
above, so the contents of the sets may change between systemd versions.</para>
|
||||||
|
|
||||||
|
<para>It is recommended to combine the file system namespacing related options with
|
||||||
|
<varname>SystemCallFilter=~@mount</varname>, in order to prohibit the unit's processes to undo the
|
||||||
|
mappings. Specifically these are the options <varname>PrivateTmp=</varname>,
|
||||||
|
<varname>PrivateDevices=</varname>, <varname>ProtectSystem=</varname>, <varname>ProtectHome=</varname>,
|
||||||
|
<varname>ProtectKernelTunables=</varname>, <varname>ProtectControlGroups=</varname>,
|
||||||
|
<varname>ReadOnlyPaths=</varname>, <varname>InaccessiblePaths=</varname> and
|
||||||
|
<varname>ReadWritePaths=</varname>.</para></listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
|
Loading…
Reference in New Issue
Block a user