diff --git a/man/systemd.exec.xml b/man/systemd.exec.xml
index 67182f17dc..84f81fe38e 100644
--- a/man/systemd.exec.xml
+++ b/man/systemd.exec.xml
@@ -877,48 +877,34 @@
ReadOnlyPaths=InaccessiblePaths=
- Sets up a new file system namespace for
- executed processes. These options may be used to limit access
- a process might have to the main file system hierarchy. Each
- setting takes a space-separated list of paths relative to
- the host's root directory (i.e. the system running the service manager).
- Note that if entries contain symlinks, they are resolved from the host's root directory as well.
- Entries (files or directories) listed in
- ReadWritePaths= are accessible from
- within the namespace with the same access rights as from
- outside. Entries listed in
- ReadOnlyPaths= are accessible for
- reading only, writing will be refused even if the usual file
- access controls would permit this. Entries listed in
- InaccessiblePaths= will be made
- inaccessible for processes inside the namespace, and may not
- countain any other mountpoints, including those specified by
- ReadWritePaths= or
- ReadOnlyPaths=.
- Note that restricting access with these options does not extend
- to submounts of a directory that are created later on.
- Non-directory paths can be specified as well. These
- options may be specified more than once, in which case all
- paths listed will have limited access from within the
- namespace. If the empty string is assigned to this option, the
- specific list is reset, and all prior assignments have no
- effect.
- Paths in
- ReadOnlyPaths=
- and
- InaccessiblePaths=
- may be prefixed with
- -, in which case
- they will be ignored when they do not
- exist. Note that using this
- setting will disconnect propagation of
- mounts from the service to the host
- (propagation in the opposite direction
- continues to work). This means that
- this setting may not be used for
- services which shall be able to
- install mount points in the main mount
- namespace.
+ Sets up a new file system namespace for executed processes. These options may be used to limit
+ access a process might have to the file system hierarchy. Each setting takes a space-separated list of paths
+ relative to the host's root directory (i.e. the system running the service manager). Note that if paths
+ contain symlinks, they are resolved relative to the root directory set with
+ RootDirectory=.
+
+ Paths listed in ReadWritePaths= are accessible from within the namespace with the same
+ access modes as from outside of it. Paths listed in ReadOnlyPaths= are accessible for
+ reading only, writing will be refused even if the usual file access controls would permit this. Nest
+ ReadWritePaths= inside of ReadOnlyPaths= in order to provide writable
+ subdirectories within read-only directories. Use ReadWritePaths= in order to whitelist
+ specific paths for write access if ProtectSystem=strict is used. Paths listed in
+ InaccessiblePaths= will be made inaccessible for processes inside the namespace (along with
+ everything below them in the file system hierarchy).
+
+ Note that restricting access with these options does not extend to submounts of a directory that are
+ created later on. Non-directory paths may be specified as well. These options may be specified more than once,
+ in which case all paths listed will have limited access from within the namespace. If the empty string is
+ assigned to this option, the specific list is reset, and all prior assignments have no effect.
+
+ Paths in ReadOnlyPaths= and InaccessiblePaths= may be prefixed with
+ -, in which case they will be ignored when they do not exist. Note that using this setting
+ will disconnect propagation of mounts from the service to the host (propagation in the opposite direction
+ continues to work). This means that this setting may not be used for services which shall be able to install
+ mount points in the main mount namespace. Note that the effect of these settings may be undone by privileged
+ processes. In order to set up an effective sandboxed environment for a unit it is thus recommended to combine
+ these settings with either CapabilityBoundingSet=~CAP_SYS_ADMIN or
+ SystemCallFilter=~@mount.
@@ -933,37 +919,30 @@
private /tmp and /var/tmp namespace by using the
JoinsNamespaceOf= directive, see
systemd.unit5 for
- details. Note that using this setting will disconnect propagation of mounts from the service to the host
- (propagation in the opposite direction continues to work). This means that this setting may not be used for
- services which shall be able to install mount points in the main mount namespace. This setting is implied if
- DynamicUser= is set.
+ details. This setting is implied if DynamicUser= is set. For this setting the same
+ restrictions regarding mount propagation and privileges apply as for ReadOnlyPaths= and
+ related calls, see above.
+
PrivateDevices=
- Takes a boolean argument. If true, sets up a
- new /dev namespace for the executed processes and only adds
- API pseudo devices such as /dev/null,
- /dev/zero or
- /dev/random (as well as the pseudo TTY
- subsystem) to it, but no physical devices such as
- /dev/sda. This is useful to securely turn
- off physical device access by the executed process. Defaults
- to false. Enabling this option will also remove
- CAP_MKNOD from the capability bounding
- set for the unit (see above), and set
- DevicePolicy=closed (see
+ Takes a boolean argument. If true, sets up a new /dev namespace for the executed processes and
+ only adds API pseudo devices such as /dev/null, /dev/zero or
+ /dev/random (as well as the pseudo TTY subsystem) to it, but no physical devices such as
+ /dev/sda. This is useful to securely turn off physical device access by the executed
+ process. Defaults to false. Enabling this option will also remove CAP_MKNOD from the
+ capability bounding set for the unit (see above), and set DevicePolicy=closed (see
systemd.resource-control5
- for details). Note that using this setting will disconnect
- propagation of mounts from the service to the host
- (propagation in the opposite direction continues to work).
- This means that this setting may not be used for services
- which shall be able to install mount points in the main mount
- namespace. The /dev namespace will be mounted read-only and 'noexec'.
- The latter may break old programs which try to set up executable
- memory by using mmap2
- of /dev/zero instead of using MAP_ANON.
+ for details). Note that using this setting will disconnect propagation of mounts from the service to the host
+ (propagation in the opposite direction continues to work). This means that this setting may not be used for
+ services which shall be able to install mount points in the main mount namespace. The /dev namespace will be
+ mounted read-only and 'noexec'. The latter may break old programs which try to set up executable memory by
+ using mmap2 of
+ /dev/zero instead of using MAP_ANON. This setting is implied if
+ DynamicUser= is set. For this setting the same restrictions regarding mount propagation and
+ privileges apply as for ReadOnlyPaths= and related calls, see above.
@@ -1023,33 +1002,23 @@
operating system (and optionally its configuration, and local mounts) is prohibited for the service. It is
recommended to enable this setting for all long-running services, unless they are involved with system updates
or need to modify the operating system in other ways. If this option is used,
- ReadWritePaths= may be used to exclude specific directories from being made read-only. Note
- that processes retaining the CAP_SYS_ADMIN capability (and with no system call filter that
- prohibits mount-related system calls applied) can undo the effect of this setting. This setting is hence
- particularly useful for daemons which have this either the @mount set filtered using
- SystemCallFilter=, or have the CAP_SYS_ADMIN capability removed, for
- example with CapabilityBoundingSet=. Defaults to off.
+ ReadWritePaths= may be used to exclude specific directories from being made read-only. This
+ setting is implied if DynamicUser= is set. For this setting the same restrictions regarding
+ mount propagation and privileges apply as for ReadOnlyPaths= and related calls, see
+ above. Defaults to off.
ProtectHome=
- Takes a boolean argument or
- read-only. If true, the directories
- /home, /root and
- /run/user
- are made inaccessible and empty for processes invoked by this
- unit. If set to read-only, the three
- directories are made read-only instead. It is recommended to
- enable this setting for all long-running services (in
- particular network-facing ones), to ensure they cannot get
- access to private user data, unless the services actually
- require access to the user's private data. Note however that
- processes retaining the CAP_SYS_ADMIN capability can undo the
- effect of this setting. This setting is hence particularly
- useful for daemons which have this capability removed, for
- example with CapabilityBoundingSet=.
- Defaults to off.
+ Takes a boolean argument or read-only. If true, the directories
+ /home, /root and /run/user are made inaccessible
+ and empty for processes invoked by this unit. If set to read-only, the three directories are
+ made read-only instead. It is recommended to enable this setting for all long-running services (in particular
+ network-facing ones), to ensure they cannot get access to private user data, unless the services actually
+ require access to the user's private data. This setting is implied if DynamicUser= is
+ set. For this setting the same restrictions regarding mount propagation and privileges apply as for
+ ReadOnlyPaths= and related calls, see above.
@@ -1059,48 +1028,41 @@
/proc/sys and /sys will be made read-only to all processes of the
unit. Usually, tunable kernel variables should only be written at boot-time, with the
sysctl.d5 mechanism. Almost
- no services need to write to these at runtime; it is hence recommended to turn this on for most
- services. Defaults to off.
+ no services need to write to these at runtime; it is hence recommended to turn this on for most services. For
+ this setting the same restrictions regarding mount propagation and privileges apply as for
+ ReadOnlyPaths= and related calls, see above. Defaults to off.
ProtectControlGroups=
- Takes a boolean argument. If true, the Linux Control Groups ("cgroups") hierarchies accessible
- through /sys/fs/cgroup will be made read-only to all processes of the unit. Except for
- container managers no services should require write access to the control groups hierarchies; it is hence
- recommended to turn this on for most services. Defaults to off.
+ Takes a boolean argument. If true, the Linux Control Groups (cgroups7) hierarchies
+ accessible through /sys/fs/cgroup will be made read-only to all processes of the
+ unit. Except for container managers no services should require write access to the control groups hierarchies;
+ it is hence recommended to turn this on for most services. For this setting the same restrictions regarding
+ mount propagation and privileges apply as for ReadOnlyPaths= and related calls, see
+ above. Defaults to off.MountFlags=
- Takes a mount propagation flag:
- , or
- , which control whether mounts in the
- file system namespace set up for this unit's processes will
- receive or propagate mounts or unmounts. See
- mount2
- for details. Defaults to . Use
- to ensure that mounts and unmounts are
- propagated from the host to the container and vice versa. Use
- to run processes so that none of their
- mounts and unmounts will propagate to the host. Use
- to also ensure that no mounts and
- unmounts from the host will propagate into the unit processes'
- namespace. Note that means that file
- systems mounted on the host might stay mounted continuously in
- the unit's namespace, and thus keep the device busy. Note that
- the file system namespace related options
- (PrivateTmp=,
- PrivateDevices=,
- ProtectSystem=,
- ProtectHome=,
- ReadOnlyPaths=,
- InaccessiblePaths= and
- ReadWritePaths=) require that mount
- and unmount propagation from the unit's file system namespace
- is disabled, and hence downgrade to
+ Takes a mount propagation flag: , or
+ , which control whether mounts in the file system namespace set up for this unit's
+ processes will receive or propagate mounts or unmounts. See mount2 for
+ details. Defaults to . Use to ensure that mounts and unmounts
+ are propagated from the host to the container and vice versa. Use to run processes so
+ that none of their mounts and unmounts will propagate to the host. Use to also ensure
+ that no mounts and unmounts from the host will propagate into the unit processes' namespace. Note that
+ means that file systems mounted on the host might stay mounted continuously in the
+ unit's namespace, and thus keep the device busy. Note that the file system namespace related options
+ (PrivateTmp=, PrivateDevices=, ProtectSystem=,
+ ProtectHome=, ProtectKernelTunables=,
+ ProtectControlGroups=, ReadOnlyPaths=,
+ InaccessiblePaths=, ReadWritePaths=) require that mount and unmount
+ propagation from the unit's file system namespace is disabled, and hence downgrade to
.
@@ -1335,7 +1297,15 @@
Note, that as new system calls are added to the kernel, additional system calls might be added to the groups
- above, so the contents of the sets may change between systemd versions.
+ above, so the contents of the sets may change between systemd versions.
+
+ It is recommended to combine the file system namespacing related options with
+ SystemCallFilter=~@mount, in order to prohibit the unit's processes to undo the
+ mappings. Specifically these are the options PrivateTmp=,
+ PrivateDevices=, ProtectSystem=, ProtectHome=,
+ ProtectKernelTunables=, ProtectControlGroups=,
+ ReadOnlyPaths=, InaccessiblePaths= and
+ ReadWritePaths=.