diff --git a/man/systemd.exec.xml b/man/systemd.exec.xml index 67182f17dc..84f81fe38e 100644 --- a/man/systemd.exec.xml +++ b/man/systemd.exec.xml @@ -877,48 +877,34 @@ ReadOnlyPaths= InaccessiblePaths= - Sets up a new file system namespace for - executed processes. These options may be used to limit access - a process might have to the main file system hierarchy. Each - setting takes a space-separated list of paths relative to - the host's root directory (i.e. the system running the service manager). - Note that if entries contain symlinks, they are resolved from the host's root directory as well. - Entries (files or directories) listed in - ReadWritePaths= are accessible from - within the namespace with the same access rights as from - outside. Entries listed in - ReadOnlyPaths= are accessible for - reading only, writing will be refused even if the usual file - access controls would permit this. Entries listed in - InaccessiblePaths= will be made - inaccessible for processes inside the namespace, and may not - countain any other mountpoints, including those specified by - ReadWritePaths= or - ReadOnlyPaths=. - Note that restricting access with these options does not extend - to submounts of a directory that are created later on. - Non-directory paths can be specified as well. These - options may be specified more than once, in which case all - paths listed will have limited access from within the - namespace. If the empty string is assigned to this option, the - specific list is reset, and all prior assignments have no - effect. - Paths in - ReadOnlyPaths= - and - InaccessiblePaths= - may be prefixed with - -, in which case - they will be ignored when they do not - exist. Note that using this - setting will disconnect propagation of - mounts from the service to the host - (propagation in the opposite direction - continues to work). This means that - this setting may not be used for - services which shall be able to - install mount points in the main mount - namespace. + Sets up a new file system namespace for executed processes. These options may be used to limit + access a process might have to the file system hierarchy. Each setting takes a space-separated list of paths + relative to the host's root directory (i.e. the system running the service manager). Note that if paths + contain symlinks, they are resolved relative to the root directory set with + RootDirectory=. + + Paths listed in ReadWritePaths= are accessible from within the namespace with the same + access modes as from outside of it. Paths listed in ReadOnlyPaths= are accessible for + reading only, writing will be refused even if the usual file access controls would permit this. Nest + ReadWritePaths= inside of ReadOnlyPaths= in order to provide writable + subdirectories within read-only directories. Use ReadWritePaths= in order to whitelist + specific paths for write access if ProtectSystem=strict is used. Paths listed in + InaccessiblePaths= will be made inaccessible for processes inside the namespace (along with + everything below them in the file system hierarchy). + + Note that restricting access with these options does not extend to submounts of a directory that are + created later on. Non-directory paths may be specified as well. These options may be specified more than once, + in which case all paths listed will have limited access from within the namespace. If the empty string is + assigned to this option, the specific list is reset, and all prior assignments have no effect. + + Paths in ReadOnlyPaths= and InaccessiblePaths= may be prefixed with + -, in which case they will be ignored when they do not exist. Note that using this setting + will disconnect propagation of mounts from the service to the host (propagation in the opposite direction + continues to work). This means that this setting may not be used for services which shall be able to install + mount points in the main mount namespace. Note that the effect of these settings may be undone by privileged + processes. In order to set up an effective sandboxed environment for a unit it is thus recommended to combine + these settings with either CapabilityBoundingSet=~CAP_SYS_ADMIN or + SystemCallFilter=~@mount. @@ -933,37 +919,30 @@ private /tmp and /var/tmp namespace by using the JoinsNamespaceOf= directive, see systemd.unit5 for - details. Note that using this setting will disconnect propagation of mounts from the service to the host - (propagation in the opposite direction continues to work). This means that this setting may not be used for - services which shall be able to install mount points in the main mount namespace. This setting is implied if - DynamicUser= is set. + details. This setting is implied if DynamicUser= is set. For this setting the same + restrictions regarding mount propagation and privileges apply as for ReadOnlyPaths= and + related calls, see above. + PrivateDevices= - Takes a boolean argument. If true, sets up a - new /dev namespace for the executed processes and only adds - API pseudo devices such as /dev/null, - /dev/zero or - /dev/random (as well as the pseudo TTY - subsystem) to it, but no physical devices such as - /dev/sda. This is useful to securely turn - off physical device access by the executed process. Defaults - to false. Enabling this option will also remove - CAP_MKNOD from the capability bounding - set for the unit (see above), and set - DevicePolicy=closed (see + Takes a boolean argument. If true, sets up a new /dev namespace for the executed processes and + only adds API pseudo devices such as /dev/null, /dev/zero or + /dev/random (as well as the pseudo TTY subsystem) to it, but no physical devices such as + /dev/sda. This is useful to securely turn off physical device access by the executed + process. Defaults to false. Enabling this option will also remove CAP_MKNOD from the + capability bounding set for the unit (see above), and set DevicePolicy=closed (see systemd.resource-control5 - for details). Note that using this setting will disconnect - propagation of mounts from the service to the host - (propagation in the opposite direction continues to work). - This means that this setting may not be used for services - which shall be able to install mount points in the main mount - namespace. The /dev namespace will be mounted read-only and 'noexec'. - The latter may break old programs which try to set up executable - memory by using mmap2 - of /dev/zero instead of using MAP_ANON. + for details). Note that using this setting will disconnect propagation of mounts from the service to the host + (propagation in the opposite direction continues to work). This means that this setting may not be used for + services which shall be able to install mount points in the main mount namespace. The /dev namespace will be + mounted read-only and 'noexec'. The latter may break old programs which try to set up executable memory by + using mmap2 of + /dev/zero instead of using MAP_ANON. This setting is implied if + DynamicUser= is set. For this setting the same restrictions regarding mount propagation and + privileges apply as for ReadOnlyPaths= and related calls, see above. @@ -1023,33 +1002,23 @@ operating system (and optionally its configuration, and local mounts) is prohibited for the service. It is recommended to enable this setting for all long-running services, unless they are involved with system updates or need to modify the operating system in other ways. If this option is used, - ReadWritePaths= may be used to exclude specific directories from being made read-only. Note - that processes retaining the CAP_SYS_ADMIN capability (and with no system call filter that - prohibits mount-related system calls applied) can undo the effect of this setting. This setting is hence - particularly useful for daemons which have this either the @mount set filtered using - SystemCallFilter=, or have the CAP_SYS_ADMIN capability removed, for - example with CapabilityBoundingSet=. Defaults to off. + ReadWritePaths= may be used to exclude specific directories from being made read-only. This + setting is implied if DynamicUser= is set. For this setting the same restrictions regarding + mount propagation and privileges apply as for ReadOnlyPaths= and related calls, see + above. Defaults to off. ProtectHome= - Takes a boolean argument or - read-only. If true, the directories - /home, /root and - /run/user - are made inaccessible and empty for processes invoked by this - unit. If set to read-only, the three - directories are made read-only instead. It is recommended to - enable this setting for all long-running services (in - particular network-facing ones), to ensure they cannot get - access to private user data, unless the services actually - require access to the user's private data. Note however that - processes retaining the CAP_SYS_ADMIN capability can undo the - effect of this setting. This setting is hence particularly - useful for daemons which have this capability removed, for - example with CapabilityBoundingSet=. - Defaults to off. + Takes a boolean argument or read-only. If true, the directories + /home, /root and /run/user are made inaccessible + and empty for processes invoked by this unit. If set to read-only, the three directories are + made read-only instead. It is recommended to enable this setting for all long-running services (in particular + network-facing ones), to ensure they cannot get access to private user data, unless the services actually + require access to the user's private data. This setting is implied if DynamicUser= is + set. For this setting the same restrictions regarding mount propagation and privileges apply as for + ReadOnlyPaths= and related calls, see above. @@ -1059,48 +1028,41 @@ /proc/sys and /sys will be made read-only to all processes of the unit. Usually, tunable kernel variables should only be written at boot-time, with the sysctl.d5 mechanism. Almost - no services need to write to these at runtime; it is hence recommended to turn this on for most - services. Defaults to off. + no services need to write to these at runtime; it is hence recommended to turn this on for most services. For + this setting the same restrictions regarding mount propagation and privileges apply as for + ReadOnlyPaths= and related calls, see above. Defaults to off. ProtectControlGroups= - Takes a boolean argument. If true, the Linux Control Groups ("cgroups") hierarchies accessible - through /sys/fs/cgroup will be made read-only to all processes of the unit. Except for - container managers no services should require write access to the control groups hierarchies; it is hence - recommended to turn this on for most services. Defaults to off. + Takes a boolean argument. If true, the Linux Control Groups (cgroups7) hierarchies + accessible through /sys/fs/cgroup will be made read-only to all processes of the + unit. Except for container managers no services should require write access to the control groups hierarchies; + it is hence recommended to turn this on for most services. For this setting the same restrictions regarding + mount propagation and privileges apply as for ReadOnlyPaths= and related calls, see + above. Defaults to off. MountFlags= - Takes a mount propagation flag: - , or - , which control whether mounts in the - file system namespace set up for this unit's processes will - receive or propagate mounts or unmounts. See - mount2 - for details. Defaults to . Use - to ensure that mounts and unmounts are - propagated from the host to the container and vice versa. Use - to run processes so that none of their - mounts and unmounts will propagate to the host. Use - to also ensure that no mounts and - unmounts from the host will propagate into the unit processes' - namespace. Note that means that file - systems mounted on the host might stay mounted continuously in - the unit's namespace, and thus keep the device busy. Note that - the file system namespace related options - (PrivateTmp=, - PrivateDevices=, - ProtectSystem=, - ProtectHome=, - ReadOnlyPaths=, - InaccessiblePaths= and - ReadWritePaths=) require that mount - and unmount propagation from the unit's file system namespace - is disabled, and hence downgrade to + Takes a mount propagation flag: , or + , which control whether mounts in the file system namespace set up for this unit's + processes will receive or propagate mounts or unmounts. See mount2 for + details. Defaults to . Use to ensure that mounts and unmounts + are propagated from the host to the container and vice versa. Use to run processes so + that none of their mounts and unmounts will propagate to the host. Use to also ensure + that no mounts and unmounts from the host will propagate into the unit processes' namespace. Note that + means that file systems mounted on the host might stay mounted continuously in the + unit's namespace, and thus keep the device busy. Note that the file system namespace related options + (PrivateTmp=, PrivateDevices=, ProtectSystem=, + ProtectHome=, ProtectKernelTunables=, + ProtectControlGroups=, ReadOnlyPaths=, + InaccessiblePaths=, ReadWritePaths=) require that mount and unmount + propagation from the unit's file system namespace is disabled, and hence downgrade to . @@ -1335,7 +1297,15 @@ Note, that as new system calls are added to the kernel, additional system calls might be added to the groups - above, so the contents of the sets may change between systemd versions. + above, so the contents of the sets may change between systemd versions. + + It is recommended to combine the file system namespacing related options with + SystemCallFilter=~@mount, in order to prohibit the unit's processes to undo the + mappings. Specifically these are the options PrivateTmp=, + PrivateDevices=, ProtectSystem=, ProtectHome=, + ProtectKernelTunables=, ProtectControlGroups=, + ReadOnlyPaths=, InaccessiblePaths= and + ReadWritePaths=.