1
0
mirror of https://github.com/systemd/systemd.git synced 2024-10-26 08:55:40 +03:00

Merge pull request #34783 from keszybz/man-nspawn-private-users

Change systemd-nspawn man page to strongly recommend private users
This commit is contained in:
Zbigniew Jędrzejewski-Szmek 2024-10-18 18:44:05 +02:00 committed by GitHub
commit 2c23b7054f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
7 changed files with 45 additions and 40 deletions

6
NEWS
View File

@ -14081,7 +14081,7 @@ CHANGES WITH 218:
or are not older than the specified time.
* A new, native PPPoE library has been added to sd-network,
systemd's library of light-weight networking protocols. This
systemd's library of lightweight networking protocols. This
library will be used in a future version of networkd to
enable PPPoE communication without an external pppd daemon.
@ -14928,7 +14928,7 @@ CHANGES WITH 214:
have been added. When enabled, they will make the user data
(such as /home) inaccessible or read-only and the system
(such as /usr) read-only, for specific services. This allows
very light-weight per-service sandboxing to avoid
very lightweight per-service sandboxing to avoid
modifications of user data or system files from
services. These two new switches have been enabled for all
of systemd's long-running services, where appropriate.
@ -15637,7 +15637,7 @@ CHANGES WITH 209:
activation files automatically into native systemd .busname
and .service units.
* sd-bus: add a light-weight vtable implementation that allows
* sd-bus: add a lightweight vtable implementation that allows
defining objects on the bus with a simple static const
vtable array of its methods, signals and properties.

View File

@ -80,7 +80,7 @@ _With all vendor-supplied OS resources in a single directory /usr they may be sh
**Myth #4**: The /usr merges only purpose is to look pretty, and has no other benefits
**Fact**: The /usr merge makes sharing the vendor-supplied OS resources between a host and networked clients as well as a host and local light-weight containers easier and atomic. Snapshotting the OS becomes a viable option. The /usr merge also allows making the entire vendor-supplied OS resources read-only for increased security and robustness.
**Fact**: The /usr merge makes sharing the vendor-supplied OS resources between a host and networked clients as well as a host and local lightweight containers easier and atomic. Snapshotting the OS becomes a viable option. The /usr merge also allows making the entire vendor-supplied OS resources read-only for increased security and robustness.
**Myth #5**: Adopting the /usr merge in your distribution means additional work for your distribution's package maintainers

View File

@ -651,7 +651,7 @@ node /org/freedesktop/machine1/machine/rawhide {
<para><varname>Leader</varname> is the PID of the leader process of the machine.</para>
<para><varname>Class</varname> is the class of the machine and is either the string "vm" (for real VMs
based on virtualized hardware) or "container" (for light-weight userspace virtualization sharing the
based on virtualized hardware) or "container" (for lightweight userspace virtualization sharing the
same kernel as the host).</para>
<para><varname>RootDirectory</varname> is the root directory of the container if it is known and

View File

@ -21,7 +21,7 @@
<refnamediv>
<refname>systemd-nspawn</refname>
<refpurpose>Spawn a command or OS in a light-weight container</refpurpose>
<refpurpose>Spawn a command or OS in a lightweight container</refpurpose>
</refnamediv>
<refsynopsisdiv>
@ -43,11 +43,11 @@
<refsect1>
<title>Description</title>
<para><command>systemd-nspawn</command> may be used to run a command or OS in a light-weight namespace
<para><command>systemd-nspawn</command> may be used to run a command or OS in a lightweight namespace
container. In many ways it is similar to <citerefentry
project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>1</manvolnum></citerefentry>, but more powerful
since it fully virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems and
the host and domain name.</para>
since it virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems, and
the host and domain names.</para>
<para><command>systemd-nspawn</command> may be invoked on any directory tree containing an operating system tree,
using the <option>--directory=</option> command line option. By using the <option>--machine=</option> option an OS
@ -59,11 +59,14 @@
project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>1</manvolnum></citerefentry> <command>systemd-nspawn</command>
may be used to boot full Linux-based operating systems in a container.</para>
<para><command>systemd-nspawn</command> limits access to various kernel interfaces in the container to read-only,
such as <filename>/sys/</filename>, <filename>/proc/sys/</filename> or <filename>/sys/fs/selinux/</filename>. The
host's network interfaces and the system clock may not be changed from within the container. Device nodes may not
be created. The host system cannot be rebooted and kernel modules may not be loaded from within the
container.</para>
<para><command>systemd-nspawn</command> limits access to various kernel interfaces in the container to
read-only, such as <filename>/sys/</filename>, <filename>/proc/sys/</filename>, or
<filename>/sys/fs/selinux/</filename>. The host's network interfaces and the system clock may not be
changed from within the container. Device nodes may not be created. The host system cannot be rebooted
and kernel modules may not be loaded from within the container. <emphasis>This sandbox can easily be
circumvented from within the container if user namespaces are not used</emphasis>. This means that
untrusted code must always be run in a user namespace, see the discussion of the
<option>--private-users=</option> option below.</para>
<para>Use a tool like <citerefentry
project='mankier'><refentrytitle>dnf</refentrytitle><manvolnum>8</manvolnum></citerefentry>, <citerefentry
@ -100,8 +103,8 @@
template unit file, making it usually unnecessary to alter this template file directly.</para>
<para>Note that <command>systemd-nspawn</command> will mount file systems private to the container to
<filename>/dev/</filename>, <filename>/run/</filename> and similar. These will not be visible outside of the
container, and their contents will be lost when the container exits.</para>
<filename>/dev/</filename>, <filename>/run/</filename>, and similar. These will not be visible outside of
the container, and their contents will be lost when the container exits.</para>
<para>Note that running two <command>systemd-nspawn</command> containers from the same directory tree will not make
processes in them see each other. The PID namespace separation of the two containers is complete and the containers
@ -810,17 +813,6 @@
range. In this mode, the number of UIDs/GIDs assigned to the container is 65536, and the owner
UID/GID of the root directory must be a multiple of 65536.</para></listitem>
<listitem><para>If the parameter is <literal>no</literal>, user namespacing is turned off. This is
the default.</para>
</listitem>
<listitem><para>If the parameter is <literal>identity</literal>, user namespacing is employed with
an identity mapping for the first 65536 UIDs/GIDs. This is mostly equivalent to
<option>--private-users=0:65536</option>. While it does not provide UID/GID isolation, since all
host and container UIDs/GIDs are chosen identically it does provide process capability isolation,
and hence is often a good choice if proper user namespacing with distinct UID maps is not
appropriate.</para></listitem>
<listitem><para>The special value <literal>pick</literal> turns on user namespacing. In this case
the UID/GID range is automatically chosen. As first step, the file owner UID/GID of the root
directory of the container's directory tree is read, and it is checked that no other container is
@ -837,22 +829,35 @@
for it, and thus in the (possibly expensive) file ownership adjustment operation. However,
subsequent invocations of the container will be cheap (unless of course the picked UID/GID range is
assigned to a different use by then).</para></listitem>
<listitem><para>If the parameter is <literal>no</literal>, user namespacing is turned off. This is
the default when <command>systemd-nspawn</command> is invoked directly. (Note that the
<filename>systemd-nspawn@.service</filename> unit enables private users.) This option is not
secure and must not be used to run untrusted code.</para></listitem>
<listitem><para>If the parameter is <literal>identity</literal>, user namespacing is employed with
an identity mapping for the first 65536 UIDs/GIDs. This is mostly equivalent to
<option>--private-users=0:65536</option>. While it does not provide UID/GID isolation, since all
host and container UIDs/GIDs are chosen identically it does provide process capability isolation,
but may be useful if proper user namespacing with distinct UID maps is not possible. This option is
not secure and must not be used to run untrusted code.</para></listitem>
</orderedlist>
<para>It is recommended to assign at least 65536 UIDs/GIDs to each container, so that the usable UID/GID range in the
container covers 16 bit. For best security, do not assign overlapping UID/GID ranges to multiple containers. It is
hence a good idea to use the upper 16 bit of the host 32-bit UIDs/GIDs as container identifier, while the lower 16
bit encode the container UID/GID used. This is in fact the behavior enforced by the
<option>--private-users=pick</option> option.</para>
<para>It is recommended to assign at least 65536 UIDs/GIDs to each container, so that the usable
UID/GID range in the container covers 16 bits. For best security, do not assign overlapping UID/GID
ranges to multiple containers. It is hence a good idea to use the upper 16 bit of the host 32-bit
UIDs/GIDs as container identifier, while the lower 16 bits encode the container UID/GID used. This is
in fact the behavior enforced by the <option>--private-users=pick</option> option.</para>
<para>When user namespaces are used, the GID range assigned to each container is always chosen identical to the
UID range.</para>
<para>When user namespaces are used, the GID range assigned to each container is always chosen
identical to the UID range.</para>
<para>In most cases, using <option>--private-users=pick</option> is the recommended option as it enhances
container security massively and operates fully automatically in most cases.</para>
<para>In most cases, using <option>--private-users=pick</option> is the recommended option as user
namespacing is required for security, and this option massively enhances container security while
operating fully automatically in most cases.</para>
<para>Note that the picked UID/GID range is not written to <filename>/etc/passwd</filename> or
<filename>/etc/group</filename>. In fact, the allocation of the range is not stored persistently anywhere,
<filename>/etc/group</filename>. In fact, the allocation of the range is not stored persistently,
except in the file ownership of the files and directories of the container.</para>
<para>Note that when user namespacing is used file ownership on disk reflects this, and all of the container's

View File

@ -601,7 +601,7 @@
<command>systemd</command> (and other UIs) as a user-visible label for the unit, so this string
should identify the unit rather than describe it, despite the name. This string also shouldn't just
repeat the unit name. <literal>Apache2 Web Server</literal> is a good example. Bad examples are
<literal>high-performance light-weight HTTP server</literal> (too generic) or
<literal>high-performance lightweight HTTP server</literal> (too generic) or
<literal>Apache2</literal> (meaningless for people who do not know Apache, duplicates the unit
name). <command>systemd</command> may use this string as a noun in status messages (<literal>Starting
<replaceable>description</replaceable>...</literal>, <literal>Started

View File

@ -320,7 +320,7 @@ static int help(void) {
return log_oom();
printf("%1$s [OPTIONS...] [PATH] [ARGUMENTS...]\n\n"
"%5$sSpawn a command or OS in a light-weight container.%6$s\n\n"
"%5$sSpawn a command or OS in a lightweight container.%6$s\n\n"
" -h --help Show this help\n"
" --version Print version string\n"
" -q --quiet Do not show status information\n"

View File

@ -2007,7 +2007,7 @@ static int create_directory_or_subvolume(
if (r == 0)
/* Don't create a subvolume unless the root directory is one, too. We do this under
* the assumption that if the root directory is just a plain directory (i.e. very
* light-weight), we shouldn't try to split it up into subvolumes (i.e. more
* lightweight), we shouldn't try to split it up into subvolumes (i.e. more
* heavy-weight). Thus, chroot() environments and suchlike will get a full brtfs
* subvolume set up below their tree only if they specifically set up a btrfs
* subvolume for the root dir too. */