mirror of
https://github.com/systemd/systemd.git
synced 2024-10-26 08:55:40 +03:00
Merge pull request #34783 from keszybz/man-nspawn-private-users
Change systemd-nspawn man page to strongly recommend private users
This commit is contained in:
commit
2c23b7054f
6
NEWS
6
NEWS
@ -14081,7 +14081,7 @@ CHANGES WITH 218:
|
||||
or are not older than the specified time.
|
||||
|
||||
* A new, native PPPoE library has been added to sd-network,
|
||||
systemd's library of light-weight networking protocols. This
|
||||
systemd's library of lightweight networking protocols. This
|
||||
library will be used in a future version of networkd to
|
||||
enable PPPoE communication without an external pppd daemon.
|
||||
|
||||
@ -14928,7 +14928,7 @@ CHANGES WITH 214:
|
||||
have been added. When enabled, they will make the user data
|
||||
(such as /home) inaccessible or read-only and the system
|
||||
(such as /usr) read-only, for specific services. This allows
|
||||
very light-weight per-service sandboxing to avoid
|
||||
very lightweight per-service sandboxing to avoid
|
||||
modifications of user data or system files from
|
||||
services. These two new switches have been enabled for all
|
||||
of systemd's long-running services, where appropriate.
|
||||
@ -15637,7 +15637,7 @@ CHANGES WITH 209:
|
||||
activation files automatically into native systemd .busname
|
||||
and .service units.
|
||||
|
||||
* sd-bus: add a light-weight vtable implementation that allows
|
||||
* sd-bus: add a lightweight vtable implementation that allows
|
||||
defining objects on the bus with a simple static const
|
||||
vtable array of its methods, signals and properties.
|
||||
|
||||
|
@ -80,7 +80,7 @@ _With all vendor-supplied OS resources in a single directory /usr they may be sh
|
||||
|
||||
**Myth #4**: The /usr merge’s only purpose is to look pretty, and has no other benefits
|
||||
|
||||
**Fact**: The /usr merge makes sharing the vendor-supplied OS resources between a host and networked clients as well as a host and local light-weight containers easier and atomic. Snapshotting the OS becomes a viable option. The /usr merge also allows making the entire vendor-supplied OS resources read-only for increased security and robustness.
|
||||
**Fact**: The /usr merge makes sharing the vendor-supplied OS resources between a host and networked clients as well as a host and local lightweight containers easier and atomic. Snapshotting the OS becomes a viable option. The /usr merge also allows making the entire vendor-supplied OS resources read-only for increased security and robustness.
|
||||
|
||||
**Myth #5**: Adopting the /usr merge in your distribution means additional work for your distribution's package maintainers
|
||||
|
||||
|
@ -651,7 +651,7 @@ node /org/freedesktop/machine1/machine/rawhide {
|
||||
<para><varname>Leader</varname> is the PID of the leader process of the machine.</para>
|
||||
|
||||
<para><varname>Class</varname> is the class of the machine and is either the string "vm" (for real VMs
|
||||
based on virtualized hardware) or "container" (for light-weight userspace virtualization sharing the
|
||||
based on virtualized hardware) or "container" (for lightweight userspace virtualization sharing the
|
||||
same kernel as the host).</para>
|
||||
|
||||
<para><varname>RootDirectory</varname> is the root directory of the container if it is known and
|
||||
|
@ -21,7 +21,7 @@
|
||||
|
||||
<refnamediv>
|
||||
<refname>systemd-nspawn</refname>
|
||||
<refpurpose>Spawn a command or OS in a light-weight container</refpurpose>
|
||||
<refpurpose>Spawn a command or OS in a lightweight container</refpurpose>
|
||||
</refnamediv>
|
||||
|
||||
<refsynopsisdiv>
|
||||
@ -43,11 +43,11 @@
|
||||
<refsect1>
|
||||
<title>Description</title>
|
||||
|
||||
<para><command>systemd-nspawn</command> may be used to run a command or OS in a light-weight namespace
|
||||
<para><command>systemd-nspawn</command> may be used to run a command or OS in a lightweight namespace
|
||||
container. In many ways it is similar to <citerefentry
|
||||
project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>1</manvolnum></citerefentry>, but more powerful
|
||||
since it fully virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems and
|
||||
the host and domain name.</para>
|
||||
since it virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems, and
|
||||
the host and domain names.</para>
|
||||
|
||||
<para><command>systemd-nspawn</command> may be invoked on any directory tree containing an operating system tree,
|
||||
using the <option>--directory=</option> command line option. By using the <option>--machine=</option> option an OS
|
||||
@ -59,11 +59,14 @@
|
||||
project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>1</manvolnum></citerefentry> <command>systemd-nspawn</command>
|
||||
may be used to boot full Linux-based operating systems in a container.</para>
|
||||
|
||||
<para><command>systemd-nspawn</command> limits access to various kernel interfaces in the container to read-only,
|
||||
such as <filename>/sys/</filename>, <filename>/proc/sys/</filename> or <filename>/sys/fs/selinux/</filename>. The
|
||||
host's network interfaces and the system clock may not be changed from within the container. Device nodes may not
|
||||
be created. The host system cannot be rebooted and kernel modules may not be loaded from within the
|
||||
container.</para>
|
||||
<para><command>systemd-nspawn</command> limits access to various kernel interfaces in the container to
|
||||
read-only, such as <filename>/sys/</filename>, <filename>/proc/sys/</filename>, or
|
||||
<filename>/sys/fs/selinux/</filename>. The host's network interfaces and the system clock may not be
|
||||
changed from within the container. Device nodes may not be created. The host system cannot be rebooted
|
||||
and kernel modules may not be loaded from within the container. <emphasis>This sandbox can easily be
|
||||
circumvented from within the container if user namespaces are not used</emphasis>. This means that
|
||||
untrusted code must always be run in a user namespace, see the discussion of the
|
||||
<option>--private-users=</option> option below.</para>
|
||||
|
||||
<para>Use a tool like <citerefentry
|
||||
project='mankier'><refentrytitle>dnf</refentrytitle><manvolnum>8</manvolnum></citerefentry>, <citerefentry
|
||||
@ -100,8 +103,8 @@
|
||||
template unit file, making it usually unnecessary to alter this template file directly.</para>
|
||||
|
||||
<para>Note that <command>systemd-nspawn</command> will mount file systems private to the container to
|
||||
<filename>/dev/</filename>, <filename>/run/</filename> and similar. These will not be visible outside of the
|
||||
container, and their contents will be lost when the container exits.</para>
|
||||
<filename>/dev/</filename>, <filename>/run/</filename>, and similar. These will not be visible outside of
|
||||
the container, and their contents will be lost when the container exits.</para>
|
||||
|
||||
<para>Note that running two <command>systemd-nspawn</command> containers from the same directory tree will not make
|
||||
processes in them see each other. The PID namespace separation of the two containers is complete and the containers
|
||||
@ -810,17 +813,6 @@
|
||||
range. In this mode, the number of UIDs/GIDs assigned to the container is 65536, and the owner
|
||||
UID/GID of the root directory must be a multiple of 65536.</para></listitem>
|
||||
|
||||
<listitem><para>If the parameter is <literal>no</literal>, user namespacing is turned off. This is
|
||||
the default.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem><para>If the parameter is <literal>identity</literal>, user namespacing is employed with
|
||||
an identity mapping for the first 65536 UIDs/GIDs. This is mostly equivalent to
|
||||
<option>--private-users=0:65536</option>. While it does not provide UID/GID isolation, since all
|
||||
host and container UIDs/GIDs are chosen identically it does provide process capability isolation,
|
||||
and hence is often a good choice if proper user namespacing with distinct UID maps is not
|
||||
appropriate.</para></listitem>
|
||||
|
||||
<listitem><para>The special value <literal>pick</literal> turns on user namespacing. In this case
|
||||
the UID/GID range is automatically chosen. As first step, the file owner UID/GID of the root
|
||||
directory of the container's directory tree is read, and it is checked that no other container is
|
||||
@ -837,22 +829,35 @@
|
||||
for it, and thus in the (possibly expensive) file ownership adjustment operation. However,
|
||||
subsequent invocations of the container will be cheap (unless of course the picked UID/GID range is
|
||||
assigned to a different use by then).</para></listitem>
|
||||
|
||||
<listitem><para>If the parameter is <literal>no</literal>, user namespacing is turned off. This is
|
||||
the default when <command>systemd-nspawn</command> is invoked directly. (Note that the
|
||||
<filename>systemd-nspawn@.service</filename> unit enables private users.) This option is not
|
||||
secure and must not be used to run untrusted code.</para></listitem>
|
||||
|
||||
<listitem><para>If the parameter is <literal>identity</literal>, user namespacing is employed with
|
||||
an identity mapping for the first 65536 UIDs/GIDs. This is mostly equivalent to
|
||||
<option>--private-users=0:65536</option>. While it does not provide UID/GID isolation, since all
|
||||
host and container UIDs/GIDs are chosen identically it does provide process capability isolation,
|
||||
but may be useful if proper user namespacing with distinct UID maps is not possible. This option is
|
||||
not secure and must not be used to run untrusted code.</para></listitem>
|
||||
</orderedlist>
|
||||
|
||||
<para>It is recommended to assign at least 65536 UIDs/GIDs to each container, so that the usable UID/GID range in the
|
||||
container covers 16 bit. For best security, do not assign overlapping UID/GID ranges to multiple containers. It is
|
||||
hence a good idea to use the upper 16 bit of the host 32-bit UIDs/GIDs as container identifier, while the lower 16
|
||||
bit encode the container UID/GID used. This is in fact the behavior enforced by the
|
||||
<option>--private-users=pick</option> option.</para>
|
||||
<para>It is recommended to assign at least 65536 UIDs/GIDs to each container, so that the usable
|
||||
UID/GID range in the container covers 16 bits. For best security, do not assign overlapping UID/GID
|
||||
ranges to multiple containers. It is hence a good idea to use the upper 16 bit of the host 32-bit
|
||||
UIDs/GIDs as container identifier, while the lower 16 bits encode the container UID/GID used. This is
|
||||
in fact the behavior enforced by the <option>--private-users=pick</option> option.</para>
|
||||
|
||||
<para>When user namespaces are used, the GID range assigned to each container is always chosen identical to the
|
||||
UID range.</para>
|
||||
<para>When user namespaces are used, the GID range assigned to each container is always chosen
|
||||
identical to the UID range.</para>
|
||||
|
||||
<para>In most cases, using <option>--private-users=pick</option> is the recommended option as it enhances
|
||||
container security massively and operates fully automatically in most cases.</para>
|
||||
<para>In most cases, using <option>--private-users=pick</option> is the recommended option as user
|
||||
namespacing is required for security, and this option massively enhances container security while
|
||||
operating fully automatically in most cases.</para>
|
||||
|
||||
<para>Note that the picked UID/GID range is not written to <filename>/etc/passwd</filename> or
|
||||
<filename>/etc/group</filename>. In fact, the allocation of the range is not stored persistently anywhere,
|
||||
<filename>/etc/group</filename>. In fact, the allocation of the range is not stored persistently,
|
||||
except in the file ownership of the files and directories of the container.</para>
|
||||
|
||||
<para>Note that when user namespacing is used file ownership on disk reflects this, and all of the container's
|
||||
|
@ -601,7 +601,7 @@
|
||||
<command>systemd</command> (and other UIs) as a user-visible label for the unit, so this string
|
||||
should identify the unit rather than describe it, despite the name. This string also shouldn't just
|
||||
repeat the unit name. <literal>Apache2 Web Server</literal> is a good example. Bad examples are
|
||||
<literal>high-performance light-weight HTTP server</literal> (too generic) or
|
||||
<literal>high-performance lightweight HTTP server</literal> (too generic) or
|
||||
<literal>Apache2</literal> (meaningless for people who do not know Apache, duplicates the unit
|
||||
name). <command>systemd</command> may use this string as a noun in status messages (<literal>Starting
|
||||
<replaceable>description</replaceable>...</literal>, <literal>Started
|
||||
|
@ -320,7 +320,7 @@ static int help(void) {
|
||||
return log_oom();
|
||||
|
||||
printf("%1$s [OPTIONS...] [PATH] [ARGUMENTS...]\n\n"
|
||||
"%5$sSpawn a command or OS in a light-weight container.%6$s\n\n"
|
||||
"%5$sSpawn a command or OS in a lightweight container.%6$s\n\n"
|
||||
" -h --help Show this help\n"
|
||||
" --version Print version string\n"
|
||||
" -q --quiet Do not show status information\n"
|
||||
|
@ -2007,7 +2007,7 @@ static int create_directory_or_subvolume(
|
||||
if (r == 0)
|
||||
/* Don't create a subvolume unless the root directory is one, too. We do this under
|
||||
* the assumption that if the root directory is just a plain directory (i.e. very
|
||||
* light-weight), we shouldn't try to split it up into subvolumes (i.e. more
|
||||
* lightweight), we shouldn't try to split it up into subvolumes (i.e. more
|
||||
* heavy-weight). Thus, chroot() environments and suchlike will get a full brtfs
|
||||
* subvolume set up below their tree only if they specifically set up a btrfs
|
||||
* subvolume for the root dir too. */
|
||||
|
Loading…
Reference in New Issue
Block a user