Add some notes about security considerations when using LXC

Describe some of the issues to be aware of when configuring LXC guests with security isolation as a goal. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2025-01-11 09:17:52 +03:00 · 2013-09-10 10:58:45 +01:00 · 2013-09-10 10:58:45 +01:00 · 5e6a85c765
commit 5e6a85c765
parent a48838ad2e
1 changed files with 103 additions and 0 deletions
--- a/docs/drvlxc.html.in
+++ b/docs/drvlxc.html.in
@ -168,6 +168,109 @@ Further block or character devices will be made available to containers
 depending on their configuration.
 </p>
 <h2><a name="security">Security considerations</a></h2>
 <p>
 The libvirt LXC driver is fairly flexible in how it can be configured,
 and as such does not enforce a requirement for strict security
 separation between a container and the host. This allows it to be used
 in scenarios where only resource control capabilities are important,
 and resource sharing is desired. Applications wishing to ensure secure
 isolation between a container and the host must ensure that they are
 writing a suitable configuration.
 </p>
 <h3><a name="securenetworking">Network isolation</a></h3>
 <p>
 If the guest configuration does not list any network interfaces,
 the <code>network</code> namespace will not be activated, and thus
 the container will see all the host's network interfaces. This will
 allow apps in the container to bind to/connect from TCP/UDP addresses
 and ports from the host OS. It also allows applications to access
 UNIX domain sockets associated with the host OS, which are in the
 abstract namespace. If access to UNIX domains sockets in the abstract
 namespace is not wanted, then applications should set the
 <code>&lt;privnet/&gt;</code> flag in the
 <code>&lt;features&gt;....&lt;/features&gt;</code> element.
 </p>
 <h3><a name="securefs">Filesystem isolation</a></h3>
 <p>
 If the guest configuration does not list any filesystems, then
 the container will be set up with a root filesystem that matches
 the host's root filesystem. As noted earlier, only a few locations
 such as <code>/dev</code>, <code>/proc</code> and <code>/sys</code>
 will be altered. This means that, in the absence of restrictions
 from sVirt, a process running as user/group N:M inside the container
 will be able to access almost exactly the same files as a process
 running as user/group N:M in the host.
 </p>
 <p>
 There are multiple options for restricting this. It is possible to
 simply map the existing root filesystem through to the container in
 read-only mode. Alternatively a completely separate root filesystem
 can be configured for the guest. In both cases, further sub-mounts
 can be applied to customize the content that is made visible. Note
 that in the absence of sVirt controls, it is still possible for the
 root user in a container to unmount any sub-mounts applied. The user
 namespace feature can also be used to restrict access to files based
 on the UID/GID mappings.
 </p>
 <p>
 Sharing the host filesystem tree, also allows applications to access
 UNIX domains sockets associated with the host OS, which are in the
 filesystem namespaces. It should be noted that a number of init
 systems including at least <code>systemd</code> and <code>upstart</code>
 have UNIX domain socket which are used to control their operation.
 Thus, if the directory/filesystem holding their UNIX domain socket is
 exposed to the container, it will be possible for a user in the container
 to invoke operations on the init service in the same way it could if
 outside the container. This also applies to other applications in the
 host which use UNIX domain sockets in the filesystem, such as DBus,
 Libvirtd, and many more. If this is not desired, then applications
 should either specify the UID/GID mapping in the configuration to
 enable user namespaces and thus block access to the UNIX domain socket
 based on permissions, or should ensure the relevant directories have
 a bind mount to hide them. This is particularly important for the
 <code>/run</code> or <code>/var/run</code> directories.
 </p>
 <h3><a name="secureusers">User and group isolation</a></h3>
 <p>
 If the guest configuration does not list any ID mapping, then the
 user and group IDs used inside the container will match those used
 outside the container. In addition, the capabilities associated with
 a process in the container will infer the same privileges they would
 for a process in the host. This has obvious implications for security,
 since a root user inside the container will be able to access any
 file owned by root that is visible to the container, and perform more
 or less any privileged kernel operation. In the absence of additional
 protection from sVirt, this means that the root user inside a container
 is effectively as powerful as the root user in the host. There is no
 security isolation of the root user.
 </p>
 <p>
 The ID mapping facility was introduced to allow for stricter control
 over the privileges of users inside the container. It allows apps to
 define rules such as "user ID 0 in the container maps to user ID 1000
 in the host". In addition the privileges associated with capabilities
 are somewhat reduced so that they cannot be used to escape from the
 container environment. A full description of user namespaces is outside
 the scope of this document, however LWN has
 <a href="https://lwn.net/Articles/532593/">a good write-up on the topic</a>.
 From the libvirt point of view, the key thing to remember is that defining
 an ID mapping for users and groups in the container XML configuration
 causes libvirt to activate the user namespace feature.
 </p>
 <h2><a name="activation">Systemd Socket Activation Integration</a></h2>
 <p>