1
0
mirror of https://github.com/systemd/systemd.git synced 2025-01-21 22:04:01 +03:00

Merge pull request #29721 from poettering/systemd-project

New capsule@.service feature
This commit is contained in:
Zbigniew Jędrzejewski-Szmek 2024-03-26 13:19:33 +01:00 committed by GitHub
commit c38e4e2fda
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
28 changed files with 717 additions and 55 deletions

6
TODO
View File

@ -356,12 +356,6 @@ Features:
policy from currently booted kernel/event log, to close gap for first boot
for pre-built images
* add a new systemd-project@.service that is very similar to user@.service but
uses DynamicUser=1 and no PAMName= to invoke an unprivileged somewhat
light-weight service manager. Use HOME=/var/lib/systemd/projects/%i as home
dir. Similar for $XDG_RUNTIME_DIR. Start project@%i.target. Use LogField= to
add a field identifying the project.
* in sd-boot and sd-stub measure the SMBIOS vendor strings to some PCR (at
least some subset of them that look like systemd stuff), because apparently
some firmware does not, but systemd honours it. avoid duplicate measurement

View File

@ -449,6 +449,7 @@
<xi:include href="user-system-options.xml" xpointer="system" />
<xi:include href="user-system-options.xml" xpointer="host" />
<xi:include href="user-system-options.xml" xpointer="machine" />
<xi:include href="user-system-options.xml" xpointer="capsule" />
<varlistentry>
<term><option>-l</option></term>

118
man/capsule@.service.xml Normal file
View File

@ -0,0 +1,118 @@
<?xml version="1.0"?>
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<!-- SPDX-License-Identifier: LGPL-2.1-or-later -->
<refentry id="capsule_.service">
<refentryinfo>
<title>capsule@.service</title>
<productname>systemd</productname>
</refentryinfo>
<refmeta>
<refentrytitle>capsule@.service</refentrytitle>
<manvolnum>5</manvolnum>
</refmeta>
<refnamediv>
<refname>capsule@.service</refname>
<refpurpose>System unit for the capsule service manager</refpurpose>
</refnamediv>
<refsynopsisdiv>
<para><filename>capsule@<replaceable>NAME</replaceable>.service</filename></para>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>Service managers for capsules run in
<filename>capsule@<replaceable>NAME</replaceable>.service</filename> system units, with the capsule name as the
instance identifier. Capsules are way to run additional instances of the service manager, under dynamic
user IDs, i.e. UIDs that are allocated when the capsule service manager is started, and released when it
is stopped.</para>
<para>In many ways <filename>capsule@.service</filename> is similar to the per-user
<filename>user@.service</filename> service manager, but there are a few important distinctions:</para>
<itemizedlist>
<listitem><para>The capsule service manager utilizes <varname>DynamicUser=</varname> (see
<citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>) to
allocate a new UID dynamically on invocation. The user name is automatically generated from the capsule
name, by prefixng <literal>p_</literal>. The UID is released when the service is terminated. The user
service manager on the other hand operates under a statically allocated user ID that must be
pre-existing, before the user service manager is invoked.</para></listitem>
<listitem><para>User service managers register themselves with <citerefentry
project='man-pages'><refentrytitle>pam</refentrytitle><manvolnum>8</manvolnum></citerefentry>, capsule
service managers do not.</para></listitem>
<listitem><para>User service managers typically read their configuration from a
<varname>$HOME</varname> directory below <filename>/home/</filename>, capsule service managers from a
<varname>$HOME</varname> directory below <filename>/var/lib/capsules/</filename>.</para></listitem>
<listitem><para>User service managers are collectively contained in the <filename>user.slice</filename>
unit, capsule service managers in <filename>capsule.slice</filename>. Also see
<citerefentry><refentrytitle>systemd.special</refentrytitle><manvolnum>7</manvolnum></citerefentry>.</para></listitem>
<listitem><para>User service managers start the user unit <filename>default.target</filename>
initially. Capsule service managers invoke the user unit <filename>capsule@.target</filename>
instead.</para></listitem>
</itemizedlist>
<para>The capsule service manager and the capsule's bus broker can be reached via the
<option>--capsule=</option> switch to
<citerefentry><refentrytitle>systemctl</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
<citerefentry><refentrytitle>systemd-run</refentrytitle><manvolnum>1</manvolnum></citerefentry> and
<citerefentry><refentrytitle>busctl</refentrytitle><manvolnum>1</manvolnum></citerefentry>.</para>
<para>New capsules can be started via a simple <command>systemctl start
capsule@<replaceable>NAME</replaceable>.service</command> command, and stopped via <command>systemctl
stop capsule@<replaceable>NAME</replaceable>.service</command>. Starting a capsule will implicitly create
a home directory <filename>/var/lib/capsules/<replaceable>NAME</replaceable>/</filename>, if missing. A
runtime directory is created as <filename>/run/capsules/<replaceable>NAME</replaceable>/</filename>. To
remove these resources use <command>systemctl clean capsule@<replaceable>NAME</replaceable>.service</command>,
for example with the <option>--what=all</option> switch.</para>
<para>The <filename>capsule@.service</filename> unit invokes a <command>systemd --user</command>
service manager process. This means unit files are looked for according to the sames rules as for regular user
service managers, for example in
<filename>/var/lib/capsules/<replaceable>NAME</replaceable>/.config/systemd/user/</filename>.</para>
<para>Capsule names may be chosen freely by the user, however, they must be suitable as UNIX filenames
(i.e. 255 characters max, and contain no <literal>/</literal>), and when prefixed with
<literal>p-</literal> be suitable as a user name matching strict POSIX rules, see <ulink
url="https://systemd.io/USER_NAMES">User/Group Name Syntax</ulink> for details.</para>
</refsect1>
<refsect1>
<title>Examples</title>
<example>
<title>Create a new capsule, invoke two programs in it (one interactively), terminate it, and clean everything up</title>
<programlisting># systemctl start capsule@tatze.service
# systemd-run --capsule=tatze --unit=sleeptest.service sleep 999
# systemctl --capsule=tatze status sleeptest.service
# systemd-run -t --capsule=tatze bash
# systemctl --capsule=tatze stop sleeptest.service
# systemctl stop capsule@tatze.service
# systemctl clean --all capsule@tatze.service</programlisting>
</example>
</refsect1>
<refsect1>
<title>See Also</title>
<para>
<citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
<citerefentry><refentrytitle>user@.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
<citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
<citerefentry><refentrytitle>systemd.slice</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
<citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
<citerefentry><refentrytitle>systemd.special</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
<citerefentry><refentrytitle>systemctl</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
<citerefentry><refentrytitle>systemd-run</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
<citerefentry><refentrytitle>busctl</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
<citerefentry project='man-pages'><refentrytitle>pam</refentrytitle><manvolnum>8</manvolnum></citerefentry>
</para>
</refsect1>
</refentry>

View File

@ -8,6 +8,7 @@ manpages = [
['bootctl', '1', [], ''],
['bootup', '7', [], ''],
['busctl', '1', [], ''],
['capsule@.service', '5', [], ''],
['coredump.conf', '5', ['coredump.conf.d'], 'ENABLE_COREDUMP'],
['coredumpctl', '1', [], 'ENABLE_COREDUMP'],
['crypttab', '5', [], 'HAVE_LIBCRYPTSETUP'],

View File

@ -2813,6 +2813,7 @@ EOF
<xi:include href="user-system-options.xml" xpointer="host" />
<xi:include href="user-system-options.xml" xpointer="machine" />
<xi:include href="user-system-options.xml" xpointer="capsule" />
<xi:include href="standard-options.xml" xpointer="no-pager" />
<xi:include href="standard-options.xml" xpointer="legend" />

View File

@ -517,6 +517,7 @@
<xi:include href="user-system-options.xml" xpointer="system" />
<xi:include href="user-system-options.xml" xpointer="host" />
<xi:include href="user-system-options.xml" xpointer="machine" />
<xi:include href="user-system-options.xml" xpointer="capsule" />
<xi:include href="standard-options.xml" xpointer="help" />
<xi:include href="standard-options.xml" xpointer="version" />

View File

@ -97,9 +97,10 @@
<filename>umount.target</filename>,
<filename>usb-gadget.target</filename>,
<!-- slices --><filename>-.slice</filename>,
<filename>capsule.slice</filename>,
<filename>machine.slice</filename>,
<filename>system.slice</filename>,
<filename>user.slice</filename>,
<filename>machine.slice</filename>,
<!-- the rest --><filename>-.mount</filename>,
<filename>dbus.service</filename>,
<filename>dbus.socket</filename>,
@ -1305,18 +1306,39 @@
<varlistentry>
<term><filename>-.slice</filename></term>
<listitem>
<para>The root slice is the root of the slice hierarchy. It usually does not contain
units directly, but may be used to set defaults for the whole tree.</para>
<para>The root slice is the root of the slice hierarchy. It usually does not contain units
directly, but may be used to set defaults for the whole tree.</para>
<xi:include href="version-info.xml" xpointer="v206"/>
</listitem>
</varlistentry>
<varlistentry>
<term><filename>machine.slice</filename></term>
<listitem>
<para>By default, all virtual machines and containers registered with
<command>systemd-machined</command> are found in this slice. This is pulled in by
<filename>systemd-machined.service</filename>.</para>
<xi:include href="version-info.xml" xpointer="v206"/>
</listitem>
</varlistentry>
<varlistentry>
<term><filename>capsule.slice</filename></term>
<listitem>
<para>By default, all capsules encapsulated in <filename>capsule@.service</filename> are found in
this slice.</para>
<xi:include href="version-info.xml" xpointer="v255"/>
</listitem>
</varlistentry>
<varlistentry>
<term><filename>system.slice</filename></term>
<listitem>
<para>By default, all system services started by
<command>systemd</command> are found in this slice.</para>
<para>By default, all system services started by <command>systemd</command> are found in this
slice.</para>
<xi:include href="version-info.xml" xpointer="v206"/>
</listitem>
@ -1334,17 +1356,6 @@
</listitem>
</varlistentry>
<varlistentry>
<term><filename>machine.slice</filename></term>
<listitem>
<para>By default, all virtual machines and containers
registered with <command>systemd-machined</command> are
found in this slice. This is pulled in by
<filename>systemd-machined.service</filename>.</para>
<xi:include href="version-info.xml" xpointer="v206"/>
</listitem>
</varlistentry>
</variablelist>
</refsect2>
</refsect1>
@ -1362,16 +1373,31 @@
<varlistentry>
<term><filename>default.target</filename></term>
<listitem>
<para>This is the main target of the user session, started by default. Various services that
compose the normal user session should be pulled into this target. In this regard,
<filename>default.target</filename> is similar to <filename>multi-user.target</filename> in the
system instance, but it is a real unit, not an alias.</para>
<para>This is the main target of the user service manager, started by default when the service
manager is invoked. Various services that compose the normal user session should be pulled into
this target. In this regard, <filename>default.target</filename> is similar to
<filename>multi-user.target</filename> in the system instance, but it is a real unit, not an
alias.</para>
<xi:include href="version-info.xml" xpointer="v242"/>
</listitem>
</varlistentry>
</variablelist>
<variablelist>
<varlistentry>
<term><filename>capsule@.target</filename></term>
<listitem>
<para>This is the main target of capsule service managers, started by default, instantiated with
the capsule name. This may be used to define different sets of units that are started for
different capsules via generic unit definitions. For details about capsules see
<citerefentry><refentrytitle>capsule@.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
<xi:include href="version-info.xml" xpointer="v255"/>
</listitem>
</varlistentry>
</variablelist>
<para>In addition, the following units are available which have definitions similar to their
system counterparts:
<filename>exit.target</filename>,

View File

@ -55,4 +55,15 @@
implied.</para>
</listitem>
</varlistentry>
<varlistentry id='capsule'>
<term><option>-C</option></term>
<term><option>--capsule=</option></term>
<listitem id='capsule-text'>
<para>Execute operation on a capsule. Specify a capsule name to connect to. See
<citerefentry><refentrytitle>capsule@.service</refentrytitle><manvolnum>5</manvolnum></citerefentry> for
details about capsules.</para>
</listitem>
</varlistentry>
</variablelist>

View File

@ -188,6 +188,7 @@ TasksMax=33%</programlisting>
<member><citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd.special</refentrytitle><manvolnum>7</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>capsule@.service</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
<member><citerefentry project='man-pages'><refentrytitle>pam</refentrytitle><manvolnum>8</manvolnum></citerefentry></member>
</simplelist></para>
</refsect1>

View File

@ -28,6 +28,7 @@
#include "parse-util.h"
#include "path-util.h"
#include "pretty-print.h"
#include "capsule-util.h"
#include "runtime-scope.h"
#include "set.h"
#include "sort-util.h"
@ -72,6 +73,7 @@ static int json_transform_message(sd_bus_message *m, JsonVariant **ret);
static int acquire_bus(bool set_monitor, sd_bus **ret) {
_cleanup_(sd_bus_close_unrefp) sd_bus *bus = NULL;
_cleanup_close_ int pin_fd = -EBADF;
int r;
r = sd_bus_new(&bus);
@ -138,10 +140,13 @@ static int acquire_bus(bool set_monitor, sd_bus **ret) {
r = bus_set_address_machine(bus, arg_runtime_scope, arg_host);
break;
case BUS_TRANSPORT_CAPSULE:
r = bus_set_address_capsule_bus(bus, arg_host, &pin_fd);
break;
default:
assert_not_reached();
}
if (r < 0)
return bus_log_address_error(r, arg_transport);
@ -2385,6 +2390,7 @@ static int parse_argv(int argc, char *argv[]) {
{ "match", required_argument, NULL, ARG_MATCH },
{ "host", required_argument, NULL, 'H' },
{ "machine", required_argument, NULL, 'M' },
{ "capsule", required_argument, NULL, 'C' },
{ "size", required_argument, NULL, ARG_SIZE },
{ "list", no_argument, NULL, ARG_LIST },
{ "quiet", no_argument, NULL, 'q' },
@ -2406,7 +2412,7 @@ static int parse_argv(int argc, char *argv[]) {
assert(argc >= 0);
assert(argv);
while ((c = getopt_long(argc, argv, "hH:M:qjl", options, NULL)) >= 0)
while ((c = getopt_long(argc, argv, "hH:M:C:J:qjl", options, NULL)) >= 0)
switch (c) {
@ -2490,6 +2496,17 @@ static int parse_argv(int argc, char *argv[]) {
arg_host = optarg;
break;
case 'C':
r = capsule_name_is_valid(optarg);
if (r < 0)
return log_error_errno(r, "Unable to validate capsule name '%s': %m", optarg);
if (r == 0)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Invalid capsule name: %s", optarg);
arg_host = optarg;
arg_transport = BUS_TRANSPORT_CAPSULE;
break;
case 'q':
arg_quiet = true;
break;

View File

@ -254,6 +254,9 @@ struct sd_bus {
char *address;
unsigned address_index;
uid_t connect_as_uid;
gid_t connect_as_gid;
int last_connect_error;
enum bus_auth auth;

View File

@ -503,11 +503,38 @@ static int bus_socket_write_auth(sd_bus *b) {
if (b->prefer_writev)
k = writev(b->output_fd, b->auth_iovec + b->auth_index, ELEMENTSOF(b->auth_iovec) - b->auth_index);
else {
CMSG_BUFFER_TYPE(CMSG_SPACE(sizeof(struct ucred))) control = {};
struct msghdr mh = {
.msg_iov = b->auth_iovec + b->auth_index,
.msg_iovlen = ELEMENTSOF(b->auth_iovec) - b->auth_index,
};
if (uid_is_valid(b->connect_as_uid) || gid_is_valid(b->connect_as_gid)) {
/* If we shall connect under some specific UID/GID, then synthesize an
* SCM_CREDENTIALS record accordingly. After all we want to adopt this UID/GID both
* for SO_PEERCRED (where we have to fork()) and SCM_CREDENTIALS (where we can just
* fake it via sendmsg()) */
struct ucred ucred = {
.pid = getpid_cached(),
.uid = uid_is_valid(b->connect_as_uid) ? b->connect_as_uid : getuid(),
.gid = gid_is_valid(b->connect_as_gid) ? b->connect_as_gid : getgid(),
};
mh.msg_control = &control;
mh.msg_controllen = sizeof(control);
struct cmsghdr *cmsg = CMSG_FIRSTHDR(&mh);
*cmsg = (struct cmsghdr) {
.cmsg_level = SOL_SOCKET,
.cmsg_type = SCM_CREDENTIALS,
.cmsg_len = CMSG_LEN(sizeof(struct ucred)),
};
memcpy(CMSG_DATA(cmsg), &ucred, sizeof(struct ucred));
}
k = sendmsg(b->output_fd, &mh, MSG_DONTWAIT|MSG_NOSIGNAL);
if (k < 0 && errno == ENOTSOCK) {
b->prefer_writev = true;
@ -949,6 +976,66 @@ static int bind_description(sd_bus *b, int fd, int family) {
return 0;
}
static int connect_as(int fd, const struct sockaddr *sa, socklen_t salen, uid_t uid, gid_t gid) {
_cleanup_(close_pairp) int pfd[2] = EBADF_PAIR;
ssize_t n;
int r;
/* Shortcut if we are not supposed to drop privileges */
if (!uid_is_valid(uid) && !gid_is_valid(gid))
return RET_NERRNO(connect(fd, sa, salen));
/* This changes identity to the specified uid/gid and issues connect() as that. This is useful to
* make sure SO_PEERCRED reports the selected UID/GID rather than the usual one of the caller. */
if (pipe2(pfd, O_CLOEXEC) < 0)
return -errno;
r = safe_fork("(sd-setresuid)", FORK_RESET_SIGNALS|FORK_DEATHSIG_SIGKILL|FORK_WAIT, /* ret_pid= */ NULL);
if (r < 0)
return r;
if (r == 0) {
/* child */
pfd[0] = safe_close(pfd[0]);
r = RET_NERRNO(setgroups(0, NULL));
if (r < 0)
goto child_finish;
if (gid_is_valid(gid)) {
r = RET_NERRNO(setresgid(gid, gid, gid));
if (r < 0)
goto child_finish;
}
if (uid_is_valid(uid)) {
r = RET_NERRNO(setresuid(uid, uid, uid));
if (r < 0)
goto child_finish;
}
r = RET_NERRNO(connect(fd, sa, salen));
if (r < 0)
goto child_finish;
r = 0;
child_finish:
n = write(pfd[1], &r, sizeof(r));
if (n != sizeof(r))
_exit(EXIT_FAILURE);
_exit(EXIT_SUCCESS);
}
n = read(pfd[0], &r, sizeof(r));
if (n != sizeof(r))
return -EIO;
return r;
}
int bus_socket_connect(sd_bus *b) {
bool inotify_done = false;
int r;
@ -980,8 +1067,9 @@ int bus_socket_connect(sd_bus *b) {
b->output_fd = b->input_fd;
bus_socket_setup(b);
if (connect(b->input_fd, &b->sockaddr.sa, b->sockaddr_size) < 0) {
if (errno == EINPROGRESS) {
r = connect_as(b->input_fd, &b->sockaddr.sa, b->sockaddr_size, b->connect_as_uid, b->connect_as_gid);
if (r < 0) {
if (r == -EINPROGRESS) {
/* If we have any inotify watches open, close them now, we don't need them anymore, as
* we have successfully initiated a connection */
@ -994,7 +1082,7 @@ int bus_socket_connect(sd_bus *b) {
return 1;
}
if (IN_SET(errno, ENOENT, ECONNREFUSED) && /* ENOENT → unix socket doesn't exist at all; ECONNREFUSED → unix socket stale */
if (IN_SET(r, -ENOENT, -ECONNREFUSED) && /* ENOENT → unix socket doesn't exist at all; ECONNREFUSED → unix socket stale */
b->watch_bind &&
b->sockaddr.sa.sa_family == AF_UNIX &&
b->sockaddr.un.sun_path[0] != 0) {
@ -1022,7 +1110,7 @@ int bus_socket_connect(sd_bus *b) {
inotify_done = true;
} else
return -errno;
return r;
} else
break;
}

View File

@ -259,6 +259,8 @@ _public_ int sd_bus_new(sd_bus **ret) {
.ucred = UCRED_INVALID,
.pidfd = -EBADF,
.runtime_scope = _RUNTIME_SCOPE_INVALID,
.connect_as_uid = UID_INVALID,
.connect_as_gid = GID_INVALID,
};
/* We guarantee that wqueue always has space for at least one entry */
@ -716,7 +718,7 @@ static void skip_address_key(const char **p) {
}
static int parse_unix_address(sd_bus *b, const char **p, char **guid) {
_cleanup_free_ char *path = NULL, *abstract = NULL;
_cleanup_free_ char *path = NULL, *abstract = NULL, *uids = NULL, *gids = NULL;
size_t l;
int r;
@ -744,6 +746,18 @@ static int parse_unix_address(sd_bus *b, const char **p, char **guid) {
else if (r > 0)
continue;
r = parse_address_key(p, "uid", &uids);
if (r < 0)
return r;
else if (r > 0)
continue;
r = parse_address_key(p, "gid", &gids);
if (r < 0)
return r;
else if (r > 0)
continue;
skip_address_key(p);
}
@ -780,6 +794,17 @@ static int parse_unix_address(sd_bus *b, const char **p, char **guid) {
b->sockaddr_size = offsetof(struct sockaddr_un, sun_path) + 1 + l;
}
if (uids) {
r = parse_uid(uids, &b->connect_as_uid);
if (r < 0)
return r;
}
if (gids) {
r = parse_gid(gids, &b->connect_as_gid);
if (r < 0)
return r;
}
b->is_local = true;
return 0;

View File

@ -17,11 +17,14 @@
#include "bus-unit-util.h"
#include "bus-wait-for-jobs.h"
#include "calendarspec.h"
#include "capsule-util.h"
#include "chase.h"
#include "env-util.h"
#include "escape.h"
#include "exit-status.h"
#include "fd-util.h"
#include "format-util.h"
#include "fs-util.h"
#include "hostname-util.h"
#include "main-func.h"
#include "parse-argument.h"
@ -35,6 +38,7 @@
#include "special.h"
#include "strv.h"
#include "terminal-util.h"
#include "uid-classification.h"
#include "unit-def.h"
#include "unit-name.h"
#include "user-util.h"
@ -265,6 +269,7 @@ static int parse_argv(int argc, char *argv[]) {
{ "version", no_argument, NULL, ARG_VERSION },
{ "user", no_argument, NULL, ARG_USER },
{ "system", no_argument, NULL, ARG_SYSTEM },
{ "capsule", required_argument, NULL, 'C' },
{ "scope", no_argument, NULL, ARG_SCOPE },
{ "unit", required_argument, NULL, 'u' },
{ "description", required_argument, NULL, ARG_DESCRIPTION },
@ -317,7 +322,7 @@ static int parse_argv(int argc, char *argv[]) {
/* Resetting to 0 forces the invocation of an internal initialization routine of getopt_long()
* that checks for GNU extensions in optstring ('-' or '+' at the beginning). */
optind = 0;
while ((c = getopt_long(argc, argv, "+hrH:M:E:p:tPqGdSu:", options, NULL)) >= 0)
while ((c = getopt_long(argc, argv, "+hrC:H:M:E:p:tPqGdSu:", options, NULL)) >= 0)
switch (c) {
@ -339,6 +344,18 @@ static int parse_argv(int argc, char *argv[]) {
arg_runtime_scope = RUNTIME_SCOPE_SYSTEM;
break;
case 'C':
r = capsule_name_is_valid(optarg);
if (r < 0)
return log_error_errno(r, "Unable to validate capsule name '%s': %m", optarg);
if (r == 0)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Invalid capsule name: %s", optarg);
arg_host = optarg;
arg_transport = BUS_TRANSPORT_CAPSULE;
arg_runtime_scope = RUNTIME_SCOPE_USER;
break;
case ARG_SCOPE:
arg_scope = true;
break;
@ -1598,6 +1615,28 @@ static void set_window_title(PTYForward *f) {
(void) pty_forward_set_title_prefix(f, dot);
}
static int chown_to_capsule(const char *path, const char *capsule) {
_cleanup_free_ char *p = NULL;
int r;
assert(path);
assert(capsule);
p = path_join("/run/capsules/", capsule);
if (!p)
return -ENOMEM;
struct stat st;
r = chase_and_stat(p, /* root= */ NULL, CHASE_SAFE|CHASE_PROHIBIT_SYMLINKS, /* ret_path= */ NULL, &st);
if (r < 0)
return r;
if (uid_is_system(st.st_uid) || gid_is_system(st.st_gid)) /* paranoid safety check */
return -EPERM;
return chmod_and_chown(path, 0600, st.st_uid, st.st_gid);
}
static int start_transient_service(sd_bus *bus) {
_cleanup_(sd_bus_message_unrefp) sd_bus_message *m = NULL, *reply = NULL;
_cleanup_(sd_bus_error_free) sd_bus_error error = SD_BUS_ERROR_NULL;
@ -1610,7 +1649,7 @@ static int start_transient_service(sd_bus *bus) {
if (arg_stdio == ARG_STDIO_PTY) {
if (arg_transport == BUS_TRANSPORT_LOCAL) {
if (IN_SET(arg_transport, BUS_TRANSPORT_LOCAL, BUS_TRANSPORT_CAPSULE)) {
master = posix_openpt(O_RDWR|O_NOCTTY|O_CLOEXEC|O_NONBLOCK);
if (master < 0)
return log_error_errno(errno, "Failed to acquire pseudo tty: %m");
@ -1619,6 +1658,14 @@ static int start_transient_service(sd_bus *bus) {
if (r < 0)
return log_error_errno(r, "Failed to determine tty name: %m");
if (arg_transport == BUS_TRANSPORT_CAPSULE) {
/* If we are in capsule mode, we must give the capsule UID/GID access to the PTY we just allocated first. */
r = chown_to_capsule(pty_path, arg_host);
if (r < 0)
return log_error_errno(r, "Failed to chown tty to capsule UID/GID: %m");
}
if (unlockpt(master) < 0)
return log_error_errno(errno, "Failed to unlock tty: %m");
@ -2306,7 +2353,7 @@ static int run(int argc, char* argv[]) {
* limited direct connection */
if (arg_wait ||
arg_stdio != ARG_STDIO_NONE ||
(arg_runtime_scope == RUNTIME_SCOPE_USER && arg_transport != BUS_TRANSPORT_LOCAL))
(arg_runtime_scope == RUNTIME_SCOPE_USER && !IN_SET(arg_transport, BUS_TRANSPORT_LOCAL, BUS_TRANSPORT_CAPSULE)))
r = bus_connect_transport(arg_transport, arg_host, arg_runtime_scope, &bus);
else
r = bus_connect_transport_systemd(arg_transport, arg_host, arg_runtime_scope, &bus);

View File

@ -18,13 +18,17 @@
#include "bus-internal.h"
#include "bus-label.h"
#include "bus-util.h"
#include "capsule-util.h"
#include "chase.h"
#include "daemon-util.h"
#include "data-fd-util.h"
#include "fd-util.h"
#include "format-util.h"
#include "memstream-util.h"
#include "path-util.h"
#include "socket-util.h"
#include "stdio-util.h"
#include "uid-classification.h"
static int name_owner_change_callback(sd_bus_message *m, void *userdata, sd_bus_error *ret_error) {
sd_event *e = ASSERT_PTR(userdata);
@ -268,6 +272,140 @@ int bus_connect_user_systemd(sd_bus **ret_bus) {
return 0;
}
static int pin_capsule_socket(const char *capsule, const char *suffix, uid_t *ret_uid, gid_t *ret_gid) {
_cleanup_close_ int inode_fd = -EBADF;
_cleanup_free_ char *p = NULL;
struct stat st;
int r;
assert(capsule);
assert(suffix);
p = path_join("/run/capsules", capsule, suffix);
if (!p)
return -ENOMEM;
/* We enter territory owned by the user, hence let's be paranoid about symlinks and ownership */
r = chase(p, /* root= */ NULL, CHASE_SAFE|CHASE_PROHIBIT_SYMLINKS, /* ret_path= */ NULL, &inode_fd);
if (r < 0)
return r;
if (fstat(inode_fd, &st) < 0)
return -errno;
/* Paranoid safety check */
if (uid_is_system(st.st_uid) || gid_is_system(st.st_gid))
return -EPERM;
*ret_uid = st.st_uid;
*ret_gid = st.st_gid;
return TAKE_FD(inode_fd);
}
int bus_connect_capsule_systemd(const char *capsule, sd_bus **ret_bus) {
_cleanup_(sd_bus_close_unrefp) sd_bus *bus = NULL;
_cleanup_close_ int inode_fd = -EBADF;
_cleanup_free_ char *pp = NULL;
uid_t uid;
gid_t gid;
int r;
assert(capsule);
assert(ret_bus);
r = capsule_name_is_valid(capsule);
if (r < 0)
return r;
if (r == 0)
return -EINVAL;
/* Connects to a capsule's user bus. We need to do so under the capsule's UID/GID, otherwise the
* the service manager might refuse our connection. Hence fake it. */
inode_fd = pin_capsule_socket(capsule, "systemd/private", &uid, &gid);
if (inode_fd < 0)
return inode_fd;
pp = bus_address_escape(FORMAT_PROC_FD_PATH(inode_fd));
if (!pp)
return -ENOMEM;
r = sd_bus_new(&bus);
if (r < 0)
return r;
if (asprintf(&bus->address, "unix:path=%s,uid=" UID_FMT ",gid=" GID_FMT, pp, uid, gid) < 0)
return -ENOMEM;
r = sd_bus_start(bus);
if (r < 0)
return r;
*ret_bus = TAKE_PTR(bus);
return 0;
}
int bus_set_address_capsule_bus(sd_bus *bus, const char *capsule, int *ret_pin_fd) {
_cleanup_free_ char *pp = NULL;
_cleanup_close_ int inode_fd = -EBADF;
uid_t uid;
gid_t gid;
int r;
assert(bus);
assert(capsule);
assert(ret_pin_fd);
r = capsule_name_is_valid(capsule);
if (r < 0)
return r;
if (r == 0)
return -EINVAL;
inode_fd = pin_capsule_socket(capsule, "bus", &uid, &gid);
if (inode_fd < 0)
return inode_fd;
pp = bus_address_escape(FORMAT_PROC_FD_PATH(inode_fd));
if (!pp)
return -ENOMEM;
if (asprintf(&bus->address, "unix:path=%s,uid=" UID_FMT ",gid=" GID_FMT, pp, uid, gid) < 0)
return -ENOMEM;
*ret_pin_fd = TAKE_FD(inode_fd); /* This fd must be kept pinned until the connection has been established */
return 0;
}
int bus_connect_capsule_bus(const char *capsule, sd_bus **ret_bus) {
_cleanup_(sd_bus_close_unrefp) sd_bus *bus = NULL;
_cleanup_close_ int inode_fd = -EBADF;
int r;
assert(capsule);
assert(ret_bus);
r = sd_bus_new(&bus);
if (r < 0)
return r;
r = bus_set_address_capsule_bus(bus, capsule, &inode_fd);
if (r < 0)
return r;
r = sd_bus_set_bus_client(bus, true);
if (r < 0)
return r;
r = sd_bus_start(bus);
if (r < 0)
return r;
*ret_bus = TAKE_PTR(bus);
return 0;
}
int bus_connect_transport(
BusTransport transport,
const char *host,
@ -281,12 +419,10 @@ int bus_connect_transport(
assert(transport < _BUS_TRANSPORT_MAX);
assert(ret);
assert_return((transport == BUS_TRANSPORT_LOCAL) == !host, -EINVAL);
assert_return(transport != BUS_TRANSPORT_REMOTE || runtime_scope == RUNTIME_SCOPE_SYSTEM, -EOPNOTSUPP);
switch (transport) {
case BUS_TRANSPORT_LOCAL:
assert_return(!host, -EINVAL);
switch (runtime_scope) {
@ -308,11 +444,12 @@ int bus_connect_transport(
break;
case BUS_TRANSPORT_REMOTE:
assert_return(runtime_scope == RUNTIME_SCOPE_SYSTEM, -EOPNOTSUPP);
r = sd_bus_open_system_remote(&bus, host);
break;
case BUS_TRANSPORT_MACHINE:
switch (runtime_scope) {
case RUNTIME_SCOPE_USER:
@ -329,6 +466,12 @@ int bus_connect_transport(
break;
case BUS_TRANSPORT_CAPSULE:
assert_return(runtime_scope == RUNTIME_SCOPE_USER, -EINVAL);
r = bus_connect_capsule_bus(host, &bus);
break;
default:
assert_not_reached();
}
@ -343,28 +486,32 @@ int bus_connect_transport(
return 0;
}
int bus_connect_transport_systemd(BusTransport transport, const char *host, RuntimeScope runtime_scope, sd_bus **bus) {
int bus_connect_transport_systemd(
BusTransport transport,
const char *host,
RuntimeScope runtime_scope,
sd_bus **ret_bus) {
assert(transport >= 0);
assert(transport < _BUS_TRANSPORT_MAX);
assert(bus);
assert_return((transport == BUS_TRANSPORT_LOCAL) == !host, -EINVAL);
assert_return(transport == BUS_TRANSPORT_LOCAL || runtime_scope == RUNTIME_SCOPE_SYSTEM, -EOPNOTSUPP);
assert(ret_bus);
switch (transport) {
case BUS_TRANSPORT_LOCAL:
assert_return(!host, -EINVAL);
switch (runtime_scope) {
case RUNTIME_SCOPE_USER:
return bus_connect_user_systemd(bus);
return bus_connect_user_systemd(ret_bus);
case RUNTIME_SCOPE_SYSTEM:
if (sd_booted() <= 0)
/* Print a friendly message when the local system is actually not running systemd as PID 1. */
return log_error_errno(SYNTHETIC_ERRNO(EHOSTDOWN),
"System has not been booted with systemd as init system (PID 1). Can't operate.");
return bus_connect_system_systemd(bus);
return bus_connect_system_systemd(ret_bus);
default:
assert_not_reached();
@ -373,10 +520,16 @@ int bus_connect_transport_systemd(BusTransport transport, const char *host, Runt
break;
case BUS_TRANSPORT_REMOTE:
return sd_bus_open_system_remote(bus, host);
assert_return(runtime_scope == RUNTIME_SCOPE_SYSTEM, -EOPNOTSUPP);
return sd_bus_open_system_remote(ret_bus, host);
case BUS_TRANSPORT_MACHINE:
return sd_bus_open_system_machine(bus, host);
assert_return(runtime_scope == RUNTIME_SCOPE_SYSTEM, -EOPNOTSUPP);
return sd_bus_open_system_machine(ret_bus, host);
case BUS_TRANSPORT_CAPSULE:
assert_return(runtime_scope == RUNTIME_SCOPE_USER, -EINVAL);
return bus_connect_capsule_systemd(host, ret_bus);
default:
assert_not_reached();

View File

@ -21,6 +21,7 @@ typedef enum BusTransport {
BUS_TRANSPORT_LOCAL,
BUS_TRANSPORT_REMOTE,
BUS_TRANSPORT_MACHINE,
BUS_TRANSPORT_CAPSULE,
_BUS_TRANSPORT_MAX,
_BUS_TRANSPORT_INVALID = -EINVAL,
} BusTransport;
@ -36,8 +37,12 @@ bool bus_error_is_unknown_service(const sd_bus_error *error);
int bus_check_peercred(sd_bus *c);
int bus_set_address_capsule_bus(sd_bus *bus, const char *capsule, int *ret_pin_fd);
int bus_connect_system_systemd(sd_bus **ret_bus);
int bus_connect_user_systemd(sd_bus **ret_bus);
int bus_connect_capsule_systemd(const char *capsule, sd_bus **ret_bus);
int bus_connect_capsule_bus(const char *capsule, sd_bus **ret_bus);
int bus_connect_transport(BusTransport transport, const char *host, RuntimeScope runtime_scope, sd_bus **bus);
int bus_connect_transport_systemd(BusTransport transport, const char *host, RuntimeScope runtime_scope, sd_bus **bus);

17
src/shared/capsule-util.c Normal file
View File

@ -0,0 +1,17 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#include "capsule-util.h"
#include "path-util.h"
#include "user-util.h"
int capsule_name_is_valid(const char *name) {
if (!filename_is_valid(name))
return false;
_cleanup_free_ char *prefixed = strjoin("c-", name);
if (!prefixed)
return -ENOMEM;
return valid_user_group_name(prefixed, /* flags= */ 0);
}

View File

@ -0,0 +1,4 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#pragma once
int capsule_name_is_valid(const char *name);

View File

@ -140,6 +140,7 @@ shared_sources = files(
'pkcs11-util.c',
'plymouth-util.c',
'pretty-print.c',
'capsule-util.c',
'ptyfwd.c',
'qrcode-util.c',
'quota-util.c',

View File

@ -255,14 +255,29 @@ static const char** make_extra_args(const char *extra_args[static 4]) {
if (arg_runtime_scope != RUNTIME_SCOPE_SYSTEM)
extra_args[n++] = "--user";
if (arg_transport == BUS_TRANSPORT_REMOTE) {
switch (arg_transport) {
case BUS_TRANSPORT_REMOTE:
extra_args[n++] = "-H";
extra_args[n++] = arg_host;
} else if (arg_transport == BUS_TRANSPORT_MACHINE) {
break;
case BUS_TRANSPORT_MACHINE:
extra_args[n++] = "-M";
extra_args[n++] = arg_host;
} else
assert(arg_transport == BUS_TRANSPORT_LOCAL);
break;
case BUS_TRANSPORT_CAPSULE:
extra_args[n++] = "-C";
extra_args[n++] = arg_host;
break;
case BUS_TRANSPORT_LOCAL:
break;
default:
assert_not_reached();
}
extra_args[n] = NULL;
return extra_args;

View File

@ -42,7 +42,7 @@ int acquire_bus(BusFocus focus, sd_bus **ret) {
return log_error_errno(SYNTHETIC_ERRNO(EOPNOTSUPP), "--global is not supported for this operation.");
/* We only go directly to the manager, if we are using a local transport */
if (arg_transport != BUS_TRANSPORT_LOCAL)
if (!IN_SET(arg_transport, BUS_TRANSPORT_LOCAL, BUS_TRANSPORT_CAPSULE))
focus = BUS_FULL;
if (getenv_bool("SYSTEMCTL_FORCE_BUS") > 0)

View File

@ -18,6 +18,7 @@
#include "path-util.h"
#include "pretty-print.h"
#include "process-util.h"
#include "capsule-util.h"
#include "reboot-util.h"
#include "rlimit-util.h"
#include "sigbus.h"
@ -63,6 +64,7 @@
#include "systemctl.h"
#include "terminal-util.h"
#include "time-util.h"
#include "user-util.h"
#include "verbs.h"
#include "virt.h"
@ -262,6 +264,7 @@ static int systemctl_help(void) {
" --version Show package version\n"
" --system Connect to system manager\n"
" --user Connect to user service manager\n"
" -C --capsule=NAME Connect to service manager of specified capsule\n"
" -H --host=[USER@]HOST Operate on remote host\n"
" -M --machine=CONTAINER Operate on a local container\n"
" -t --type=TYPE List units of a particular type\n"
@ -490,6 +493,7 @@ static int systemctl_parse_argv(int argc, char *argv[]) {
{ "user", no_argument, NULL, ARG_USER },
{ "system", no_argument, NULL, ARG_SYSTEM },
{ "global", no_argument, NULL, ARG_GLOBAL },
{ "capsule", required_argument, NULL, 'C' },
{ "wait", no_argument, NULL, ARG_WAIT },
{ "no-block", no_argument, NULL, ARG_NO_BLOCK },
{ "legend", required_argument, NULL, ARG_LEGEND },
@ -544,7 +548,7 @@ static int systemctl_parse_argv(int argc, char *argv[]) {
/* We default to allowing interactive authorization only in systemctl (not in the legacy commands) */
arg_ask_password = true;
while ((c = getopt_long(argc, argv, "ht:p:P:alqfs:H:M:n:o:iTr.::", options, NULL)) >= 0)
while ((c = getopt_long(argc, argv, "hC:t:p:P:alqfs:H:M:n:o:iTr.::", options, NULL)) >= 0)
switch (c) {
@ -679,6 +683,18 @@ static int systemctl_parse_argv(int argc, char *argv[]) {
arg_runtime_scope = RUNTIME_SCOPE_GLOBAL;
break;
case 'C':
r = capsule_name_is_valid(optarg);
if (r < 0)
return log_error_errno(r, "Unable to validate capsule name '%s': %m", optarg);
if (r == 0)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Invalid capsule name: %s", optarg);
arg_host = optarg;
arg_transport = BUS_TRANSPORT_CAPSULE;
arg_runtime_scope = RUNTIME_SCOPE_USER;
break;
case ARG_WAIT:
arg_wait = true;
break;

View File

@ -0,0 +1,53 @@
#!/usr/bin/env bash
# SPDX-License-Identifier: LGPL-2.1-or-later
# shellcheck disable=SC2235
set -eux
set -o pipefail
at_exit() {
set +e
systemctl --no-block stop capsule@foobar.service
rm -rf /run/capsules/foobar
rm -rf /var/lib/capsules/foobar
rm -f /run/systemd/system/capsule@.service.d/99-asan.conf
}
trap at_exit EXIT
# Appease ASan, since the capsule@.service uses DynamicUser=yes
systemctl edit --runtime --stdin capsule@.service --drop-in=99-asan.conf <<EOF
[Service]
EnvironmentFile=-/usr/lib/systemd/systemd-asan-env
EOF
(! test -f /run/capsules/foobar )
(! test -f /var/lib/capsules/foobar )
(! id -u c-foobar )
systemctl start capsule@foobar.service
test -d /run/capsules/foobar
test -d /var/lib/capsules/foobar
id -u c-foobar
systemctl status capsule@foobar.service
busctl -C foobar
systemctl -C foobar
systemd-run -C foobar -u sleepinfinity /bin/sleep infinity
systemctl -C foobar status sleepinfinity
systemctl -C foobar stop sleepinfinity
(! systemctl clean capsule@foobar.service )
systemctl stop capsule@foobar.service
systemctl clean capsule@foobar.service --what=all
(! test -f /run/capsules/foobar )
(! test -f /var/lib/capsules/foobar )
(! id -u c-foobar )

13
units/capsule.slice Normal file
View File

@ -0,0 +1,13 @@
# SPDX-License-Identifier: LGPL-2.1-or-later
#
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
[Unit]
Description=Capsule Slice
Documentation=man:systemd.special(7)
Before=slices.target

33
units/capsule@.service.in Normal file
View File

@ -0,0 +1,33 @@
# SPDX-License-Identifier: LGPL-2.1-or-later
#
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
[Unit]
Description=Capsule Service Manager for %i
Documentation=man:capsule@.service(5)
After=dbus.service systemd-oomd.service
[Service]
User=c-%i
DynamicUser=yes
Type=notify-reload
ExecStart={{LIBEXECDIR}}/systemd --user --unit=capsule@%i.target
Environment=HOME=/var/lib/capsules/%i
Environment=XDG_RUNTIME_DIR=/run/capsules/%i
StateDirectory=capsules/%i
RuntimeDirectory=capsules/%i
LogExtraFields=CAPSULE=%i
Slice=capsule.slice
KillMode=mixed
Delegate=pids memory cpu
DelegateSubgroup=init.scope
TasksMax=infinity
TimeoutStopSec={{ DEFAULT_USER_TIMEOUT_SEC*4//3 }}s
KeyringMode=inherit
OOMScoreAdjust=100
MemoryPressureWatch=skip

View File

@ -746,6 +746,8 @@ units = [
{ 'file' : 'user-runtime-dir@.service.in' },
{ 'file' : 'user.slice' },
{ 'file' : 'user@.service.in' },
{ 'file' : 'capsule@.service.in' },
{ 'file' : 'capsule.slice' },
{
'file' : 'var-lib-machines.mount',
'conditions' : ['ENABLE_MACHINED'],

View File

@ -0,0 +1,15 @@
# SPDX-License-Identifier: LGPL-2.1-or-later
#
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
[Unit]
Description=Main Capsule Target for %i
Documentation=man:systemd.special(7)
Requires=basic.target
After=basic.target
AllowIsolate=yes

View File

@ -11,6 +11,7 @@ units = [
'graphical-session.target',
'paths.target',
'printer.target',
'capsule@.target',
'session.slice',
'shutdown.target',
'smartcard.target',