1
0
mirror of https://github.com/systemd/systemd.git synced 2025-01-05 13:18:06 +03:00
Commit Graph

9158 Commits

Author SHA1 Message Date
Lennart Poettering
ccaa76ac48
image-discovery: add per-user scope (#35510) 2024-12-20 22:12:35 +01:00
Lennart Poettering
1c0ade2e1f discover-image: introduce per-user image directories
We nowadays support unprivileged invocation of systemd-nspawn +
systemd-vmspawn, but there was no support for discovering suitable disk
images (i.e. no per-user counterpart of /var/lib/machines). Add this
now, and hook it up everywhere.

Instead of hardcoding machined's, importd's, portabled's, sysupdated's
image discovery to RUNTIME_SCOPE_SYSTEM I introduced a field that make
the scope variable, even if this field is always initialized to
RUNTIME_SCOPE_SYSTEM for now. I think these four services should
eventually be updated to support a per-user concept too, this is
preparation for that, even though it doesn't outright add support for
this.

This is for the largest part not user visible, except for in nspawn,
vmspawn and the dissect tool. For the latter I added a pair of
--user/--system switches to select the discovery scope.
2024-12-20 18:04:01 +01:00
Lennart Poettering
4103bf9f2f man: document the new per-use credstore paths
(And some other minor tweaks)
2024-12-20 17:52:07 +01:00
Antonio Alvarez Feijoo
e9f781a5a4
debug-generator: add a kernel cmdline option to pause the boot process
Introduce the `systemd.break=` kernel command line option to allow stopping the
boot process at a certain point and spawn a debug shell. After exiting this
shell, the system will resume booting.

It accepts the following values:
- `pre-udev`: before starting to process kernel uevents (initrd and host).
- `pre-basic`: before leaving early boot and regular services start (initrd and
host).
- `pre-mount`: before the root filesystem is mounted (initrd).
- `pre-switch-root`: before switching root (initrd).
2024-12-20 08:51:23 +01:00
Antonio Alvarez Feijoo
cb3801a4c9
man/debug-generator: add a section for kernel command line options 2024-12-20 08:48:23 +01:00
Yu Watanabe
5e837858e7
analyze: add "chid" verb to display CHIDs of the local system (#35175)
We already have the code for it, expose it in systemd-analyze, because
it's useful.
2024-12-20 11:47:03 +09:00
Matthias Lisin
6e3f32cc56
man/sysupdate.features: fix typos 2024-12-19 12:39:32 +01:00
Matthias Lisin
f441831c9e
man/sysupdate.d: fix wrong PathRelativeTo value 2024-12-19 12:39:31 +01:00
Matthias Lisin
4bc06da775
man: fix args order for udevadm info cmd 2024-12-19 12:39:31 +01:00
Lennart Poettering
8f114904fc analyze: add verb for showing system's CHIDs
We have the code already, expose it in systemd-analyze too.

This should make it easier to debug the CHID use in the UKIs with
onboard tooling.
2024-12-18 17:38:42 +01:00
Daan De Meyer
a48803fd84 man: Document generator sandbox environment 2024-12-19 00:36:52 +09:00
Lennart Poettering
7a8556b901
confext/sysext: add initrd-specific units (#35426)
In the rootfs these need to run after /var/lib/ has been set up. In the
initrd we want them to run as soon as possible so that they can be used
to customize setting up the rootfs.
2024-12-18 10:33:38 +01:00
Lennart Poettering
00a415fc8f tree-wide: remove support for kernels lacking ambient caps
Let's bump the kernel baseline a bit to 4.3 and thus require ambient
caps.

This allows us to remove support for a variety of special casing, most
importantly the ExecStart=!! hack.
2024-12-17 17:34:46 +01:00
Yu Watanabe
e76fcd0e40 core: make ProtectHostname= optionally take a hostname
Closes #35623.
2024-12-16 23:55:44 +09:00
Federico Giovanardi
7fd45eec37 udev: add option to trigger parent devices despite filters
This commit add the `-i` option to `udevadm trigger` that force it to
match parent devices even if they're excluded from filters.
The rationale is that some embedded devices have a huge number of
platform devices ( ~ 4k for MX8 ) they are there because they're defined
in the device tree but there isn't any action or udev rules associated
with them.

So at boot a significant time is spend triggering and processing rules
for devices that don't produce any effect and we would like to filter
them by calling:

```
udevadm trigger --type=device --action=add -s block -s tty
```

instead of the normal

```
udevadm trigger --type=device --action=add
```

so we can use filter to filter out only subsystems for we we know that
we have rules in place that do something useful.

On the other side action / rules are not triggered until the parent is
triggered ( which is part of another subsystem), so the additional option
will allows udev to complete the coldplug with only the devices we care.

Example on iMX8:

.Without the new option
```
root@dev:~# udevadm trigger --dry-run  -s block --action=add -v
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0boot0
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0boot1
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p1
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p2
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p3
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p4
```

.With the new option
```
root@dev:~# udevadm trigger --dry-run -i -s block --action=add -v
/sys/devices/platform
/sys/devices/platform/bus@5b000000
/sys/devices/platform/bus@5b000000/5b010000.mmc
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0boot0
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0boot1
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p1
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p2
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p3
/sys/devices/platform/bus@5b000000/5b010000.mmc/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p4
```
Boot time reduction with this is place is ~ 1 second.
2024-12-16 15:43:52 +01:00
David Härdeman
130698dc20 logind: allow wall messages to be controlled via config file
Right now, the sending of wall messages on reboot/shutdown/etc can be
controlled via DBus properties. This patch adds support for changing the
default via the logind.conf file as well.

Note that the DBus setting is lost if logind is restarted or reloaded,
but it was already the case before this patch that the setting is lost
upon restart.
2024-12-14 10:54:58 +09:00
Yu Watanabe
10a768443a network: introduce MPLSRouting= to enable MPLS routing
Closing #35487.
2024-12-13 15:36:45 +00:00
Luca Boccassi
ed803ee195
journalctl: make --setup-keys honor --output=json and --quiet (#35507)
Closes #35503.
Closes #35504.
2024-12-13 13:40:09 +00:00
Luca Boccassi
6dfd290031
core: Add PrivateUsers=full (#35183)
Recently, PrivateUsers=identity was added to support mapping the first
65536 UIDs/GIDs from parent to the child namespace and mapping the other
UID/GIDs to the nobody user.

However, there are use cases where users have UIDs/GIDs > 65536 and need
to do a similar identity mapping. Moreover, in some of those cases,
users want a full identity mapping from 0 -> UID_MAX.

To support this, we add PrivateUsers=full that does identity mapping for
all available UID/GIDs.

Note to differentiate ourselves from the init user namespace, we need to
set up the uid_map/gid_map like:
```
0 0 1
1 1 UINT32_MAX - 1
```

as the init user namedspace uses `0 0 UINT32_MAX` and some applications
- like systemd itself - determine if its a non-init user namespace based
on uid_map/gid_map files.

Note systemd will remove this heuristic in running_in_userns() in
version 258 (https://github.com/systemd/systemd/pull/35382) and uses
namespace inode. But some users may be running a container image with
older systemd < 258 so we keep this hack until version 259 for version
N-1 compatibility.

In addition to mapping the whole UID/GID space, we also set
/proc/pid/setgroups to "allow". While we usually set "deny" to avoid
security issues with dropping supplementary groups
(https://lwn.net/Articles/626665/), this ends up breaking dbus-broker
when running /sbin/init in full OS containers.

Fixes: #35168
Fixes: #35425
2024-12-13 12:25:13 +00:00
Ryan Wilson
2665425176 core: Set /proc/pid/setgroups to allow for PrivateUsers=full
When trying to run dbus-broker in a systemd unit with PrivateUsers=full,
we see dbus-broker fails with EPERM at `util_audit_drop_permissions`.

The root cause is dbus-broker calls the setgroups() system call and this
is disallowed via systemd's implementation of PrivateUsers= by setting
/proc/pid/setgroups = deny. This is done to remediate potential privilege
escalation vulnerabilities in user namespaces where an attacker can remove
supplementary groups and gain access to resources where those groups are
restricted.

However, for OS-like containers, setgroups() is a pretty common API and
disabling it is not feasible. So we allow setgroups() by setting
/proc/pid/setgroups to allow in PrivateUsers=full. Note security conscious
users can still use SystemCallFilter= to disable setgroups() if they want
to specifically prevent this system call.

Fixes: #35425
2024-12-12 11:36:10 +00:00
Yu Watanabe
831bbaf5cd creds: support --transcode=help and --with-key=help 2024-12-12 15:25:34 +09:00
Carlo Teubner
dfbd4d8bc5 systemd-cryptenroll.xml: fix typo 2024-12-11 23:10:59 +00:00
cvlc12
693038fce4
man: update example in systemd-measure.xml (#35506)
In the example from systemd-measure(1), do not bind to PCR 7 in
addition to the PCR policy.

As long as this is still done by default, see #35280.
2024-12-12 06:09:11 +09:00
Mike Yuan
3ae314afdc Revert "run: disable --expand-environment by default for --scope"
This reverts commit 8167c56bfa.

We've announced the breaking change during v254-v257. Let's actually
apply it for v258.
2024-12-12 06:05:30 +09:00
Yu Watanabe
e53be91e5d
libfido2-util: show also verity features when listing FIDO2 devices (#35295)
This way, users don't have to check those features using an external
program, or wait for later failure when trying to enroll using an
unsupported feature.

E.g.:

```
# systemd-cryptenroll --fido2-device list
PATH         MANUFACTURER PRODUCT               RK  CLIENTPIN UP  UV
/dev/hidraw2 Yubico       YubiKey OTP+FIDO+CCID yes no        yes no
```
2024-12-12 05:11:46 +09:00
Lennart Poettering
3c702e8210 condition: add new ConditionKernelModuleLoaded=
This introduces a new unit condition check: that matches if a specific
kmod module is allowed. This should be generally useful, but there's one
usecase in particular: we can optimize modprobe@.service with this and
avoid forking out a bunch of modprobe requests during boot for the same
kmods.

Checking if a kernel module is loaded is more complicated than just
checking if /sys/module/$MODULE/ exists, since kernel modules typically
take a while to initialize and we must check that this is complete (by
checking if the sysfs attr "initstate" is "live").
2024-12-12 05:03:52 +09:00
andrejpodzimek
ae2f3af639 Fixing VLAN ranges in man systemd.network.
Otherwise it doesn't hold that VLANs 100-400 are allowed (because 201-299 are disallowed).
2024-12-12 03:52:00 +09:00
Katariina Lounento
3ca09aa4dd man: document unprivileged is not for reading properties
Document the fact that read-only properties may not have the flag
SD_BUS_VTABLE_UNPRIVILEGED as that is not obvious especially given the
flag is accepted for writable properties.

Based on the check in `add_object_vtable_internal` called by
`sd_bus_add_object_vtable` (as of the current tip of the main branch
f7f5ba0192):

    case _SD_BUS_VTABLE_PROPERTY: {
            [...]
            if ([...] ||
                [...]
                (v->flags & SD_BUS_VTABLE_UNPRIVILEGED && v->type == _SD_BUS_VTABLE_PROPERTY)) {
                    r = -EINVAL;
                    goto fail;
            }

(where `_SD_BUS_VTABLE_PROPERTY` means read-only property whereas
`_SD_BUS_VTABLE_WRITABLE_PROPERTY` maps to writable property).

This was implemented in the commit
adacb9575a ("bus: introduce "trusted" bus
concept and encode access control in object vtables") where
`SD_BUS_VTABLE_UNPRIVILEGED` was introduced:

    Writable properties are also subject to SD_BUS_VTABLE_UNPRIVILEGED
    and SD_BUS_VTABLE_CAPABILITY() for controlling write access to them.
    Note however that read access is unrestricted, as PropertiesChanged
    messages might send out the values anyway as an unrestricted
    broadcast.
2024-12-11 18:32:46 +01:00
Yu Watanabe
f8bfe16b06 journalctl: do not override explicitly specified -b or -n with -e or -k
Fixes #35248.
2024-12-11 18:12:13 +09:00
Antonio Alvarez Feijoo
62b7b70bb7
man/systemd-cryptenroll: sort --fido2-credential-algorithm after --fido2-device
And also fix a typo.
2024-12-11 07:32:04 +01:00
Yu Watanabe
5c9da83004 journalctl: allow to dump generated key in json format
Closes #35503.
2024-12-11 11:18:06 +09:00
Yu Watanabe
77064620d7 Revert "coredumpctl: Don't treat no coredumps as failure"
This reverts commit dfe79b9ed2.
2024-12-11 11:14:37 +09:00
Yu Watanabe
627d1a9ac1
core: Add ProtectHostname=private (#35447)
This PR allows an option for systemd exec units to enable UTS namespaces
but not restrict changing hostname via seccomp. Thus, units can change
hostname without affecting the host. This is useful for OS-like
containers running as units where they should have freedom to change
their container hostname if they want, but not the host's hostname.

Fixes: #30348
2024-12-11 10:17:25 +09:00
Daan De Meyer
dfe79b9ed2 coredumpctl: Don't treat no coredumps as failure
Having to deal with a process that fails or doesn't fail depending on
whether there are coredumps or not is incredibly annoying for users.
2024-12-10 21:03:20 +01:00
Ryan Wilson
219a6dbbf3 core: Fix time namespace in RestrictNamespaces=
RestrictNamespaces= would accept "time" but would not actually apply
seccomp filters e.g. systemd-run -p RestrictNamespaces=time unshare -T true
should fail but it succeeded.

This commit actually enables time namespace seccomp filtering.
2024-12-10 20:55:26 +01:00
Zbigniew Jędrzejewski-Szmek
7b2ebd7040 cryptenroll: show which devices support "hmac secret"
We'd silently skip devices which don't have the feature in the list.
This looked wrong esp. if no devices were suitable. Instead, list them
and show which ones are usable.

$ build/systemd-cryptenroll --fido2-device=list
PATH          MANUFACTURER PRODUCT                HMAC SECRET
/dev/hidraw7  Yubico       YubiKey OTP+FIDO+CCID  ✓
/dev/hidraw10 Yubico       Security Key by Yubico ✗
/dev/hidraw5  Yubico       Security Key by Yubico ✗
/dev/hidraw9  Yubico       Yubikey 4 OTP+U2F+CCID ✗
2024-12-10 10:58:58 +01:00
Zbigniew Jędrzejewski-Szmek
4b034cc128 systemd-cryptenroll: use pager for --help, add --no-pager option 2024-12-09 16:04:25 +01:00
Ryan Wilson
cf48bde7ae core: Add ProtectHostname=private
This allows an option for systemd exec units to enable UTS namespaces
but not restrict changing hostname via seccomp. Thus, units can change
hostname without affecting the host.

Fixes: #30348
2024-12-06 13:34:04 -08:00
Ryan Wilson
6746f28854 core: Migrate ProtectHostname to use enum vs boolean
Migrating ProtectHostname to enum will set the stage for adding more
properties like ProtectHostname=private in future commits.

In addition, we add PrivateHostnameEx property to dbus API which uses
string instead of boolean.
2024-12-06 13:33:49 -08:00
Ryan Wilson
705cc82938 core: Add PrivateUsers=full
Recently, PrivateUsers=identity was added to support mapping the first
65536 UIDs/GIDs from parent to the child namespace and mapping the other
UID/GIDs to the nobody user.

However, there are use cases where users have UIDs/GIDs > 65536 and need
to do a similar identity mapping. Moreover, in some of those cases, users
want a full identity mapping from 0 -> UID_MAX.

Note to differentiate ourselves from the init user namespace, we need to
set up the uid_map/gid_map like:
```
0 0 1
1 1 UINT32_MAX - 1
```

as the init user namedspace uses `0 0 UINT32_MAX` and some applications -
like systemd itself - determine if its a non-init user namespace based on
uid_map/gid_map files. Note systemd will remove this heuristic in
running_in_userns() in version 258 and uses namespace inode. But some users
may be running a container image with older systemd < 258 so we keep this
hack until version 259.

To support this, we add PrivateUsers=full that does identity mapping for
all available UID/GIDs.

Fixes: #35168
2024-12-05 10:34:32 -08:00
Septatrix
5857f31c2c man: clarify wording regarding MONITOR_* envs 2024-12-06 03:01:19 +09:00
Antonio Alvarez Feijoo
61cf8472e7 man: remove references to invalid rd.systemd.image_policy option
The option with the `rd.` prefix is not implemented, the image policy is not
applied in the initrd.
2024-12-03 19:36:41 +01:00
Luca Boccassi
d21b42b463 sysext: add initrd-specific unit
In the initrd we want to run as early as possible, before
any of the filesystems are set up, so that users can use
sysexts to customize kernel modules, firmware, etc. But
in the root fs it needs to run after /var/ has been set
up. Split the unit, and have an initrd-specific one that
runs very early.
2024-12-01 12:17:21 +00:00
Luca Boccassi
e813252378 confext: add initrd-specific unit
In the initrd we want to run as early as possible, before
any of the filesystems are set up, so that users can use
confexts to customize fstab/veritytab/crypttab/etc. But
in the root fs it needs to run after /var/ has been set
up. Split the unit, and have an initrd-specific one that
runs very early.
2024-12-01 12:16:54 +00:00
SuhailAhmedVelorum
27369124e8 Typo fix in man/systemd.resource-control 2024-11-28 17:23:58 +00:00
Lennart Poettering
92033d8fba man: split systemd.conf(5) into multiple sections
No changes in wording, let's just make a very long man page a bit more
digestable by adding sections, and then reordering settings to fit into
them.
2024-11-27 21:51:32 +09:00
Zbigniew Jędrzejewski-Szmek
ef20d06da6
ukify: Switch to JSON HWID description format (#35208)
Fixes #35176
2024-11-27 09:50:41 +01:00
Yu Watanabe
f29a07f3fc man: several more assorted fixes
Continuation of 4ebbb5bfe8.
Closes #35307.
2024-11-26 17:28:14 +01:00
Winterhuman
5bed97dd57
man/systemd-system.conf: Correct "struct" to "strict" (#35364) 2024-11-26 22:41:49 +09:00
Yu Watanabe
1ea1a79aa1 Revert "Revert "man: use MIT-0 license for example codes in daemon(7)""
This reverts commit 7a9d0abe4d.
2024-11-26 12:26:10 +01:00