IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Right now, the sending of wall messages on reboot/shutdown/etc can be
controlled via DBus properties. This patch adds support for changing the
default via the logind.conf file as well.
Note that the DBus setting is lost if logind is restarted or reloaded,
but it was already the case before this patch that the setting is lost
upon restart.
Recently, PrivateUsers=identity was added to support mapping the first
65536 UIDs/GIDs from parent to the child namespace and mapping the other
UID/GIDs to the nobody user.
However, there are use cases where users have UIDs/GIDs > 65536 and need
to do a similar identity mapping. Moreover, in some of those cases,
users want a full identity mapping from 0 -> UID_MAX.
To support this, we add PrivateUsers=full that does identity mapping for
all available UID/GIDs.
Note to differentiate ourselves from the init user namespace, we need to
set up the uid_map/gid_map like:
```
0 0 1
1 1 UINT32_MAX - 1
```
as the init user namedspace uses `0 0 UINT32_MAX` and some applications
- like systemd itself - determine if its a non-init user namespace based
on uid_map/gid_map files.
Note systemd will remove this heuristic in running_in_userns() in
version 258 (https://github.com/systemd/systemd/pull/35382) and uses
namespace inode. But some users may be running a container image with
older systemd < 258 so we keep this hack until version 259 for version
N-1 compatibility.
In addition to mapping the whole UID/GID space, we also set
/proc/pid/setgroups to "allow". While we usually set "deny" to avoid
security issues with dropping supplementary groups
(https://lwn.net/Articles/626665/), this ends up breaking dbus-broker
when running /sbin/init in full OS containers.
Fixes: #35168Fixes: #35425
When trying to run dbus-broker in a systemd unit with PrivateUsers=full,
we see dbus-broker fails with EPERM at `util_audit_drop_permissions`.
The root cause is dbus-broker calls the setgroups() system call and this
is disallowed via systemd's implementation of PrivateUsers= by setting
/proc/pid/setgroups = deny. This is done to remediate potential privilege
escalation vulnerabilities in user namespaces where an attacker can remove
supplementary groups and gain access to resources where those groups are
restricted.
However, for OS-like containers, setgroups() is a pretty common API and
disabling it is not feasible. So we allow setgroups() by setting
/proc/pid/setgroups to allow in PrivateUsers=full. Note security conscious
users can still use SystemCallFilter= to disable setgroups() if they want
to specifically prevent this system call.
Fixes: #35425
This way, users don't have to check those features using an external
program, or wait for later failure when trying to enroll using an
unsupported feature.
E.g.:
```
# systemd-cryptenroll --fido2-device list
PATH MANUFACTURER PRODUCT RK CLIENTPIN UP UV
/dev/hidraw2 Yubico YubiKey OTP+FIDO+CCID yes no yes no
```
This introduces a new unit condition check: that matches if a specific
kmod module is allowed. This should be generally useful, but there's one
usecase in particular: we can optimize modprobe@.service with this and
avoid forking out a bunch of modprobe requests during boot for the same
kmods.
Checking if a kernel module is loaded is more complicated than just
checking if /sys/module/$MODULE/ exists, since kernel modules typically
take a while to initialize and we must check that this is complete (by
checking if the sysfs attr "initstate" is "live").
Document the fact that read-only properties may not have the flag
SD_BUS_VTABLE_UNPRIVILEGED as that is not obvious especially given the
flag is accepted for writable properties.
Based on the check in `add_object_vtable_internal` called by
`sd_bus_add_object_vtable` (as of the current tip of the main branch
f7f5ba019206cacd486b0892fec76f70f525e04d):
case _SD_BUS_VTABLE_PROPERTY: {
[...]
if ([...] ||
[...]
(v->flags & SD_BUS_VTABLE_UNPRIVILEGED && v->type == _SD_BUS_VTABLE_PROPERTY)) {
r = -EINVAL;
goto fail;
}
(where `_SD_BUS_VTABLE_PROPERTY` means read-only property whereas
`_SD_BUS_VTABLE_WRITABLE_PROPERTY` maps to writable property).
This was implemented in the commit
adacb9575a09981fcf11279f2f661e3fc21e58ff ("bus: introduce "trusted" bus
concept and encode access control in object vtables") where
`SD_BUS_VTABLE_UNPRIVILEGED` was introduced:
Writable properties are also subject to SD_BUS_VTABLE_UNPRIVILEGED
and SD_BUS_VTABLE_CAPABILITY() for controlling write access to them.
Note however that read access is unrestricted, as PropertiesChanged
messages might send out the values anyway as an unrestricted
broadcast.
This PR allows an option for systemd exec units to enable UTS namespaces
but not restrict changing hostname via seccomp. Thus, units can change
hostname without affecting the host. This is useful for OS-like
containers running as units where they should have freedom to change
their container hostname if they want, but not the host's hostname.
Fixes: #30348
RestrictNamespaces= would accept "time" but would not actually apply
seccomp filters e.g. systemd-run -p RestrictNamespaces=time unshare -T true
should fail but it succeeded.
This commit actually enables time namespace seccomp filtering.
We'd silently skip devices which don't have the feature in the list.
This looked wrong esp. if no devices were suitable. Instead, list them
and show which ones are usable.
$ build/systemd-cryptenroll --fido2-device=list
PATH MANUFACTURER PRODUCT HMAC SECRET
/dev/hidraw7 Yubico YubiKey OTP+FIDO+CCID ✓
/dev/hidraw10 Yubico Security Key by Yubico ✗
/dev/hidraw5 Yubico Security Key by Yubico ✗
/dev/hidraw9 Yubico Yubikey 4 OTP+U2F+CCID ✗
This allows an option for systemd exec units to enable UTS namespaces
but not restrict changing hostname via seccomp. Thus, units can change
hostname without affecting the host.
Fixes: #30348
Migrating ProtectHostname to enum will set the stage for adding more
properties like ProtectHostname=private in future commits.
In addition, we add PrivateHostnameEx property to dbus API which uses
string instead of boolean.
Recently, PrivateUsers=identity was added to support mapping the first
65536 UIDs/GIDs from parent to the child namespace and mapping the other
UID/GIDs to the nobody user.
However, there are use cases where users have UIDs/GIDs > 65536 and need
to do a similar identity mapping. Moreover, in some of those cases, users
want a full identity mapping from 0 -> UID_MAX.
Note to differentiate ourselves from the init user namespace, we need to
set up the uid_map/gid_map like:
```
0 0 1
1 1 UINT32_MAX - 1
```
as the init user namedspace uses `0 0 UINT32_MAX` and some applications -
like systemd itself - determine if its a non-init user namespace based on
uid_map/gid_map files. Note systemd will remove this heuristic in
running_in_userns() in version 258 and uses namespace inode. But some users
may be running a container image with older systemd < 258 so we keep this
hack until version 259.
To support this, we add PrivateUsers=full that does identity mapping for
all available UID/GIDs.
Fixes: #35168
In the initrd we want to run as early as possible, before
any of the filesystems are set up, so that users can use
sysexts to customize kernel modules, firmware, etc. But
in the root fs it needs to run after /var/ has been set
up. Split the unit, and have an initrd-specific one that
runs very early.
In the initrd we want to run as early as possible, before
any of the filesystems are set up, so that users can use
confexts to customize fstab/veritytab/crypttab/etc. But
in the root fs it needs to run after /var/ has been set
up. Split the unit, and have an initrd-specific one that
runs very early.
Some ambiguity (e.g., same-named man pages in multiple volumes)
makes it impossible to fully automate this, but the following
Python snippet (run inside the man/ directory of the systemd repo)
helped to generate the sed command lines (which were subsequently
manually reviewed, run and the false positives reverted):
from pathlib import Path
import lxml
from lxml import etree as ET
man2vol: dict[str, str] = {}
man2citerefs: dict[str, list] = {}
for file in Path(".").glob("*.xml"):
tree = ET.parse(file, lxml.etree.XMLParser(recover=True))
meta = tree.find("refmeta")
if meta is not None:
title = meta.findtext("refentrytitle")
if title is not None:
vol = meta.findtext("manvolnum")
if vol is not None:
man2vol[title] = vol
citerefs = list(tree.iter("citerefentry"))
if citerefs:
man2citerefs[title] = citerefs
for man, refs in man2citerefs.items():
for ref in refs:
title = ref.findtext("refentrytitle")
if title is not None:
has = ref.findtext("manvolnum")
try:
should_have = man2vol[title]
except KeyError: # Non-systemd man page reference? Ignore.
continue
if has != should_have:
print(
f"sed -i '\\|<citerefentry><refentrytitle>{title}"
f"</refentrytitle><manvolnum>{has}</manvolnum>"
f"</citerefentry>|s|<manvolnum>{has}</manvolnum>|"
f"<manvolnum>{should_have}</manvolnum>|' {man}.xml"
)