IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
To be able to run systemd in a Type=notify transient unit, the notify
socket can't be bind mounted to /run/systemd/notify as systemd in the
transient unit wants to use that as its own notify socket which conflicts
with systemd on the host.
Instead, for sandboxed units, let's bind mount the notify socket to
/run/host/notify as documented in the container interface. Since we don't
guarantee a stable location for the notify socket and insist users use
$NOTIFY_SOCKET to get its path, this is safe to do.
Recently, PrivateUsers=identity was added to support mapping the first
65536 UIDs/GIDs from parent to the child namespace and mapping the other
UID/GIDs to the nobody user.
However, there are use cases where users have UIDs/GIDs > 65536 and need
to do a similar identity mapping. Moreover, in some of those cases,
users want a full identity mapping from 0 -> UID_MAX.
To support this, we add PrivateUsers=full that does identity mapping for
all available UID/GIDs.
Note to differentiate ourselves from the init user namespace, we need to
set up the uid_map/gid_map like:
```
0 0 1
1 1 UINT32_MAX - 1
```
as the init user namedspace uses `0 0 UINT32_MAX` and some applications
- like systemd itself - determine if its a non-init user namespace based
on uid_map/gid_map files.
Note systemd will remove this heuristic in running_in_userns() in
version 258 (https://github.com/systemd/systemd/pull/35382) and uses
namespace inode. But some users may be running a container image with
older systemd < 258 so we keep this hack until version 259 for version
N-1 compatibility.
In addition to mapping the whole UID/GID space, we also set
/proc/pid/setgroups to "allow". While we usually set "deny" to avoid
security issues with dropping supplementary groups
(https://lwn.net/Articles/626665/), this ends up breaking dbus-broker
when running /sbin/init in full OS containers.
Fixes: #35168Fixes: #35425
Initialize the start of the system-wide idle time with the time logind was
initialized and not with the start of the Unix epoch. This means that systemd
will not repport a unreasonable long idle time (around 54 years at the time of
writing this), especially at in the early boot, while no login manager session,
e.g,. gdm, had a chance to provide a more accurate start of the idle period.
Fixes#35163
The MountEntry logic was refactored to store the verity
settings, and updated for ExtensionImages=, but not for
MountImages=.
Follow-up for a1a40297dbfa5bcd926d1a19320deb73c033c6f5
When trying to run dbus-broker in a systemd unit with PrivateUsers=full,
we see dbus-broker fails with EPERM at `util_audit_drop_permissions`.
The root cause is dbus-broker calls the setgroups() system call and this
is disallowed via systemd's implementation of PrivateUsers= by setting
/proc/pid/setgroups = deny. This is done to remediate potential privilege
escalation vulnerabilities in user namespaces where an attacker can remove
supplementary groups and gain access to resources where those groups are
restricted.
However, for OS-like containers, setgroups() is a pretty common API and
disabling it is not feasible. So we allow setgroups() by setting
/proc/pid/setgroups to allow in PrivateUsers=full. Note security conscious
users can still use SystemCallFilter= to disable setgroups() if they want
to specifically prevent this system call.
Fixes: #35425
The MountEntry logic was refactored to store the verity
settings, and updated for ExtensionImages=, but not for
MountImages=.
Follow-up for a1a40297dbfa5bcd926d1a19320deb73c033c6f5
This effectively reverts 0bcf1679007e71d1d37666c10ab1f8d46de8d570.
The handling is not necessary anymore after 61242b1f0f9cac399deb67c88c3b62d38218dba3.
This function was listed in the public sd-varlink.h header, but not
actually made public. Fix that. It's quite useful, the comment in it
describes the usecase nicely.
Fixes: #35554
This way, users don't have to check those features using an external
program, or wait for later failure when trying to enroll using an
unsupported feature.
E.g.:
```
# systemd-cryptenroll --fido2-device list
PATH MANUFACTURER PRODUCT RK CLIENTPIN UP UV
/dev/hidraw2 Yubico YubiKey OTP+FIDO+CCID yes no yes no
```
This introduces a new unit condition check: that matches if a specific
kmod module is allowed. This should be generally useful, but there's one
usecase in particular: we can optimize modprobe@.service with this and
avoid forking out a bunch of modprobe requests during boot for the same
kmods.
Checking if a kernel module is loaded is more complicated than just
checking if /sys/module/$MODULE/ exists, since kernel modules typically
take a while to initialize and we must check that this is complete (by
checking if the sysfs attr "initstate" is "live").
Now that we have an explicit userns check we can drop the heuristic for
it, given that it's kinda wrong (because mapping the full host UID range
into a userns is actually a thing people do).
Hence, just delete the code and only keep the userns inode check in
place.
Now that we have a reliable pidns check I don't think we really should
look for cgroupns anymore, it's too weak a check. I mean, if I myself
would implement a desktop app sandbox (like flatpak) I'd always enable
cgroupns, simply to hide the host cgroup hierarchy.
Hence drop the check.
I suggested adding this 4 years ago here:
https://github.com/systemd/systemd/pull/17902#issuecomment-745548306
- Rename to script_get_shebang_interpreter and return
-EMEDIUMTYPE if the executable is not a script.
We nowadays utilize the scheme of making ret param
of getters optional, and use them directly as checkers.
- Don't unnecessarily read the whole line, but check
only the shebang first.