IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This PR allows an option for systemd exec units to enable UTS namespaces
but not restrict changing hostname via seccomp. Thus, units can change
hostname without affecting the host. This is useful for OS-like
containers running as units where they should have freedom to change
their container hostname if they want, but not the host's hostname.
Fixes: #30348
RestrictNamespaces= would accept "time" but would not actually apply
seccomp filters e.g. systemd-run -p RestrictNamespaces=time unshare -T true
should fail but it succeeded.
This commit actually enables time namespace seccomp filtering.
Previously if one service specified the same unit as their
success and failure handler we bailed out of resolving the triggering unit
even though it is still unique.
These new tests are flaky, so disable them temporarily, until after
the release, to avoid pushing out new flakiness to consumers. They
will be re-enabled immediately after.
This allows an option for systemd exec units to enable UTS namespaces
but not restrict changing hostname via seccomp. Thus, units can change
hostname without affecting the host.
Fixes: #30348
Follow-up for c26948c6da
We only want to get rid of cred mount here, and RuntimeDirectory=
is documented to be retained for SERVICE_EXITED state.
Fixes#35427
- make udevd generate debugging logs for loopback and DM devices,
- insert 'udevadm wait' at several places to make the device processed
by udevd,
- cleanup generated integritysetup service before moving to next
algorithm,
- drop unnecessary exit on command failure,
- also test data splitting mode for all algorithms.
Otherwise the root inode will typically have what mkdtemp sets up, which
is something like 0700, which is weird and somewhat broken when trying
to look into containers from unpriv users.
This is a more comprehensive fix compared to #35273. Also adds a minimal
test only.
Based on Luca's #35273 but generalizes the code a bit.
In v258 we really should get rid of the old heuristics around userns and
cgroupns detection, but given we are late in the v257 cycle this keeps
them in.
This tmpfiles.d wants to write to sysfs, which is read-only in containers,
so systemd-tmpfiles --create fails in TEST-22-TMPFILES when ran in nspawn
if the selinux policy package is instealled. Mask it, as it's not our
config file, we don't need it in the test.
The indoe number of root pid namespace is hardcoded in the kernel to
0xEFFFFFFC since 3.8, so check the inode number of our pid namespace
if all else fails. If it's not 0xEFFFFFFC then we are in a pid
namespace, hence a container environment.
Fixes https://github.com/systemd/systemd/issues/35249
[Reworked by Lennart, to make use of namespace_is_init()]
Currently, get_fixed_user() employs USER_CREDS_SUPPRESS_PLACEHOLDER,
meaning home path is set to NULL if it's empty or root. However,
the path is also used for applying WorkingDirectory=~, and we'd
spuriously use the invoking user's home as fallback even if
User= is changed in that case.
Let's instead delegate such suppression to build_environment(),
so that home is proper initialized for usage at other steps.
shell doesn't actually suffer from such problem, but it's changed
too for consistency.
Alternative to #34789
libnvme 1.11 appears to require a kernel built with NVME TLS
kconfigs, and fails hard if it is not, as the expected
privileged keyring '.nvme' is not present. We cannot just
create it from userspace, as privileged keyrings can only
be created by the kernel itself (those starting with '.').
Skip the test if the library exactly matches this version.
https://github.com/linux-nvme/nvme-cli/issues/2573
Fixes https://github.com/systemd/systemd/issues/35130
Let's gather generic key/certificate operations in a new tool
systemd-keyutil instead of spreading them across various special
purpose tools.
Fixes#35087
Currently in mkosi and ukify we use sbsigntools to do secure boot
signing. This has multiple issues:
- sbsigntools is practically unmaintained, sbvarsign is completely
broken with the latest gnu-efi when built without -fshort-wchar and
upstream has completely ignored my bug report about this.
- sbsigntools only supports openssl engines and not the new providers
API.
- sbsigntools doesn't allow us to cache hardware token pins in the
kernel keyring like we do nowadays when we sign stuff ourselves in
systemd-repart or systemd-measure
There are alternative tools like sbctl and pesign but these do not
support caching hardware token pins in the kernel keyring either.
To get around the issues with sbsigntools, let's introduce our own
tool systemd-sbsign to do secure boot signing. This allows us to
take advantage of our own openssl infra so that hardware token pins
are cached in the kernel keyring as expected and we get openssl
provider support as well.
Currently in mkosi and ukify we use sbsigntools to do secure boot
signing. This has multiple issues:
- sbsigntools is practically unmaintained, sbvarsign is completely
broken with the latest gnu-efi when built without -fshort-wchar and
upstream has completely ignored my bug report about this.
- sbsigntools only supports openssl engines and not the new providers
API.
- sbsigntools doesn't allow us to cache hardware token pins in the
kernel keyring like we do nowadays when we sign stuff ourselves in
systemd-repart or systemd-measure
There are alternative tools like sbctl and pesign but these do not
support caching hardware token pins in the kernel keyring either.
To get around the issues with sbsigntools, let's introduce our own
tool systemd-sbsign to do secure boot signing. This allows us to
take advantage of our own openssl infra so that hardware token pins
are cached in the kernel keyring as expected and we get openssl
provider support as well.
This new setting allows unsharing the pid namespace in a unit. Because
you have to fork to get a process into a pid namespace, we fork in
systemd-executor to get into the new pid namespace. The parent then
sends the pid of the child process back to the manager and exits while
the child process continues on with the rest of exec_invoke() and then
executes the actual payload.
Communicating the child pid is done via a new pidref socket pair that is
set up on manager startup.
We unshare the PID namespace right before the mount namespace so we
mount procfs correctly. Note PrivatePIDs=yes always implies MountAPIVFS=yes
to mount procfs.
When running unprivileged in a user session, user namespace is set up first
to allow for PID namespace to be unshared. However, when running in
privileged mode, we unshare the user namespace last to ensure the user
namespace does not own the PID namespace and cannot break out of the sandbox.
Note we disallow Type=forking services from using PrivatePIDs=yes since the
init proess inside the PID namespace must not exit for other processes in
the namespace to exist.
Note Daan De Meyer did the original work for this commit with Ryan Wilson
addressing follow-ups.
Co-authored-by: Daan De Meyer <daan.j.demeyer@gmail.com>
This allows a single tmpfiles snippet with lines to symlink directories
from /usr/share/factory to be shared across many different configurations
while making sure symlinks only get created if the source actually exists.