IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This patch adds seccomp support to the riscv64 architecture. seccomp
support is available in the riscv64 kernel since version 5.5, and it
has just been added to the libseccomp library.
riscv64 uses generic syscalls like aarch64, so I used that architecture
as a reference to find which code has to be modified.
With this patch, the testsuite passes successfully, including the
test-seccomp test. The system boots and works fine with kernel 5.4 (i.e.
without seccomp support) and kernel 5.5 (i.e. with seccomp support). I
have also verified that the "SystemCallFilter=~socket" option prevents a
service to use the ping utility when running on kernel 5.5.
On new kernels (>= 5.8) unprivileged users may create the 0:0 character
device node. Which is great, as we can use that as inaccessible device
nodes if we run unprivileged. Hence, change how we find the right
inaccessible device inodes: when the user asks for a block device node,
but we have none, try the char device node first. If that doesn't exist,
fall back to the socket node as before.
This means that:
1. in the best case we'll return a node if the right device node type
2. otherwise we hopefully at least can return a device node if one asked
for even if the type doesn't match (i.e. we return char instead of
the requested block device node)
3. in the worst case (old kernels…) we'll return a socket node
Similarly to "setup" vs. "set up", "fallback" is a noun, and "fall back"
is the verb. (This is pretty clear when we construct a sentence in the
present continous: "we are falling back" not "we are fallbacking").
Commands like build/man/man journald.conf.d would show the installed
man page (or an error if the page cannot be found in the global search
path), and not the one in the build directory. If the man page is
a redirect, or the .html is a symlink, resolve it, build the target,
and show that.
For some reason this failed in koji build on s390x:
--- command ---
16:12:46 PATH='/builddir/build/BUILD/systemd-stable-246.1/s390x-redhat-linux-gnu:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/sbin' SYSTEMD_LANGUAGE_FALLBACK_MAP='/builddir/build/BUILD/systemd-stable-246.1/src/locale/language-fallback-map' SYSTEMD_KBD_MODEL_MAP='/builddir/build/BUILD/systemd-stable-246.1/src/locale/kbd-model-map' /builddir/build/BUILD/systemd-stable-246.1/s390x-redhat-linux-gnu/test-acl-util
--- stdout ---
-rw-r-----. 1 mockbuild mock 0 Aug 7 16:12 /tmp/test-empty.7RzmEc
other::---
--- stderr ---
Assertion 'r >= 0' failed at src/test/test-acl-util.c:42, function test_add_acls_for_user(). Aborting.
Follow the same model established for RootImage and RootImageOptions,
and allow to either append a single list of options or tuples of
partition_number:options.
let's make sure we collect the right error code from errno, otherwise
we'll see EPERM (i.e. error 1) for all errors readv() returns (since it
returns -1 on error), including EAGAIN.
This is definitely backport material.
A fix-up for 3691bcf3c5.
Fixes: #16699
The concept is flawed, and mostly useless. Let's finally remove it.
It has been deprecated since 90a2ec10f2 (6
years ago) and we started to warn since
55dadc5c57 (1.5 years ago).
Let's get rid of it altogether.
Previously, we'd create them from user-runtime-dir@.service. That has
one benefit: since this service runs privileged, we can create the full
set of device nodes. It has one major drawback though: it security-wise
problematic to create files/directories in directories as privileged
user in directories owned by unprivileged users, since they can use
symlinks to redirect what we want to do. As a general rule we hence
avoid this logic: only unpriv code should populate unpriv directories.
Hence, let's move this code to an appropriate place in the service
manager. This means we lose the inaccessible block device node, but
since there's already a fallback in place, this shouldn't be too bad.
This has the major benefit that the entire payload of the container can
access these files there. Previously, we'd set them only as env vars,
but that meant only PID 1 could read them directly or other privileged
payload code with access to /run/1/environ.
Let's make /run/host the sole place we pass stuff from host to container
in and place the "inaccessible" nodes in /run/host too.
In contrast to the previous two commits this is a minor compat break, but
not a relevant one I think. Previously the container manager would place
these nodes in /run/systemd/inaccessible/ and that's where PID 1 in the
container would try to add them too when missing. Container manager and
PID 1 in the container would thus manage the same dir together.
With this change the container manager now passes an immutable directory
to the container and leaves /run/systemd entirely untouched, and managed
exclusively by PID 1 inside the container, which is nice to have clear
separation on who manages what.
In order to make sure systemd then usses the /run/host/inaccesible/
nodes this commit changes PID 1 to look for that dir and if it exists
will symlink it to /run/systemd/inaccessible.
Now, this will work fine if new nspawn and new pid 1 in the container
work together. as then the symlink is created and the difference between
the two dirs won't matter.
For the case where an old nspawn invokes a new PID 1: in this case
things work as they always worked: the dir is managed together.
For the case where different container manager invokes a new PID 1: in
this case the nodes aren't typically passed in, and PID 1 in the
container will try to create them and will likely fail partially (though
gracefully) when trying to create char/block device nodes. THis is fine
though as there are fallbacks in place for that case.
For the case where a new nspawn invokes an old PID1: this is were the
(minor) incompatibily happens: in this case new nspawn will place the
nodes in the /run/host/inaccessible/ subdir, but the PID 1 in the
container won't look for them there. Since the nodes are also not
pre-created in /run/systed/inaccessible/ PID 1 will try to create them
there as if a different container manager sets them up. This is of
course not sexy, but is not a total loss, since as mentioned fallbacks
are in place anyway. Hence I think it's OK to accept this minor
incompatibility.
The sd_notify() socket that nspawn binds that the payload can use to
talk to it was previously stored in /run/systemd/nspawn/notify, which is
weird (as in the previous commit) since this makes /run/systemd
something that is cooperatively maintained by systemd inside the
container and nspawn outside of it.
We now have a better place where container managers can put the stuff
they want to pass to the payload: /run/host/, hence let's make use of
that.
This is not a compat breakage, since the sd_notify() protocol is based
on the $NOTIFY_SOCKET env var, where we place the new socket path.
Previously we'd use a directory /run/systemd/nspawn/incoming for
accepting mounts to propagate from the host. This is a bit weird, since
we have a shared namespace: /run/systemd/ contains both stuff managed by
the surround nspawn as well as from the systemd inside.
We now have the /run/host/ hierarchy that has special stuff we want to
pass from host to container. Let's make use of that here, and move this
directory here too.
This is not a compat breakage, since the payload never interfaces with
that directory natively: it's only nspawn and machined that need to
agree on it.
arg_system == true and getpid() == 1 hold under the very same condition
this early in the main() function (this only changes later when we start
parsing command lines, where arg_system = true is set if users invoke us
in test mode even when getpid() != 1.
Hence, let's simplify things, and merge a couple of if branches and not
pretend they were orthogonal.
Timestamps for unit start/stop are recorded with microsecond granularity,
but status and show truncate to second granularity by default.
Add a --timestamp=pretty|us|utc option to allow including the microseconds
or to use the UTC TZ to all timestamps printed by systemctl.
Apparently both Fedora and suse default to btrfs now, it should hence be
good enough for us too.
This enables a bunch of really nice things for us, most importanly we
can resize home directories freely (i.e. both grow *and* shrink) while
online. It also allows us to add nice subvolume based home directory
snapshotting later on.
Also, whenever we mention the three supported types, alaways mention
them in alphabetical order, which is also our new order of preference.
The description didn't really explain how the distribution mechanism
works exactly and the relationship of leaf and slice units.
Update the documentation and also explicitly explain the expected
behaviour as it is created by the memory_recursiveprot cgroup2 mount
option.