IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Up until now, the behaviour in systemd has (mostly) been to silently
ignore failures to action unit directives that refer to an unavailble
controller. The addition of AssertControlGroupController and its
conditional counterpart allow explicit specification of the desired
behaviour when such a situation occurs.
As for how this can happen, it is possible that a particular controller
is not available in the cgroup hierarchy. One possible reason for this
is that, in the running kernel, the controller simply doesn't exist --
for example, the CPU controller in cgroup v2 has only recently been
merged and was out of tree until then. Another possibility is that the
controller exists, but has been forcibly disabled by `cgroup_disable=`
on the kernel command line.
In future this will also support whatever comes out of issue #7624,
`DefaultXAccounting=never`, or similar.
Otherwise, setting udev_log=debug in /etc/udev/udev.conf has no effects since
systemd-udevd is built with LOG_REALM=LOG_REALM_UDEV.
However using LOG_REALM_UDEV (for libudev_core) reveals another similar bug for
udevadm which should also define LOG_REALM_UDEV.
This code is executed before we parse command line/configuration
parameters, hence let's not use arg_system to figure our how to clean up
things, but instead PID == 1. Let's move that check inside of the
function, to make things a bit more robust abstract from the outside.
Also, let's add a log message about this, that was so far missing.
Let's place them in initialize_runtime(), where they appear to fit best.
Effectively this is just a move a little bit down, swapping places with
log_execution_mode(), which should require neither call to be done
first.
Note that changes the conditionalization a bit for these calls, from
(PID == 1) to (arg_system && arg_action == ACTION_RUN). At this point this is pretty much the same
however, as we don't allow PID 1 without ACTION_RUN and without
arg_system set, safety_checks() ensures that.
It's part of finalizing our runtime parameters, hence let's move this
into load_configuration() after we loaded everything else. This is safe,
since we don't use it between the location where it was and where we
place it now yet.
This just sets up some variables the loaded configuration will then
modify. Let's invoke it hence right before loading the configuration.
This moves the initialization just a tiny bit later, but that shouldn't
matter, since we never access it in-between.
Let's make sure that if we are PID 1 we are invoked in ACTION_RUN mode,
and in arg_system mode, as well as the opposite.
Everything else is untested and probably not worth supporting hence
let's bail out early if people try anyway.
Let's shorten main() a bit, and split out everything that loads our
configuration and runtime parameters into a function of its own.
No changes in behaviour.
The kernel needs two numbers, but for the user it's most convenient to provide the
user name and have that resolved to uid and gid.
Right now the primary group of the specified user is always used. That's the most
common case anyway. In the future we can extend the --owner option to allow a group
after a colon.
[I added this before realizing that this will not be enough to be used for user
runtime directory. But this seems useful on its own, so I'm keeping this commit.]
This makes things a bit easier to read I think, and also makes sure we
always use the _unlikely_ wrapper around it, which so far we used
sometimes and other times we didn't. Let's clean that up.
Dec 14 14:10:54 krowka systemd[1]: System is tainted: overflowgid-not-65534
-- Subject: The system is configured in a way that might cause problems
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The following "tags" are possible:
-- - "split-usr" — /usr is a separate file system and was not mounted when systemd
-- was booted
-- - "cgroups-missing" — the kernel was compiled without cgroup support or access
-- to expected interface files is resticted
-- - "var-run-bad" — /var/run is not a symlink to /run
-- - "overflowuid-not-65534" — the kernel user ID used for "unknown" users (with
-- NFS or user namespaces) is not 65534
-- - "overflowgid-not-65534" — the kernel group ID used for "unknown" users (with
-- NFS or user namespaces) is not 65534
-- Current system is tagged as overflowgid-not-65534.
We have a check and warning at compile time. The user cannot do anything about
this at runtime, and all other taints are about checks that happen at runtime
and are specific to that system (and at least potentially correctable).
(The logic in the compilation-time check was updated to treat "nogroup" as OK,
but not the runtime check. But I think it's better to remove the runtime check
for this altogether, so this becomes moot.)
Followup to previous commit. Suggested by @poettering.
Reindented the `verbs[]` tables to match the apparent previous
whitespace rules (indent to one flag, allow multiple flags to overflow?).
A lot of code references the `running_in_chroot()` function; while
I didn't dig I'm pretty certain this arose to deal with situations
like RPM package builds in `mock` - there we don't want the `%post`s
to `systemctl start` for example.
And actually this exact same use case arises for
[rpm-ostree](https://github.com/projectatomic/rpm-ostree/)
where we implement offline upgrades by default; the `%post`s are
always run in a new chroot using [bwrap](https://github.com/projectatomic/bubblewrap).
And here's the problem: bwrap creates proper mount roots, so it
passes `running_in_chroot()`, and then if a script tries to do
`systemctl start` we get:
`System has not been booted with systemd as init system (PID 1)`
but that's an *error*, unlike the `running_in_chroot()` case where we ignore.
Further complicating things is there are real world RPM packages
like `glusterfs` which end up invoking `systemctl start`.
A while ago, the `SYSTEMD_IGNORE_CHROOT` environment variable was
added for the inverse case of running in a chroot, but still wanting
to use systemd as PID 1 (presumably some broken initramfs setups?).
Let's introduce a `SYSTEMD_OFFLINE` environment variable for cases like
mock/rpm-ostree so we can force on the "ignore everything except preset" logic.
This way we'll still not start services even if mock switches to use nspawn or
bwrap or something else that isn't a chroot.
We also cleanly supercede the `SYSTEMD_IGNORE_CHROOT=1` which is now spelled
`SYSTEMD_OFFLINE=0`. (Suggested by @poettering)
Also I made things slightly nicer here and we now print the ignored operation.
This is useful to debug things, but also to hook up external post-up
scripts with resolved.
Eventually this code might be useful to implement a
resolvconf(8)-compatible interface for compatibility purposes. Since the
semantics don't map entirely cleanly as first step we add a native
interface for pushing DNS configuration into resolved, that exposes the
correct semantics, before adding any compatibility interface.
See: #7202
memset() is weird anyway, since it expects an "int" as second parameter,
which it then uses as a byte, i.e. as uint8_t or something like that.
But by passing -1 to it, things get particularly weird, as that relies
on sign expansion to do the right thing.
In similar fashion to the previous change, sync() operations can stall
endlessly if cache is unable to be written out. In order to avoid an
unbounded hang, the sync takes place within a child process. Every 10
seconds (SYNC_TIMEOUT_USEC), the value of /proc/meminfo "Dirty" is checked
to verify it is smaller than the last iteration. If the sync is not making
progress for 3 successive iterations (SYNC_PROGRESS_ATTEMPTS), a SIGKILL is
sent to the sync process and the shutdown continues.
Remount, and subsequent umount, attempts can hang for inaccessible network
based mount points. This can leave a system in a hard hang state that
requires a hard reset in order to recover. This change moves the remount,
and umount attempts into separate child processes. The remount and umount
operations will block for up to 90 seconds (DEFAULT_TIMEOUT_USEC). Should
those waits fail, the parent will issue a SIGKILL to the child and continue
with the shutdown efforts.
In addition, instead of only reporting some additional errors on the final
attempt, failures are reported as they occur.
With Type=notify services, EXTEND_TIMEOUT_USEC= messages will delay any startup/
runtime/shutdown timeouts.
A service that hasn't timed out, i.e, start time < TimeStartSec,
runtime < RuntimeMaxSec and stop time < TimeoutStopSec, may by sending
EXTEND_TIMEOUT_USEC=, allow the service to continue beyond the limit for
the execution phase (i.e TimeStartSec, RunTimeMaxSec and TimeoutStopSec).
EXTEND_TIMEOUT_USEC= must continue to be sent (in the same way as
WATCHDOG=1) within the time interval specified to continue to reprevent
the timeout from occuring.
Watchdog timeouts are also extended if a EXTEND_TIMEOUT_USEC is greater
than the remaining time on the watchdog counter.
Fixes#5868.
These helper calls are potentially called often, and allocate FILE*
objects internally for a very short period of time, let's turn off
locking for them too.
Let's replace usage of fputc_unlocked() and friends by __fsetlocking(f,
FSETLOCKING_BYCALLER). This turns off locking for the entire FILE*,
instead of doing individual per-call decision whether to use normal
calls or _unlocked() calls.
This has various benefits:
1. It's easier to read and easier not to forget
2. It's more comprehensive, as fprintf() and friends are covered too
(as these functions have no _unlocked() counterpart)
3. Philosophically, it's a bit more correct, because it's more a
property of the file handle really whether we ever pass it on to another
thread, not of the operations we then apply to it.
This patch reworks all pieces of codes that so far used fxyz_unlocked()
calls to use __fsetlocking() instead. It also reworks all places that
use open_memstream(), i.e. use stdio FILE* for string manipulations.
Note that this in some way a revert of 4b61c87511.
This changes dns_name_between() to deal properly with checking whether B
is between A and C if A and C are equal. Previously we simply returned
-EINVAL in this case, refusing checking. With this change we correct
behaviour: if A and C are equal, then B is "between" both if it is
different from them. That's logical, since we do < and > comparisons, not
<= and >=, and that means that anything "right of A" and "left of C"
lies in between with wrap-around at the ends. And if A and C are equal
that means everything lies between, except for A itself.
This fixes handling of domains using NSEC3 "white lies", for example the
.it TLD.
Fixes: #7421
fputs() writes only first 2048 bytes and fails
to write to /proc when values are larger than that.
This patch adds a new flag to WriteStringFileFlags
that make it possible to disable the buffer under
specific cases.
This commit updates networkd behavior to check if the hostname option
received via DHCP is too long for Linux limit, and in case shorten it.
An overlong hostname will be truncated to the first dot or to
`HOST_MAX_LEN`, whatever comes earlier.
Add a new option `--network-namespace-path` to systemd-nspawn to allow
users to specify an arbitrary network namespace, e.g. `/run/netns/foo`.
Then systemd-nspawn will open the netns file, pass the fd to
outer_child, and enter the namespace represented by the fd before
running inner_child.
```
$ sudo ip netns add foo
$ mount | grep /run/netns/foo
nsfs on /run/netns/foo type nsfs (rw)
...
$ sudo systemd-nspawn -D /srv/fc27 --network-namespace-path=/run/netns/foo \
/bin/readlink -f /proc/self/ns/net
/proc/1/ns/net:[4026532009]
```
Note that the option `--network-namespace-path=` cannot be used together
with other network-related options such as `--private-network` so that
the options do not conflict with each other.
Fixes https://github.com/systemd/systemd/issues/7361
Using strlen() to declare a buffer results in a variable-length array,
even if the compiler likely optimizes it to be a compile time constant.
When building with -Wvla, certain versions of gcc complain about such
buffers. Compiling with -Wvla has the advantage of preventing variably
length array, which defeat static asserts that are implemented by
declaring an array of negative length.
While the compiler likely optimizes strlen(x) for string literals,
it is not a constant expression.
Hence,
char buffer[strlen("OPTION_000") + 1];
declares a variable-length array. STRLEN() can be used instead
when a constant espression is needed.
It's not entirely identical to strlen(), as STRLEN("a\0") counts 2.
Also, it only works with string literals and the macro enforces
that the argument is a literal.
Let's rename escaped_name to disk_path since this is an actual content
that pointer refers to. It is either path to encrypted block device
or path to encrypted image file.
Also drop redundant function disk_major_minor(). src is always set, and
it always points to either encrypted block device path (or symlink to
such device) or to encrypted image. In case it is set to device path
there is no need to reset it to /dev/block/major:minor symlink since
those paths are equivalent.
Some ask-password agents (e.g. clevis-luks-askpass) use Id option from
/run/systemd/ask-password/ask* file in order to obtain the password for
the device.
Id option should be in the following format,
e.g. Id=subsystem:data. Where data part is supposed to identify object
that ask-password query is done for. Since
e51b9486d1 this field has format
Id=cryptsetup:/dev/block/major:minor when systemd-cryptsetup is
unlocking encrypted block device. However, crypttab also supports
encrypted image files in which case we usually set data part of Id to
"vol on mountpoint". This is unexpected and actually breaks network
based device encryption as implemented by clevis.
Example:
$ cat /etc/crypttab
clevis-unlocked /clevis-test-disk-image none luks,_netdev
$ systemctl start 'systemd-cryptsetup@clevis\x2dunlocked.service'
$ grep Id /run/systemd/ask-password/ask*
Before:
$ Id=cryptsetup:clevis-unlocked on /clevis-test-disk-image-mnt
After:
$ Id=cryptsetup:/clevis-test-disk-image
RFC 8080 describes how to use EdDSA keys and signatures in DNSSEC. It
uses the curves Ed25519 and Ed448. Libgcrypt 1.8.1 does not support
Ed448, so only the Ed25519 is supported at the moment. Once Libgcrypt
supports Ed448, support for it can be trivially added to resolve.
The routing policy rule setup logic is moved to the routes setup phase (rather than the addresses setup phase as it is now). Additionally, a call to `link_check_ready` is added to the routing policy rules setup handler. This prevents a race condition with the routes setup handler.
Also give each async handler its own message counter to prevent race conditions when logging successes.
Fixes: #7614
Currently, we accept SERVFAIL after downgrading fully, cache it and move
on. Let's extend this a bit: after downgrading fully, if the SERVFAIL
logic continues to be an issue, then use a different DNS server if there
are any.
Fixes: #7147
On my system the boot and EFI partitions are protected, hence "bootctl
status" can't find the ESP, and then the tool continues with arg_path ==
NULL, which it really should not. Handle these cases, and simply
suppress all output that needs arg_path.
This renames find_esp() to find_esp_and_warn() and tries to normalize its
behaviour:
1. Change the error that is returned when we can't find the ESP to
ENOKEY (from ENOENT). This way the error code can only mean one
thing: that our search loop didn't find a good candidate.
2. Really log about all errors, except for ENOKEY and EACCES, and
document the letter cases.
3. Normalize parameters to the call: separate out the path parameter in
two: an input path and an output path. That way the memory management
is clear: we will access the input parameter only for reading, and
only write out the output parameter, using malloc() memory.
Before the calling convention were quire surprising for internal API
code, as the path parameter had to be malloc() memory and might and
might not have changed.
4. Rename bootctl's find_esp_warn() to acquire_esp(), and make it a
simple wrapper around find_esp_warn(), that basically just adds the
friendly logging for the ENOKEY case. This rework removes double
logging in a number of error cases, as we no longer log here in
anything but ENOKEY, and leave that entirely to find_esp_warn().
5. find_esp_and_warn() now takes a bool flag parameter
"unprivileged_mode", which disables logging in the EACCES case, and
skips privileged validation of the path. This makes the function less
magic, and doesn't hide this internal silencing automatism from the
caller anymore.
With all that in place "bootctl list" and "bootctl status" work properly
(or as good as they can) when I invoke the tools whithout privileges on
my system where /boot is not world-readable