IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
We already have the terminal open, hence pass the fd we got to
ask_password_tty(), so that it doesn't have to reopen it a second time.
This is mostly an optimization, but it has the nice benefit of making us
independent from RLIMIT_NOFILE issues and so on, as we don't need to
allocate another fd needlessly.
This new helper not only removes a file from a directory but also
ensures its space on disk is deallocated, by either punching a hole over
the full file or truncating the file afterwards if the file's link
counter is 0. This is useful in "vacuuming" algorithms to ensure that
client's can't keep the disk space the vacuuming is supposed to recover
pinned simply by keeping an fd open to it.
This is similar to string_hash_ops but operates one file system paths
specifically. It will ensure that "/foo//bar" and "///foo/bar" are
considered to be the same path for hashmap purposes.
This makes use of the existing path_compare() API, and adds a matching
hashing function for it.
Note that relative and absolute paths will hash to different values,
however whether the path is suffixed with a slash or not is not
detected. This matches the existing path_compare() behaviour, and
follows the logic that on Linux there can't be two different objects at
path /foo/bar and /foo/bar/ either.
This adds some paranoia code that moves some of the fds we allocate for
longer periods of times to fds > 2 if they are allocated below this
boundary. This is a paranoid safety thing, in order to avoid that
external code might end up erroneously use our fds under the assumption
they were valid stdin/stdout/stderr. Think: some app closes
stdin/stdout/stderr and then invokes 'fprintf(stderr, …' which causes
writes on our fds.
This both adds the helper to do the moving as well as ports over a
number of users to this new logic. Since we don't want to litter all our
code with invocations of this I tried to strictly focus on fds we keep
open for long periods of times only and only in code that is frequently
loaded into foreign programs (under the assumptions that in our own
codebase we are smart enough to always keep stdin/stdout/stderr
allocated to avoid this pitfall). Specifically this means all code used
by NSS and our sd-xyz API:
1. our logging APIs
2. sd-event
3. sd-bus
4. sd-resolve
5. sd-netlink
This changed was inspired by this:
https://github.com/systemd/systemd/issues/8075#issuecomment-363689755
This shows that apparently IRL there are programs that do close
stdin/stdout/stderr, and we should accomodate for that.
Note that this won't fix any bugs, this just makes sure that buggy
programs are less likely to interfere with out own code.
This is preparation for emulating the "usage_usec" keyed attribute of
the "cpu.stat" property of the root cgroup from data in /proc. Similar,
for emulating the "memory.current" attribute.
It's not good if the paths are in different order. With --user, we expect
more paths, but it must be a strict superset, and the order for the ones
that appear in both sets must be the same.
$ diff -u <(build/systemd-analyze --global unit-paths) <(build/systemd-analyze --user unit-paths)|colordiff
--- /proc/self/fd/14 2018-02-08 14:11:45.425353107 +0100
+++ /proc/self/fd/15 2018-02-08 14:11:45.426353116 +0100
@@ -1,6 +1,17 @@
+/home/zbyszek/.config/systemd/system.control
+/run/user/1000/systemd/system.control
+/run/user/1000/systemd/transient
+/run/user/1000/systemd/generator.early
+/home/zbyszek/.config/systemd/user
/etc/systemd/user
+/run/user/1000/systemd/user
/run/systemd/user
+/run/user/1000/systemd/generator
+/home/zbyszek/.local/share/systemd/user
+/home/zbyszek/.local/share/flatpak/exports/share/systemd/user
+/var/lib/flatpak/exports/share/systemd/user
/usr/local/share/systemd/user
/usr/share/systemd/user
/usr/local/lib/systemd/user
/usr/lib/systemd/user
+/run/user/1000/systemd/generator.late
A test is added so that we don't regress on this.
The function `strv_join_quoted()` is now not used, and has a bug
in the buffer size calculation when the strings needs to escaped,
as reported in #8056.
So, let's remove the function.
Closes#8056.
Red is used for highligting, the same as grep does. Except when the line is
highlighted red already, because it has high priority, in which case plain ansi
highlight is used for the matched substring.
Coloring is implemented for short and cat outputs, and not for other types.
I guess we could also add it for verbose output in the future.
Case sensitive or case insensitive matching can be requested using
--case-sensitive[=yes|no].
Unless specified, matching is case sensitive if the pattern contains any
uppercase letters, and case insensitive otherwise. This matches what
forward-search does in emacs, and recently also --ignore-case in less. This
works surprisingly well, because usually when one is wants to do case-sensitive
matching, the pattern is usually camel-cased. In the less frequent case when
case-sensitive matching is required with an all-lowercase pattern,
--case-sensitive can be used to override the automatic logic.
Previously, we'd maintain two hashmaps keyed by PIDs, pointing to Unit
interested in SIGCHLD events for them. This scheme allowed a specific
PID to be watched by exactly 0, 1 or 2 units.
With this rework this is replaced by a single hashmap which is primarily
keyed by the PID and points to a Unit interested in it. However, it
optionally also keyed by the negated PID, in which case it points to a
NULL terminated array of additional Unit objects also interested. This
scheme means arbitrary numbers of Units may now watch the same PID.
Runtime and memory behaviour should not be impact by this change, as for
the common case (i.e. each PID only watched by a single unit) behaviour
stays the same, but for the uncommon case (a PID watched by more than
one unit) we only pay with a single additional memory allocation for the
array.
Why this all? Primarily, because allowing exactly two units to watch a
specific PID is not sufficient for some niche cases, as processes can
belong to more than one unit these days:
1. sd_notify() with MAINPID= can be used to attach a process from a
different cgroup to multiple units.
2. Similar, the PIDFile= setting in unit files can be used for similar
setups,
3. By creating a scope unit a main process of a service may join a
different unit, too.
4. On cgroupsv1 we frequently end up watching all processes remaining in
a scope, and if a process opens lots of scopes one after the other it
might thus end up being watch by many of them.
This patch hence removes the 2-unit-per-PID limit. It also makes a
couple of other changes, some of them quite relevant:
- manager_get_unit_by_pid() (and the bus call wrapping it) when there's
ambiguity will prefer returning the Unit the process belongs to based on
cgroup membership, and only check the watch-pids hashmap if that
fails. This change in logic is probably more in line with what people
expect and makes things more stable as each process can belong to
exactly one cgroup only.
- Every SIGCHLD event is now dispatched to all units interested in its
PID. Previously, there was some magic conditionalization: the SIGCHLD
would only be dispatched to the unit if it was only interested in a
single PID only, or the PID belonged to the control or main PID or we
didn't dispatch a signle SIGCHLD to the unit in the current event loop
iteration yet. These rules were quite arbitrary and also redundant as
the the per-unit handlers would filter the PIDs anyway a second time.
With this change we'll hence relax the rules: all we do now is
dispatch every SIGCHLD event exactly once to each unit interested in
it, and it's up to the unit to then use or ignore this. We use a
generation counter in the unit to ensure that we only invoke the unit
handler once for each event, protecting us from confusion if a unit is
both associated with a specific PID through cgroup membership and
through the "watch_pids" logic. It also protects us from being
confused if the "watch_pids" hashmap is altered while we are
dispatching to it (which is a very likely case).
- sd_notify() message dispatching has been reworked to be very similar
to SIGCHLD handling now. A generation counter is used for dispatching
as well.
This also adds a new test that validates that "watch_pid" registration
and unregstration works correctly.
The new flag returns the O_PATH fd of the final component, which may be
converted into a proper fd by open()ing it again through the
/proc/self/fd/xyz path.
Together with O_SAFE this provides us with a somewhat safe way to open()
files in directories potentially owned by unprivileged code, where we
want to refuse operation if any symlink tricks are played pointing to
privileged files.
When the flag is specified we won't transition to a privilege-owned
file or directory from an unprivileged-owned one. This is useful when
privileged code wants to load data from a file unprivileged users have
write access to, and validates the ownership, but want's to make sure
that no symlink games are played to read a root-owned system file
belonging to a different context.
log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.
Let's hence drop inclusion of:
1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
declaration
4. process-util.h which was needed for getpid_cached() which we now hide
in a funciton log_emergency_level() instead, which nicely abstracts
the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
forward declaration suffices for that too.
Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.
(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)
This adds a "watch-bind" feature to sd-bus connections. If set and the
AF_UNIX socket we are connecting to doesn't exist yet, we'll establish
an inotify watch instead, and wait for the socket to appear. In other
words, a missing AF_UNIX just makes connecting slower.
This is useful for daemons such as networkd or resolved that shall be
able to run during early-boot, before dbus-daemon is up, and want to
connect to dbus-daemon as soon as it becomes ready.
Let's rework touch_file() so that it works correctly on sockets, fifos,
and device nodes: let's open an O_PATH file descriptor first and operate
based on that, if we can. This is usually the better option as it this
means we can open AF_UNIX nodes in the file system, and update their
timestamps and ownership correctly. It also means we can correctly touch
symlinks and block/character devices without triggering their drivers.
Moreover, by operating on an O_PATH fd we can make sure that we
operate on the same inode the whole time, and it can't be swapped out in
the middle.
While we are at it, rework the call so that we try to adjust as much as
we can before returning on error. This is a good idea as we call the
function quite often without checking its result, and hence it's best to
leave the files around in the most "correct" fashion possible.
This renames wait_for_terminate_and_warn() to
wait_for_terminate_and_check(), and adds a flags parameter, that
controls how much to log: there's one flag that means we log about
abnormal stuff, and another one that controls whether we log about
non-zero exit codes. Finally, there's a shortcut flag value for logging
in both cases, as that's what we usually use.
All callers are accordingly updated. At three occasions duplicate logging
is removed, i.e. where the old function was called but logged in the
caller, too.
This reduces the meson man=false target count to 1281.
v2:
- link test-engine with libshared instead of libsystemd_static
Previous version built fine on F27, but fails on F26 with the following error:
/usr/bin/ld: /tmp/ccr8HRGw.ltrans6.ltrans.o: undefined reference to symbol '__start_BUS_ERROR_MAP@@SD_SHARED'
/home/zbyszek/fedora/systemd/systemd-9d5aae75c64f5583a110f03b94816aacc03bbf4d/x86_64-redhat-linux-gnu/src/shared/libsystemd-shared-236.so: error adding symbols: DSO missing from command line
v3:
- add libudev_basic
gcrypt_util_sources had to be moved because otherwise they appeared twice
in libshared.so halfproducts, causing an error.
-fvisibility=default is added to libbasic, libshared_static so that the symbols
appear properly in the exported symbol list in libshared.
The advantage is that files are not compiled twice. When configured with -Dman=false,
the ninja target list is reduced from 1588 to 1347 targets. The difference in compilation
time is small (<10%). I think this is because of -O0 and ccache and multiple cores, and
in different settings the compilation time could be reduced. The main advantage is that
errors and warnings are not reported twice.
This adds a simple condition/assert/match to the service manager, to
udev's .link handling and to networkd, for matching the kernel version
string.
In this version we only do fnmatch() based globbing, but we might want
to extend that to version comparisons later on, if we like, by slightly
extending the syntax with ">=", "<=", ">", "<" and "==" expressions.
This adds a new safe_fork() wrapper around fork() and makes use of it
everywhere. The new wrapper does a couple of things we previously did
manually and separately in a safer, more correct and automatic way:
1. Optionally resets signal handlers/mask in the child
2. Sets a name on all processes we fork off right after forking off (and
the patch assigns useful names for all processes we fork off now,
following a systematic naming scheme: always enclosed in () – in order
to indicate that these are not proper, exec()ed processes, but only
forked off children, and if the process is long-running with only our
own code, without execve()'ing something else, it gets am "sd-" prefix.)
3. Optionally closes all file descriptors in the child
4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe
way so that the parent dying before this happens being handled
safely.
5. Optionally reopens the logs
6. Optionally connects stdin/stdout/stderr to /dev/null
7. Debug logs about the forked off processes.
We already use the "_static" suffix for libshared_static ("shared" is the name
of the library, "static" is the format) and other libs, so let's rename for
consistency.
Also change libsystemd_static_sources to libsystemd_sources, since the same
list is used for both and shorter is better.
Up until now, the behaviour in systemd has (mostly) been to silently
ignore failures to action unit directives that refer to an unavailble
controller. The addition of AssertControlGroupController and its
conditional counterpart allow explicit specification of the desired
behaviour when such a situation occurs.
As for how this can happen, it is possible that a particular controller
is not available in the cgroup hierarchy. One possible reason for this
is that, in the running kernel, the controller simply doesn't exist --
for example, the CPU controller in cgroup v2 has only recently been
merged and was out of tree until then. Another possibility is that the
controller exists, but has been forcibly disabled by `cgroup_disable=`
on the kernel command line.
In future this will also support whatever comes out of issue #7624,
`DefaultXAccounting=never`, or similar.
This changes dns_name_between() to deal properly with checking whether B
is between A and C if A and C are equal. Previously we simply returned
-EINVAL in this case, refusing checking. With this change we correct
behaviour: if A and C are equal, then B is "between" both if it is
different from them. That's logical, since we do < and > comparisons, not
<= and >=, and that means that anything "right of A" and "left of C"
lies in between with wrap-around at the ends. And if A and C are equal
that means everything lies between, except for A itself.
This fixes handling of domains using NSEC3 "white lies", for example the
.it TLD.
Fixes: #7421
We currently look for "nobody" and "nfsnobody" when testing groups, both
of which do not exist on Ubuntu, our main testing environment. Let's
extend the tests slightly to also use "nogroup" if it exists.
We already synthesize records for both "root" and "nobody" in
nss-systemd. Let's do the same in our own NSS wrappers that are supposed
to bypass NSS if possible. Previously this was done for "root" only, but
let's clean this up, and do the same for "nobody" too, so that we
synthesize records the same way everywhere, regardless whether in NSS or
internally.
This adds uid_is_system() and gid_is_system(), similar in style to
uid_is_dynamic(). That a helper like this is useful is illustrated by
the fact that test-condition.c didn't get the check right so far, which
this patch fixes.
%h is a special specifier because we look at $HOME (unless running suid, but
let's say that this case does not apply to tmpfiles, since the code is
completely unready to be run suid). For all other specifiers we query the user
db and use those values directly. I'm not sure if this exception is good, but
let's just "document" status quo for now. If this is changes, it should be in
a separate PR.
It's failing on artful s390x and i386:
Running /tmp/autopkgtest.Pexzdu/build.lfO/debian/build-deb/systemd-tmpfiles on 'f /tmp/test-systemd-tmpfiles.c236s1uq/arg - - - - %m'
expect: '01234567890123456789012345678901'
actual: 'e84bc78d162e472a8ac9759f5f1e4e0e'
--- stderr ---
Traceback (most recent call last):
File "/tmp/autopkgtest.Pexzdu/build.lfO/debian/src/test/test-systemd-tmpfiles.py", line 129, in <module>
test_valid_specifiers(user=False)
File "/tmp/autopkgtest.Pexzdu/build.lfO/debian/src/test/test-systemd-tmpfiles.py", line 89, in test_valid_specifiers
test_content('f {} - - - - %m', '{}'.format(id128.get_machine().hex), user=user)
File "/tmp/autopkgtest.Pexzdu/build.lfO/debian/src/test/test-systemd-tmpfiles.py", line 84, in test_content
assert content == expected
AssertionError
-------
Let's skip the test for now until this is resolved properly on the autopkgtest
side.
sd_path_home() returns ENXIO when a variable (such as $XDG_RUNTIME_DIR) is not
defined. Previously we used ENOKEY for unresolvable specifiers. To avoid having
two codes, or translating ENXIO to ENOKEY, I replaced ENOKEY use with ENXIO.
v2:
- use sd_path_home and change to ENXIO everywhere
The code intentionally ignored unknown specifiers, treating them as text. This
needs to change because otherwise we can never add a new specifier in a backwards
compatible way. So just treat an unknown (potential) specifier as an error.
In principle this is a break of backwards compatibility, but the previous
behaviour was pretty much useless, since the expanded value could change every
time we add new specifiers, which we do all the time.
As a compromise for backwards compatibility, only fail on alphanumerical
characters. This should cover the most cases where an unescaped percent
character is used, like size=5% and such, which behave the same as before with
this patch. OTOH, this means that we will not be able to use non-alphanumerical
specifiers without breaking backwards compatibility again. I think that's an
acceptable compromise.
v2:
- add NEWS entry
v3:
- only fail on alphanumerical
This adds a new flavour of strextend(), called
strextend_with_separator(), which takes an optional separator string. If
specified, the separator is inserted between each appended string, as
well as before the first one, but only if the original string was
non-empty.
This new call is particularly useful when appending new options to mount
option strings and suchlike, which need to be comma-separated, and
initially start out from an empty string.
Let's optimize things a bit, and instead of having to strip whitespace
first before decoding base64, let's do that implicitly while doing so.
Given that base64 was designed the way it was designed specifically to
be tolerant to whitespace changes, it's a good idea to do this
automatically and implicitly.
In this way, individual errors in files can be treated differently than a
failure of the whole service.
A test is added to check that the expected value is returned.
Some parts are commented out, because it is not. This will be fixed in
a subsequent commit.
Now the function returns an empty string when given an empty string.
Not sure if this is the best option (maybe this should be an error?),
but at least the behaviour is well defined.
The kernel will reply with -ENOTDIR when we try to access a non-directory under
a name which ends with a slash. But our functions would strip the trailing slash
under various circumstances. Keep the trailing slash, so that
path_is_mount_point("/path/to/file/") return -ENOTDIR when /path/to/file/ is a file.
Tests are added for this change in behaviour.
Also, when called with a trailing slash, path_is_mount_point() would get
"" from basename(), and call name_to_handle_at(3, "", ...), and always
return -ENOENT. Now it'll return -ENOTDIR if the mount point is a file, and
true if it is a directory and a mount point.
v2:
- use strip_trailing_chars()
v3:
- instead of stripping trailing chars(), do the opposite — preserve them.
The code in install-printf.c and unit-printf.c for these is pretty much
the same and very generic. Let's move this all over to the generic
specifier.c, and share the implementations.
The test was written so far under the assumption that if two mounts are
placed onto the same location the "upper" mount is listed later in
/proc/self/mountinfo. This appears not to be guaranteed however, as
running the tests in a normal nspawn shows.
This patch fixes that: it reverses the hashmap of mounts we build:
instead of keying by path, we key by mnt_id, and if we notice that
path_get_mnt_id() doesn't match what a line in /proc/self/mountinfo
says, we use the returned ID to check if maybe another line agrees.
Fixes: #7431
This is a simple wrapper around name_to_handle_at_loop() and
fd_fdinfo_mnt_id() to query the mnt ID of a path. It uses
name_to_handle_at() where it can, and falls back to to
fd_fdinfo_mnt_id() where that doesn't work.
This is a best-effort thing of course, since neither name_to_handle_at()
nor the fdinfo logic work on all kernels.
First of all, let's rename it to read_etc_hostname(), to make clearer
what kind of configuration it actually reads: the file format defined in
/etc/hostname and nothing else.
Secondly: let's port this to use read_line(), i.e. the new way to read
lines from a file in a safe, bounded way.
Thirdly: let's strip leading/trailing whitespace from what we are
reading. Given that we are already pretty lenient what we read (comments
and empty lines), let's be permissive regarding whitespace too.
Fourthly: let's actually validate the hostname when reading it. So far
we tried to make it valid, but that's not always possible (for example,
we can't make an empty hostname valid, ever).
So far I avoided adding license headers to meson files, but they are pretty
big and important and should carry license headers like everything else.
I added my own copyright, even though other people modified those files too.
But this is mostly symbolic, so I hope that's OK.
All this function does is place some data in an in-memory read-only fd,
that may be read back to get the original data back.
Doing this in a way that works everywhere, given the different kernels
we support as well as different privilege levels is surprisingly
complex.
Linux doesn't have faccess(), hence let's emulate it. Linux has access()
and faccessat() but neither allows checking the access rights of an fd
passed in directly.
This path to ping is compatible with both debian-like and usr-merged
distros. This keeps the test simple, and should thus pass everywhere.
Fixes: #7267
The environment variables we've serialized can quite possibly contain
characters outside the set allowed by env_assignment_is_valid(). In
fact, my environment seems to contain a couple of these:
* TERMCAP set by screen contains a '\x7f' character
* BASH_FUNC_module%% variable has a '%' character in name
Strict check of environment variables name and value certainly makes sense for
unit files, but not so much for deserialization of values we already had
in our environment.
We generally use the casing "Namespace" for the word, and that's visible
in a number of user-facing interfaces, including "RestrictNamespace=" or
"JoinsNamespaceOf=". Let's make sure to use the same casing internally
too.
As discussed in #7024
More specifically, it should return > 0 only for conditions specified in
probe_flags. We only set KMOD_PROBE_APPLY_BLACKLIST in probe_flags, so the
code was correct, but add an assert to clarify this.
The configuration option was called -Dresolve, but the internal define
was …RESOLVED. This options governs more than just resolved itself, so
let's settle on the version without "d".
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
Don't miscount number of "../" to generate, if we "." is included in an
input path.
Also, refuse if we encounter "../" since we can't possibly follow that
up properly, without file system access.
Some other modernizations.
Also, tests for DynamicUser= should really run for system mode, as we
allocate from a system resource.
(This also increases the test timeout to 2min. If one of our tests
really hangs then waiting for 2min longer doesn't hurt either. The old
2s is really short, given that we run in potentially slow VM
environments for this test. This becomes noticable when the slow "find"
command this adds is triggered)
Let's clean up the interaction of StateDirectory= (and friends) to
DynamicUser=1: instead of creating these directories directly below
/var/lib, place them in /var/lib/private instead if DynamicUser=1 is
set, making that directory 0700 and owned by root:root. This way, if a
dynamic UID is later reused, access to the old run's state directory is
prohibited for that user. Then, use file system namespacing inside the
service to make /var/lib/private a readable tmpfs, hiding all state
directories that are not listed in StateDirectory=, and making access to
the actual state directory possible. Mount all directories listed in
StateDirectory= to the same places inside the service (which means
they'll now be mounted into the tmpfs instance). Finally, add a symlink
from the state directory name in /var/lib/ to the one in
/var/lib/private, so that both the host and the service can access the
path under the same location.
Here's an example: let's say a service runs with StateDirectory=foo.
When DynamicUser=0 is set, it will get the following setup, and no
difference between what the unit and what the host sees:
/var/lib/foo (created as directory)
Now, if DynamicUser=1 is set, we'll instead get this on the host:
/var/lib/private (created as directory with mode 0700, root:root)
/var/lib/private/foo (created as directory)
/var/lib/foo → private/foo (created as symlink)
And from inside the unit:
/var/lib/private (a tmpfs mount with mode 0755, root:root)
/var/lib/private/foo (bind mounted from the host)
/var/lib/foo → private/foo (the same symlink as above)
This takes inspiration from how container trees are protected below
/var/lib/machines: they generally reuse UIDs/GIDs of the host, but
because /var/lib/machines itself is set to 0700 host users cannot access
files in the container tree even if the UIDs/GIDs are reused. However,
for this commit we add one further trick: inside and outside of the unit
/var/lib/private is a different thing: outside it is a plain,
inaccessible directory, and inside it is a world-readable tmpfs mount
with only the whitelisted subdirs below it, bind mounte din. This
means, from the outside the dir acts as an access barrier, but from the
inside it does not. And the symlink created in /var/lib/foo itself
points across the barrier in both cases, so that root and the unit's
user always have access to these dirs without knowing the details of
this mounting magic.
This logic resolves a major shortcoming of DynamicUser=1 units:
previously they couldn't safely store persistant data. With this change
they can have their own private state, log and data directories, which
they can write to, but which are protected from UID recycling.
With this change, if RootDirectory= or RootImage= are used it is ensured
that the specified state/log/cache directories are always mounted in
from the host. This change of semantics I think is much preferable since
this means the root directory/image logic can be used easily for
read-only resource bundling (as all writable data resides outside of the
image). Note that this is a change of behaviour, but given that we
haven't released any systemd version with StateDirectory= and friends
implemented this should be a safe change to make (in particular as
previously it wasn't clear what would actually happen when used in
combination). Moreover, by making this change we can later add a "+"
modifier to these setings too working similar to the same modifier in
ReadOnlyPaths= and friends, making specified paths relative to the
container itself.
read_line() is much like getline(), and returns a line read from a
FILE*, of arbitrary sizes. In contrast to gets() it will grow the buffer
dynamically, and in contrast to getline() it will place a user-specified
boundary on the line.
I think this matches the spirit of "indirect" well: the unit
*might* be active, even though it is not "installed" in the
sense of symlinks created based on the [Install] section.
The changes to test-install-root touch the same lines as in the previous
commit; the change in each case is from
assert_se(unit_file_get_state(...) >= 0 && state == UNIT_FILE_ENABLED)
to
assert_se(unit_file_get_state(...) >= 0 && state == UNIT_FILE_DISABLED)
to
assert_se(unit_file_get_state(...) >= 0 && state == UNIT_FILE_INDIRECT)
in the last two commits.
When a unit has a symlink that makes an alias in the filesystem,
but that name is not specified in [Install], it is confusing
is the unit is shown as "enabled". Look only for names specified
in Alias=.
Fixes#6338.
v2:
- Fix indentation.
- Fix checking for normal enablement, when the symlink name is the same as the
unit name. This case wasn't handled properly in v1.
v3:
- Rework the patch to also handle templates properly:
A template templ@.service with DefaultInstance=foo will be considered
enabled only when templ@foo.service symlink is found. Symlinks with
other instance names do not count, which matches the logic for aliases
to normal units. Tests are updated.
This adds IOVEC_INIT() and IOVEC_MAKE() for initializing iovec structures
from a pointer and a size. On top of these IOVEC_INIT_STRING() and
IOVEC_MAKE_STRING() are added which take a string and automatically
determine the size of the string using strlen().
This patch removes the old IOVEC_SET_STRING() macro, given that
IOVEC_MAKE_STRING() is now useful for similar purposes. Note that the
old IOVEC_SET_STRING() invocations were two characters shorter than the
new ones using IOVEC_MAKE_STRING(), but I think the new syntax is more
readable and more generic as it simply resolves to a C99 literal
structure initialization. Moreover, we can use very similar syntax now
for initializing strings and pointer+size iovec entries. We canalso use
the new macros to initialize function parameters on-the-fly or array
definitions. And given that we shouldn't have so many ways to do the
same stuff, let's just settle on the new macros.
(This also converts some code to use _cleanup_ where dynamically
allocated strings were using IOVEC_SET_STRING() before, to modernize
things a bit)
Now generators are only run in systemd --test mode, where this makes
most sense (how are you going to test what would happen otherwise?).
Fixes#6842.
v2:
- rename test_run to test_run_flags
Current behavior is that %X where X is an unidentified specifier, then the result is
the same %X string. This was not the case when the string ended with a stray %, where
the character would have not been output. Lets add that missing character.
Fixes: #6374
glibc appears to propagate different errors in different ways, let's fix
this up, so that our own code doesn't get confused by this.
See #6752 + #6737 for details.
Fixes: #6755
Let's lock the personality to the currently set one, if nothing is
specifically specified. But do so with a grain of salt, and never
default to any exotic personality here, but only PER_LINUX or
PER_LINUX32.
Add LockPersonality boolean to allow locking down personality(2)
system call so that the execution domain can't be changed.
This may be useful to improve security because odd emulations
may be poorly tested and source of vulnerabilities, while
system services shouldn't need any weird personalities.