IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Since 3536f49e8f, manager_new() in
user mode requires XDG_RUNTIME_DIR is set. So, in this commit,
setup_fake_runtime_directory() is added in the beginning of test.
Fixes an issue comment in #6384.
With previous commits, test-daemon is one of the slowest tests.
Under normal circumstances, the notifications go nowhere anyway,
because the test process does not have privileges.
The timeout can be specified as an argument. This is useful to
e.g. test handling of the notifications, which is much easier
with a longer timeout.
Also, if we fail to set the watchdog, run through the rest of the test
without waiting. I think it's useful to still start the commands to
test the error paths, but we can do it quickly.
If an error is encountered in any of the Exec* lines, WorkingDirectory,
SELinuxContext, ApparmorProfile, SmackProcessLabel, Service (in .socket
units), User, or Group, refuse to load the unit. If the config stanza
has support, ignore the failure if '-' is present.
For those configuration directives, even if we started the unit, it's
pretty likely that it'll do something unexpected (like write files
in a wrong place, or with a wrong context, or run with wrong permissions,
etc). It seems better to refuse to start the unit and have the admin
clean up the configuration without giving the service a chance to mess
up stuff.
Note that all "security" options that restrict what the unit can do
(Capabilities, AmbientCapabilities, Restrict*, SystemCallFilter, Limit*,
PrivateDevices, Protect*, etc) are _not_ treated like this. Such options are
only supplementary, and are not always available depending on the architecture
and compilation options, so unit authors have to make sure that the service
runs correctly without them anyway.
Fixes#6237, #6277.
https://tools.ietf.org/html/rfc5891#section-4.2.3.1 says that
> The Unicode string MUST NOT contain "--" (two consecutive hyphens) in the third
> and fourth character positions and MUST NOT start or end with a "-" (hyphen).
This means that libidn2 refuses to encode such names.
Let's just resolve them without trying to use IDN.
If the input is older than "1970-01-01 UTC", then `parse_timestamp()`
fails and returns -EINVAL. However, if the input is e.g. `-100years`,
then the function succeeds and sets `usec = 0`.
This commit makes the function also succeed for old dates and set
`usec = 0`.
Fixes#6290.
test_readlink_and_make_absolute switches to a temp directory, and then
removes it.
test_get_files_in_directory calls opendir(".") from a directory that has
been removed from the filesystem.
This call sequence triggers a bug in Gentoo's sandbox library. This
library attempts to resolve the "." to an absolute path, and aborts when
it ultimately fails to do so.
Re-ordering the calls works around the issue until the sandbox library
can be fixed to more gracefully deal with this.
Bug: https://bugs.gentoo.org/590084
This extends 2d79a0bbb9 to the kernel
command line parsing.
The parsing is changed a bit to only understand "0" as infinity. If units are
specified, parse normally, e.g. "0s" is just 0. This makes it possible to
provide a zero timeout if necessary.
Simple test is added.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1462378.
Also called "ANSI-C Quoting" in info:(bash) ANSI-C Quoting.
The escaping rules are a POSIX proposal, and are described in
http://austingroupbugs.net/view.php?id=249. There's a lot of back-and-forth on
the details of escaping of control characters, but we'll be only using a small
subset of the syntax that is common to all proposals and is widely supported.
Unfortunately dash and fish and maybe some other shells do not support it (see
the man page patch for a list).
This allows environment variables to be safely exported using show-environment
and imported into the shell. Shells which do not support this syntax will have
to do something like
export $(systemctl show-environment|grep -v '=\$')
or whatever is appropriate in their case. I think csh and fish do not support
the A=B syntax anyway, so the change is moot for them.
Fixes#5536.
v2:
- also escape newlines (which currently disallowed in shell values, so this
doesn't really matter), and tabs (as $'\t'), and ! (as $'!'). This way quoted
output can be included directly in both interactive and noninteractive bash.
This adds two options that are useful for user units. In particular, it
is useful to check ConditionUser=!0 to not start for the root user.
Closes: #5187
*scanf functions set errno on i/o error. For sscanf, this doesn't really apply,
so (based on the man page), it seems that errno is unlikely to be ever set to a
useful value. So just ignore errno. The error message includes the string that
was parsed, so it should be always pretty clear why parsing failed.
On the other hand, detect trailing characters and minus prefix that weren't
converted properly. This matches what our safe_ato* functions do. Add tests to
elucidate various edge cases.
All those uses were correct, but I think it's better to be explicit.
Using implicit errno is too error prone, and with this change we can require
(in the sense of a style guideline) that the code is always specified.
Helpful query: git grep -n -P 'log_[^s][a-z]+\(.*%m'
test-login.c is largely rewritten to use _cleanup_ and give more meaningful
messages (function names are used instead of creative terms like "active
session", so that when something unexpected is returned, it's much easier to
see what function is responsible).
The monitoring part is only activated if '-m' is passed on the command line.
It runs against the information from /run/systemd/ in the live system, but that
should be OK: logind/sd-login interface is supposed to be stable and both
backwards and forwards compatible.
If not running in a login session, some tests are skipped.
Those two changes together mean that it's possible to run test-login in the
test suite.
Tests for sd_pid_get_{unit,user_unit,slice} are added.
The condition is on "word", hence we give word instead of rvalue.
An assert would be triggered if !utf8_is_valid(word) is true and
rvalue == NULL, since log_syntax_invalid_utf8 calls utf8_escape_invalid
which calls assert(str).
A test case has been added to test with valid and invalid utf8.
This test is mostly a compilation test that checks that various defines in
sd-bus-vtable.h are valid C++. The code is executed, but the results are not
checked (apart from sd-bus functions not returning an error). test-bus-objects
contains pretty extensive tests for this functionality.
The C++ version is only added to meson, since it's simpler there.
Because of the .cc extension, meson will compile the executable with c++.
This test is necessary to properly check the macros in sd-bus-vtable.h. Just
running the headers through g++ is not enough, because the macros are not
exercised.
Follow-up for #5941.
This adds a modified version of dhcp6_option_parse_domainname() that is
able to parse compressed domain names, borrowing the idea from
dns_packet_read_name(). It also adds pieces in networkd-link and
networkd-manager to properly save/load the added option field.
Resolves#2710.
This reverts commit 6355e75610.
The previously mentioned commit inadvertently broke a lot of SELinux related
functionality for both unprivileged users and systemd instances running as
MANAGER_USER. In particular, setting the correct SELinux context after a User=
directive is used would fail to work since we attempt to set the security
context after changing UID. Additionally, it causes activated socket units to
be mislabeled for systemd --user processes since setsockcreatecon() would never
be called.
Reverting this fixes the issues with labeling outlined above, and reinstates
SELinux access checks on unprivileged user services.
libidn2 2.0.0 supports IDNA2008, in contrast to libidn which supports IDNA2003.
https://bugzilla.redhat.com/show_bug.cgi?id=1449145
From that bug report:
Internationalized domain names exist for quite some time (IDNA2003), although
the protocols describing them have evolved in an incompatible way (IDNA2008).
These incompatibilities will prevent applications written for IDNA2003 to
access certain problematic domain names defined with IDNA2008, e.g., faß.de is
translated to domain xn--fa-hia.de with IDNA2008, while in IDNA2003 it is
translated to fass.de domain. That not only causes incompatibility problems,
but may be used as an attack vector to redirect users to different web sites.
v2:
- keep libidn support
- require libidn2 >= 2.0.0
v3:
- keep dns_name_apply_idna caller dumb, and keep the #ifdefs inside of the
function.
- use both ±IDN and ±IDN2 in the version string
We expect that if socket() syscall is available, seccomp works for that
architecture. So instead of explicitly listing all architectures where we know
it is not available, just assume it is broken if the number is not defined.
This should have the same effect, except that other architectures where it is
also broken will pass tests without further changes. (Architectures where the
filter should work, but does not work because of missing entries in
seccomp-util.c, will still fail.)
i386, s390, s390x are the exception — setting the filter fails, even though
socket() is available, so it needs to be special-cased
(https://github.com/systemd/systemd/issues/5215#issuecomment-277241488).
This remove the last define in seccomp-util.h that was only used in test-seccomp.c. Porting
the seccomp filter to new architectures should be simpler because now only two places need
to be modified.
RestrictAddressFamilies seems to work on ppc64[bl]e, so enable it (the tests pass).
The single log level is split into an array of log levels. Which index in the
array is used can be determined for each compilation unit separately by setting
a macro before including log.h. All compilation units use the same index
(LOG_REALM_SYSTEMD), so there should be no functional change.
v2:
- the "realm" is squished into the level (upper bits that are not used by
priority or facility), and unsquished later in functions in log.c.
v3:
- rename REALM_PLUS_LEVEL to LOG_REALM_PLUS_LEVEL and REALM to LOG_REALM_REMOVE_LEVEL.
Since all our python scripts have a proper python3 shebang, there is no benefit
to letting meson autodetect them. On linux, meson will just uses exec(), so the
shebang is used anyway. The only difference should be in how meson reports the
script and that the detection won't fail for (most likely misconfigured)
non-UTF8 locales.
Closes#5855.
While adding the defines for arm, I realized that we have pretty much all
known architectures covered, so SECCOMP_RESTRICT_NAMESPACES_BROKEN is not
necessary anymore. clone(2) is adamant that the order of the first two
arguments is only reversed on s390/s390x. So let's simplify things and remove
the #if.
SECCOMP_MEMORY_DENY_WRITE_EXECUTE_BROKEN was conflating two separate things:
1. whether shmat/shmdt/shmget can be filtered (if ipc multiplexer is used, they can not)
2. whether we know this for the current architecture
For i386, shmat is implemented as ipc, so seccomp filter is "broken" for shmat,
but not for mmap, and SECCOMP_MEMORY_DENY_WRITE_EXECUTE_BROKEN cannot be used
to cover both cases. The define was only used for tests — not in the implementation
in seccomp-util.c. So let's get rid of SECCOMP_MEMORY_DENY_WRITE_EXECUTE_BROKEN
and encode the right condition directly in tests.
This is useful on systems like NixOS, where python3 is not in
/usr/bin/python3 as well as for people using alternative ways to
install python such as virtualenv/pyenv.
This filters out "." and ".." from glob results. Fixes#5655 and #5644.
Any judgements on whether the path is "safe" are removed. We will not remove
"/" under any name (including "/../" and such), but we will remove stuff that
is specified using paths that include "//", "/./" and "/../". Such paths can be
created when joining strings automatically, or for other reasons, and people
generally know what ".." and "." is.
Tests are added to make sure that the helper functions behave as expected.
Shell scripts should be executable so that meson reports their
invocation succinctly (does not print 'sh' '-e').
Python scripts should not be executable so that meson does the
detection of the right python binary itself.
Add -u everywhere to catch potential errors.
The indentation for emacs'es meson-mode is added .dir-locals.
All files are reindented automatically, using the lasest meson-mode from git.
Indentation should now be fairly consistent.
This makes the helper binaries significantly bigger (in some cases, the final
size depends on link options and optimization level), and is only useful for
distributions which want to provide the option to install udev without systemd.
As the linking is improved, the difference between the columns might shrink,
but it's unlikely that linking libshared statically could ever be more
efficient.
E.g. with -O0, no -flto:
(static) (shared)
src/udev/ata_id 999176 85696
src/udev/cdrom_id 1024344 111656
src/udev/collect 990344 81280
src/udev/scsi_id 1023592 115656
src/udev/v4l_id 811736 17744
When linked dynamically, install_rpath must be specified, so add that.
test-dlopen is a very simple binary that is only linked with libc and
libdl. From it we do dlopen() on the nss and pam modules to check that they are
linked to all necessary libs.
(meson-compiled nss modules are linked to less libraries, for whatever reason.
I suspected that some deps are missing, but it turns out that my suspicions
weren't justified, and the modules load just fine. Let's keep the test though,
it is very quick, and might detect missing linkage in the future.)
This simplifies things and leads to a smaller installation footprint.
libsystemd_internal and libsystemd_journal_internal are linked into
libystemd-shared and available to all programs linked to libsystemd-shared.
libsystemd_journal_internal is not needed anymore, and libsystemd-shared
is used everwhere. The few exceptions are: libsystemd.so, test-engine,
test-bus-error, and various loadable modules.
The tests are included under the conditional too, instead of specifying
'ENABLE_NETWORKD' in the test definition array, because libnetworkd_core
dependency is undefined if networkd is disabled.
With mesonbuid/meson#1545, meson does not propagate deps of a library
when linking with that library. That's of course the right thing to do,
but it exposes a bunch of missing deps.
This compiles with both meson-0.39.1 and meson-git + pr/1545.
This is slightly complicated by the fact that files('libudev.h') cannot be used
as an argument in custom_target command (string is required). This restriction
should be lifted in future versions of mesons, so this could be simplified.
This is quite messy. I think libtool might have been using something
like -Wl,--whole-archive, but I don't think meson has support for that.
For now, just recompile all the sources for linking into libsystemd
directly. This should not matter much for efficiency, since it's a
few small files.
This is what autoconf-based build does, and it makes test-bus-error and
test-engine able to access the bus error mapping table. OTOH, this is a heavy
price to pay: it would be excellent to link libcore.a to libsystemd-shared-NNN.so.
Otherwise we duplicate the same code in 'systemd' and 'libsystemd-shared-NNN.so'.
-rwxrwxr-x. 1 4075544 Apr 6 20:30 systemd* <-- libcore linked against libsystemd-shared.so
-rwxrwxr-x. 1 5596504 Apr 9 14:07 systemd* <-- libcore linked against libsystemd-shared.a
v2:
- update for 6b5cf3ea62
Tests can be run with 'ninja-build test' or using 'mesontest'.
'-Dtests=unsafe' can be used to include the "unsafe" tests in the
test suite, same as with autotools.
v2:
- use more conf.get guards are optional components
- declare deps on generated headers for test-{af,arphrd,cap}-list
v3:
- define environment for tests
Most test don't need this, but to be consistent with autotools-based build, and
to avoid questions which tests need it and which don't, set the same environment
for all tests.
v4:
- rework test generation
Use a list of lists to define each test. This way we can reduce the
boilerplate somewhat, although the test listings are still pretty verbose. We
can also move the definitions of the tests to the subdirs. Unfortunately some
subdirs are included earlier than some of the libraries that test binaries
are linked to. So just dump all definitions of all tests that cannot be
defined earlier into src/test. The `executable` definitions are still at the
top level, so the binaries are compiled into the build root.
v5:
- tag test-dnssec-complex as manual
v6:
- fix HAVE_LIBZ typo
- add missing libgobject/libgio defs
- mark test-qcow2 as manual
We defined both $(VERSION) and $(PACKAGE_VERSION) with the same contents.
$(PACKAGE_VERSION) is slightly more descriptive, so settle on that, and
drop the other define.
Package build machines may have module loading disabled, thus AF_ALG
sockets are not available. Skip the tests that cover those (khash and
id128) instead of failing them in this case.
Fixes#5524
Sometimes it's useful to provide a default value during an environment
expansion, if the environment variable isn't already set.
For instance $XDG_DATA_DIRS is suppose to default to:
/usr/local/share/:/usr/share/
if it's not yet set. That means callers wishing to augment
XDG_DATA_DIRS need to manually add those two values.
This commit changes replace_env to support the following shell
compatible default value syntax:
XDG_DATA_DIRS=/foo:${XDG_DATA_DIRS:-/usr/local/share/:/usr/share}
Likewise, it's useful to provide an alternate value during an
environment expansion, if the environment variable isn't already set.
For instance, $LD_LIBRARY_PATH will inadvertently search the current
working directory if it starts or ends with a colon, so the following
is usually wrong:
LD_LIBRARY_PATH=/foo/lib:${LD_LIBRARY_PATH}
To address that, this changes replace_env to support the following
shell compatible alternate value syntax:
LD_LIBRARY_PATH=/foo/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
[zj: gate the new syntax under REPLACE_ENV_ALLOW_EXTENDED switch, so
existing callers are not modified.]
In the future we might want to allow additional syntax (for example
"unset VAR". But let's check that the data we're getting does not contain
anything unexpected.
(Only in environment.d files.)
We have only basic compatibility with shell syntax, but specifying variables
without using braces is probably more common, and I think a lot of people would
be surprised if this didn't work.
merge_env_file is a new function, that's like load_env_file, but takes a
pre-existing environment as an input argument. New environment entries are
merged. Variable expansion is performed.
Falling back to the process environment is supported (when a flag is set).
Alternatively this could be implemented as passing an additional fallback
environment array, but later on we're adding another flag to allow braceless
expansion, and the two flags can be combined in one arg, so there's less
stuff to pass around.
If an environment array has duplicates, strv_env_get_n returns
the results for the first match. This is wrong, because later
entries in the environment are supposed to replace earlier
entries.
strv_env_replace was calling env_match(), which in effect allowed multiple
values for the same key to be inserted into the environment block. That's
pointless, because APIs to access variables only return a single value (the
latest entry), so it's better to keep the block clean, i.e. with just a single
entry for each key.
Add a new helper function that simply tests if the part before '=' is equal in
two strings and use that in strv_env_replace.
In load_env_file_push, use strv_env_replace to immediately replace the previous
assignment with a matching name.
Afaict, none of the callers are materially affected by this change, but it
seems like some pointless work was being done, if the same value was set
multiple times. We'd go through parsing and assigning the value for each
entry. With this change, we handle just the last one.
The output of processes can be gathered, and passed back to the callee.
(This commit just implements the basic functionality and tests.)
After the preparation in previous commits, the change in functionality is
relatively simple. For coding convenience, alarm is prepared *before* any
children are executed, and not before. This shouldn't matter usually, since
just forking of the children should be pretty quick. One could also argue that
this is more correct, because we will also catch the case when (for whatever
reason), forking itself is slow.
Three callback functions and three levels of serialization are used:
- from individual generator processes to the generator forker
- from the forker back to the main process
- deserialization in the main process
v2:
- replace an structure with an indexed array of callbacks
There is a slight change in behaviour: the user manager for root will create a
temporary file in /run/systemd, not /tmp. I don't think this matters, but
simplifies implementation.
Commit 436e916ea introduced the assumption into test-stat-util that /run
is a tmpfs mount point. This is not the case in build chroots such as
Fedora's mock or Debian's sbuild. So only assert that /run is a tmpfs
and not a btrfs if /run is actually a mount point. This will then still
be asserted with installed tests.
This changes the file copy logic of machined to set the UID/GID of all
copied files to 0 if the host and container do not share the same user
namespace.
Fixes: #4078
This adds a unified "copy_flags" parameter to all copy_xyz() function
calls, replacing the various boolean flags so far used. This should make
many invocations more readable as it is clear what behaviour is
precisely requested. This also prepares ground for adding support for
more modes later on.
Drop the TEST_DATA_DIR macro as this was using alloca() within a
function call which is allegedly unsafe. So add a "suffix" argument to
get_testdata_dir() instead and call that directly.
Rename get_exe_relative_testdata_dir() to get_testdata_dir() and move
the env var check into that, so that everything interesting happens at
the same place.
That way, if the test directory does not exist we don't leave behind
temporary files (as in that case or on test failure the cleanup actions
don't run).
Only one test case is added, but it is enough to check basic sanity of the
code (single-line and binary fields and trusted fields, allocation and freeing).
It is useful to package test-* binaries and run them as root under
autopkgtest or manually on particular machines. They currently have a
built-in hardcoded absolute path to their test data, which does not work
when running the test programs from any other path than the original
build directory.
By default, make the tests look for their data in
<test_exe_directory>/testdata/ so that they can be called from any
directory (provided that the corresponding test data is installed
correctly). As we don't have a fixed static path in the build tree (as
build and source tree are independent), set $TEST_DIR with "make check"
to point to <srcdir>/test/, as we previously did with an automake
variable.
ReadOnlyPaths=, ProtectHome=, InaccessiblePaths= and ProtectSystem= are
about restricting access and little more, hence they should be disabled
if PermissionsStartOnly= is used or ExecStart= lines are prefixed with a
"+". Do that.
(Note that we will still create namespaces and stuff, since that's about
a lot more than just permissions. We'll simply disable the effect of
the four options mentioned above, but nothing else mount related.)
This also adds a test for this, to ensure this works as intended.
No documentation updates, as the documentation are already vague enough
to support the new behaviour ("If true, the permission-related execution
options…"). We could clarify this further, but I think we might want to
extend the switches' behaviour a bit more in future, hence leave it at
this for now.
Fixes: #5308
5dd11ab5f3 did a similar change for conf_files_list_strv().
Here we do the same for conf_files_list() and conf_files_list_nulstr().
No change for existing users. Tests are added.
Add a bit of code that tries to get the right parameter order in place
for some of the better known architectures, and skips
restrict_namespaces for other archs.
This also bypasses the test on archs where we don't know the right
order.
In this case I didn't bother with testing the case where no filter is
applied, since that is hopefully just an issue for now, as there's
nothing stopping us from supporting more archs, we just need to know
which order is right.
Fixes: #5241
The compiler warning is a false positive, since n_addresses is always
initialised on the success path from parse_argv(), but the compiler
obviously can’t work that out.
Fixes:
src/test/test-nss.c:426:9: warning: 'n_addresses' may be used uninitialized in this function [-Wmaybe-uninitialized]
On i386 we block the old mmap() call entirely, since we cannot properly
filter it. Thankfully it hasn't been used by glibc since quite some
time.
Fixes: #5240
If a unit foobar@.service stored below /usr is instantiated via a
symlink foobar@quux.service also below /usr, then we should consider the
instance statically enabled, while the template itself should continue
to be considered enabled/disabled/static depending on its [Install]
section.
In order to implement this we'll now look for enablement symlinks in all
unit search paths, not just in the config and runtime dirs.
Fixes: #5136
This is similar to RootDirectory= but mounts the root file system from a
block device or loopback file instead of another directory.
This reuses the image dissector code now used by nspawn and
gpt-auto-discovery.
explicit_bzero was added in glibc 2.25. Make use of it.
explicit_bzero is hardcoded to zero the memory, so string erase now
truncates the string, instead of overwriting it with 'x'. This causes
a visible difference only in the journalctl case.
Gcc7 is smarter about detecting unused functions and detects those two functions
which are unused in tests. But gperf generates them for us, so let's instead of removing
tell gcc that we know they might be unused in the test code.
In file included from ../src/test/test-af-list.c:29:0:
./src/basic/af-from-name.h:140:1: warning: ‘lookup_af’ defined but not used [-Wunused-function]
lookup_af (register const char *str, register size_t len)
^~~~~~~~~
In file included from ../src/test/test-arphrd-list.c:29:0:
./src/basic/arphrd-from-name.h:125:1: warning: ‘lookup_arphrd’ defined but not used [-Wunused-function]
lookup_arphrd (register const char *str, register size_t len)
^~~~~~~~~~~~~
usec_t is always 64bit, which means it can cover quite a number of
years. However, 4 digit year display and glibc limitations around time_t
limit what we can actually parse and format. Let's make this explicit,
so that we never end up formatting dates we can#t parse and vice versa.
Note that this is really just about formatting/parsing. Internal
calculations with times outside of the formattable range are not
affected.
networkd: Allow ':' in label
This reverts a341dfe563 and takes a slightly different approach: anything is
allowed in network interface labels, but network interface names are verified
as before (i.e. amongst other things, no colons are allowed there).
If chase_symlinks() encouters an absolute symlink, it resets the todo
buffer to just the newly discovered symlink and discards any of the
remaining previous symlink path. Regardless of whether or not the
symlink is absolute or relative, we need to preserve the remainder of
the path that has not yet been resolved.
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
The AF_VSOCK address family facilitates guest<->host communication on
VMware and KVM (virtio-vsock). Adding support to systemd allows guest
agents to be launched through .socket unit files. Today guest agents
are stand-alone daemons running inside guests that do not take advantage
of systemd socket activation.
gperf-3.1 generates lookup functions that take a size_t length
parameter instead of unsigned int. Test for this at configure time.
Fixes: https://github.com/systemd/systemd/issues/5039
In preparation for reusing the image dissector in the GPT auto-discovery
logic, only optionally fail the dissection when we can't identify a root
partition.
In the GPT auto-discovery we are completely fine with any kind of root,
given that we run when it is already mounted and all we do is find some
additional auxiliary partitions on the same disk.
This improves kernel command line parsing in a number of ways:
a) An kernel option "foo_bar=xyz" is now considered equivalent to
"foo-bar-xyz", i.e. when comparing kernel command line option names "-" and
"_" are now considered equivalent (this only applies to the option names
though, not the option values!). Most of our kernel options used "-" as word
separator in kernel command line options so far, but some used "_". With
this change, which was a source of confusion for users (well, at least of
one user: myself, I just couldn't remember that it's systemd.debug-shell,
not systemd.debug_shell). Considering both as equivalent is inspired how
modern kernel module loading normalizes all kernel module names to use
underscores now too.
b) All options previously using a dash for separating words in kernel command
line options now use an underscore instead, in all documentation and in
code. Since a) has been implemented this should not create any compatibility
problems, but normalizes our documentation and our code.
c) All kernel command line options which take booleans (or are boolean-like)
have been reworked so that "foobar" (without argument) is now equivalent to
"foobar=1" (but not "foobar=0"), thus normalizing the handling of our
boolean arguments. Specifically this means systemd.debug-shell and
systemd_debug_shell=1 are now entirely equivalent.
d) All kernel command line options which take an argument, and where no
argument is specified will now result in a log message. e.g. passing just
"systemd.unit" will no result in a complain that it needs an argument. This
is implemented in the proc_cmdline_missing_value() function.
e) There's now a call proc_cmdline_get_bool() similar to proc_cmdline_get_key()
that parses booleans (following the logic explained in c).
f) The proc_cmdline_parse() call's boolean argument has been replaced by a new
flags argument that takes a common set of bits with proc_cmdline_get_key().
g) All kernel command line APIs now begin with the same "proc_cmdline_" prefix.
h) There are now tests for much of this. Yay!
Check if the parsed seconds value fits in an integer *after*
multiplying by USEC_PER_SEC, otherwise a large value can trigger
modulo by zero during normalization.
This is useful for reusing the dissector logic in the gpt-auto-discovery logic:
there we really don't want to use MBR or naked file systems as root device.
Let's use chase_symlinks() when looking for /etc/os-release and
/usr/lib/os-release as these files might be symlinks (and actually are IRL on
some distros).
PR_SET_MM_ARG_START allows us to relatively cleanly implement process renaming.
However, it's only available with privileges. Hence, let's try to make use of
it, and if we can't fall back to the traditional way of overriding argv[0].
This removes size restrictions on the process name shown in argv[] at least for
privileged processes.
Fixes:
```
$ ./libtool --mode=execute valgrind --leak-check=full ./test-fs-util
...
==22871==
==22871== 27 bytes in 1 blocks are definitely lost in loss record 1 of 1
==22871== at 0x4C2FC47: realloc (vg_replace_malloc.c:785)
==22871== by 0x4E86D05: strextend (string-util.c:726)
==22871== by 0x4E8F347: chase_symlinks (fs-util.c:712)
==22871== by 0x109EBF: test_chase_symlinks (test-fs-util.c:75)
==22871== by 0x10C381: main (test-fs-util.c:305)
==22871==
```
Closes#4888
This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow
defining arbitrary bind mounts specific to particular services. This is
particularly useful for services with RootDirectory= set as this permits making
specific bits of the host directory available to chrooted services.
The two new settings follow the concepts nspawn already possess in --bind= and
--bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these
latter options should probably be renamed to BindPaths= and BindReadOnlyPaths=
too).
Fixes: #3439
Let's store the invocation ID in the per-service keyring as a root-owned key,
with strict access rights. This has the advantage over the environment-based ID
passing that it also works from SUID binaries (as they key cannot be overidden
by unprivileged code starting them), in contrast to the secure_getenv() based
mode.
The invocation ID is now passed in three different ways to a service:
- As environment variable $INVOCATION_ID. This is easy to use, but may be
overriden by unprivileged code (which might be a bad or a good thing), which
means it's incompatible with SUID code (see above).
- As extended attribute on the service cgroup. This cannot be overriden by
unprivileged code, and may be queried safely from "outside" of a service.
However, it is incompatible with containers right now, as unprivileged
containers generally cannot set xattrs on cgroupfs.
- As "invocation_id" key in the kernel keyring. This has the benefit that the
key cannot be changed by unprivileged service code, and thus is safe to
access from SUID code (see above). But do note that service code can replace
the session keyring with a fresh one that lacks the key. However in that case
the key will not be owned by root, which is easily detectable. The keyring is
also incompatible with containers right now, as it is not properly namespace
aware (but this is being worked on), and thus most container managers mask
the keyring-related system calls.
Ideally we'd only have one way to pass the invocation ID, but the different
ways all have limitations. The invocation ID hookup in journald is currently
only available on the host but not in containers, due to the mentioned
limitations.
How to verify the new invocation ID in the keyring:
# systemd-run -t /bin/sh
Running as unit: run-rd917366c04f847b480d486017f7239d6.service
Press ^] three times within 1s to disconnect TTY.
# keyctl show
Session Keyring
680208392 --alswrv 0 0 keyring: _ses
250926536 ----s-rv 0 0 \_ user: invocation_id
# keyctl request user invocation_id
250926536
# keyctl read 250926536
16 bytes of data in key:
9c96317c ac64495a a42b9cd7 4f3ff96b
# echo $INVOCATION_ID
9c96317cac64495aa42b9cd74f3ff96b
# ^D
This creates a new transient service runnint a shell. Then verifies the
contents of the keyring, requests the invocation ID key, and reads its payload.
For comparison the invocation ID as passed via the environment variable is also
displayed.
This adds support for discovering and making use of properly tagged dm-verity
data integrity partitions. This extends both systemd-nspawn and systemd-dissect
with a new --root-hash= switch that takes the root hash to use for the root
partition, and is otherwise fully automatic.
Verity partitions are discovered automatically by GPT table type UUIDs, as
listed in
https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/
(which I updated prior to this change, to include new UUIDs for this purpose.
mkosi with https://github.com/systemd/mkosi/pull/39 applied may generate images
that carry the necessary integrity data. With that PR and this commit, the
following simply lines suffice to boot up an integrity-protected container image:
```
# mkdir test
# cd test
# mkosi --verity
# systemd-nspawn -i ./image.raw -bn
```
Note that mkosi writes the image file to "image.raw" next to a a file
"image.roothash" that contains the root hash. systemd-nspawn will look for that
file and use it if it exists, in case --root-hash= is not specified explicitly.
This adds two new APIs to systemd:
- loop-util.h is a simple internal API for allocating, setting up and releasing
loopback block devices.
- dissect-image.h is an internal API for taking apart disk images and figuring
out what the purpose of each partition is.
Both APIs are basically refactored versions of similar code in nspawn. This
rework should permit us to reuse this in other places than just nspawn in the
future. Specifically: to implement RootImage= in the service image, similar to
RootDirectory=, but operating on a disk image; to unify the gpt-auto-discovery
generator code with the discovery logic in nspawn; to add new API to machined
for determining the OS version of a disk image (i.e. not just running
containers). This PR does not make any such changes however, it just provides
the new reworked API.
The reworked code is also slightly more powerful than the nspawn original one.
When pointing it to an image or block device with a naked file system (i.e. no
partition table) it will simply make it the root device.
Let's accept "µs" as alternative time unit for microseconds. We already accept
"us" and "usec" for them, lets extend on this and accept the proper scientific
unit specification too.
We will never output this as time unit, but it's fine to accept it, after all
we are pretty permissive with time units already.
This new flag controls whether to consider a problem if the referenced path
doesn't actually exist. If specified it's OK if the final file doesn't exist.
Note that this permits one or more final components of the path not to exist,
but these must not contain "../" for safety reasons (or, to be extra safe,
neither "./" and a couple of others, i.e. what path_is_safe() permits).
This new flag is useful when resolving paths before issuing an mkdir() or
open(O_CREAT) on a path, as it permits that the file or directory is created
later.
The return code of chase_symlinks() is changed to return 1 if the file exists,
and 0 if it doesn't. The latter is only returned in case CHASE_NON_EXISTING is
set.
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
Previously, we'd generate an EINVAL error if it is attempted to escape a root
directory with relative ".." symlinks. With this commit this is changed so that
".." from the root directory is a NOP, following the kernel's own behaviour
where /.. is equivalent to /.
As suggested by @keszybz.
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
This adds an API for retrieving an app-specific machine ID to sd-id128.
Internally it calculates HMAC-SHA256 with an 128bit app-specific ID as payload
and the machine ID as key.
(An alternative would have been to use siphash for this, which is also
cryptographically strong. However, as it only generates 64bit hashes it's not
an obvious choice for generating 128bit IDs.)
Fixes: #4667
Let's take inspiration from bluez's ELL library, and let's move our
cryptographic primitives away from libgcrypt and towards the kernel's AF_ALG
cryptographic userspace API.
In the long run we should try to remove the dependency on libgcrypt, in favour
of using only the kernel's own primitives, however this is unlikely to happen
anytime soon, as the kernel does not provide Elliptic Curve APIs to userspace
at this time, and we need them for the DNSSEC cryptographic.
This commit only covers hashing for now, symmetric encryption/decryption or
even asymetric encryption/decryption is not available for now.
"khash" is little more than a lightweight wrapper around the kernel's AF_ALG
socket API.
strtoul() parses leading whitespace and an optional sign;
check that the first character is a digit to prevent odd
specifications like "00: 00: 00" and "-00:+00/-1".