IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The condition is on "word", hence we give word instead of rvalue.
An assert would be triggered if !utf8_is_valid(word) is true and
rvalue == NULL, since log_syntax_invalid_utf8 calls utf8_escape_invalid
which calls assert(str).
A test case has been added to test with valid and invalid utf8.
This test is mostly a compilation test that checks that various defines in
sd-bus-vtable.h are valid C++. The code is executed, but the results are not
checked (apart from sd-bus functions not returning an error). test-bus-objects
contains pretty extensive tests for this functionality.
The C++ version is only added to meson, since it's simpler there.
Because of the .cc extension, meson will compile the executable with c++.
This test is necessary to properly check the macros in sd-bus-vtable.h. Just
running the headers through g++ is not enough, because the macros are not
exercised.
Follow-up for #5941.
This adds a modified version of dhcp6_option_parse_domainname() that is
able to parse compressed domain names, borrowing the idea from
dns_packet_read_name(). It also adds pieces in networkd-link and
networkd-manager to properly save/load the added option field.
Resolves#2710.
This reverts commit 6355e75610.
The previously mentioned commit inadvertently broke a lot of SELinux related
functionality for both unprivileged users and systemd instances running as
MANAGER_USER. In particular, setting the correct SELinux context after a User=
directive is used would fail to work since we attempt to set the security
context after changing UID. Additionally, it causes activated socket units to
be mislabeled for systemd --user processes since setsockcreatecon() would never
be called.
Reverting this fixes the issues with labeling outlined above, and reinstates
SELinux access checks on unprivileged user services.
libidn2 2.0.0 supports IDNA2008, in contrast to libidn which supports IDNA2003.
https://bugzilla.redhat.com/show_bug.cgi?id=1449145
From that bug report:
Internationalized domain names exist for quite some time (IDNA2003), although
the protocols describing them have evolved in an incompatible way (IDNA2008).
These incompatibilities will prevent applications written for IDNA2003 to
access certain problematic domain names defined with IDNA2008, e.g., faß.de is
translated to domain xn--fa-hia.de with IDNA2008, while in IDNA2003 it is
translated to fass.de domain. That not only causes incompatibility problems,
but may be used as an attack vector to redirect users to different web sites.
v2:
- keep libidn support
- require libidn2 >= 2.0.0
v3:
- keep dns_name_apply_idna caller dumb, and keep the #ifdefs inside of the
function.
- use both ±IDN and ±IDN2 in the version string
We expect that if socket() syscall is available, seccomp works for that
architecture. So instead of explicitly listing all architectures where we know
it is not available, just assume it is broken if the number is not defined.
This should have the same effect, except that other architectures where it is
also broken will pass tests without further changes. (Architectures where the
filter should work, but does not work because of missing entries in
seccomp-util.c, will still fail.)
i386, s390, s390x are the exception — setting the filter fails, even though
socket() is available, so it needs to be special-cased
(https://github.com/systemd/systemd/issues/5215#issuecomment-277241488).
This remove the last define in seccomp-util.h that was only used in test-seccomp.c. Porting
the seccomp filter to new architectures should be simpler because now only two places need
to be modified.
RestrictAddressFamilies seems to work on ppc64[bl]e, so enable it (the tests pass).
The single log level is split into an array of log levels. Which index in the
array is used can be determined for each compilation unit separately by setting
a macro before including log.h. All compilation units use the same index
(LOG_REALM_SYSTEMD), so there should be no functional change.
v2:
- the "realm" is squished into the level (upper bits that are not used by
priority or facility), and unsquished later in functions in log.c.
v3:
- rename REALM_PLUS_LEVEL to LOG_REALM_PLUS_LEVEL and REALM to LOG_REALM_REMOVE_LEVEL.
Since all our python scripts have a proper python3 shebang, there is no benefit
to letting meson autodetect them. On linux, meson will just uses exec(), so the
shebang is used anyway. The only difference should be in how meson reports the
script and that the detection won't fail for (most likely misconfigured)
non-UTF8 locales.
Closes#5855.
While adding the defines for arm, I realized that we have pretty much all
known architectures covered, so SECCOMP_RESTRICT_NAMESPACES_BROKEN is not
necessary anymore. clone(2) is adamant that the order of the first two
arguments is only reversed on s390/s390x. So let's simplify things and remove
the #if.
SECCOMP_MEMORY_DENY_WRITE_EXECUTE_BROKEN was conflating two separate things:
1. whether shmat/shmdt/shmget can be filtered (if ipc multiplexer is used, they can not)
2. whether we know this for the current architecture
For i386, shmat is implemented as ipc, so seccomp filter is "broken" for shmat,
but not for mmap, and SECCOMP_MEMORY_DENY_WRITE_EXECUTE_BROKEN cannot be used
to cover both cases. The define was only used for tests — not in the implementation
in seccomp-util.c. So let's get rid of SECCOMP_MEMORY_DENY_WRITE_EXECUTE_BROKEN
and encode the right condition directly in tests.
This is useful on systems like NixOS, where python3 is not in
/usr/bin/python3 as well as for people using alternative ways to
install python such as virtualenv/pyenv.
This filters out "." and ".." from glob results. Fixes#5655 and #5644.
Any judgements on whether the path is "safe" are removed. We will not remove
"/" under any name (including "/../" and such), but we will remove stuff that
is specified using paths that include "//", "/./" and "/../". Such paths can be
created when joining strings automatically, or for other reasons, and people
generally know what ".." and "." is.
Tests are added to make sure that the helper functions behave as expected.
Shell scripts should be executable so that meson reports their
invocation succinctly (does not print 'sh' '-e').
Python scripts should not be executable so that meson does the
detection of the right python binary itself.
Add -u everywhere to catch potential errors.
The indentation for emacs'es meson-mode is added .dir-locals.
All files are reindented automatically, using the lasest meson-mode from git.
Indentation should now be fairly consistent.
This makes the helper binaries significantly bigger (in some cases, the final
size depends on link options and optimization level), and is only useful for
distributions which want to provide the option to install udev without systemd.
As the linking is improved, the difference between the columns might shrink,
but it's unlikely that linking libshared statically could ever be more
efficient.
E.g. with -O0, no -flto:
(static) (shared)
src/udev/ata_id 999176 85696
src/udev/cdrom_id 1024344 111656
src/udev/collect 990344 81280
src/udev/scsi_id 1023592 115656
src/udev/v4l_id 811736 17744
When linked dynamically, install_rpath must be specified, so add that.
test-dlopen is a very simple binary that is only linked with libc and
libdl. From it we do dlopen() on the nss and pam modules to check that they are
linked to all necessary libs.
(meson-compiled nss modules are linked to less libraries, for whatever reason.
I suspected that some deps are missing, but it turns out that my suspicions
weren't justified, and the modules load just fine. Let's keep the test though,
it is very quick, and might detect missing linkage in the future.)
This simplifies things and leads to a smaller installation footprint.
libsystemd_internal and libsystemd_journal_internal are linked into
libystemd-shared and available to all programs linked to libsystemd-shared.
libsystemd_journal_internal is not needed anymore, and libsystemd-shared
is used everwhere. The few exceptions are: libsystemd.so, test-engine,
test-bus-error, and various loadable modules.
The tests are included under the conditional too, instead of specifying
'ENABLE_NETWORKD' in the test definition array, because libnetworkd_core
dependency is undefined if networkd is disabled.
With mesonbuid/meson#1545, meson does not propagate deps of a library
when linking with that library. That's of course the right thing to do,
but it exposes a bunch of missing deps.
This compiles with both meson-0.39.1 and meson-git + pr/1545.
This is slightly complicated by the fact that files('libudev.h') cannot be used
as an argument in custom_target command (string is required). This restriction
should be lifted in future versions of mesons, so this could be simplified.
This is quite messy. I think libtool might have been using something
like -Wl,--whole-archive, but I don't think meson has support for that.
For now, just recompile all the sources for linking into libsystemd
directly. This should not matter much for efficiency, since it's a
few small files.
This is what autoconf-based build does, and it makes test-bus-error and
test-engine able to access the bus error mapping table. OTOH, this is a heavy
price to pay: it would be excellent to link libcore.a to libsystemd-shared-NNN.so.
Otherwise we duplicate the same code in 'systemd' and 'libsystemd-shared-NNN.so'.
-rwxrwxr-x. 1 4075544 Apr 6 20:30 systemd* <-- libcore linked against libsystemd-shared.so
-rwxrwxr-x. 1 5596504 Apr 9 14:07 systemd* <-- libcore linked against libsystemd-shared.a
v2:
- update for 6b5cf3ea62
Tests can be run with 'ninja-build test' or using 'mesontest'.
'-Dtests=unsafe' can be used to include the "unsafe" tests in the
test suite, same as with autotools.
v2:
- use more conf.get guards are optional components
- declare deps on generated headers for test-{af,arphrd,cap}-list
v3:
- define environment for tests
Most test don't need this, but to be consistent with autotools-based build, and
to avoid questions which tests need it and which don't, set the same environment
for all tests.
v4:
- rework test generation
Use a list of lists to define each test. This way we can reduce the
boilerplate somewhat, although the test listings are still pretty verbose. We
can also move the definitions of the tests to the subdirs. Unfortunately some
subdirs are included earlier than some of the libraries that test binaries
are linked to. So just dump all definitions of all tests that cannot be
defined earlier into src/test. The `executable` definitions are still at the
top level, so the binaries are compiled into the build root.
v5:
- tag test-dnssec-complex as manual
v6:
- fix HAVE_LIBZ typo
- add missing libgobject/libgio defs
- mark test-qcow2 as manual
We defined both $(VERSION) and $(PACKAGE_VERSION) with the same contents.
$(PACKAGE_VERSION) is slightly more descriptive, so settle on that, and
drop the other define.
Package build machines may have module loading disabled, thus AF_ALG
sockets are not available. Skip the tests that cover those (khash and
id128) instead of failing them in this case.
Fixes#5524
Sometimes it's useful to provide a default value during an environment
expansion, if the environment variable isn't already set.
For instance $XDG_DATA_DIRS is suppose to default to:
/usr/local/share/:/usr/share/
if it's not yet set. That means callers wishing to augment
XDG_DATA_DIRS need to manually add those two values.
This commit changes replace_env to support the following shell
compatible default value syntax:
XDG_DATA_DIRS=/foo:${XDG_DATA_DIRS:-/usr/local/share/:/usr/share}
Likewise, it's useful to provide an alternate value during an
environment expansion, if the environment variable isn't already set.
For instance, $LD_LIBRARY_PATH will inadvertently search the current
working directory if it starts or ends with a colon, so the following
is usually wrong:
LD_LIBRARY_PATH=/foo/lib:${LD_LIBRARY_PATH}
To address that, this changes replace_env to support the following
shell compatible alternate value syntax:
LD_LIBRARY_PATH=/foo/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
[zj: gate the new syntax under REPLACE_ENV_ALLOW_EXTENDED switch, so
existing callers are not modified.]
In the future we might want to allow additional syntax (for example
"unset VAR". But let's check that the data we're getting does not contain
anything unexpected.
(Only in environment.d files.)
We have only basic compatibility with shell syntax, but specifying variables
without using braces is probably more common, and I think a lot of people would
be surprised if this didn't work.
merge_env_file is a new function, that's like load_env_file, but takes a
pre-existing environment as an input argument. New environment entries are
merged. Variable expansion is performed.
Falling back to the process environment is supported (when a flag is set).
Alternatively this could be implemented as passing an additional fallback
environment array, but later on we're adding another flag to allow braceless
expansion, and the two flags can be combined in one arg, so there's less
stuff to pass around.
If an environment array has duplicates, strv_env_get_n returns
the results for the first match. This is wrong, because later
entries in the environment are supposed to replace earlier
entries.
strv_env_replace was calling env_match(), which in effect allowed multiple
values for the same key to be inserted into the environment block. That's
pointless, because APIs to access variables only return a single value (the
latest entry), so it's better to keep the block clean, i.e. with just a single
entry for each key.
Add a new helper function that simply tests if the part before '=' is equal in
two strings and use that in strv_env_replace.
In load_env_file_push, use strv_env_replace to immediately replace the previous
assignment with a matching name.
Afaict, none of the callers are materially affected by this change, but it
seems like some pointless work was being done, if the same value was set
multiple times. We'd go through parsing and assigning the value for each
entry. With this change, we handle just the last one.
The output of processes can be gathered, and passed back to the callee.
(This commit just implements the basic functionality and tests.)
After the preparation in previous commits, the change in functionality is
relatively simple. For coding convenience, alarm is prepared *before* any
children are executed, and not before. This shouldn't matter usually, since
just forking of the children should be pretty quick. One could also argue that
this is more correct, because we will also catch the case when (for whatever
reason), forking itself is slow.
Three callback functions and three levels of serialization are used:
- from individual generator processes to the generator forker
- from the forker back to the main process
- deserialization in the main process
v2:
- replace an structure with an indexed array of callbacks
There is a slight change in behaviour: the user manager for root will create a
temporary file in /run/systemd, not /tmp. I don't think this matters, but
simplifies implementation.
Commit 436e916ea introduced the assumption into test-stat-util that /run
is a tmpfs mount point. This is not the case in build chroots such as
Fedora's mock or Debian's sbuild. So only assert that /run is a tmpfs
and not a btrfs if /run is actually a mount point. This will then still
be asserted with installed tests.
This changes the file copy logic of machined to set the UID/GID of all
copied files to 0 if the host and container do not share the same user
namespace.
Fixes: #4078
This adds a unified "copy_flags" parameter to all copy_xyz() function
calls, replacing the various boolean flags so far used. This should make
many invocations more readable as it is clear what behaviour is
precisely requested. This also prepares ground for adding support for
more modes later on.
Drop the TEST_DATA_DIR macro as this was using alloca() within a
function call which is allegedly unsafe. So add a "suffix" argument to
get_testdata_dir() instead and call that directly.
Rename get_exe_relative_testdata_dir() to get_testdata_dir() and move
the env var check into that, so that everything interesting happens at
the same place.
That way, if the test directory does not exist we don't leave behind
temporary files (as in that case or on test failure the cleanup actions
don't run).
Only one test case is added, but it is enough to check basic sanity of the
code (single-line and binary fields and trusted fields, allocation and freeing).
It is useful to package test-* binaries and run them as root under
autopkgtest or manually on particular machines. They currently have a
built-in hardcoded absolute path to their test data, which does not work
when running the test programs from any other path than the original
build directory.
By default, make the tests look for their data in
<test_exe_directory>/testdata/ so that they can be called from any
directory (provided that the corresponding test data is installed
correctly). As we don't have a fixed static path in the build tree (as
build and source tree are independent), set $TEST_DIR with "make check"
to point to <srcdir>/test/, as we previously did with an automake
variable.
ReadOnlyPaths=, ProtectHome=, InaccessiblePaths= and ProtectSystem= are
about restricting access and little more, hence they should be disabled
if PermissionsStartOnly= is used or ExecStart= lines are prefixed with a
"+". Do that.
(Note that we will still create namespaces and stuff, since that's about
a lot more than just permissions. We'll simply disable the effect of
the four options mentioned above, but nothing else mount related.)
This also adds a test for this, to ensure this works as intended.
No documentation updates, as the documentation are already vague enough
to support the new behaviour ("If true, the permission-related execution
options…"). We could clarify this further, but I think we might want to
extend the switches' behaviour a bit more in future, hence leave it at
this for now.
Fixes: #5308
5dd11ab5f3 did a similar change for conf_files_list_strv().
Here we do the same for conf_files_list() and conf_files_list_nulstr().
No change for existing users. Tests are added.
Add a bit of code that tries to get the right parameter order in place
for some of the better known architectures, and skips
restrict_namespaces for other archs.
This also bypasses the test on archs where we don't know the right
order.
In this case I didn't bother with testing the case where no filter is
applied, since that is hopefully just an issue for now, as there's
nothing stopping us from supporting more archs, we just need to know
which order is right.
Fixes: #5241
The compiler warning is a false positive, since n_addresses is always
initialised on the success path from parse_argv(), but the compiler
obviously can’t work that out.
Fixes:
src/test/test-nss.c:426:9: warning: 'n_addresses' may be used uninitialized in this function [-Wmaybe-uninitialized]
On i386 we block the old mmap() call entirely, since we cannot properly
filter it. Thankfully it hasn't been used by glibc since quite some
time.
Fixes: #5240
If a unit foobar@.service stored below /usr is instantiated via a
symlink foobar@quux.service also below /usr, then we should consider the
instance statically enabled, while the template itself should continue
to be considered enabled/disabled/static depending on its [Install]
section.
In order to implement this we'll now look for enablement symlinks in all
unit search paths, not just in the config and runtime dirs.
Fixes: #5136
This is similar to RootDirectory= but mounts the root file system from a
block device or loopback file instead of another directory.
This reuses the image dissector code now used by nspawn and
gpt-auto-discovery.
explicit_bzero was added in glibc 2.25. Make use of it.
explicit_bzero is hardcoded to zero the memory, so string erase now
truncates the string, instead of overwriting it with 'x'. This causes
a visible difference only in the journalctl case.
Gcc7 is smarter about detecting unused functions and detects those two functions
which are unused in tests. But gperf generates them for us, so let's instead of removing
tell gcc that we know they might be unused in the test code.
In file included from ../src/test/test-af-list.c:29:0:
./src/basic/af-from-name.h:140:1: warning: ‘lookup_af’ defined but not used [-Wunused-function]
lookup_af (register const char *str, register size_t len)
^~~~~~~~~
In file included from ../src/test/test-arphrd-list.c:29:0:
./src/basic/arphrd-from-name.h:125:1: warning: ‘lookup_arphrd’ defined but not used [-Wunused-function]
lookup_arphrd (register const char *str, register size_t len)
^~~~~~~~~~~~~
usec_t is always 64bit, which means it can cover quite a number of
years. However, 4 digit year display and glibc limitations around time_t
limit what we can actually parse and format. Let's make this explicit,
so that we never end up formatting dates we can#t parse and vice versa.
Note that this is really just about formatting/parsing. Internal
calculations with times outside of the formattable range are not
affected.
networkd: Allow ':' in label
This reverts a341dfe563 and takes a slightly different approach: anything is
allowed in network interface labels, but network interface names are verified
as before (i.e. amongst other things, no colons are allowed there).
If chase_symlinks() encouters an absolute symlink, it resets the todo
buffer to just the newly discovered symlink and discards any of the
remaining previous symlink path. Regardless of whether or not the
symlink is absolute or relative, we need to preserve the remainder of
the path that has not yet been resolved.
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
The AF_VSOCK address family facilitates guest<->host communication on
VMware and KVM (virtio-vsock). Adding support to systemd allows guest
agents to be launched through .socket unit files. Today guest agents
are stand-alone daemons running inside guests that do not take advantage
of systemd socket activation.
gperf-3.1 generates lookup functions that take a size_t length
parameter instead of unsigned int. Test for this at configure time.
Fixes: https://github.com/systemd/systemd/issues/5039
In preparation for reusing the image dissector in the GPT auto-discovery
logic, only optionally fail the dissection when we can't identify a root
partition.
In the GPT auto-discovery we are completely fine with any kind of root,
given that we run when it is already mounted and all we do is find some
additional auxiliary partitions on the same disk.
This improves kernel command line parsing in a number of ways:
a) An kernel option "foo_bar=xyz" is now considered equivalent to
"foo-bar-xyz", i.e. when comparing kernel command line option names "-" and
"_" are now considered equivalent (this only applies to the option names
though, not the option values!). Most of our kernel options used "-" as word
separator in kernel command line options so far, but some used "_". With
this change, which was a source of confusion for users (well, at least of
one user: myself, I just couldn't remember that it's systemd.debug-shell,
not systemd.debug_shell). Considering both as equivalent is inspired how
modern kernel module loading normalizes all kernel module names to use
underscores now too.
b) All options previously using a dash for separating words in kernel command
line options now use an underscore instead, in all documentation and in
code. Since a) has been implemented this should not create any compatibility
problems, but normalizes our documentation and our code.
c) All kernel command line options which take booleans (or are boolean-like)
have been reworked so that "foobar" (without argument) is now equivalent to
"foobar=1" (but not "foobar=0"), thus normalizing the handling of our
boolean arguments. Specifically this means systemd.debug-shell and
systemd_debug_shell=1 are now entirely equivalent.
d) All kernel command line options which take an argument, and where no
argument is specified will now result in a log message. e.g. passing just
"systemd.unit" will no result in a complain that it needs an argument. This
is implemented in the proc_cmdline_missing_value() function.
e) There's now a call proc_cmdline_get_bool() similar to proc_cmdline_get_key()
that parses booleans (following the logic explained in c).
f) The proc_cmdline_parse() call's boolean argument has been replaced by a new
flags argument that takes a common set of bits with proc_cmdline_get_key().
g) All kernel command line APIs now begin with the same "proc_cmdline_" prefix.
h) There are now tests for much of this. Yay!
Check if the parsed seconds value fits in an integer *after*
multiplying by USEC_PER_SEC, otherwise a large value can trigger
modulo by zero during normalization.
This is useful for reusing the dissector logic in the gpt-auto-discovery logic:
there we really don't want to use MBR or naked file systems as root device.
Let's use chase_symlinks() when looking for /etc/os-release and
/usr/lib/os-release as these files might be symlinks (and actually are IRL on
some distros).
PR_SET_MM_ARG_START allows us to relatively cleanly implement process renaming.
However, it's only available with privileges. Hence, let's try to make use of
it, and if we can't fall back to the traditional way of overriding argv[0].
This removes size restrictions on the process name shown in argv[] at least for
privileged processes.
Fixes:
```
$ ./libtool --mode=execute valgrind --leak-check=full ./test-fs-util
...
==22871==
==22871== 27 bytes in 1 blocks are definitely lost in loss record 1 of 1
==22871== at 0x4C2FC47: realloc (vg_replace_malloc.c:785)
==22871== by 0x4E86D05: strextend (string-util.c:726)
==22871== by 0x4E8F347: chase_symlinks (fs-util.c:712)
==22871== by 0x109EBF: test_chase_symlinks (test-fs-util.c:75)
==22871== by 0x10C381: main (test-fs-util.c:305)
==22871==
```
Closes#4888
This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow
defining arbitrary bind mounts specific to particular services. This is
particularly useful for services with RootDirectory= set as this permits making
specific bits of the host directory available to chrooted services.
The two new settings follow the concepts nspawn already possess in --bind= and
--bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these
latter options should probably be renamed to BindPaths= and BindReadOnlyPaths=
too).
Fixes: #3439
Let's store the invocation ID in the per-service keyring as a root-owned key,
with strict access rights. This has the advantage over the environment-based ID
passing that it also works from SUID binaries (as they key cannot be overidden
by unprivileged code starting them), in contrast to the secure_getenv() based
mode.
The invocation ID is now passed in three different ways to a service:
- As environment variable $INVOCATION_ID. This is easy to use, but may be
overriden by unprivileged code (which might be a bad or a good thing), which
means it's incompatible with SUID code (see above).
- As extended attribute on the service cgroup. This cannot be overriden by
unprivileged code, and may be queried safely from "outside" of a service.
However, it is incompatible with containers right now, as unprivileged
containers generally cannot set xattrs on cgroupfs.
- As "invocation_id" key in the kernel keyring. This has the benefit that the
key cannot be changed by unprivileged service code, and thus is safe to
access from SUID code (see above). But do note that service code can replace
the session keyring with a fresh one that lacks the key. However in that case
the key will not be owned by root, which is easily detectable. The keyring is
also incompatible with containers right now, as it is not properly namespace
aware (but this is being worked on), and thus most container managers mask
the keyring-related system calls.
Ideally we'd only have one way to pass the invocation ID, but the different
ways all have limitations. The invocation ID hookup in journald is currently
only available on the host but not in containers, due to the mentioned
limitations.
How to verify the new invocation ID in the keyring:
# systemd-run -t /bin/sh
Running as unit: run-rd917366c04f847b480d486017f7239d6.service
Press ^] three times within 1s to disconnect TTY.
# keyctl show
Session Keyring
680208392 --alswrv 0 0 keyring: _ses
250926536 ----s-rv 0 0 \_ user: invocation_id
# keyctl request user invocation_id
250926536
# keyctl read 250926536
16 bytes of data in key:
9c96317c ac64495a a42b9cd7 4f3ff96b
# echo $INVOCATION_ID
9c96317cac64495aa42b9cd74f3ff96b
# ^D
This creates a new transient service runnint a shell. Then verifies the
contents of the keyring, requests the invocation ID key, and reads its payload.
For comparison the invocation ID as passed via the environment variable is also
displayed.
This adds support for discovering and making use of properly tagged dm-verity
data integrity partitions. This extends both systemd-nspawn and systemd-dissect
with a new --root-hash= switch that takes the root hash to use for the root
partition, and is otherwise fully automatic.
Verity partitions are discovered automatically by GPT table type UUIDs, as
listed in
https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/
(which I updated prior to this change, to include new UUIDs for this purpose.
mkosi with https://github.com/systemd/mkosi/pull/39 applied may generate images
that carry the necessary integrity data. With that PR and this commit, the
following simply lines suffice to boot up an integrity-protected container image:
```
# mkdir test
# cd test
# mkosi --verity
# systemd-nspawn -i ./image.raw -bn
```
Note that mkosi writes the image file to "image.raw" next to a a file
"image.roothash" that contains the root hash. systemd-nspawn will look for that
file and use it if it exists, in case --root-hash= is not specified explicitly.
This adds two new APIs to systemd:
- loop-util.h is a simple internal API for allocating, setting up and releasing
loopback block devices.
- dissect-image.h is an internal API for taking apart disk images and figuring
out what the purpose of each partition is.
Both APIs are basically refactored versions of similar code in nspawn. This
rework should permit us to reuse this in other places than just nspawn in the
future. Specifically: to implement RootImage= in the service image, similar to
RootDirectory=, but operating on a disk image; to unify the gpt-auto-discovery
generator code with the discovery logic in nspawn; to add new API to machined
for determining the OS version of a disk image (i.e. not just running
containers). This PR does not make any such changes however, it just provides
the new reworked API.
The reworked code is also slightly more powerful than the nspawn original one.
When pointing it to an image or block device with a naked file system (i.e. no
partition table) it will simply make it the root device.
Let's accept "µs" as alternative time unit for microseconds. We already accept
"us" and "usec" for them, lets extend on this and accept the proper scientific
unit specification too.
We will never output this as time unit, but it's fine to accept it, after all
we are pretty permissive with time units already.
This new flag controls whether to consider a problem if the referenced path
doesn't actually exist. If specified it's OK if the final file doesn't exist.
Note that this permits one or more final components of the path not to exist,
but these must not contain "../" for safety reasons (or, to be extra safe,
neither "./" and a couple of others, i.e. what path_is_safe() permits).
This new flag is useful when resolving paths before issuing an mkdir() or
open(O_CREAT) on a path, as it permits that the file or directory is created
later.
The return code of chase_symlinks() is changed to return 1 if the file exists,
and 0 if it doesn't. The latter is only returned in case CHASE_NON_EXISTING is
set.
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
Previously, we'd generate an EINVAL error if it is attempted to escape a root
directory with relative ".." symlinks. With this commit this is changed so that
".." from the root directory is a NOP, following the kernel's own behaviour
where /.. is equivalent to /.
As suggested by @keszybz.
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
This adds an API for retrieving an app-specific machine ID to sd-id128.
Internally it calculates HMAC-SHA256 with an 128bit app-specific ID as payload
and the machine ID as key.
(An alternative would have been to use siphash for this, which is also
cryptographically strong. However, as it only generates 64bit hashes it's not
an obvious choice for generating 128bit IDs.)
Fixes: #4667
Let's take inspiration from bluez's ELL library, and let's move our
cryptographic primitives away from libgcrypt and towards the kernel's AF_ALG
cryptographic userspace API.
In the long run we should try to remove the dependency on libgcrypt, in favour
of using only the kernel's own primitives, however this is unlikely to happen
anytime soon, as the kernel does not provide Elliptic Curve APIs to userspace
at this time, and we need them for the DNSSEC cryptographic.
This commit only covers hashing for now, symmetric encryption/decryption or
even asymetric encryption/decryption is not available for now.
"khash" is little more than a lightweight wrapper around the kernel's AF_ALG
socket API.
strtoul() parses leading whitespace and an optional sign;
check that the first character is a digit to prevent odd
specifications like "00: 00: 00" and "-00:+00/-1".
"*-*~1" => The last day of every month
"*-02~3..5" => The third, fourth, and fifth last days in February
"Mon 05~07/1" => The last Monday in May
Resolves#3861
core: add new RestrictNamespaces= unit file setting
Merging, not rebasing, because this touches many files and there were tree-wide cleanups in the mean time.
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
This new setting permits restricting whether namespaces may be created and
managed by processes started by a unit. It installs a seccomp filter blocking
certain invocations of unshare(), clone() and setns().
RestrictNamespaces=no is the default, and does not restrict namespaces in any
way. RestrictNamespaces=yes takes away the ability to create or manage any kind
of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces
so that only mount and IPC namespaces may be created/managed, but no other
kind of namespaces.
This setting should be improve security quite a bit as in particular user
namespacing was a major source of CVEs in the kernel in the past, and is
accessible to unprivileged processes. With this setting the entire attack
surface may be removed for system services that do not make use of namespaces.
In case of running test-execute on systems with systemd < v232, several
tests like privatedevices or protectkernelmodules fail because
/run/systemd/inaccessible/ doesn't exist. In these cases, we should skip
tests to avoid unnecessary errors.
See also https://github.com/systemd/systemd/pull/4243#issuecomment-253665566
This stripping is contolled by a new boolean parameter. When the parameter
is true, it means that the caller does not care about the distinction between
initrd and real root, and wants to act on both rd-dot-prefixed and unprefixed
parameters in the initramfs, and only on the unprefixed parameters in real
root. If the parameter is false, behaviour is the same as before.
Changes by caller:
log.c (systemd.log_*): changed to accept rd-dot-prefix params
pid1: no change, custom logic
cryptsetup-generator: no change, still accepts rd-dot-prefix params
debug-generator: no change, does not accept rd-dot-prefix params
fsck: changed to accept rd-dot-prefix params
fstab-generator: no change, custom logic
gpt-auto-generator: no change, custom logic
hibernate-resume-generator: no change, does not accept rd-dot-prefix params
journald: changed to accept rd-dot-prefix params
modules-load: no change, still accepts rd-dot-prefix params
quote-check: no change, does not accept rd-dot-prefix params
udevd: no change, still accepts rd-dot-prefix params
I added support for "rd." params in the three cases where I think it's
useful: logging, fsck options, journald forwarding options.
Fixes:
```
==10750==
==10750== HEAP SUMMARY:
==10750== in use at exit: 96 bytes in 3 blocks
==10750== total heap usage: 1,711 allocs, 1,708 frees, 854,545 bytes
allocated
==10750==
==10750== 96 (64 direct, 32 indirect) bytes in 1 blocks are definitely
lost in loss record 3 of 3
==10750== at 0x4C2DA60: calloc (vg_replace_malloc.c:711)
==10750== by 0x4EB3BDA: calendar_spec_from_string
(calendarspec.c:771)
==10750== by 0x109675: test_hourly_bug_4031 (test-calendarspec.c:118)
==10750== by 0x10A00E: main (test-calendarspec.c:202)
==10750==
==10750== LEAK SUMMARY:
==10750== definitely lost: 64 bytes in 1 blocks
==10750== indirectly lost: 32 bytes in 2 blocks
==10750== possibly lost: 0 bytes in 0 blocks
==10750== still reachable: 0 bytes in 0 blocks
==10750== suppressed: 0 bytes in 0 blocks
==10750==
==10750== For counts of detected and suppressed errors, rerun with: -v
==10750== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```
When a unit file is invalid, we'd return an error without any details:
$ systemctl --root=/ enable testing@instance.service
Failed to enable: Invalid argument.
Fix things to at least print the offending file name:
$ systemctl enable testing@instance.service
Failed to enable unit: File testing@instance.service: Invalid argument
$ systemctl --root=/ enable testing@instance.service
Failed to enable unit, file testing@instance.service: Invalid argument.
A real fix would be to pass back a proper error message from conf-parser.
But this would require major surgery, since conf-parser functions now
simply print log errors, but we would need to return them over the bus.
So let's just print the file name, to indicate where the error is.
(Incomplete) fix for #4210.
Lets go further and make /lib/modules/ inaccessible for services that do
not have business with modules, this is a minor improvment but it may
help on setups with custom modules and they are limited... in regard of
kernel auto-load feature.
This change introduce NameSpaceInfo struct which we may embed later
inside ExecContext but for now lets just reduce the argument number to
setup_namespace() and merge ProtectKernelModules feature.
Let's make sure people invoking STRV_FOREACH_BACKWARDS() as a single statement
of an if statement don't fall into a trap, and find the tail for the list via
strv_length().
If the new item is inserted before the first item in the list, then the
head must be updated as well.
Add a test to the list unit test to check for this.
This adds logic to chase symlinks for all mount points that shall be created in
a namespace environment in userspace, instead of leaving this to the kernel.
This has the advantage that we can correctly handle absolute symlinks that
shall be taken relative to a specific root directory. Moreover, we can properly
handle mounts created on symlinked files or directories as we can merge their
mounts as necessary.
(This also drops the "done" flag in the namespace logic, which was never
actually working, but was supposed to permit a partial rollback of the
namespace logic, which however is only mildly useful as it wasn't clear in
which case it would or would not be able to roll back.)
Fixes: #3867
According to its manual page, flags given to mkostemp(3) shouldn't include
O_RDWR, O_CREAT or O_EXCL flags as these are always included. Beyond
those, the only flag that all callers (except a few tests where it
probably doesn't matter) use is O_CLOEXEC, so set that unconditionally.
This adds the boolean RemoveIPC= setting to service, socket, mount and swap
units (i.e. all unit types that may invoke processes). if turned on, and the
unit's user/group is not root, all IPC objects of the user/group are removed
when the service is shut down. The life-cycle of the IPC objects is hence bound
to the unit life-cycle.
This is particularly relevant for units with dynamic users, as it is essential
that no objects owned by the dynamic users survive the service exiting. In
fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set.
In order to communicate the UID/GID of an executed process back to PID 1 this
adds a new "user lookup" socket pair, that is inherited into the forked
processes, and closed before the exec(). This is needed since we cannot do NSS
from PID 1 due to deadlock risks, However need to know the used UID/GID in
order to clean up IPC owned by it if the unit shuts down.
This adds parse_nice() that parses a nice level and ensures it is in the right
range, via a new nice_is_valid() helper. It then ports over a number of users
to this.
No functional changes.
Beef up the existing var_tmp() call, rename it to var_tmp_dir() and add a
matching tmp_dir() call (the former looks for the place for /var/tmp, the
latter for /tmp).
Both calls check $TMPDIR, $TEMP, $TMP, following the algorithm Python3 uses.
All dirs are validated before use. secure_getenv() is used in order to limite
exposure in suid binaries.
This also ports a couple of users over to these new APIs.
The var_tmp() return parameter is changed from an allocated buffer the caller
will own to a const string either pointing into environ[], or into a static
const buffer. Given that environ[] is mostly considered constant (and this is
exposed in the very well-known getenv() call), this should be OK behaviour and
allows us to avoid memory allocations in most cases.
Note that $TMPDIR and friends override both /var/tmp and /tmp usage if set.
This patch improves parsing and generation of timestamps and calendar
specifications in two ways:
- The week day is now always printed in the abbreviated English form, instead
of the locale's setting. This makes sure we can always parse the week day
again, even if the locale is changed. Given that we don't follow locale
settings for printing timestamps in any other way either (for example, we
always use 24h syntax in order to make uniform parsing possible), it only
makes sense to also stick to a generic, non-localized form for the timestamp,
too.
- When parsing a timestamp, the local timezone (in its DST or non-DST name)
may be specified, in addition to "UTC". Other timezones are still not
supported however (not because we wouldn't want to, but mostly because libc
offers no nice API for that). In itself this brings no new features, however
it ensures that any locally formatted timestamp's timezone is also parsable
again.
These two changes ensure that the output of format_timestamp() may always be
passed to parse_timestamp() and results in the original input. The related
flavours for usec/UTC also work accordingly. Calendar specifications are
extended in a similar way.
The man page is updated accordingly, in particular this removes the claim that
timestamps systemd prints wouldn't be parsable by systemd. They are now.
The man page previously showed invalid timestamps as examples. This has been
removed, as the man page shouldn't be a unit test, where such negative examples
would be useful. The man page also no longer mentions the names of internal
functions, such as format_timestamp_us() or UNIX error codes such as EINVAL.
Depending on how binutils was configured and the --enable-fast-install
configure option, the test binary might be called either name.
Fixes: https://github.com/systemd/systemd/issues/3838
The condition tests for hostname will fail if hostname looks like an id128.
The test function attempts to convert hostname to an id128, and if that
succeeds compare it to the machine ID (presumably because the 'hostname'
condition test is overloaded to also test machine ID). That will typically
fail, and unfortunately the 'mock' utility generates a random hostname that
happens to have the same format as an id128, thus causing a test failure.
Accept both files with and without trailing newlines. Apparently some rkt
releases generated them incorrectly, missing the trailing newlines, and we
shouldn't break that.
User expectations are broken when "systemctl enable /some/path/service.service"
behaves differently to "systemctl link ..." followed by "systemctl enable".
From user's POV, "enable" with the full path just combines the two steps into
one.
Fixes#3010.
This adds support for a TasksMax=40% syntax for specifying values relative to
the system's configured maximum number of processes. This is useful in order to
neatly subdivide the available room for tasks within containers.
We currently have code to read and write files containing UUIDs at various
places. Unify this in id128-util.[ch], and move some other stuff there too.
The new files are located in src/libsystemd/sd-id128/ (instead of src/shared/),
because they are actually the backend of sd_id128_get_machine() and
sd_id128_get_boot().
In follow-up patches we can use this reduce the code in nspawn and
machine-id-setup by adopted the common implementation.
Let's lot at LOG_NOTICE about any processes that we are going to
SIGKILL/SIGABRT because clean termination of them didn't work.
This turns the various boolean flag parameters to cg_kill(), cg_migrate() and
related calls into a single binary flags parameter, simply because the function
now gained even more parameters and the parameter listed shouldn't get too
long.
Logging for killing processes is done either when the kill signal is SIGABRT or
SIGKILL, or on explicit request if KILL_TERMINATE_AND_LOG instead of LOG_TERMINATE
is passed. This isn't used yet in this patch, but is made use of in a later
patch.
strv_make_nulstr was creating a nulstr which was not a valid nulstr,
because it was missing the terminating NUL. This didn't cause any issues,
because strv_parse_nulstr correctly parsed the result, using the
separately specified length.
But it's confusing to have something called nulstr which really isn't.
It is likely that somebody will try to use strv_make_nulstr() in
some other place, incorrectly.
This patch changes strv_parse_nulstr() to produce a valid nulstr, and
changes the output length parameter to be the minimum number of bytes
which can be later on parsed by strv_parse_nulstr(). This allows the
only user in ask-password-api to be slightly simplified.
Based-on-patch-by: Jean-Sébastien Bour <jean-sebastien@bour.name>
Fixes#3689.
==1447== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1
==1447== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==1447== by 0x5350F19: strdup (in /usr/lib64/libc-2.23.so)
==1447== by 0x4E9D435: strv_new_ap (strv.c:166)
==1447== by 0x4E9D5FA: strv_new (strv.c:199)
==1447== by 0x10E665: test_strv_fnmatch (test-strv.c:693)
==1447== by 0x10EAD5: main (test-strv.c:763)
==1447==
* networkd: condition_test() can return a negative error, handle that
If a condition check fails with an error we should not consider the check
successful. Fix that.
We should probably also improve logging in this case, but for now, let's just
unbreak this breakage.
Fixes: #3236
* condition: handle unrecognized architectures nicer
When we encounter a check for an architecture we don't know we should not
let the condition check fail with an error code, but instead simply return
false. After all the architecture might just be newer than the ones we know, in
which case it's certainly not our local one.
Fixes: #3236
Link as many binaries as possible with it, to save storage space.
Preserve the static libshared and libbasic for use in libraries, nss
modules and udev.
Libraries need to be static in order to avoid polluting the symbol
namespace.
Udev needs to be static so downstream can avoid strict version dependencies
with the systemd package, and this can complicate upgrade scenarios.