IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This also moves the check for writable paths from test-execute to TEST-34.
Closes#10337.
(cherry picked from commit f01f70a9a3)
(cherry picked from commit 40053e60f5)
test-execute checks that only /var/lib/private/waldo is writable, but there are
some filesystems that are always writable and excluded. Add /sys/devices/system/cpu
which is created by lxcfs.
Fixes https://github.com/systemd/systemd/issues/23263
(cherry picked from commit 646cba5c42)
Linux 5.15 broke kernel API:
e70344c059
Previously setting IOPRIO_CLASS_NONE for a process would then report
IOPRIO_CLASS_NONE back. But since 5.15 it reports IOPRIO_CLASS_BE
instead. Since IOPRIO_CLASS_NONE is an alias for a special setting of
IOPRIO_CLASS_BE this makes some sense, but it's also a kernel API
breakage that our testsuite trips up on.
(I made some minimal effort to inform the kernel people about this API
breakage during the 5.15 rc phase, but noone was interested.)
Either way let's hadle this gracefully in our test suite and accept
"best-effort" too when "none" was set.
(This is only triggable if the tests are run on 5.15 with full privs)
Those are all consumed by our parser, so they all support comments.
I was considering whether they should have a license header at all,
but in the end I decided to add it because those files are often created
by copying parts of real unit files. And if the real ones have a license,
then those might as well. It's easier to add it than to make an exception.
This adds a high level test verifying that syscall filtering in
combination with a simple architecture filter for the "native"
architecture works fine.
Currently there does not exist a way to specify a path relative to which
all binaries executed by Exec should be found. The only way is to
specify the absolute path.
This change implements the functionality to specify a path relative to which
binaries executed by Exec*= can be found.
Closes#6308
Implement directives `NoExecPaths=` and `ExecPaths=` to control `MS_NOEXEC`
mount flag for the file system tree. This can be used to implement file system
W^X policies, and for example with allow-listing mode (NoExecPaths=/) a
compromised service would not be able to execute a shell, if that was not
explicitly allowed.
Example:
[Service]
NoExecPaths=/
ExecPaths=/usr/bin/daemon /usr/lib64 /usr/lib
Closes: #17942.
The cmp in ExecStartPost= was actually failing – ExecStartPost= has the
same StandardOutput as the rest of the service, so the output file is
truncated before cmp can compare it with the expected output – but the
test still passed because test_exec_standardoutput_truncate() calls
test(), which only checks the main result, rather than test_service(),
which checks the result of the whole service. Fix the test by merging
the ExecStartPost= into the ExecStart= – the cmp has to be part of the
same command line as the cat so that the file is not truncated between
the two processes.
This adds the ability to specify truncate:PATH for StandardOutput= and
StandardError=, similar to the existing append:PATH. The code is mostly
copied from the related append: code. Fixes#8983.
Define explicit action "kill" for SystemCallErrorNumber=.
In addition to errno code, allow specifying "kill" as action for
SystemCallFilter=.
---
v7: seccomp_parse_errno_or_action() returns -EINVAL if !HAVE_SECCOMP
v6: use streq_ptr(), let errno_to_name() handle bad values, kill processes,
init syscall_errno
v5: actually use seccomp_errno_or_action_to_string(), don't fail bus unit
parsing without seccomp
v4: fix build without seccomp
v3: drop log action
v2: action -> number
All backslashes that should be single in shell syntax need to be written as "\\" because
our parser will remove one level of quoting. Also, single quotes were doubly nested, which
cannot work.
Should fix the following message:
test-execute/exec-dynamicuser-statedir.service:16: Ignoring unknown escape sequences: "test $$(find / \( -path /var/tmp -o -path /tmp -o -path /proc -o -path /dev/mqueue -o -path /dev/shm -o -path /sys/fs/bpf -o -path /dev/.lxc \) -prune -o -type d -writable -print 2>/dev/null | sort -u | tr -d \\n) = /var/lib/private/quux/pief/var/lib/private/waldo"
https://tools.ietf.org/html/draft-knodel-terminology-02https://lwn.net/Articles/823224/
This gets rid of most but not occasions of these loaded terms:
1. scsi_id and friends are something that is supposed to be removed from
our tree (see #7594)
2. The test suite defines an API used by the ubuntu CI. We can remove
this too later, but this needs to be done in sync with the ubuntu CI.
3. In some cases the terms are part of APIs we call or where we expose
concepts the kernel names the way it names them. (In particular all
remaining uses of the word "slave" in our codebase are like this,
it's used by the POSIX PTY layer, by the network subsystem, the mount
API and the block device subsystem). Getting rid of the term in these
contexts would mean doing some major fixes of the kernel ABI first.
Regarding the replacements: when whitelist/blacklist is used as noun we
replace with with allow list/deny list, and when used as verb with
allow-list/deny-list.
The man pages state that the '+' prefix in Exec* directives should
ignore filesystem namespacing options such as PrivateTmp. Now it does.
This is very similar to #8842, just with PrivateTmp instead of
PrivateDevices.
Since libcap v2.29 the format of cap_to_text() has been changed which
makes certain `test-execute` subtest fail. Let's remove the offending
part of the output (dropped capabilities) to make it compatible with
both the old and the new libcap.
In some containers unshare() is made unavailable entirely. Let's deal
with this that more gracefully and disable our sandboxing of services
then, so that we work in a container, under the assumption the container
manager is then responsible for sandboxing if we can't do it ourselves.
Previously, we'd insist on sandboxing as soon as any form of BindPath=
is used. With this change we only insist on it if we have a setting like
that where source and destination differ, i.e. there's a mapping
established that actually rearranges things, and thus would result in
systematically different behaviour if skipped (as opposed to mappings
that just make stuff read-only/writable that otherwise arent').
(Let's also update a test that intended to test for this behaviour with
a more specific configuration that still triggers the behaviour with
this change in place)
Fixes: #13955
(For testing purposes unshare() can easily be blocked with
systemd-nspawn --system-call-filter=~unshare.)
Before only one comparison was allowed. Let's make this more flexible:
ConditionKernelVersion = ">=4.0" "<=4.5"
Fixes#12881.
This also fixes expressions like "ConditionKernelVersion=>" which would
evaluate as true.
These services are likely to coredump, and we expect that but aren't
interested in the coredump. Hence let's turn off processing by setting
RLIMIT_CORE to 0/0.
Those interfaces are created automatically when ip6_tunnel and ip6_gre loaded.
They break the test with exec-privatenetwork-yes.service.
C.f. 6b08180ca6.
... when no mount options are passed.
Change the code, to avoid the following failure in the newly added tests:
exec-temporaryfilesystem-rw.service: Executing: /usr/bin/sh -x -c
'[ "$(stat -c %a /var)" == 755 ]'
++ stat -c %a /var
+ '[' 1777 == 755 ']'
Received SIGCHLD from PID 30364 (sh).
Child 30364 (sh) died (code=exited, status=1/FAILURE)
(And I spotted an opportunity to use TAKE_PTR() at the end).
We can't remount the underlying superblocks, if we are inside a user
namespace and running Linux <= 4.17. We can only change the per-mount
flags (MS_REMOUNT | MS_BIND).
This type of mount() call can only change the per-mount flags, so we
don't have to worry about passing the right string options now.
Fixes#9914 ("Since 1beab8b was merged, systemd has been failing to start
systemd-resolved inside unprivileged containers" ... "Failed to re-mount
'/run/systemd/unit-root/dev' read-only: Operation not permitted").
> It's basically my fault :-). I pointed out we could remount read-only
> without MS_BIND when reviewing the PR that added TemporaryFilesystem=,
> and poettering suggested to change PrivateDevices= at the same time.
> I think it's safe to change back, and I don't expect anyone will notice
> a difference in behaviour.
>
> It just surprised me to realize that
> `TemporaryFilesystem=/tmp:size=10M,ro,nosuid` would not apply `ro` to the
> superblock (underlying filesystem), like mount -osize=10M,ro,nosuid does.
> Maybe a comment could note the kernel version (v4.18), that lets you
> remount without MS_BIND inside a user namespace.
This makes the code longer and I guess this function is still ugly, sorry.
One obstacle to cleaning it up is the interaction between
`PrivateDevices=yes` and `ReadOnlyPaths=/dev`. I've added a test for the
existing behaviour, which I think is now the correct behaviour.