IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Implement directives `NoExecPaths=` and `ExecPaths=` to control `MS_NOEXEC`
mount flag for the file system tree. This can be used to implement file system
W^X policies, and for example with allow-listing mode (NoExecPaths=/) a
compromised service would not be able to execute a shell, if that was not
explicitly allowed.
Example:
[Service]
NoExecPaths=/
ExecPaths=/usr/bin/daemon /usr/lib64 /usr/lib
Closes: #17942.
The cmp in ExecStartPost= was actually failing – ExecStartPost= has the
same StandardOutput as the rest of the service, so the output file is
truncated before cmp can compare it with the expected output – but the
test still passed because test_exec_standardoutput_truncate() calls
test(), which only checks the main result, rather than test_service(),
which checks the result of the whole service. Fix the test by merging
the ExecStartPost= into the ExecStart= – the cmp has to be part of the
same command line as the cat so that the file is not truncated between
the two processes.
This adds the ability to specify truncate:PATH for StandardOutput= and
StandardError=, similar to the existing append:PATH. The code is mostly
copied from the related append: code. Fixes#8983.
Define explicit action "kill" for SystemCallErrorNumber=.
In addition to errno code, allow specifying "kill" as action for
SystemCallFilter=.
---
v7: seccomp_parse_errno_or_action() returns -EINVAL if !HAVE_SECCOMP
v6: use streq_ptr(), let errno_to_name() handle bad values, kill processes,
init syscall_errno
v5: actually use seccomp_errno_or_action_to_string(), don't fail bus unit
parsing without seccomp
v4: fix build without seccomp
v3: drop log action
v2: action -> number
All backslashes that should be single in shell syntax need to be written as "\\" because
our parser will remove one level of quoting. Also, single quotes were doubly nested, which
cannot work.
Should fix the following message:
test-execute/exec-dynamicuser-statedir.service:16: Ignoring unknown escape sequences: "test $$(find / \( -path /var/tmp -o -path /tmp -o -path /proc -o -path /dev/mqueue -o -path /dev/shm -o -path /sys/fs/bpf -o -path /dev/.lxc \) -prune -o -type d -writable -print 2>/dev/null | sort -u | tr -d \\n) = /var/lib/private/quux/pief/var/lib/private/waldo"
https://tools.ietf.org/html/draft-knodel-terminology-02https://lwn.net/Articles/823224/
This gets rid of most but not occasions of these loaded terms:
1. scsi_id and friends are something that is supposed to be removed from
our tree (see #7594)
2. The test suite defines an API used by the ubuntu CI. We can remove
this too later, but this needs to be done in sync with the ubuntu CI.
3. In some cases the terms are part of APIs we call or where we expose
concepts the kernel names the way it names them. (In particular all
remaining uses of the word "slave" in our codebase are like this,
it's used by the POSIX PTY layer, by the network subsystem, the mount
API and the block device subsystem). Getting rid of the term in these
contexts would mean doing some major fixes of the kernel ABI first.
Regarding the replacements: when whitelist/blacklist is used as noun we
replace with with allow list/deny list, and when used as verb with
allow-list/deny-list.
The man pages state that the '+' prefix in Exec* directives should
ignore filesystem namespacing options such as PrivateTmp. Now it does.
This is very similar to #8842, just with PrivateTmp instead of
PrivateDevices.
Since libcap v2.29 the format of cap_to_text() has been changed which
makes certain `test-execute` subtest fail. Let's remove the offending
part of the output (dropped capabilities) to make it compatible with
both the old and the new libcap.
In some containers unshare() is made unavailable entirely. Let's deal
with this that more gracefully and disable our sandboxing of services
then, so that we work in a container, under the assumption the container
manager is then responsible for sandboxing if we can't do it ourselves.
Previously, we'd insist on sandboxing as soon as any form of BindPath=
is used. With this change we only insist on it if we have a setting like
that where source and destination differ, i.e. there's a mapping
established that actually rearranges things, and thus would result in
systematically different behaviour if skipped (as opposed to mappings
that just make stuff read-only/writable that otherwise arent').
(Let's also update a test that intended to test for this behaviour with
a more specific configuration that still triggers the behaviour with
this change in place)
Fixes: #13955
(For testing purposes unshare() can easily be blocked with
systemd-nspawn --system-call-filter=~unshare.)
Before only one comparison was allowed. Let's make this more flexible:
ConditionKernelVersion = ">=4.0" "<=4.5"
Fixes#12881.
This also fixes expressions like "ConditionKernelVersion=>" which would
evaluate as true.
These services are likely to coredump, and we expect that but aren't
interested in the coredump. Hence let's turn off processing by setting
RLIMIT_CORE to 0/0.
Those interfaces are created automatically when ip6_tunnel and ip6_gre loaded.
They break the test with exec-privatenetwork-yes.service.
C.f. 6b08180ca6.
... when no mount options are passed.
Change the code, to avoid the following failure in the newly added tests:
exec-temporaryfilesystem-rw.service: Executing: /usr/bin/sh -x -c
'[ "$(stat -c %a /var)" == 755 ]'
++ stat -c %a /var
+ '[' 1777 == 755 ']'
Received SIGCHLD from PID 30364 (sh).
Child 30364 (sh) died (code=exited, status=1/FAILURE)
(And I spotted an opportunity to use TAKE_PTR() at the end).
We can't remount the underlying superblocks, if we are inside a user
namespace and running Linux <= 4.17. We can only change the per-mount
flags (MS_REMOUNT | MS_BIND).
This type of mount() call can only change the per-mount flags, so we
don't have to worry about passing the right string options now.
Fixes#9914 ("Since 1beab8b was merged, systemd has been failing to start
systemd-resolved inside unprivileged containers" ... "Failed to re-mount
'/run/systemd/unit-root/dev' read-only: Operation not permitted").
> It's basically my fault :-). I pointed out we could remount read-only
> without MS_BIND when reviewing the PR that added TemporaryFilesystem=,
> and poettering suggested to change PrivateDevices= at the same time.
> I think it's safe to change back, and I don't expect anyone will notice
> a difference in behaviour.
>
> It just surprised me to realize that
> `TemporaryFilesystem=/tmp:size=10M,ro,nosuid` would not apply `ro` to the
> superblock (underlying filesystem), like mount -osize=10M,ro,nosuid does.
> Maybe a comment could note the kernel version (v4.18), that lets you
> remount without MS_BIND inside a user namespace.
This makes the code longer and I guess this function is still ugly, sorry.
One obstacle to cleaning it up is the interaction between
`PrivateDevices=yes` and `ReadOnlyPaths=/dev`. I've added a test for the
existing behaviour, which I think is now the correct behaviour.
This corresponds nicely with the specifiers we already pass for
/var/lib, /var/cache, /run and so on.
This is particular useful to update the test-path service files to
operate without guessable files, thus allowing multiple parallel
test-path invocations to pass without issues (the idea is to set $TMPDIR
early on in the test to some private directory, and then only use the
new %T or %V specifier to refer to it).
We go through the whole file system, so this test can take arbitrary time. But
this test is still quite useful, so let's at least try to make it more efficent
by not descending at all into the directories we would filter out later on
anyway.
Also increase the timeout, in case the previous step doesn't help enough.