IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This patch add supports to configure IFLA_VXLAN_LOCAL
and IFLA_VXLAN_GROUP.
The "Group" is renamed to "Remote" which is a multicast address.`
```
Description=vxlan-test
Name=vxlan1
Kind=vxlan
[VXLAN]
Id=33
Local=2001:db8:2f4:4bff:fa71:1a56
Remote=FF02:0:0:0:0:0:1:9
```
output
```
ip -d link show vxlan1
16: vxlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/ether fe:b4:97:03:f8:e5 brd ff:ff:ff:ff:ff:ff promiscuity 0
vxlan id 33 group ff02::1:9 local 2001:db8:02f4:4bff:fa71:1a56 dev enp0s3 srcport 0 0 dstport 8472 ageing 300 noudpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
```
This updates the man page for the changes introduced in 1d84ad9445.
"=" is kep if the option is predominantly used with an argument, and dropped
otherwise.
v2:
- update also description of log_color
- drop '=' in all cases where it is optional
(previous rule of dropping it only in some cases was just too arbitrary.)
Sometimes it's useful to provide a default value during an environment
expansion, if the environment variable isn't already set.
For instance $XDG_DATA_DIRS is suppose to default to:
/usr/local/share/:/usr/share/
if it's not yet set. That means callers wishing to augment
XDG_DATA_DIRS need to manually add those two values.
This commit changes replace_env to support the following shell
compatible default value syntax:
XDG_DATA_DIRS=/foo:${XDG_DATA_DIRS:-/usr/local/share/:/usr/share}
Likewise, it's useful to provide an alternate value during an
environment expansion, if the environment variable isn't already set.
For instance, $LD_LIBRARY_PATH will inadvertently search the current
working directory if it starts or ends with a colon, so the following
is usually wrong:
LD_LIBRARY_PATH=/foo/lib:${LD_LIBRARY_PATH}
To address that, this changes replace_env to support the following
shell compatible alternate value syntax:
LD_LIBRARY_PATH=/foo/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
[zj: gate the new syntax under REPLACE_ENV_ALLOW_EXTENDED switch, so
existing callers are not modified.]
In the future we might want to allow additional syntax (for example
"unset VAR". But let's check that the data we're getting does not contain
anything unexpected.
(Only in environment.d files.)
We have only basic compatibility with shell syntax, but specifying variables
without using braces is probably more common, and I think a lot of people would
be surprised if this didn't work.
Add support for /etc/environment and document the changes to the user manager
to automatically import environment *.conf files from:
~/.config/environment.d/
/etc/environment.d/
/run/environment.d/
/usr/local/lib/environment.d/
/usr/lib/environment.d/
/etc/environment
Why the strange name: the prefix is necessary to follow our own advice that
environment generators should have numerical prefixes. I also put -d- in the
name because otherwise the name was very easy to mistake with
systemd.environment-generator. This additional letter clarifies that this
on special generator that supports environment.d files.
v2:
- add example files to EXTRA_DIST
v3:
- rework for the new scheme where nothing is written to disk
v4:
- use separate dirs for system and user env generators
As the kernel won't map the UIDs this is simply not safe, and hence we
should generate a clean error and refuse it.
We can restore this feature later should a "shiftfs" become available in
the kernel.
This changes the file copy logic of machined to set the UID/GID of all
copied files to 0 if the host and container do not share the same user
namespace.
Fixes: #4078
This breaks again, this time for setups where Qemu is not reported via DMI for whatever
reason. So swap order of cpuid and dmi again, but properly detect oracle.
See issue #5318.
Embedding sd_id128_t's in constant strings was rather cumbersome. We had
SD_ID128_CONST_STR which returned a const char[], but it had two problems:
- it wasn't possible to statically concatanate this array with a normal string
- gcc wasn't really able to optimize this, and generated code to perform the
"conversion" at runtime.
Because of this, even our own code in coredumpctl wasn't using
SD_ID128_CONST_STR.
Add a new macro to generate a constant string: SD_ID128_MAKE_STR.
It is not as elegant as SD_ID128_CONST_STR, because it requires a repetition
of the numbers, but in practice it is more convenient to use, and allows gcc
to generate smarter code:
$ size .libs/systemd{,-logind,-journald}{.old,}
text data bss dec hex filename
1265204 149564 4808 1419576 15a938 .libs/systemd.old
1260268 149564 4808 1414640 1595f0 .libs/systemd
246805 13852 209 260866 3fb02 .libs/systemd-logind.old
240973 13852 209 255034 3e43a .libs/systemd-logind
146839 4984 34 151857 25131 .libs/systemd-journald.old
146391 4984 34 151409 24f71 .libs/systemd-journald
It is also much easier to check if a certain binary uses a certain MESSAGE_ID:
$ strings .libs/systemd.old|grep MESSAGE_ID
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
$ strings .libs/systemd|grep MESSAGE_ID
MESSAGE_ID=c7a787079b354eaaa9e77b371893cd27
MESSAGE_ID=b07a249cd024414a82dd00cd181378ff
MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7
MESSAGE_ID=de5b426a63be47a7b6ac3eaac82e2f6f
MESSAGE_ID=d34d037fff1847e6ae669a370e694725
MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5
MESSAGE_ID=1dee0369c7fc4736b7099b38ecb46ee7
MESSAGE_ID=39f53479d3a045ac8e11786248231fbf
MESSAGE_ID=be02cf6855d2428ba40df7e9d022f03d
MESSAGE_ID=7b05ebc668384222baa8881179cfda54
MESSAGE_ID=9d1aaa27d60140bd96365438aad20286
In commit 050e65a we swapped order of detect_vm_{cpuid,dmi}(). That
fixed Virtualbox but broke qemu with kvm, which is expected to return
'kvm'. So check for qemu/kvm first, then DMI, CPUID last.
This fixes#5318.
Signed-off-by: Christian Hesse <mail@eworm.de>
Currently fstab entries with 'nofail' option are mounted
asynchronously and there is no way how to specify dependencies
between such fstab entry and another units. It means that
users are forced to write additional dependency units manually.
The patch introduces new systemd fstab options:
x-systemd.before=<PATH>
x-systemd.after=<PATH>
- to specify another mount dependency (PATH is translated to unit name)
x-systemd.before=<UNIT>
x-systemd.after=<UNIT>
- to specify arbitrary UNIT dependency
For example mount where A should be mounted before local-fs.target unit:
/dev/sdb1 /mnt/test/A none nofail,x-systemd.before=local-fs.target
IPv6 Neighbor discovery proxy is the IPv6 equivalent to proxy ARP for IPv4.
It is required when ISPs do not unconditional route IPv6 subnets
to their designated target, but expect neighbor solicitation messages
for every address on a link.
A variable IPv6ProxyNDPAddress= is introduced to the [Network] section,
each representing a IPv6 neighbour proxy entry in the neighbour table.
Let's clarify that RestrictAddressFamilies= and MemoryDenyWriteExecute=
are only fully effective if non-native system call architectures are
disabled, since they otherwise may be used to circumvent the filters, as
the filters aren't equally effective on all ABIs.
Fixes: #5277
We should try to keep the unbreakable lines below 80 columns.
It's not always possible of course.
Also, use the dl.fp.o alias instead of a specific mirror.
There was a missing dependency and one with the wrong type. Additionally, refer
to DefaultDependencies= once instead of twice, without a vague reference in the
first one that doesn't mention that the value matters.
Fixes#5226.
Add a bit of code that tries to get the right parameter order in place
for some of the better known architectures, and skips
restrict_namespaces for other archs.
This also bypasses the test on archs where we don't know the right
order.
In this case I didn't bother with testing the case where no filter is
applied, since that is hopefully just an issue for now, as there's
nothing stopping us from supporting more archs, we just need to know
which order is right.
Fixes: #5241
Add a new --pivot-root argument to systemd-nspawn, which specifies a
directory to pivot to / inside the container; while the original / is
pivoted to another specified directory (if provided). This adds
support for booting container images which may contain several bootable
sysroots, as is common with OSTree disk images. When these disk images
are booted on real hardware, ostree-prepare-root is run in conjunction
with sysroot.mount in the initramfs to achieve the same results.
On i386 we block the old mmap() call entirely, since we cannot properly
filter it. Thankfully it hasn't been used by glibc since quite some
time.
Fixes: #5240
The --help text currently uses the "--umount" spelling, hence to the
same in the man page too.
And let's settle on "umount" instead of "unmount" here, since most folks
probably expect that when typing in a command, as util-linux' tool is
called "umount" after all, and so is the symlink "systemd-umount" we
install.
The third paragraph of the Description already linked to
systemd.resource-control(5), but it was missing from the list of
additional options for the [Service] section.
This slightly extends the roothash loading logic to first check for a
user.verity.roothash extended attribute on the image file. If it exists,
it is used as Verity root hash and the ".roothash" file is not used.
This should improve the chance that the roothash is retained when the
file is moved around, as the data snippet is attached directly to the
image file. The field is still detached from the file payload however,
in order to make sure it may be trusted independently.
This does not replace the ".roothash" file loading, it simply adds a
second way to retrieve the data.
Extended attributes are often a poor choice for storing metadata like
this as it is usually difficult to discover for admins and users, and
hard to fix if it ever gets out of sync. However, in this case I think
it's safe as verity implies read-only access, and thus there's little
chance of it to get out of sync.
This is similar to RootDirectory= but mounts the root file system from a
block device or loopback file instead of another directory.
This reuses the image dissector code now used by nspawn and
gpt-auto-discovery.
This adds a boolean unit file setting MountAPIVFS=. If set, the three
main API VFS mounts will be mounted for the service. This only has an
effect on RootDirectory=, which it makes a ton times more useful.
(This is basically the /dev + /proc + /sys mounting code posted in the
original #4727, but rebased on current git, and with the automatic logic
replaced by explicit logic controlled by a unit file setting)
This changes the environment for services running as root from:
LANG=C.utf8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
INVOCATION_ID=ffbdec203c69499a9b83199333e31555
JOURNAL_STREAM=8:1614518
to
LANG=C.utf8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
HOME=/root
LOGNAME=root
USER=root
SHELL=/bin/sh
INVOCATION_ID=15a077963d7b4ca0b82c91dc6519f87c
JOURNAL_STREAM=8:1616718
Making the environment special for the root user complicates things
unnecessarily. This change simplifies both our logic (by making the setting
of the variables unconditional), and should also simplify the logic in
services (particularly scripts).
Fixes#5124.
$ systemd-cgls -u systemd-journald.service machine.slice
I opted for a "global" switch, instead of modifying the behaviour of just one
argument. It seem to be a more useful setting, since usually one will want to
query one or more units, and not mix unit names with paths.
Closes#5156.
'systemctl --failed' is an extremely common operation and it's nice to have
a shortcut for it.
Revert "man: don't document systemctl --failed" and add the option back to
systemctl's help and shell completion scripts.
This reverts commit 036359ba8d.
- Show example of all `systemctl status` output and documents what possible
"Loaded:", "Active" and "Enabled" values mean.
- Documents what different colors of the dot mean.
- Documents "gotcha" with load-on-demand behavior which will report units as
"loaded" even if they are only loaded to show their status.
(From @poettering: https://github.com/systemd/systemd/issues/5063#issuecomment-272115024 )
When a user is trying to understand what is going on with a restart action, it is useful to explicitly describe how the action is run. It may seem obvious, but it is helpful to be explicit so one knows there isn't a special ExecRestart= or similar option that they could be looking at.
Accept AF_VSOCK listen addresses in socket unit files. Both guest and
host can now take advantage of socket activation.
The QEMU guest agent has recently been modified to support socket
activation and can run over AF_VSOCK with this patch.
This adds a brief explanation, suggesting the use of "systemd-run -M" to
acquire exit status/code information for the invoked process.
My original plan was to propagate the exit code/status in "machinectl
shell" too, but this would mean we'd have to actively watch the shell's
runtime status, and thus would need full, highly privileged and
continious access to the container's system manager, the way
"systemd-run" does it. This would be quite a departure from the
simplistic, low-priviliged OpenShell() bus call implementation of the
current code, that really just acquires a PTY device with a shell
connected.
Moreover it would blur the lines between the two commands even further,
which I think is not desirable. Hence, from now on:
"machinectl shell" is the full-session, interactive shell for human
users
"systemd-run -M …" is the low-level tool, that supports
on-interactive mode, and is more configurable and suitable for
streaming.
Fixes: #4215
The manpage claimed that ExecStop would be followed immediately by
SIGKILL, whereas the actual behavior is to go through KillMode= and
KillSignal= first.
Fixes#4490
Fix wrong condition test in manager_etc_hosts_lookup(), which caused it to
return an IPv4 answer when an IPv6 question was asked, and vice versa.
Also only return success if we actually found any A or AAAA record.
In systemd-resolved.service(8), point out that /etc/hosts mappings only
affect address-type lookups, not other types.
The test case currently disables DNSSEC in resolved, as there is a bug
where "-t MX" fails due to "DNSSEC validation failed" even after
"downgrading to non-DNSSEC mode". This should be dropped once that bug
gets fixed.
Fixes#4801
active_slave:
Specifies the new active slave for modes that support it
(active-backup, balance-alb and balance-tlb).
primary slave:
systemd-networks currently lacks the capability to set the primary slave
in an
active-backup bonding. This is necessary if you prefer one interface
over the
other. A common example is a eth0-wlan0 bonding on a laptop where you'd
want to
switch to the wired connection whenever it's available.
Fixes: #2837
This adds a generator and a small service that will look for "roothash="
on the kernel command line and use it for setting up a very partition
for the root device.
This provides similar functionality to nspawn's existing --roothash=
switch.
This adds support for a new kernel command line option "systemd.volatile=" that
provides the same functionality that systemd-nspawn's --volatile= switch
provides, but for host systems (i.e. systems booting with a kernel).
It takes the same parameter and has the same effect.
In order to implement systemd.volatile=yes a new service
systemd-volatile-root.service is introduced that only runs in the initrd and
rearranges the root directory as needed to become a tmpfs instance. Note that
systemd.volatile=state is implemented different: it simply generates a
var.mount unit file that is part of the normal boot and has no effect on the
initrd execution.
The way this is implemented ensures that other explicit configuration for /var
can always override the effect of these options. Specifically, the var.mount
unit is generated in the "late" generator directory, so that it only is in
effect if nothing else overrides it.
This improves kernel command line parsing in a number of ways:
a) An kernel option "foo_bar=xyz" is now considered equivalent to
"foo-bar-xyz", i.e. when comparing kernel command line option names "-" and
"_" are now considered equivalent (this only applies to the option names
though, not the option values!). Most of our kernel options used "-" as word
separator in kernel command line options so far, but some used "_". With
this change, which was a source of confusion for users (well, at least of
one user: myself, I just couldn't remember that it's systemd.debug-shell,
not systemd.debug_shell). Considering both as equivalent is inspired how
modern kernel module loading normalizes all kernel module names to use
underscores now too.
b) All options previously using a dash for separating words in kernel command
line options now use an underscore instead, in all documentation and in
code. Since a) has been implemented this should not create any compatibility
problems, but normalizes our documentation and our code.
c) All kernel command line options which take booleans (or are boolean-like)
have been reworked so that "foobar" (without argument) is now equivalent to
"foobar=1" (but not "foobar=0"), thus normalizing the handling of our
boolean arguments. Specifically this means systemd.debug-shell and
systemd_debug_shell=1 are now entirely equivalent.
d) All kernel command line options which take an argument, and where no
argument is specified will now result in a log message. e.g. passing just
"systemd.unit" will no result in a complain that it needs an argument. This
is implemented in the proc_cmdline_missing_value() function.
e) There's now a call proc_cmdline_get_bool() similar to proc_cmdline_get_key()
that parses booleans (following the logic explained in c).
f) The proc_cmdline_parse() call's boolean argument has been replaced by a new
flags argument that takes a common set of bits with proc_cmdline_get_key().
g) All kernel command line APIs now begin with the same "proc_cmdline_" prefix.
h) There are now tests for much of this. Yay!
Since commit 9d06297, mount units from mountinfo are not bound to their devices
anymore (they use the "Requires" dependency instead).
This has the following drawback: if a media is mounted and the eject button is
pressed then the media is unconditionally ejected leaving some inconsistent
states.
Since udev is the component that is reacting (no matter if the device is used
or not) to the eject button, users expect that udev at least try to unmount the
media properly.
This patch introduces a new property "SYSTEMD_MOUNT_DEVICE_BOUND". When set on
a block device, all units that requires this device will see their "Requires"
dependency upgraded to a "BindTo" one. This is currently only used by cdrom
devices.
This patch also gives the possibility to the user to restore the previous
behavior that is bind a mount unit to a device. This is achieved by passing the
"x-systemd.device-bound" option to mount(8). Please note that currently this is
not working because libmount treats the x-* options has comments therefore
they're not available in utab for later application retrievals.
This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow
defining arbitrary bind mounts specific to particular services. This is
particularly useful for services with RootDirectory= set as this permits making
specific bits of the host directory available to chrooted services.
The two new settings follow the concepts nspawn already possess in --bind= and
--bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these
latter options should probably be renamed to BindPaths= and BindReadOnlyPaths=
too).
Fixes: #3439
As requested in
https://github.com/systemd/systemd/pull/4864#pullrequestreview-12372557.
docbook will substitute triple dots for the ellipsis in man output, so this has
no effect on the troff output, only on HTML, making it infinitesimally nicer.
In some places we show output from programs, which use dots, and those places
should not be changed. In some tables, the alignment would change if dots were
changed to the ellipsis which is only one character. Since docbook replaces the
ellipsis automatically, we should leave those be. This patch changes all other
places.
Our warning message was misleading, because we wouldn't "correct" anything,
we'd just ignore unkown escapes. Update the message.
Also, print just the extracted word (which contains the offending sequences) in
the message, instead of the whole line.
Fixes#4697.
We shouldn't just have snippets of configuration, but instead
examples which show all the parts necessary to build a certain kind
of setup, with short explanations.
%c and %r rely on settings made in the unit files themselves and hence resolve
to different values depending on whether they are used before or after Slice=.
Let's simply deprecate them and drop them from the documentation, as that's not
really possible to fix. Moreover they are actually redundant, as the same
information may always be queried from /proc/self/cgroup and /proc/1/cgroup.
(Accurately speaking, %R is actually not broken like this as it is constant.
However, let's remove all cgroup-related specifiers at once, as it is also
redundant, and doesn't really make much sense alone.)
This extends the --bind= and --overlay= syntax so that an empty string as source/upper
directory is taken as request to automatically allocate a temporary directory
below /var/tmp, whose lifetime is bound to the nspawn runtime. In combination
with the "+" path extension this permits a switch "--overlay=+/var::/var" in
order to use the container's shipped /var, combine it with a writable temporary
directory and mount it to the runtime /var of the container.
If a source path is prefixed with "+" it is taken relative to the container's
root directory instead of the host. This permits easily establishing bind and
overlay mounts based on data from the container rather than the host.
This also reworks custom_mounts_prepare(), and turns it into two functions: one
custom_mount_check_all() that remains in nspawn.c but purely verifies the
validity of the custom mounts configured. And one called
custom_mount_prepare_all() that actually does the preparation step, sorts the
custom mounts, resolves relative paths, and allocates temporary directories as
necessary.
This adds an API for retrieving an app-specific machine ID to sd-id128.
Internally it calculates HMAC-SHA256 with an 128bit app-specific ID as payload
and the machine ID as key.
(An alternative would have been to use siphash for this, which is also
cryptographically strong. However, as it only generates 64bit hashes it's not
an obvious choice for generating 128bit IDs.)
Fixes: #4667
Note: the name is "system-update-cleanup.service" rather than
"system-update-done.service", because it should not run normally, and also
because there's already "systemd-update-done.service", and having them named
so similarly would be confusing.
In https://bugzilla.redhat.com/show_bug.cgi?id=1395686 the system repeatedly
entered system-update.target on boot. Because of a packaging issue, the tool
that created the /system-update symlink could be installed without the service
unit that was supposed to perform the upgrade (and remove the symlink). In
fact, if there are no units in system-update.target, and /system-update symlink
is created, systemd always "hangs" in system-update.target. This is confusing
for users, because there's no feedback what is happening, and fixing this
requires starting an emergency shell somehow, and also knowing that the symlink
must be removed. We should be more resilient in this case, and remove the
symlink automatically ourselves, if there are no upgrade service to handle it.
This adds a service which is started after system-update.target is reached and
the symlink still exists. It nukes the symlink and reboots the machine. It
should subsequently boot into the default default.target.
This is a more general fix for
https://bugzilla.redhat.com/show_bug.cgi?id=1395686 (the packaging issue was
already fixed).
- use "service" instead of "script", because various offline updaters that we have
aren't really scripts, e.g. dnf-plugin-system-upgrade, packagekit-offline-update,
fwupd-offline-update.
- strongly recommend After=sysinit.target, Wants=sysinit.target
- clarify a bit what should happen when multiple update services are started
- replace links to the wiki with refs to the man page that replaced it.
"*-*~1" => The last day of every month
"*-02~3..5" => The third, fourth, and fifth last days in February
"Mon 05~07/1" => The last Monday in May
Resolves#3861
When trying to read keyfiles from an encrypted partition to unlock the swap,
a cyclic dependency is generated because systemd can not mount the
filesystem before it has checked if there is a swap to resume from.
Closes#3940
Given that other file systems (notably: xfs) support reflinks these days, let's
extend the file system snapshotting logic to fall back to plan copies or
reflinks when full btrfs subvolume snapshots are not available.
This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn
--template=" available on non-btrfs subvolumes. Of course, both operations will
still be slower on non-btrfs than on btrfs (simply because reflinking each file
individually in a directory tree is still slower than doing this in one step
for a whole subvolume), but it's probably good enough for many cases, and we
should provide the users with the tools, they have to figure out what's good
for them.
Note that "machinectl clone" already had a fallback like this in place, this
patch generalizes this, and adds similar support to our other cases.
Previously --ephemeral was only supported with container trees in btrfs
subvolumes (i.e. in combination with --directory=). This adds support for
--ephemeral in conjunction with disk images (i.e. --image=) too.
As side effect this fixes that --ephemeral was accepted but ignored when using
-M on a container that turned out to be an image.
Fixes: #4664
@filesystem groups various file system operations, such as opening files and
directories for read/write and stat()ing them, plus renaming, deleting,
symlinking, hardlinking.
It's rather hard to parse the confirmation messages (enabled with
systemd.confirm_spawn=true) amongst the status messages and the kernel
ones (if enabled).
This patch gives the possibility to the user to redirect the confirmation
message to a different virtual console, either by giving its name or its path,
so those messages are separated from the other ones and easier to read.
This changes a couple of things in the namespace handling:
It merges the BindMount and TargetMount structures. They are mostly the same,
hence let's just use the same structue, and rely on C's implicit zero
initialization of partially initialized structures for the unneeded fields.
This reworks memory management of each entry a bit. It now contains one "const"
and one "malloc" path. We use the former whenever we can, but use the latter
when we have to, which is the case when we have to chase symlinks or prefix a
root directory. This means in the common case we don't actually need to
allocate any dynamic memory. To make this easy to use we add an accessor
function bind_mount_path() which retrieves the right path string from a
BindMount structure.
While we are at it, also permit "+" as prefix for dirs configured with
ReadOnlyPaths= and friends: if specified the root directory of the unit is
implicited prefixed.
This also drops set_bind_mount() and uses C99 structure initialization instead,
which I think is more readable and clarifies what is being done.
This drops append_protect_kernel_tunables() and
append_protect_kernel_modules() as append_static_mounts() is now simple enough
to be called directly.
Prefixing with the root dir is now done in an explicit step in
prefix_where_needed(). It will prepend the root directory on each entry that
doesn't have it prefixed yet. The latter is determined depending on an extra
bit in the BindMount structure.
This adds a new systemd fstab option x-systemd.mount-timeout. The option
adds a timeout value that specifies how long systemd waits for the mount
command to finish. It allows to mount huge btrfs volumes without issues.
This is equivalent to adding option TimeoutSec= to [Mount] section in a
mount unit file.
fixes#4055
Link: port to new ethtool ETHTOOL_xLINKSETTINGS
This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
handled by the new get_link_ksettings/set_link_ksettings .
This is a WIP version based on this [kernel
patch](https://patchwork.kernel.org/patch/8411401/).
commit 0527f1c
3f1ac7a700ommit
35afb33
core: add new RestrictNamespaces= unit file setting
Merging, not rebasing, because this touches many files and there were tree-wide cleanups in the mean time.
This new setting permits restricting whether namespaces may be created and
managed by processes started by a unit. It installs a seccomp filter blocking
certain invocations of unshare(), clone() and setns().
RestrictNamespaces=no is the default, and does not restrict namespaces in any
way. RestrictNamespaces=yes takes away the ability to create or manage any kind
of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces
so that only mount and IPC namespaces may be created/managed, but no other
kind of namespaces.
This setting should be improve security quite a bit as in particular user
namespacing was a major source of CVEs in the kernel in the past, and is
accessible to unprivileged processes. With this setting the entire attack
surface may be removed for system services that do not make use of namespaces.
If execve() or socket() is filtered the service manager might get into trouble
executing the service binary, or handling any failures when this fails. Mention
this in the documentation.
The other option would be to implicitly whitelist all system calls that are
required for these codepaths. However, that appears less than desirable as this
would mean socket() and many related calls have to be whitelisted
unconditionally. As writing system call filters requires a certain level of
expertise anyway it sounds like the better option to simply document these
issues and suggest that the user disables system call filters in the service
temporarily in order to debug any such failures.
See: #3993.
@resources contains various syscalls that alter resource limits and memory and
scheduling parameters of processes. As such they are good candidates to block
for most services.
@basic-io contains a number of basic syscalls for I/O, similar to the list
seccomp v1 permitted but slightly more complete. It should be useful for
building basic whitelisting for minimal sandboxes
The system call is already part in @default hence implicitly allowed anyway.
Also, if it is actually blocked then systemd couldn't execute the service in
question anymore, since the application of seccomp is immediately followed by
it.