IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The mount option has special meaning when SELinux is enabled. To make
NoNewPrivileges=yes not break SELinux enabled systems, let's not set the
mount flag on such systems.
This reverts commit 1753d30215.
Let's re-enable that feature now. As reported when the original commit
was merged, this causes some trouble on SELinux enabled systems. So,
in the subsequent commit, the feature will be disabled when SELinux is enabled.
But, anyway, this commit just re-enable that feature unconditionally.
If ActivationPolicy= is set to down, always-down, or manual, then any
matching link will delay boot (due to delaying network-online.target).
If RequiredForOnline= wasn't explicitly set, then default it to false
if ActivationPolicy= is down or manual. If ActivationPolicy=always-down,
then force RequiredForOnline=no.
This is useful for provisioning initially empty secondary A/B root file
systems. We don't want those to ever be considered for automatic
mounting, for example in "systemd-nspawn --image=", hence we should
create them with the No-Auto flag turned on. Once a file system image is
dropped into the partition the flag may be turned off by the updater
tool, so that it is considered from then on.
Thew new option for this is called NoAuto. I dislike negated options
like this, but this is taken from the naming in the spec, which in turn
inherited the name from the same flag for Microsoft Data Partitions. To
minimize confusion, let's stick to the name hence.
Text currently refers to `/etc/nsswitch.conf` where it should refer to `/etc/resolv.conf`.
This is in the context of defining a nameserver IP and search domains.
So in theory UUID Variant 2 (i.e. microsoft GUIDs) are supposed to be
displayed in native endian. That is of course a bad idea, and Linux
userspace generally didn't implement that, i.e. uuidd and similar.
Hence, let's not bother either, but let's document that we treat
everything the same as Variant 1, even if it declares something else.
This reverts commit d8e3c31bd8.
A poorly documented fact is that SELinux unfortunately uses nosuid mount flag
to specify that also a fundamental feature of SELinux, domain transitions, must
not be allowed either. While this could be mitigated case by case by changing
the SELinux policy to use `nosuid_transition`, such mitigations would probably
have to be added everywhere if systemd used automatic nosuid mount flags when
`NoNewPrivileges=yes` would be implied. This isn't very desirable from SELinux
policy point of view since also untrusted mounts in service's mount namespaces
could start triggering domain transitions.
Alternatively there could be directives to override this behavior globally or
for each service (for example, new directives `SUIDPaths=`/`NoSUIDPaths=` or
more generic mount flag applicators), but since there's little value of the
commit by itself (setting NNP already disables most setuid functionality), it's
simpler to revert the commit. Such new directives could be used to implement
the original goal.
Previously, IPv6LinkLocalAddressGenerationMode= is not set, then we
define the address generation mode based on the result of reading
stable_secret sysctl value. This makes the mode is determined by whether
a secret address is specified in the new setting.
Closes#19622.
For "systemd-tmpfiles --cleanup", when the "Age" parameter
is specified, the criteria for deletion is determined from
the path's last modification timestamp ("mtime"), its last
access timestamp ("atime") and its last status change
timestamp ("ctime").
For instance, if one of those paths to be cleaned up are
opened, it results in the modification of "atime", which
results file system entry to not be removed because the
default aging algorithm would skip the entry.
Add an optional "age-by" argument by extending the "Age"
parameter to restrict the clean-up for a particular type
of file timestamp, which can be specified in "tmpfiles.d"
as follows:
[age-by:]cleanup-age, where age-by is "[abcmACBM]+"
For example:
d /foo/bar - - - abM:1m -
Would clean-up any files that were not accessed and created,
or directories that were not modified less than a minute ago
in "/foo/bar".
Fixes: #17002
Add the '=' action modifier that instructs tmpfiles.d to check the file
type of a path and remove objects that do not match before trying to
open or create the path.
BUG=chromium:1186405
TEST=./test/test-systemd-tmpfiles.py "$(which systemd-tmpfiles)"
Change-Id: If807dc0db427393e9e0047aba640d0d114897c26
When using top level drop-ins it isn't immediately obvious that one can
make use of symlinking to disable a top-level drop in for a specific
unit.
Signed-off-by: Peter Morrow <pemorrow@linux.microsoft.com>
When /var/lib/systemd/coredump/ is backed by a tmpfs, all disk usage
will be accounted under the systemd-coredump process cgroup memory
limit.
If MemoryMax is set, this might cause systemd-coredump to be terminated
by the kernel oom handler when writing large uncompressed core files,
even if the compressed core would fit within the limits.
Detect if a tmpfs is used, and if so check MemoryMax from the process
and slice cgroups, and do not write uncompressed core files that are
greater than half the available memory. If the limit is breached,
stop writing and compress the written chunk immediately, then delete
the uncompressed chunk to free more memory, and resume compressing
directly from STDIN.
Example debug log when this situation happens:
systemd-coredump[737455]: Setting max_size to limit writes to 51344896 bytes.
systemd-coredump[737455]: ZSTD compression finished (51344896 -> 3260 bytes, 0.0%)
systemd-coredump[737455]: ZSTD compression finished (1022786048 -> 47245 bytes, 0.0%)
systemd-coredump[737455]: Process 737445 (a.out) of user 1000 dumped core.
Try to infer the unused memory that a unit can claim before the
memory.max limit is reached, including any limit set on any parent
slice above the unit itself.
We were effectively doing all post-upgrade scripts twice in Fedora. We got this
wrong, so it's likely other people will get it wrong too. So let's explain
what is actually needed to make this work, but also when it's not useful.
Use the option name 'password-echo' instead of the generic term
'silent'.
Make the option take an argument for better control over echoing
behavior.
Related discussion in https://github.com/systemd/systemd/pull/19619
This adds --visible=yes|no|asterisk which allow controlling the echo of
the password prompt in detail. The existing --echo switch is then made
an alias for --visible=yes (and a shortcut -e added for it too).
This catches up homed's FIDO2 support with cryptsetup's: we'll now store
the uv/up/clientPin configuration at enrollment in the user record JSON
data, and use it when authenticating with it.
This also adds explicit "uv" support: we'll only allow it to happen when
the client explicity said it's OK. This is then used by clients to print
a nice message suggesting "uv" has to take place before retrying
allowing it this time. This is modelled after the existing handling for
"up".
Giving --echo to systemd-ask-password allows to echo the user input.
There's nothing secret, so do not show a lock and key emoji by default.
The behavior can be controlled with --emoji=yes|no|auto. The default is
auto, which defaults to yes, unless --echo is given.
0cf8469387 added --console.
6af621248f added an optional argument, but didn't
update the help texts.
Note that there is no ambiguity with the optional argument because no positional
arguments are allowed.
This adds two things:
- A new switch --uuid is added to "udevadm trigger". If specified a
random UUID is associated with the synthettic uevent and it is printed
to stdout. It may then be used manually to match up uevents as they
propagate through the system.
- The UUID logic is now implicitly enabled if "udevadm trigger --settle"
is used, in order to wait for precisely the uevents we actually
trigger. Fallback support is kept for pre-4.13 kernels (where the
requests for trigger uevents with uuids results in EINVAL).
This is the case because the ID128 we generate are all marked as v4 UUID
which requires that some bits are zero and others are one. Let's
document this so that people can rely on SD_ID128_NULL being a special
value for "uninitialized" that is always distinguishable from generated
UUIDs.
When `NoNewPrivileges=yes`, the service shouldn't have a need for any
setuid/setgid programs, so in case there will be a new mount namespace anyway,
mount the file systems with MS_NOSUID.
The code works differently than the docs, and the code is right here.
Fix the doc hence.
See VALID_CHARS in unit-name.c for details about allowed chars in unit
names, but keep in mind that "-" and "\" are special, since generated by
the escaping logic: they are OK to show up in unit names, but need to be
escaped when converting foreign strings to unit names to make sure
things remain reversible.
Fixes: #19623
Strictly speaking adding this is a compatibility break, given that
previously % weren't special. But I'd argue that was simply a bug, as
for the much more prominent Environment= service setting we always
resolved specifiers, and DEfaultEnvironment= is explicitly listed as
being the default for that. Hence, let's fix that.
Replaces: #16787
This might be useful for CopyFiles=, to reference some subdir of $TMP in
a generic way. This allows us to use the new common
system_and_tmp_specifier_table[].
Previously, we supported only "," as separator. This adds support for
"+" and makes it the documented choice.
This is to make specifying PCRs in crypttab easier, since commas are
already used there for separating volume options, and needless escaping
sucks.
"," continues to be supported, but in order to keep things minimal not
documented.
Fixe: #19205
This is like a really strong version of Wants=, that keeps starting the
specified unit if it is ever found inactive.
This is an alternative to Restart= inside a unit, acknowledging the fact
that whether to keep restarting the unit is sometimes not a property of
the unit itself but the state of the system.
This implements a part of what #4263 requests. i.e. there's no
distinction between "always" and "opportunistic". We just dumbly
implement "always" and become active whenever we see no job queued for
an inactive unit that is supposed to be upheld.
This is similar to OnFailure= but is activated whenever a unit returns
into inactive state successfully.
I was always afraid of adding this, since it effectively allows building
loops and makes our engine Turing complete, but it pretty much already
was it was just hidden.
Given that we have per-unit ratelimits as well as an event loop global
ratelimit I feel safe to add this finally, given it actually is useful.
Fixes: #13386
This takes inspiration from PropagatesReloadTo=, but propagates
stop jobs instead of restart jobs.
This is defined based on exactly two atoms: UNIT_ATOM_PROPAGATE_STOP +
UNIT_ATOM_RETROACTIVE_STOP_ON_STOP. The former ensures that when the
unit the dependency is originating from is stopped based on user
request, we'll propagate the stop job to the target unit, too. In
addition, when the originating unit suddenly stops from external causes
the stopping is propagated too. Note that this does *not* include the
UNIT_ATOM_CANNOT_BE_ACTIVE_WITHOUT atom (which is used by BoundBy=),
i.e. this dependency is purely about propagating "edges" and not
"levels", i.e. it's about propagating specific events, instead of
continious states.
This is supposed to be useful for dependencies between .mount units and
their backing .device units. So far we either placed a BindsTo= or
Requires= dependency between them. The former gave a very clear binding
of the to units together, however was problematic if users establish
mounnts manually with different block device sources than our
configuration defines, as we there might come to the conclusion that the
backing device was absent and thus we need to umount again what the user
mounted. By combining Requires= with the new StopPropagatedFrom= (i.e.
the inverse PropagateStopTo=) we can get behaviour that matches BindsTo=
in every single atom but one: UNIT_ATOM_CANNOT_BE_ACTIVE_WITHOUT is
absent, and hence the level-triggered logic doesn't apply.
Replaces: #11340
Let's add an implicit reverse dep OnFailureOf=. This is exposed via the
bus to make things more debuggable: you can now ask systemd for which
units a specific unit is the failure handler.
OnFailure= was the only dependency type that had no inverse, this fixes
that.
Now that deps are a bit cheaper, it should be OK to add deps that only
serve debug purposes.
The slice a unit is assigned to is currently a UnitRef reference. Let's
turn it into a proper dependency, to simplify and clean up code a bit.
Now that new dep types are cheaper, deps should generally be preferable
over everything else, if the concept applies.
This brings one major benefit: we often have to iterate through all unit
a slice contains. So far we iterated through all Before= dependencies of
the slice unit to achieve that, filtering out unrelated units, and
taking benefit of the fact that slice units are implicitly ordered
Before= the units they contain. By making Slice= a proper dependency,
and having an accompanying SliceOf= dependency type, this is much
simpler and nicer as we can directly enumerate the units a slice
contains.
The forward dependency is actually called InSlice internally, since we
already used the UNIT_SLICE name as UnitType field. However, since we
don't intend to expose the dependency to users as dep anyway (we already
have the regular Slice D-Bus property for this) this shouldn't matter.
The SliceOf= implicit dependency type (the erverse of Slice=/InSlice=)
is exported over the bus, to make things a bit nicer to debug and
discoverable.
Previously, when a link has already in a numbered group, we cannot
remove the link from the group.
This also fixes the range mentioned in the man page.
This is also not entirely obvious. I think the code I came
up with is pretty elegant ;] The final part of of the code that makes
use of the parsed data is kept very similar to the shell code on purpose,
even though it could be written a bit more idiomatically.
Let's order the fields from the most general to least: os name, os variant, os
version, machine-parseable version details, metadata, special settings. I added
section headers to roughly group the settings. The division is not strict,
because for example CPE_NAME also includes the version, and PRETTY_NAME may
too, but it still makes it easier to find the right name.
Also split out Examples to separate paragraphs:
almost all descriptions had "Example:" at the end, where multiple
examples were listed. Splitting this out to separate paragraphs
makes the whole thing much easier to read.
Add missing markup and punctuation while at it.
About
- If not set, defaults to <literal>NAME=Linux</literal>.
+ If not set, a default of <literal>NAME=Linux</literal> may be used.
and similar changes: in many circumstances, if this is not set, no value should
be used. The fallback mostly make sense when we need to present something to the
user. So let's reword this to not imply that the default is necessary.
With new "online state" semantics in networkd, make the description of
RequiredFamilyForOnline= a little more broad. Some rewording has been
done to make the passage easier to understand.
This doesn't matter too much, but makes things a bit more consistent.
A minor advantage is that the file is not a configuration file for meson
anymore, so:
a) It is not built unless pulled in by another target. Since
we don't usually build man pages by default, this saves a tiny
amount of work.
b) When the .in file is updated, meson does not reconfigure everything,
but just rebuilds the dependent targets.
Now that the conversion is finished, time for benchmarking:
a full build with default settings (and -Dstandalonebinaries=true), yields
before this pull request: 1687 targets, 148.13s user 35.17s system 317% cpu 57.697 total
with the full pull request: 1714 targets, 143.07s user 27.87s system 314% cpu 54.369 total
The difference doesn't seem significant. Partial rebuilds might be faster as
mentioned before.
I want to stop using 'substs'. But in this case, configure_file() is nicer
than custom_target(), because it causes meson to immediately generate the
helpers after configuration, so it's possible to do
'meson build && build/man/man ...', without building anything first.
We only substitute one variable here, so let's use a custom configuration_data()
object.
User managers always pass their environment on to their children.
Make that clear in the description of ManagerEnvironment= which
states that none of those args will get passed to child processes of
service managers.
Adds a crypttab option 'silent' that enables the AskPasswordFlag
ASK_PASSWORD_SILENT. This allows usage of systemd-cryptsetup to default
to silent mode, rather than requiring the user to press tab every time.
Meson 0.58 has gotten quite bad with emitting a message every time
a quoted command is used:
Program /home/zbyszek/src/systemd-work/tools/meson-make-symlink.sh found: YES (/home/zbyszek/src/systemd-work/tools/meson-make-symlink.sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program xsltproc found: YES (/usr/bin/xsltproc)
Configuring custom-entities.ent using configuration
Message: Skipping bootctl.1 because ENABLE_EFI is false
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Message: Skipping journal-remote.conf.5 because HAVE_MICROHTTPD is false
Message: Skipping journal-upload.conf.5 because HAVE_MICROHTTPD is false
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Message: Skipping loader.conf.5 because ENABLE_EFI is false
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
...
Let's suffer one message only for each command. Hopefully we can silence
even this when https://github.com/mesonbuild/meson/issues/8642 is
resolved.
The settings were listen in a completely random order, also different
between the v4 and v6 sections. Order by "options sent", "options received",
"communication settings" in both sections.
Also minor formatting changes are done, e.g. "=" is added in various places.
When `--json` option is specified, "status" and "list" commands gives
the same information, as originally "list" just gives partial
information of "status" in different format.
This was a copy/paste mistae apparently, there's not "try_authtok" and
this was supposed to copy what Fedora uses, which uses "use_authtok"
correctly. Hence adjust this.
Fixes: #19369
In most of our codebase when we referenced "ipv4" and "ipv6" on the
right-hand-side of an assignment, we lowercases it (on the
left-hand-side we used CamelCase, and thus "IPv4" and "IPv6"). In
particular all across the networkd codebase the various "per-protocol
booleans" use the lower-case spelling. Hence, let's use lower-case for
SocketBindAllow=/SocketBindDeny= too, just make sure things feel like
they belong together better.
(This work is not included in any released version, hence let's fix this
now, before any fixes in this area would be API breakage)
Follow-up for #17655
The directory might not be created in the ESP but in the extended boot
loader partition, hence don#t claim otherwise.
Also, give a brief reason why the concept exists at all.
Link up machine-id man page.
Follow-up for: 6a3fff75ba
To determine the network interface type for use in the `Type=` directive, it is more concise to use the `list` command. Whereas, the `status` command requires an interface parameter.
For example, on a RaspberryPi 4 the following shows that the `wlan0` interface type `wlan` is more coveniently listed by the `list` command.
```
root@raspberrypi4-64:~# networkctl list
IDX LINK TYPE OPERATIONAL SETUP
1 lo loopback carrier unmanaged
2 eth0 ether routable configured
3 wlan0 wlan off unmanaged
3 links listed.
```
Whereas the `networkctl status` command doesn't include this information.
```
root@raspberrypi4-64:~# networkctl status
● State: routable
Address: 192.168.1.141 on eth0
fd8b:8779:b7a4::f43 on eth0
fd8b:8779:b7a4:0:dea6:32ff:febe:d1ce on eth0
fe80::dea6:32ff:febe:d1ce on eth0
Gateway: 192.168.1.1 (CZ.NIC, z.s.p.o.) on eth0
DNS: 192.168.1.1
May 07 14:17:18 raspberrypi4-64 systemd-networkd[212]: eth0: Gained carrier
May 07 14:17:19 raspberrypi4-64 systemd-networkd[212]: eth0: Gained IPv6LL
May 07 14:17:19 raspberrypi4-64 systemd-networkd[212]: eth0: DHCPv6 address fd8b:8779:b7a4::f43/128 timeout preferred -1 valid -1
May 07 14:17:21 raspberrypi4-64 systemd-networkd[212]: eth0: DHCPv4 address 192.168.1.141/24 via 192.168.1.1
```
To get the interface type using the `status` command you need to specify an additional argument.
```
root@raspberrypi4-64:~# networkctl status wlan0
● 3: wlan0
Link File: /lib/systemd/network/99-default.link
Network File: n/a
Type: wlan
State: off (unmanaged)
Path: platform-fe300000.mmcnr
Driver: brcmfmac
HW Address: dc:a6:32:be:d1:cf (Raspberry Pi Trading Ltd)
MTU: 1500 (min: 68, max: 1500)
QDisc: noop
IPv6 Address Generation Mode: eui64
Queue Length (Tx/Rx): 1/1
```
This ensures we not only synthesize regular paswd/group records of
userdb records, but shadow records as well. This should make sure that
userdb can be used as comprehensive superset of the classic
passwd/group/shadow/gshadow functionality.
Some tokens support authorization via fingerprint or other biometric
ID. Add support for "user verification" to cryptenroll and cryptsetup.
Disable by default, as it is still quite uncommon.
In some cases user presence might not be required to get _a_
secret out of a FIDO2 device, but it might be required to
the get actual secret that was used to lock the volume.
Record whether we used it in the LUKS header JSON metadata.
Let the cryptenroll user ask for the feature, but bail out if it is
required by the token and the user disabled it.
Enabled by default.
Closes: https://github.com/systemd/systemd/issues/19246
Some FIDO2 devices allow the user to choose whether to use a PIN or not
and will HMAC with a different secret depending on the choice.
Some other devices (or some device-specific configuration) can instead
make it mandatory.
Allow the cryptenroll user to choose whether to use a PIN or not, but
fail immediately if it is a hard requirement.
Record the choice in the JSON-encoded LUKS header metadata so that the
right set of options can be used on unlock.
On headless setups, in case other methods fail, asking for a password/pin
is not useful as there are no users on the terminal, and generates
unwanted noise. Add a parameter to /etc/crypttab to skip it.
$ coredumpctl info |grep Command
Command Line: bash -c kill -SEGV $$ (before)
Command Line: bash -c "kill -SEGV \$\$" (road not taken, C quotes)
Command Line: bash -c $'kill -SEGV $$' (now, POSIX quotes)
Before we wouldn't use any quoting, making it impossible to figure how the
command line was split into arguments. We could use "normal" quotes, but this
has the disadvantage that the commandline *looks* like it could be pasted into
the terminal and executed, but this is not true: various non-printable
characters cannot be expressed in this quoting style. (This is not visible in
this example). Thus, "POSIX quotes" are used, which should allow any command
line to be expressed acurrately and pasted directly into a shell prompt to
reexecute.
I wonder if we should another field in the coredump entry that simply shows the
original cmdline with embedded NULs, in the original /proc/*/cmdline
format. This would allow clients to format the data as they see fit. But I
think we'd want to keep the serialized form anyway, for backwards compatibility.
This commit applies the filtering imposed by LogLevelMax on a unit's
processes to messages logged by PID1 about the unit as well.
The target use case for this feature is a service that runs on a timer
many times an hour, where the system administrator decides that writing
a generic success message to the journal every few minutes or seconds
adds no diagnostic value and isn't worth the clutter or disk I/O.
The maximum allowed value of the sysfs device index entry was limited to
16383 (2^14-1) to avoid the generation of unreasonable onboard interface
names.
For s390 the index can assume a value of up to 65535 (2^16-1) which is
now allowed depending on the new naming flag NAMING_16BIT_INDEX.
Larger index values are considered unreasonable and remain to be
ignored.
This allows to limit units to machines that run on a certain firmware
type. For device tree defined machines checking against the machine's
compatible is also possible.
Similar to sd_bus_error_has_names() that was added in
2b07ec316a.
It is made inline in the hope that the compiler will be able to optimize
all the va_args boilerplate away, and do an efficient comparison when
the arguments are all constants.
In man pages, horizontal space it at premium, and everything should
generally be indented with 2 spaces to make it more likely that the
examples fit on a user's screen.
C.f. 798d3a524e.
Let's make the GPT partition flags configurable when creating new
partitions. This is primarily useful for the read-only flag (which we
want to set for verity enabled partitions).
This adds two settings for this: Flags= and ReadOnly=, which strictly
speaking are redundant. The main reason to have both is that usually the
ReadOnly= setting is the one wants to control, and it' more generic.
Moreover we might later on introduce inherting of flags from CopyBlocks=
partitions, where one might want to control most flags as is except for
the RO flag and similar, hence let's keep them separate.
When using systemd-repart as an installer that replicates the install
medium on another medium it is useful to reference the root
partition/usr partition or verity data that is currently booted, in
particular in A/B scenarios where we have two copies and want to
reference the one we currently use. Let's add a CopyBlocks=auto for this
case: for a partition that uses that we'll copy a suitable partition
from the host.
CopyBlocks=auto finds the partition to copy like this: based on the
configured partition type uuid we determine the usual mount point (i.e.
for the /usr partition type we determine /usr/, and so on). We then
figure out the block device behind that path, through dm-verity and
dm-crypt if necessary. Finally, we compare the partition type uuid of
the partition found that way with the one we are supposed to fill and
only use it if it matches (the latter is primarily important on
dm-verity setups where a volume is likely backed by two partitions and
we need to find the right one).
This is particularly fun to use in conjunction with --image= (where
we'll restrict the device search onto the specify device, for security
reasons), as this allows "duplicating" an image like this:
# systemd-repart --image=source.raw --empty=create --size=auto target.raw
If the right repart data is embedded into "source.raw" this will be able
to create and initialize a partition table on target.raw that carrries
all needed partitions, and will stream the source's file systems onto it
as configured.
So far we already had the CopyFiles= option in systemd-repart drop-in
files, as a mechanism for populating freshly formatted file systems with
files and directories. This adds MakeDirectories= in similar style, and
creates simple directories as listed. The option is of course entirely
redundant, since the same can be done with CopyFiles= simply by copying
in a directory. It's kinda nice to encode the dirs to create directly in
the drop-in files however, instead of providing a directory subtree to
copy in somehere, to make the files more self-contained — since often
just creating dirs is entirely sufficient.
The main usecase for this are GPT OS images that carry only a /usr/
tree, and for which a root file system is only formatted on first boot
via repart. Without any additional CopyFiles=/MakeDirectories=
configuration these root file systems are entirely empty of course
initially. To mount in the /usr/ tree, a directory inode for /usr/ to
mount over needs to be created. systemd-nspawn will do so automatically
when booting up the image, as will the initrd during boot. However, this
requires the image to be writable – which is OK for npawn and
initrd-based boots, but there are plenty tools where read-only operation
is desirable after repart ran, before the image was booted for the first
time. Specifically, "systemd-dissect" opens the image in read-only to
inspect its contents, and this will only work of /usr/ can be properly
mounted. Moreover systemd-dissect --mount --read-only won't succeed
either if the fs is read-only.
Via MakeDirectories= we now provide a way that ensures that the image
can be mounted/inspected in a fully read-only way immediately after
systemd-repart completed. Specifically, let's consider a GPT disk image
shipping with a file usr/lib/repart.d/50-root.conf:
[Partition]
Type=root
Format=btrfs
MakeDirectories=/usr
MakeDirectories=/efi
With this in place systemd-repart will create a root partition when run,
and add /usr and /efi into it as directory inods. This ensures that the
whole image can then be mounted truly read-only anf /usr and /efi can be
overmounted by the /usr partition and the ESP.
This is similar to the --image= switch in the other tools, like
systemd-sysusers or systemd-tmpfiles, i.e. it apply the configuration
from the image to the image.
This is particularly useful for downloading minimized GPT image, and
then extending it to the desired size via:
# systemd-repart --image=foo.image --size=5G
The commit 0b81225e57 makes that networkd
remove all foreign rules except those with "proto kernel".
But, in some situation, people may want to manage routing policy rules
with other tools, e.g. 'ip' command. To support such the situation,
this introduce ManageForeignRoutingPolicyRules= boolean setting.
Closes#19106.
There are two ambiguity in the original description:
1. It will delay all RUN instructions, include builtin.
2. It will delay before running RUN, not each of RUN{program} instructions.
- Handle BPFProgram= property in string format
"<bpf_attach_type>:<bpffs_path>", e.g. egress:/sys/fs/bpf/egress-hook.
- Add dbus getter to list foreign bpf programs attached to a cgroup.
create_fifo() was added in a2fc2f8dd3, and
would always ignore failure. The test was trying to fail in this case, but
we actually don't fail, which seems to be correct. We didn't notice before
because the test was ineffective.
To make things consistent, generally log at warning level, but don't propagate
the error. For symlinks, log at debug level, as before.
For 'e', failure is not propagated now. The test is adjusted to match.
I think warning is appropriate in most cases: we do not expect a device node to
be replaced by a different device node or even a non-device file. This would
most likely be an error somewhere. An exception is made for symlinks, which are
mismatched on purpose, for example /etc/resolv.conf. With this patch, we don't
get any warnings with the any of the 74 tmpfiles.d files, which suggests that
increasing the warning levels will not cause too many unexpected warnings. If
it turns out that there are valid cases where people have expected mismatches
for non-symlink types, we can always decrease the log levels again.
We didn't document this behaviour one way or another, so I think it's
OK to change. All callers do the NULL check before callling this to avoid
the assert warning, so it seems reasonable to do it internally.
sd_bus_can_send() is similar, but there we expressly say that an
error is returned on NULL, so I didn't change it.
Append 'package' and 'packageVersion' to the journal as discrete fields
COREDUMP_PKGMETA_PACKAGE and COREDUMP_PKGMETA_PACKAGEVERSION respectively,
and the full json blurb as COREDUMP_PKGMETA_JSON.
In some instances, particularly with swap on zram, swap used will be high
while there is still a lot of memory available. FB OOMD handles this by
thresholding kills to X% of total swap usage. Let's do the same thing here.
Anecdotally with these thresholds and my laptop which is exclusively swap
on zram I can sit at 0K / 4G swap free with most of memory free and
systemd-oomd doesn't kill anything.
Partially addresses aggressive kill behavior from
https://bugzilla.redhat.com/show_bug.cgi?id=1941170
The s390 PCI driver assigns the hotplug slot name from the
function_id attribute of the PCI device using a 8 char hexadecimal
format to match the underlying firmware/hypervisor notation.
Further, there's always a one-to-one mapping between a PCI
function and a hotplug slot, as individual functions can
hot plugged even for multi-function devices.
As the generic matching code will always try to parse the slot
name in /sys/bus/pci/slots as a positive decimal number, either
a wrong value might be produced for ID_NET_NAME_SLOT if
the slot name consists of decimal numbers only, or none at all
if a character in the range from 'a' to 'f' is encountered.
Additionally, the generic code assumes that two interfaces
share a hotplug slot, if they differ only in the function part
of the PCI address. E.g., for an interface with the PCI address
dddd:bb:aa.f, it will match the device to the first slot with
an address dddd:bb:aa. As more than one slot may have this address
for the s390 PCI driver, the wrong slot may be selected.
To resolve this we're adding a new naming schema version with the
flag NAMING_SLOT_FUNCTION_ID, which enables the correct matching
of hotplug slots if the device has an attribute named function_id.
The ID_NET_NAME_SLOT property will only be produced if there's
a file /sys/bus/pci/slots/<slotname> where <slotname> matches
the value of /sys/bus/pci/devices/.../function_id in 8 char
hex notation.
Fixes#19016
See also #19078
It was one giant all of text in pseudo-random order. Let's split it into
paragraphs talk about one subject each.
And unfortunately, the description of what happens when the error is not
set was not correct. In general, various functions treat 0/NULL as
not-an-error, and return 0.
Add an --extension parameter to portablectl, and new DBUS methods
to attach/detach/reattach/inspect.
Allows to append separate images on top of the root directory (os-release
will be searched in there) and mount the images using an overlay-like
setup (unit files will be searched in there) using the new ExtensionImages
service option.
This specifes two new optional fields for /etc/os-release:
IMAGE_VERSION= and IMAGE_ID= that are supposed to identify the image of
the current booted system by name and version.
This is inspired by the versioning stuff in
https://github.com/systemd/mkosi/pull/683.
In environments where pre-built images are installed and updated as a
whole the existing os-release version/distro identifier are not
sufficient to describe the system's version, as they describe only the
distro an image is built from, but not the image itself, even if that
image is deployed many times on many systems, and even if that image
contains more resources than just the RPMs/DEBs.
In particular, "mkosi" is a tool for building disk images based on
distro RPMs with additional resources dropped in. The combination of all
of these together with their versions should also carry an identifier
and version, and that's what IMAGE_VERSION= and IMAGE_ID= is supposed to
be.
This adds generic support for the SetCredential=/LoadCredential= logic
to our password querying infrastructure: if a password is requested by a
program that has a credential store configured via
$CREDENTIALS_DIRECTORY we'll look in it for a password.
The "systemd-ask-password" tool is updated with an option to specify the
credential to look for.
Let's make use of our own credentials infrastructure in our tools: let's
hook up systemd-sysusers with the credentials logic, so that the root
password can be provisioned this way. This is really useful when working
with stateless systems, in particular nspawn's "--volatile=yes" switch,
as this works now:
# systemd-nspawn -i foo.raw --volatile=yes --set-credential=passwd.plaintext-password:foo
For the first time we have a nice, non-interactive way to provision the
root password for a fully stateless system from the container manager.
Yay!
This allows "LoadCredentials=foo" to be used as shortcut for
"LoadCredentials=foo:foo", i.e. it's a very short way to inherit a
credential under its original name from the service manager into a
service.
When using hidepid=invisible on procfs, the kernel will check if the
gid of the process trying to access /proc is the same as the gid of
the process that mounted the /proc instance, or if it has the ptrace
capability:
https://github.com/torvalds/linux/blob/v5.10/fs/proc/base.c#L723https://github.com/torvalds/linux/blob/v5.10/fs/proc/root.c#L155
Given we set up the /proc instance as root for system services,
The same restriction applies to CAP_SYS_PTRACE, if a process runs with
it then hidepid=invisible has no effect.
ProtectProc effectively can only be used with User= or DynamicUser=yes,
without CAP_SYS_PTRACE.
Update the documentation to explicitly state these limitations.
Fixes#18997
Tables with only one column aren't really tables, they are lists. And if
each cell only consists of a single word, they are probably better
written in a single line. Hence, shorten the man page a bit, and list
boot loader spec partition types in a simple sentence.
Also, drop "root-secondary" from the list. When dissecting images we'll
upgrade "root-secondary" to "root" if we mount it, and do so only if
"root" doesn't exist. Hence never mention "root-secondary" as we never
will mount a partition under that id.