1
0
mirror of https://github.com/systemd/systemd.git synced 2025-02-09 13:57:42 +03:00

64305 Commits

Author SHA1 Message Date
Gustavo Noronha Silva
6b8e90545e Apply known iocost solutions to block devices
Meta's resource control demo project[0] includes a benchmark tool that can
be used to calculate the best iocost solutions for a given SSD.

  [0]: https://github.com/facebookexperimental/resctl-demo

A project[1] has now been started to create a publicly available database
of results that can be used to apply them automatically.

  [1]: https://github.com/iocost-benchmark/iocost-benchmarks

This change adds a new tool that gets triggered by a udev rule for any
block device and queries the hwdb for known solutions. The format for
the hwdb file that is currently generated by the github action looks like
this:

  # This file was auto-generated on Tue, 23 Aug 2022 13:03:57 +0000.
  # From the following commit:
  # ca82acfe93
  #
  # Match key format:
  # block:<devpath>:name:<model name>:

  # 12 points, MOF=[1.346,1.346], aMOF=[1.249,1.249]
  block:*:name:HFS256GD9TNG-62A0A:fwver:*:
    IOCOST_SOLUTIONS=isolation isolated-bandwidth bandwidth naive
    IOCOST_MODEL_ISOLATION=rbps=1091439492 rseqiops=52286 rrandiops=63784 wbps=192329466 wseqiops=12309 wrandiops=16119
    IOCOST_QOS_ISOLATION=rpct=0.00 rlat=8807 wpct=0.00 wlat=59023 min=100.00 max=100.00
    IOCOST_MODEL_ISOLATED_BANDWIDTH=rbps=1091439492 rseqiops=52286 rrandiops=63784 wbps=192329466 wseqiops=12309 wrandiops=16119
    IOCOST_QOS_ISOLATED_BANDWIDTH=rpct=0.00 rlat=8807 wpct=0.00 wlat=59023 min=100.00 max=100.00
    IOCOST_MODEL_BANDWIDTH=rbps=1091439492 rseqiops=52286 rrandiops=63784 wbps=192329466 wseqiops=12309 wrandiops=16119
    IOCOST_QOS_BANDWIDTH=rpct=0.00 rlat=8807 wpct=0.00 wlat=59023 min=100.00 max=100.00
    IOCOST_MODEL_NAIVE=rbps=1091439492 rseqiops=52286 rrandiops=63784 wbps=192329466 wseqiops=12309 wrandiops=16119
    IOCOST_QOS_NAIVE=rpct=99.00 rlat=8807 wpct=99.00 wlat=59023 min=75.00 max=100.00

The IOCOST_SOLUTIONS key lists the solutions available for that device
in the preferred order for higher isolation, which is a reasonable
default for most client systems. This can be overriden to choose better
defaults for custom use cases, like the various data center workloads.

The tool can also be used to query the known solutions for a specific
device or to apply a non-default solution (say, isolation or bandwidth).

Co-authored-by: Santosh Mahto <santosh.mahto@collabora.com>
2023-04-20 16:45:57 +02:00
Lennart Poettering
18010d394b
Merge pull request #27327 from DaanDeMeyer/hotplug
kmod-setup: Add early loading for virtio_console
2023-04-20 16:34:12 +02:00
Daan De Meyer
a93aaede29 kmod-setup: Add early loading for virtio_console
getty-generator enables serial-getty@.service for virtualizer consoles
that it can find in /sys/class/tty. To make sure this works for
virtio consoles, let's make sure we load the module is loaded early
so that the /sys/class/tty/hvc0 exists before we run getty-generator.
2023-04-20 13:43:37 +02:00
Daan De Meyer
d2f57745d5 core: Parse logging environment earlier
Let's make sure we parse the logging environment ASAP so that the
options apply to more code. e.g. to allow debugging kmod-setup.c
for example.
2023-04-20 13:43:37 +02:00
Daan De Meyer
e1d8f702a2 kmod-setup: Introduce match_modalias_recurse_dir_cb()
Let's make the logic around matching a modalias a bit more generic.
2023-04-20 13:43:37 +02:00
Daan De Meyer
70cc7ed97e string-util: Add startswith_strv()
This is the function version of STARTSWITH_SET(). We also move
STARTSWITH_SET() to string-util.h as it fits more there than in
strv.h and reimplement it using startswith_strv().
2023-04-20 13:43:37 +02:00
Daan De Meyer
85003d1296 mkosi: Disable kmsg ratelimiting 2023-04-20 13:43:37 +02:00
Daan De Meyer
3fe07e9525 log: Log when kmsg is being ratelimited
Let's avoid confusing developers and users when log messages suddenly
stop getting logged to kmsg because of ratelimiting by logging an
additional message if we start ratelimiting log messages to kmsg.
2023-04-20 13:43:36 +02:00
Daan De Meyer
8750a06b6c log: Add knob to disable kmsg ratelimiting
This allows us to disable kmsg ratelimiting in the integration tests
and mkosi for easier debugging.
2023-04-20 13:43:34 +02:00
Lennart Poettering
14ce246771 dissect: let's check for crypto_LUKS before fstype allowlist check
When trying to mount a partition that is encrypted without the
encryption first having been set up we want to return a
recognizable error (EUNATCH). This was broken by
80ce8580f5aa6b03fa13a0b3b30207bc9b5c5fe0 which added an allowlist check
for permissible file systems first. Let's reverse the check order, so
that we get EUNATCH again, as before. (And leave EIDRM as error for the
failed allowlist check).
2023-04-20 13:39:28 +02:00
Lennart Poettering
ed6a6bac45 ratelimit: handle counter overflows somewhat sanely
An overflow here (i.e. the counter reaching 2^32 within a ratelimit time
window) is not so unlikely. Let's handle this somewhat sanely
and simply stop counting, while remaining in the "limit is hit" state until
the time window has passed.
2023-04-20 13:39:06 +02:00
Lennart Poettering
e002b8a28a man: try to make clearer that /var/ is generally not available in /usr/lib/systemd/system-shutdown/ callouts
I made the mistake to look into what is installed into
/usr/lib/systemd/system-shutdown/ on Fedora. fwdupd among other things
assumes /var/ is available from these callouts, though it is not in the
general case.

Hence, let's emphasize this in the documentation a bit more.
2023-04-20 13:38:49 +02:00
Lennart Poettering
4d49f44f0f dissect-image: issue BLKFLSBUF before probing an fs at block device offset != 0
See added code comment for a longer explanation. TLDR: Linux maintains
distinct block device caches for partition and "whole" block devices,
and a simply BLKFLSBUF should make the worst confusions this causes go
away.
2023-04-20 13:38:32 +02:00
Robert Meijers
4646cdaa37 networkd: fallback to chaddr for static lease lookup when not found
DHCP static leases are looked up by the client identifier as send by
the client, while configured based on MAC. As RFC 2131 states the client
identifier is an opaque key and must not be interpreted by the server
this means that DHCP clients can (/will) also use a client identifier
which is not a MAC address. One of these clients actually is
systemd-networkd which uses an RFC 4361 by default to generate the
client identifier. For these kind of DHCP clients static leases thus
don't work because of this mismatch between configuring a MAC address
but the server matching based on client identifier. This adds a fallback
to try to look up a configured static lease based on the "chaddr" of the
DHCP message as this will always contain the MAC address of the client.

Fixes #21368
2023-04-20 19:18:50 +09:00
Yu Watanabe
114e85d28e core/device: rewrite how device unit is removed from Manager.devices_by_sysfs
If the device unit is not the head of the list saved in
Manager.devices_by_sysfs, then it is not necessary to replace the
existing hashmap entry. This should not change any behavior, just
refactoring.
2023-04-20 09:22:25 +02:00
Yu Watanabe
24a5370bbc list: fix double evaluation 2023-04-20 09:20:08 +02:00
Daan De Meyer
59e4eeed78
Merge pull request #27299 from yuwata/chase-absolute
chase: return absolute path when dir_fd points to the root directory
2023-04-20 09:19:22 +02:00
Yu Watanabe
47041a2b91 hwdb: disable entry for Logitech USB receiver used by G502 X
Fixes a bug introduced by dede07d3d04007c70c78653a73e2bcd8616564a5.

Fixes #27118.
2023-04-19 21:14:03 +01:00
Yu Watanabe
cb3c6aec3a core: add one missing assertion for release_resource_queue
Follow-up for 6ac62d61db737b01ad3776a7688d8a4c57b3f7d9.
2023-04-19 21:12:08 +01:00
Quintin Hill
0214ead6ee dissect-image: fix log level in dissect_log_error
Actually use the log_level argument in this function!

Fixes 4953e39
2023-04-20 02:04:15 +08:00
Daan De Meyer
6b7e774b5d mkosi: Update to latest 2023-04-19 10:13:06 +02:00
Yu Watanabe
c19f1cc9a5 test: add regression tests for find_esp() and friend 2023-04-19 04:04:57 +09:00
Yu Watanabe
60e761d8f3 chase: replace path_prefix_root_cwd() with chaseat_prefix_root()
The function path_prefix_root_cwd() was introduced for prefixing the
result from chaseat() with root, but
- it is named slightly generic,
- the logic is different from what chase() does.

This makes the name more explanative and specific for the result of the
chaseat(), and make the logic consistent with chase().

Fixes https://github.com/systemd/systemd/pull/27199#issuecomment-1511387731.

Follow-up for #27199.
2023-04-19 03:38:59 +09:00
Yu Watanabe
8d3c49b168 fd-util: skip to check mount ID if kernel is too old and /proc is not mounted
Now, dir_fd_is_root() is heavily used in chaseat(), which is used at
various places. If the kernel is too old and /proc is not mounted, then
there is no way to get the mount ID of a directory. In that case, let's
silently skip the mount ID check.

Fixes https://github.com/systemd/systemd/pull/27299#issuecomment-1511403680.
2023-04-19 03:38:47 +09:00
Yu Watanabe
4b1e461c49 mountpoint-util: check /proc is mounted on failure 2023-04-19 03:28:34 +09:00
Yu Watanabe
9a0dcf03fa chase: prefix with the root directory only when it is not "/" 2023-04-19 03:28:34 +09:00
Yu Watanabe
237bf933de chase: drop repeated call of empty_to_root() 2023-04-19 03:28:34 +09:00
Yu Watanabe
b3ef56bc8e chase: update outdated comment about result path 2023-04-19 03:28:34 +09:00
Yu Watanabe
24be89ebd8 chase: make the result absolute when a symlink is absolute
As the path may be outside of the specified dir_fd.
2023-04-19 03:28:34 +09:00
Yu Watanabe
c0552b359c chase: make chaseat() provides absolute path also when dir_fd points to the root directory
Usually, we pass the file descriptor of the root directory to chaseat()
when `--root=` is not specified. Previously, even in such case, the
result was relative, and we need to prefix the path with "/" when we
want to pass the path to other functions that do not support dir_fd, or
log or show the path. That's inconvenient.
2023-04-19 03:28:34 +09:00
Mike Yuan
d81fc15254
Merge pull request #27323 from keszybz/gpt-auto-generator-warning-cleanup
gpt-auto-generator: do not error out when no partitions are found
2023-04-19 02:06:06 +08:00
Frantisek Sumsal
574d09bad0 test: prefix the transient unit with test- to make coverage runs happy
See 9fd8226312 for more details.

Follow-up to c9210b7470.
2023-04-18 14:55:08 +01:00
Mike Yuan
901ba45cfe
Merge pull request #27320 from poettering/kmod-setup-tweaks
minor tweaks to kmod-setup.c
2023-04-18 19:25:08 +08:00
Zbigniew Jędrzejewski-Szmek
4953e39c70 gpt-auto-generator: "translate" errno codes into proper messages
E.g. in logs on jammy-ppc64el in https://github.com/systemd/systemd/pull/27294:
Apr 16 17:42:50 H systemd-gpt-auto-generator[300]: Failed to dissect partition table of block device /dev/sda: No message of desired type
Apr 16 17:42:50 H (sd-execu[295]: /usr/lib/systemd/system-generators/systemd-gpt-auto-generator failed with exit status 1.

ee0e6e476e61d4baa2a18e241d212753e75003bf made this particular condition not an
error. But for other errnos we want to print a better message too.
dissect_loop_device_and_warn() already does this, but it always prints the
error at error level. We want to suppress some of the errors, so let's make the
print helper public and do the error suppression in the caller.
2023-04-18 11:58:33 +02:00
Zbigniew Jędrzejewski-Szmek
de47cd0610 fstab-generator: add missing phrase in comment 2023-04-18 11:55:03 +02:00
Paolo Velati
d5fbaa965e hwdb: Fix rotation for BMAX Y13 2023-04-18 18:43:21 +09:00
Lennart Poettering
0a5d3c0b5b kmod-setup: bypass heavy virtio-rng check if we are not running in a VM anyway
detect_vm() is cheap, because cached, let's hence do that early before
we get out the big guns and sweep through sysfs.
2023-04-18 10:52:04 +02:00
Lennart Poettering
fa505db314 kmod-setup: use STARTSWITH_SET() where appropriate 2023-04-18 10:51:00 +02:00
Lennart Poettering
ff707dd1b1 Revert "getty-generator: Use device hotplug to instantiate virtualizer consoles"
This reverts commit e7e6ce5f8d467304731a98e8a140e69713f1bf07.
2023-04-18 10:38:38 +02:00
Lennart Poettering
766c30a3b5
Merge pull request #27256 from medhefgo/boot-rdtsc
boot: Improve timer frequency detection
2023-04-18 10:38:15 +02:00
Yu Watanabe
ee0e6e476e gpt-auto: do not fail when no suitable partitions found
Follow-up for 598fd4da1cf9665834110583fd9133073cc12481.
2023-04-18 17:37:56 +09:00
Daan De Meyer
e7e6ce5f8d getty-generator: Use device hotplug to instantiate virtualizer consoles
If getty-generator runs in the initrd, the corresponding tty might not
have been instantiated yet in /dev, which means a serial getty is not
spawned on it. Instead, let's instantiate the serial-getty when the
device appears so that it always gets instantiated.
2023-04-18 09:35:14 +02:00
Lennart Poettering
b3a062cb80 lsm-util: move detection of support of LSMs into a new lsm-util.[ch] helper
This makes the bpf LSM check generic, so that we can use it elsewhere.
it also drops the caching inside it, given that bpf-lsm code in PID1
will cache it a second time a stack frame further up when it checks for
various other bpf functionality.
2023-04-18 08:22:21 +02:00
Dominique Martinet
25d9c6cdaf bpf-firewall: give a name to maps used
Running systemd with IP accounting enabled generates many bpf maps (two
per unit for accounting, another two if IPAddressAllow/Deny are used).

Systemd itself knows which maps belong to what unit and commands like
`systemctl status <unit>` can be used to query what service has which
map, but monitoring these values all the time costs 4 dbus requests
(calling the .IP{E,I}gress{Bytes,Packets} method for each unit) and
makes services like the prometheus systemd_exporter[1] somewhat slow
when doing that for every units, while less precise information could
quickly be obtained by looking directly at the maps.

Unfortunately, bpf map names are rather limited:
- only 15 characters in length (16, but last byte must be 0)
- only allows isalnum(), _ and . characters

If it wasn't for the length limit we could use the normal unit escape
functions but I've opted to just make any forbidden character into
underscores for maximum brievty -- the map prefix is also rather short:
This isn't meant as a precise mapping, but as a hint for admins who want
to look at these.

(Note there is no problem if multiple maps have the same name)

Link: https://github.com/povilasv/systemd_exporter [1]
2023-04-18 08:23:55 +09:00
Lennart Poettering
38cdd08b22 process-util: be more careful with pidfd_get_pid() special cases
Let's be more careful with generating error codes for (expected) error
causes.

This does not introduce new error conditions, it just changes what we
return under specific cases, to make things nicely recognizable in each
case. Most importantly this detects if fdinfo reports a pid of "-1" for
pidfds with processes that are already reaped (and thus have no PID
anymore)

None of our current users care about these error codes, but let's get
this right for the future.
2023-04-17 21:38:41 +01:00
Florian Klink
360c9cdc65 fsck: use execv_p_ and execl_p_
Instead of invoking find_executable on our own, use the variants of exec
provided by glibc which does this for us.
2023-04-17 19:56:06 +01:00
Luca Boccassi
c9210b7470 creds: make available to all ExecStartPre= and ExecStart= processes
Fixes https://github.com/systemd/systemd/issues/27275
2023-04-17 17:47:28 +01:00
jcg
1034dfd0d8 user-util:remove duplicate includes 2023-04-17 23:58:04 +08:00
Benjamin Herrenschmidt
aab896e213 virt: Further improve detection of EC2 metal instances
Commit f90eea7d18d9ebe88e6a66cd7a86b618def8945d
virt: Improve detection of EC2 metal instances

Added support for detecting EC2 metal instances via the product
name in DMI by testing for the ".metal" suffix.

Unfortunately this doesn't cover all cases, as there are going to be
instance types where ".metal" is not a suffix (ie, .metal-16xl,
.metal-32xl, ...)

This modifies the logic to also allow those new forms.

Signed-off-by: Benjamin Herrenschmidt <benh@amazon.com>
2023-04-17 13:21:11 +01:00
Daan De Meyer
c8ae0a81bf mkosi: Use kernel-core for Fedora and CentOS images
Let's reduce image size by using a smaller kernel package.
2023-04-17 10:50:14 +02:00