Bugfixes: * Many manager configuration settings that are only applicable to user manager or system manager can be always set. It would be better to reject them when parsing config. * Jun 01 09:43:02 krowka systemd[1]: Unit user@1000.service has alias user@.service. Jun 01 09:43:02 krowka systemd[1]: Unit user@6.service has alias user@.service. Jun 01 09:43:02 krowka systemd[1]: Unit user-runtime-dir@6.service has alias user-runtime-dir@.service. External: * Fedora: add an rpmlint check that verifies that all unit files in the RPM are listed in %systemd_post macros. * dbus: - natively watch for dbus-*.service symlinks (PENDING) - teach dbus to activate all services it finds in /etc/systemd/services/org-*.service * kernel: add device_type = "fb", "fbcon" to class "graphics" * /usr/bin/service should actually show the new command line * fedora: suggest auto-restart on failure, but not on success and not on coredump. also, ask people to think about changing the start limit logic. Also point people to RestartPreventExitStatus=, SuccessExitStatus= * neither pkexec nor sudo initialize environ[] from the PAM environment? * fedora: update policy to declare access mode and ownership of unit files to root:root 0644, and add an rpmlint check for it * register catalog database signature as file magic * zsh shell completion: - - should complete options, but currently does not - systemctl add-wants,add-requires * systemctl status should know about 'systemd-analyze calendar ... --iterations=' * If timer has just OnInactiveSec=..., it should fire after a specified time after being started. * write blog stories about: - hwdb: what belongs into it, lsusb - enabling dbus services - how to make changes to sysctl and sysfs attributes - remote access - how to pass throw-away units to systemd, or dynamically change properties of existing units - testing with Harald's awesome test kit - auto-restart - how to develop against journal browsing APIs - the journal HTTP iface - non-cgroup resource management - dynamic resource management with cgroups - refreshed, longer missions statement - calendar time events - init=/bin/sh vs. "emergency" mode, vs. "rescue" mode, vs. "multi-user" mode, vs. "graphical" mode, and the debug shell - how to create your own target - instantiated apache, dovecot and so on - hooking a script into various stages of shutdown/rearly booot Regularly: * look for close() vs. close_nointr() vs. close_nointr_nofail() * check for strerror(r) instead of strerror(-r) * pahole * set_put(), hashmap_put() return values check. i.e. == 0 does not free()! * use secure_getenv() instead of getenv() where appropriate * link up selected blog stories from man pages and unit files Documentation= fields Janitorial Clean-ups: * Rearrange tests so that the various test-xyz.c match a specific src/basic/xyz.c again * rework mount.c and swap.c to follow proper state enumeration/deserialization semantics, like we do for device.c now Features: * systemd-fstab-generator: support addition mount specifications via kernel cmdline. Usecase: invoke a VM, and mount a host homedir into it via virtio-fs. * for vendor-built signed initrds: - make sysext run in the initrd - sysext should pick up sysext images from /.extra/ in the initrd, and insist on verification if in secureboot mode - kernel-install should be able to install pre-built unified kernel images in type #2 drop-in dir in the ESP. - kernel-install should be able install encrypted creds automatically for machine id, root pw, rootfs uuid, resume partition uuid, and place next to EFI kernel, for sd-stub to pick them up. These creds should be locked to the TPM, and bind to the right PCR the kernel is measured to. - kernel-install should be able to pick up initrd sysexts automatically and place them next to EFI kernel, for sd-stub to pick them up. - systemd-fstab-generator should look for rootfs device to mount in creds - pid 1 should look for machine ID in creds - systemd-resume-generator should look for resume partition uuid in creds - sd-stub: automatically pick up microcode from ESP (/loader/microcode/*) and synthesize initrd from it, and measure it. Signing is not necessary, as microcode does that on its own. Pass as first initrd to kernel. - systemd-creds should have a fallback logic that uses neither TPM nor the system key in /var for encryption and instead some fixed key. This should be opt in (since it provides no security properties) but be used by kernel-install when encrypting the creds it generates on systems that lack a TPM, so that we can have very similar codepaths on TPM and TPM-less systems. i.e. --with-key=tpm-graceful or so. * Add a new service type very similar to Type=notify, that goes one step further and extends the protocol to cover reloads. Specifically, SIGHUP will become the official way to reload, and daemon has to respond with sd_notify() to report when it starts reloading, and when it is complete reloading. Care must be taken to remove races from this model. I.e. PID 1 needs to take CLOCK_MONOTONIC, then send SIGHUP, then wait for at least one RELOADING=1 message that comes with a newer timestamp, then wait for a READY=1 message. while we are at it, also maybe extend the logic to require handling of some specific SIGRT signal for setting debug log level, that carries the level via the sigqueue() data parameter. With that we extended with minimal logic the service runtime logic quite substantially. * get_color_mode() should probably check the $COLORTERM environment variable which most terminal environments appear to set. * firstboot: maybe just default to C.UTF-8 locale if nothing is set, so that we don't query this unnecessarily in entirely uninitialized containers. (i.e. containers with empty /etc). * systemd creds hookup with qemu fw_cfg. (Quite possibly might not need any code at all, given the fw_cfg stuff are just files, but we should then document how to use it). Goal: provide symmetric ways to pass creds to nspawn containers and qemu VMs. (maybe also pick up env vars from fw_cfg?) * beef up sd_notify() to support AV_VSOCK in $NOTIFY_SOCKET, so that VM managers can get ready notifications from VMs, just like container managers from their payload. Also pick up address from qemu/fw_cfg if set there. (which has benefits, given SecureBoot and kernel cmdline are not necessarily friends.) * mirroring this: maybe support binding to AV_VSOCK in Type=notify services, then passing $NOTIFY_SOCKET and $NOTIFY_GUESTCID with PID1's cid (typically fixed to "2", i.e. the official host cid) and the expected guest cid, for the two sides of the channel. The latter env var could then be used in an appropriate qemu cmdline. That way qemu payloads could talk sd_notify() directly to host service manager. * maybe write a tool that binds an AF_VFSOCK socket, then invokes qemu, extending the command line to enable vsock on the VM, and using fw_cfg to configure socket address. * sd-boot: rework random seed handling following recent kernel changes: always pass seed to kernel, but credit only if secure boot is used * sd-boot: hash data from GetNextHighMonotonicCount() into updated random seed, so that we might even open up up the random seed logic to non-SecureBoot systems? * sd-boot: add menu item for shutdown? or hotkey? * sd-device has an API to create an sd_device object from a device id, but has no api to query the device id * sd-device should return the devnum type (i.e. 'b' or 'c') via some API for an sd_device object, so that data passed into sd_device_new_from_devnum() can also be queried. * udevadm: a new "tree" verb that shows tree of devices as syspath hierarchy, along with their properties. uninitialized devices should be greyed out. * bootctl: show whether UEFI audit mode is available * sd-event: optionally, if per-event source rate limit is hit, downgrade priority, but leave enabled, and once ratelimit window is over, upgrade priority again. That way we can combat event source starvation without stopping processing events from one source entirely. * sd-event: similar to existing inotify support add fanotify support (given that apparently new features in this area are only going to be added to the latter). * sd-event: add 1st class event source for clock changes * sd-event: add 1st class event source for timezone changes * support uefi/http boots with sd-boot: instead of looking for dropin files in /loader/entries/ dir, look for a file /loader/entries/SHA256SUMS and use that as directory manifest. The file would be a standard directory listing as generated by GNU sha256sums. * initialize machine ID from systemd credential picked up from the ESP via sd-stub, so that machine ID is stable even on systems where unified kernels are used, and hence kernel cmdline cannot be modified locally * in gpt-auto-generator: check partition uuids against such uuids supplied via sd-stub credentials. That way, we can support parallel OS installations with pre-built kernels. * sysext: measure all activated sysext into a TPM PCR * maybe add a "syscfg" concept, that is almost entirely identical to "sysext", but operates on /etc/ instead of /usr/ and /opt/. Use case would be: trusted, authenticated, atomic, additive configuration management primitive: drop in a configuration bundle, and activate it, so that it is instantly visible, comprehensively. * systemd-dissect: show available versions inside of a disk image, i.e. if multiple versions are around of the same resource, show which ones. (in other words: show partition labels). * systemd-nspawn: make boot assessment do something sensible in a container. i.e send an sd_notify() from payload to container manager once boot-up is completed successfully, and use that in nspawn for dealing with boot counting, implemented in the partition table labels and directory names. * maybe add a generator that reads /proc/cmdline, looks for systemd.pull-raw-portable=, systemd-pull-raw-sysext= and similar switches that take an URL as parameter. It then generates service units for systemd-pull calls that download these URLs if not installed yet. usecase: invoke a VM or nspawn container in a way it automatically deploys/runs these images as OS payloads. i.e. have a generic OS image you can point to any payload you like, which is then downloaded, securely verified and run. * improve scope units to support creation by pidfd instead of by PID * deprecate cgroupsv1 further (print log message at boot) * systemd-dissect: add --cat switch for dumping files such as /etc/os-release * per-service sandboxing option: ProtectIds=. If used, will overmount /etc/machine-id and /proc/sys/kernel/random/boot_id with synthetic files, to make it harder for the service to identify the host. Depending on the user setting it should be fully randomized at invocation time, or a hash of the real thing, keyed by the unit name or so. Of course, there are other ways to get these IDs (e.g. journal) or similar ids (e.g. MAC addresses, DMI ids, CPU ids), so this knob would only be useful in combination with other lockdown options. Particularly useful for portable services, and anything else that uses RootDirectory= or RootImage=. (Might also over-mount /sys/class/dmi/id/*{uuid,serial} with /dev/null). * journalctl/timesyncd: whenever timesyncd acquires a synchronization from NTP, create a structured log entry that contains boot ID, monotonic clock and realtime clock (I mean, this requires no special work, as these three fields are implicit). Then in journalctl when attempting to display the realtime timestamp of a log entry, first search for the closest later log entry of this kinda that has a matching boot id, and convert the monotonic clock timestamp of the entry to the realtime clock using this info. This way we can retroactively correct the wallclock timestamps, in particular for systems without RTC, i.e. where initially wallclock timestamps carry rubbish, until an NTP sync is acquired. * kernel-install: - add --all switch for rerunning kernel-install for all installed kernels - maybe add env var that shortcuts kernel-install for installers that want to call it at the end only * doc: prep a document explaining resolved's internal objects, i.e. Query vs. Question vs. Transaction vs. Stream and so on. * doc: prep a document explaining PID 1's internal logic, i.e. transactions, jobs, units * bootspec: remove tries counter from boot entry ids * bootspec: bring UEFI and userspace enumeration of bootspec entries back into sync, i.e. parse out tries in both * automatically ignore threaded cgroups in cg_xyz(). * add linker script that implicitly adds symbol for build ID and new coredump json package metadata, and use that when logging * systemd-dissect: show GPT disk UUID in output * Enable RestricFileSystems= for all our long-running services (similar: RestrictNetworkInterfaces=) * Add systemd-analyze security checks for RestrictFileSystems= and RestrictNetworkInterfaces= * cryptsetup/homed: implement TOTP authentication backed by TPM2 and its internal clock. * nspawn: optionally set up nftables/iptables routes that forward UDP/TCP traffic on port 53 to resolved stub 127.0.0.54 * man: rework os-release(5), and clearly separate our extension-release.d/ and initrd-release parts, i.e. list explicitly which fields are about what. * sysext: before applying a sysext, do a superficial validation run so that things are not rearranged to wildy. I.e. protect against accidental fuckups, such as masking out /usr/lib/ or so. We should probably refuse if existing inodes are replaced by other types of inodes or so. * sysext: ensure one can build a sysext that can safely apply to *any* system (because it contains only static go binaries in /opt/ or so) * userdb: when synthesizing NSS records, pick "best" password from defined passwords, not just the first. i.e. if there are multiple defined, prefer unlocked over locked and prefer non-empty over empty. * maybe add a tool inspired by the GPT auto discovery spec that runs in the initrd and rearranges the rootfs hierarchy via bind mounts, if enabled. Specifically in some top-level dir /@auto/ it will look for dirs/symlinks/subvolumes that are named after their purpose, and optionally encode a version as well as assessment counters, and then mount them into the file system tree to boot into, similar to how we do that for the gpt auto logic. Maybe then bind mount the original root into /.superior or something like that (so that update tools can look there). Further discussion in this thread: https://lists.freedesktop.org/archives/systemd-devel/2021-November/047059.html The GPT dissection logic should automatically enable this tool whenever we detect a specially marked root fs (i.e introduce a new generic root gpt type for this, that is arch independent). The also implement this in the image dissection logic, so that nspawn/RootImage= and so on grok it. Maybe make generic enough so that it can also work for ostrees arrangements. * if a path ending in ".auto.d/" is set for RootDirectory=/RootImage= then do a strverscmp() of everything inside that dir and use that. i.e. implement very simple version control. Also use this in systemd-nspawn --image= and so on. * homed: while a home dir is not activated generate slightly different NSS records for it, that reports the home dir as "/" and the shell as some binary provided by us. Then, when an SSH login happens and SSH permits it our binary is invoked. This binary can then talk to homed and activate the homedir if it's not around yet, prompting the user for a password. Once that succeeded we'll switch to the real user record, i.e. home dir and shell, and our tool exec()s the latter. Net effect: ssh'ing into a homed account will just work: we'll neatly prompt for the homedir's password if its needed. –– Building on this we could take this even further: since this tool will potentially have access to the client's ssh-agent (if ssh-agent forwarding is enabled) we could implement SSH unlocking of a homedir with that: when enrolling a new ssh pubkey in a user record we'd ask the ssh-agent to sign some random value with the privkey, then use that as luks key to unlock the home dir. Will not work for ECDSA keys since their signatures contain a random component, but will work for RSA and Ed25519 keys. * add tiny service that decrypts encrypted user records passed via initrd credential logic and drops them into /run where nss-systemd can pick them up, similar to /run/host/userdb/. Usecase: drop a root user JSON record there, and use it in the initrd to log in as root with locally selected password, for debugging purposes. Other usecase: boot into qemu with regular user mounted from host. maybe put this in systemd-user-sessions.service? * drop dependency on libcap, replace by direct syscalls based on CapabilityQuintet we already have. (This likely allows us drop drop libcap dep in the base OS image) * sysext: automatically activate sysext images dropped in via new sd-stub sysext pickup logic. * add concept for "exitrd" as inverse of "initrd", that we can transition to at shutdown, and has similar security semantics. This should then take the place of dracut's shutdown logic. Should probably support sysexts too. Care needs to be taken that the resulting logic ends up in RAM, i.e. is copied out of on-disk storage. * userdbd: implement an additional varlink service socket that provides the host user db in restricted form, then allow this to be bind mounted into sandboxed environments that want the host database in minimal form. All records would be stripped of all meta info, except the basic UID/name info. Then use this in portabled environments that do not use PrivateUsers=1. * logind introduce two types of sessions: "heavy" and "light". The former would be our current sessions. But the latter would be a new type of session that is mostly the same but does not pull in user@.service or wait for it. Then, allow configuration which type of session is desired via pam_systemd parameters, and then make user@.service's session one of these "light" ones. People could then choose to make FTP sessions and suchlike "light" if they don't want the service manager to be started for that. * /etc/veritytab: allow that the roothash column can be specified as fs path including a path to an AF_UNIX path, similar to how we do things with the keys of /etc/crypttab. That way people can store/provide the roothash externally and provide to us on demand only. * add high-level lockdown level for GPT dissection logic: e.g. an enum that can be ANY (to mount anything), TRUSTED (to require that /usr is on signed verity, but rest doesn't matter), LOCKEDDOWN (to require that everything is on signed verity, except for ESP), SUPERLOCKDOWN (like LOCKEDDOWN but ESP not allowed). And then maybe some flavours of that that declare what is expected from home/srv/var… Then, add a new cmdline flag to all tools that parse such images, to configure this. Also, add a kernel cmdline option for this, to be honoured by the gpt auto generator. * nspawn: maybe optionally insert .nspawn file as GPT partition into images, so that such container images are entirely stand-alone and can be updated as one. * we probably should extend the root verity hash of the root fs into some PCR on boot. (i.e. maybe add a crypttab option tpm2-measure=8 or so to measure it into PCR 8) * add a "policy" to the dissection logic. i.e. a bit mask what is OK to mount, what must be read-only, what requires encryption, and what requires authentication. * in uefi stub: query firmware regarding which PCRs are being used, store that in EFI var. then use this when enrolling TPM2 in cryptsetup to verify that the selected PCRs actually are used by firmware. * rework recursive read-only remount to use new mount API * PAM: pick up authentication token from credentials * when mounting disk images: if IMAGE_ID/IMAGE_VERSION is set in os-release data in the image, make sure the image filename actually matches this, so that images cannot be misused. * New udev block device symlink names: /dev/disk/by-parttypelabel/-. Use case: if pt label is used as partition image version string, this is a safe way to reference a specific version of a specific partition type, in particular where related partitions are processed (e.g. verity + rootfs both named "LennartOS_0.7"). * sysupdate: - add fuzzing to the pattern parser - support casync as download mechanism - direct TPM2 PCR change handling, possible renrolling LUKS2 media if needed. - "systemd-sysupdate update --all" support, that iterates through all components defined on the host, plus all images installed into /var/lib/machines/, /var/lib/portable/ and so on. - figure out what to do about system extensions (i.e. they need to imply an update component, since otherwise system extenion' sysupdate.d/ files would override the host's update files.) - Allow invocation with a single transfer definition, i.e. with --definitions= pointing to a file rather than a dir. - add ability to disable implicit decompression of downloaded artifacts, i.e. a Compress=no option in the transfer definitions * in sd-id128: also parse UUIDs in RFC4122 URN syntax (i.e. chop off urn:uuid: prefix) * DynamicUser= + StateDirectory= → use uid mapping mounts, too, in order to make dirs appear under right UID. * systemd-sysext: optionally, run it in initrd already, before transitioning into host, to open up possibility for services shipped like that. * maybe add a tool that displays most recent journal logs as QR code to scan off screen and run it automatically on boot failures, emergency logs and such. Use DRM APIs directly, see https://github.com/dvdhrm/docs/blob/master/drm-howto/modeset.c for an example for doing that. * introduce /dev/disk/root/* symlinks that allow referencing partitions on the disk the rootfs is on in a reasonably secure way. (or maybe: add /dev/gpt-auto-{home,srv,boot,…} similar in style to /dev/gpt-auto-root as we already have it. * whenever we receive fds via SCM_RIGHTS make sure none got dropped due to the reception limit the kernel silently enforces. * add an Open= setting to service unit files that can open arbitrary file system paths at service startup time and pass them to the service process via our usual socket activation protocol. If passed path refers to AF_UNIX socket: connect() to it. * add a ConnectSocket= setting to service unit files, that may reference a socket unit, and which will connect to the socket defined therein, and pass the resulting fd to the service program via socket activation proto. * Add a concept of ListenStream=anonymous to socket units: listen on a socket that is deleted in the fs. Usecase would be with ConnectSocket= above. * importd: support image signature verification with PKCS#7 + OpenBSD signify logic, as alternative to crummy gpg * add "systemd-analyze debug" + AttachDebugger= in unit files: The former specifies a command to execute; the latter specifies that an already running "systemd-analyze debug" instance shall be contacted and execution paused until it gives an OK. That way, tools like gdb or strace can be safely be invoked on processes forked off PID 1. * expose MS_NOSYMFOLLOW in various places * make LoadCredential= automatically find credentials in /etc/creds, /run/creds, … and so on, if path component is unqualified * teach LoadCredential=/LoadCredentialEncrypted= to load credentials from kernel cmdline, maybe: LoadCredentialEncrypted=foobar:proc-cmdline:foobar * credentials system: - acquire from kernel command line - acquire from EFI variable? - acquire via via ask-password? - acquire creds via keyring? - pass creds via keyring? - pass creds via memfd? - acquire + decrypt creds from pkcs11? - make systemd-cryptsetup acquire pw via creds logic - make PAMName= acquire pw via creds logic - make macsec/wireguard code in networkd read key via creds logic - make gatwayd/remote read key via creds logic - add sd_notify() command for flushing out creds not needed anymore * add tpm.target or so which is delayed until TPM2 device showed up in case firmware indicates there is one. * Add concept for upgrading TPM2 enrollments, maybe a new switch --pcrs=4: or so, i.e. select a PCR to include in the hash, and then override its hash * TPM2: auto-reenroll in cryptsetup, as fallback for hosed firmware upgrades and such * introduce a new group to own TPM devices * cryptsetup: if only recovery keys are registered and no regular passphrases, ask user for "recovery key", not "passphrase" * cyptsetup: add option for automatically removing empty password slot on boot * cryptsetup: optionally, when run during boot-up and password is never entered, and we are on battery power (or so), power off machine again * cryptsetup: when waiting for FIDO2/PKCS#11 token, tell plymouth that, and allow plymouth to abort the waiting and enter pw instead * make cryptsetup lower --iter-time * cryptsetup: allow encoding key directly in /etc/crypttab, maybe with a "base64:" prefix. Useful in particular for pkcs11 mode. * cryptsetup: reimplement the mkswap/mke2fs in cryptsetup-generator to use systemd-makefs.service instead. * cryptsetup: - cryptsetup-generator: allow specification of passwords in crypttab itself - support rd.luks.allow-discards= kernel cmdline params in cryptsetup generator * when configuring loopback netif, and it fails due to EPERM, eat up error if it happens to be set up alright already. * at boot: check if battery above some threshold, if not power off again after explanation * userdb: add field for ambient caps, so that a user can have CAP_WAKE_ALARM for example. And add code that resets ambient caps for all services by default. * sd-bus: when connecting to some dbus server socker, set originating AF_UNIX socket name in abstract namespace to include "description" string, and pick it up from there in sd_bus_creds logic. i.e. we can use the socket peer address as conduit for some minimal connection metainfo, and use it to restore the "description" logic that kdbus used to have. * systemd-analyze netif that explains predictable interface (or networkctl) * Add service setting to run a service within the specified VRF. i.e. do the equivalent of "ip vrf exec". * change SwitchRoot() implementation in PID 1 to use pivot_root(".", "."), as documented in the pivot_root(2) man page, so that we can drop the /oldroot temporary dir. * special case some calls of chase_symlinks() to use openat2() internally, so that the kernel does what we otherwise do. * add a new flag to chase_symlinks() that stops chasing once the first missing component is found and then allows the caller to create the rest. * make use of new glibc 2.32 APIs sigabbrev_np() and strerrorname_np(). * if /usr/bin/swapoff fails due to OOM, log a friendly explanatory message about it * Remove any support for booting without /usr pre-mounted in the initrd entirely. Update INITRD_INTERFACE.md accordingly. * pid1: Move to tracking of main pid/control pid of units per pidfd * pid1: support new clone3() fork-into-cgroup feature * pid1: also remove PID files of a service when the service starts, not just when it exits * make us use dynamically fewer deps for containers in general purpose distros: o turn into dlopen() deps: - p11-kit-trust (always) - kmod-libs (only when called from PID 1) - libblkid (only in RootImage= handling in PID 1, but not elsewhere) - libpam (only when called from PID 1) - bzip2, xz, lz4 (always — gzip and zstd should probably stay static deps the way they are, since they are so basic and our defaults) o move into separate libsystemd-shared-iptables.so .so - iptables-libs (only used by nspawn + networkd) * seccomp: maybe use seccomp_merge() to merge our filters per-arch if we can. Apparently kernel performance is much better with fewer larger seccomp filters than with more smaller seccomp filters. * systemd-path: add ESP and XBOOTLDR path. Add "private" runtime/state/cache dir enum, mapping to $RUNTIME_DIRECTORY, $STATE_DIRECTORY and such * All tools that support --root= should also learn --image= so that they can operate on disk images directly. Specifically: bootctl, systemctl, coredumpctl. (Already done: systemd-nspawn, systemd-firstboot, systemd-repart, systemd-tmpfiles, systemd-sysusers, journalctl) * seccomp: by default mask x32 ABI system wide on x86-64. it's on its way out * seccomp: don't install filters for ABIs that are masked anyway for the specific service * busctl: maybe expose a verb "ping" for pinging a dbus service to see if it exists and responds. * Maybe add a separate GPT partition type to the discoverable partition spec for "hibernate" partitions, that are exactly like swap partitions but only activated right before hibernation and thus never used for regular swapping. * socket units: allow creating a udev monitor socket with ListenDevices= or so, with matches, then activate app through that passing socket over * unify on openssl: - port journald + fsprg over from libgcrypt - when that's done: kill gnutls support in resolved * add growvol and makevol options for /etc/crypttab, similar to x-systemd.growfs and x-systemd-makefs. * userdb: allow username prefix searches in varlink API, allow realname and realname substr searches in varlink API * userdb: allow uid/gid range checks * userdb: allow existence checks * pid1: activation by journal search expression * when switching root from initrd to host, set the machine_id env var so that if the host has no machine ID set yet we continue to use the random one the initrd had set. * sd-event: add native support for P_ALL waitid() watching, then move PID 1 to it for reaping assigned but unknown children. This needs to some special care to operate somewhat sensibly in light of priorities: P_ALL will return arbitrary processes, regardless of the priority we want to watch them with, hence on each event loop iteration check all processes which we shall watch with higher prio explicitly, and then watch the entire rest with P_ALL. * tweak sd-event's child watching: keep a prioq of children to watch and use waitid() only on the children with the highest priority until one is waitable and ignore all lower-prio ones from that point on * maybe introduce xattrs that can be set on the root dir of the root fs partition that declare the volatility mode to use the image in. Previously I thought marking this via GPT partition flags but that's not ideal since that's outside of the LUKS encryption/verity verification, and we probably shouldn't operate in a volatile mode unless we got told so from a trusted source. * coredump: maybe when coredumping read a new xattr from /proc/$PID/exe that may be used to mark a whole binary as non-coredumpable. Would fix: https://bugs.freedesktop.org/show_bug.cgi?id=69447 * teach parse_timestamp() timezones like the calendar spec already knows it * beef up hibernation to optionally do swapon/swapoff immediately before/after the hibernation * beef up s2h to implement a battery watch loop: instead of entering hibernation unconditionally after coming back from resume make a decision based on the battery load level: if battery level is above a specific threshold, go to suspend again, only hibernate if below it. This means we'd stick to suspend usually, but fall back to hibernation only when battery runs empty (well, subject to our sampling interval). Related to this, check if we can make ACPI _BTP (i.e. /sys/class/power_supply/*/alarm) work for us too, i.e. see if it can wake up machines from suspend, so that we could resume automatically when the system is low on power and move automatically to hibernation mode. (see https://uefi.org/sites/default/files/resources/ACPI%206_2_A_Sept29.pdf section 10.2.2.8 and https://docs.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-wake-sources at the end). * We should probably replace /etc/rc.d/README with a symlink to doc content. After all it is constant vendor data. * maybe add kernel cmdline params: to force random seed crediting * introduce a new per-process uuid, similar to the boot id, the machine id, the invocation id, that is derived from process creds, specifically a hashed combination of AT_RANDOM + getpid() + the starttime from /proc/self/status. Then add these ids implicitly when logging. Deriving this uuid from these three things has the benefit that it can be derived easily from /proc/$PID/ in a stable, and unique way that changes on both fork() and exec(). * let's not GC a unit while its ratelimits are still pending * when killing due to service watchdog timeout maybe detect whether target process is under ptracing and then log loudly and continue instead. * make rfkill uaccess controllable by default, i.e. steal rule from gnome-bluetooth and friends * make MAINPID= message reception checks even stricter: if service uses User=, then check sending UID and ignore message if it doesn't match the user or root. * maybe trigger a uevent "change" on a device if "systemctl reload xyz.device" is issued. * when importing an fs tree with machined, optionally apply userns-rec-chown * when importing an fs tree with machined, complain if image is not an OS * Maybe introduce a helper safe_exec() or so, which is to execve() which safe_fork() is to fork(). And then make revert the RLIMIT_NOFILE soft limit to 1K implicitly, unless explicitly opted-out. * rework seccomp/nnp logic that even if User= is used in combination with a seccomp option we don't have to set NNP. For that, change uid first whil keeping CAP_SYS_ADMIN, then apply seccomp, the drop cap. * when no locale is configured, default to UEFI's PlatformLang variable * add a new syscall group "@esoteric" for more esoteric stuff such as bpf() and usefaultd() and make systemd-analyze check for it. * paranoia: whenever we process passwords, call mlock() on the memory first. i.e. look for all places we use free_and_erasep() and augment them with mlock(). Also use MADV_DONTDUMP. Alternatively (preferably?) use memfd_secret(). * Move RestrictAddressFamily= to the new cgroup create socket * support the bind/connect/sendmsg cgroup stuff for sandboxing, and possibly patching around * maybe implicitly attach monotonic+realtime timestamps to outgoing messages in log.c and sd-journal-send * optionally: turn on cgroup delegation for per-session scope units * introduce per-unit (i.e. per-slice, per-service) journal log size limits. * sd-boot: optionally, show boot menu when previous default boot item has non-zero "tries done" count * augment CODE_FILE=, CODE_LINE= with something like CODE_BASE= or so which contains some identifier for the project, which allows us to include clickable links to source files generating these log messages. The identifier could be some abberviated URL prefix or so (taking inspiration from Go imports). For example, for systemd we could use CODE_BASE=github.com/systemd/systemd/blob/98b0b1123cc or so which is sufficient to build a link by prefixing "http://" and suffixing the CODE_FILE. * Augment MESSAGE_ID with MESSAGE_BASE, in a similar fashion so that we can make clickable links from log messages carrying a MESSAGE_ID, that lead to some explanatory text online. * maybe extend .path units to expose fanotify() per-mount change events * When reloading configuration PID 1 should reset all its properties to the original defaults before calling parse_config() * hibernate/s2h: make this robust and safe to enable in Fedora by default. Specifically: 1. add resume_offset support to the resume code (i.e. support swap files properly) 2. check if swap is on weird storage and refuse if so 3. add auto-detection of hibernation images * cgroups: use inotify to get notified when somebody else modifies cgroups owned by us, then log a friendly warning. * beef up log.c with support for stripping ANSI sequences from strings, so that it is OK to include them in log strings. This would be particularly useful so that our log messages could contain clickable links for example for unit files and suchlike we operate on. * importd: add ability download images for portabled + sysext * add support for "portablectl attach http://foobar.com/waaa.raw (i.e. importd integration) * sync dynamic uids/gids between host+portable srvice (i.e. if DynamicUser=1 is set for a service, make sure that the selected user is resolvable in the service even if it ships its own /etc/passwd) * Fix DECIMAL_STR_MAX or DECIMAL_STR_WIDTH. One includes a trailing NUL, the other doesn't. What a disaster. Probably to exclude it. * Check that users of inotify's IN_DELETE_SELF flag are using it properly, as usually IN_ATTRIB is the right way to watch deleted files, as the former only fires when a file is actually removed from disk, i.e. the link count drops to zero and is not open anymore, while the latter happens when a file is unlinked from any dir. * port systemctl, busctl, … over to format-table.[ch]'s table formatters * pid1: lock image configured with RootDirectory=/RootImage= using the usual nspawn semantics while the unit is up * add --vacuum-xyz options to coredumpctl, matching those journalctl already has. * introduce Ephemeral= unit file switch, that creates an ephemeral copy of all files and directories that are left writable for a unit, and which are removed after the unit goes down again. A bit like --ephemeral for systemd-nspawn but for system services. If used together with RootImage= this should reflink the image file itself. Related: add Ephemeral= … which would allow marking specific paths only like this. * add CopyFile= or so as unit file setting that may be used to copy files or directory trees from the host to the services RootImage= and RootDirectory= environment. Which we can use for /etc/machine-id and in particular /etc/resolv.conf. Should be smart and do something useful on read-only images, for example fall back to read-only bind mounting the file instead. * show invocation ID in systemd-run output * bypass SIGTERM state in unit files if KillSignal is SIGKILL * add proper dbus APIs for the various sd_notify() commands, such as MAINPID=1 and so on, which would mean we could report errors and such. * introduce DefaultSlice= or so in system.conf that allows changing where we place our units by default, i.e. change system.slice to something else. Similar, ManagerSlice= should exist so that PID1's own scope unit could be moved somewhere else too. Finally machined and logind should get similar options so that it is possible to move user session scopes and machines to a different slice too by default. Usecase: people who want to put resources on the entire system, with the exception of one specific service. See: https://lists.freedesktop.org/archives/systemd-devel/2018-February/040369.html * maybe rework get_user_creds() to query the user database if $SHELL is used for root, but only then. * be stricter with fds we receive for the fdstore: close them asynchronously * calenderspec: add support for week numbers and day numbers within a year. This would allow us to define "bi-weekly" triggers safely. * sd-bus: add vtable flag, that may be used to request client creds implicitly and asynchronously before dispatching the operation * sd-bus: parse addresses given in sd_bus_set_addresses immediately and not only when used. Add unit tests. * make use of ethtool veth peer info in machined, for automatically finding out host-side interface pointing to the container. * add some special mode to LogsDirectory=/StateDirectory=… that allows declaring these directories without necessarily pulling in deps for them, or creating them when starting up. That way, we could declare that systemd-journald writes to /var/log/journal, which could be useful when we doing disk usage calculations and so on. * deprecate RootDirectoryStartOnly= in favour of a new ExecStart= prefix char * add a new RuntimeDirectoryPreserve= mode that defines a similar lifecycle for the runtime dir as we maintain for the fdstore: i.e. keep it around as long as the unit is running or has a job queued. * support projid-based quota in machinectl for containers * add a way to lock down cgroup migration: a boolean, which when set for a unit makes sure the processes in it can never migrate out of it * blog about fd store and restartable services * document Environment=SYSTEMD_LOG_LEVEL=debug drop-in in debugging document * rework ExecOutput and ExecInput enums so that EXEC_OUTPUT_NULL loses its magic meaning and is no longer upgraded to something else if set explicitly. * in the long run: permit a system with /etc/machine-id linked to /dev/null, to make it lose its identity, i.e. be anonymous. For this we'd have to patch through the whole tree to make all code deal with the case where no machine ID is available. * optionally, collect cgroup resource data, and store it in per-unit RRD files, suitable for processing with rrdtool. Add bus API to access this data, and possibly implement a CPULoad property based on it. * beef up pam_systemd to take unit file settings such as cgroups properties as parameters * maybe hook up xfs/ext4 quotactl() with services? i.e. automatically manage the quota of the user indicated in User= via unit file settings, like the other resource management concepts. Would mix nicely with DynamicUser=1. Or alternatively, do this with projids, so that we can also cover services running as root. Quota should probably cover all the special dirs such as StateDirectory=, LogsDirectory=, CacheDirectory=, as well as RootDirectory= if it is set, plus the whole disk space any image configured with RootImage=. * In DynamicUser= mode: before selecting a UID, use disk quota APIs on relevant disks to see if the UID is already in use. * expose IO accounting data on the bus, show it in systemd-run --wait and log about it in the resource log message * Add AddUser= setting to unit files, similar to DynamicUser=1 which however creates a static, persistent user rather than a dynamic, transient user. We can leverage code from sysusers.d for this. * add some optional flag to ReadWritePaths= and friends, that has the effect that we create the dir in question when the service is started. Example: ReadWritePaths=:/var/lib/foobar * Add ExecMonitor= setting. May be used multiple times. Forks off a process in the service cgroup, which is supposed to monitor the service, and when it exits the service is considered failed by its monitor. * track the per-service PAM process properly (i.e. as an additional control process), so that it may be queried on the bus and everything. * add a new "debug" job mode, that is propagated to unit_start() and for services results in two things: we raise SIGSTOP right before invoking execve() and turn off watchdog support. Then, use that to implement "systemd-gdb" for attaching to the start-up of any system service in its natural habitat. * gpt-auto logic: support encrypted swap, add kernel cmdline option to force it, and honour a gpt bit about it, plus maybe a configuration file * add a percentage syntax for TimeoutStopSec=, e.g. TimeoutStopSec=150%, and then use that for the setting used in user@.service. It should be understood relative to the configured default value. * enable LockMLOCK to take a percentage value relative to physical memory * Permit masking specific netlink APIs with RestrictAddressFamily= * define gpt header bits to select volatility mode * ProtectClock= (drops CAP_SYS_TIMES, adds seecomp filters for settimeofday, adjtimex), sets DeviceAllow o /dev/rtc * ProtectTracing= (drops CAP_SYS_PTRACE, blocks ptrace syscall, makes /sys/kernel/tracing go away) * ProtectMount= (drop mount/umount/pivot_root from seccomp, disallow fuse via DeviceAllow, imply Mountflags=slave) * ProtectKeyRing= to take keyring calls away * RemoveKeyRing= to remove all keyring entries of the specified user * ProtectReboot= that masks reboot() and kexec_load() syscalls, prohibits kill on PID 1 with the relevant signals, and makes relevant files in /sys and /proc (such as the sysrq stuff) unavailable * Support ReadWritePaths/ReadOnlyPaths/InaccessiblePaths in systemd --user instances via the new unprivileged Landlock LSM (https://landlock.io) * make sure the ratelimit object can deal with USEC_INFINITY as way to turn off things * in nss-systemd, if we run inside of RootDirectory= with PrivateUsers= set, find a way to map the User=/Group= of the service to the right name. This way a user/group for a service only has to exist on the host for the right mapping to work. * add bus API for creating unit files in /etc, reusing the code for transient units * add bus API to remove unit files from /etc * add bus API to retrieve current unit file contents (i.e. implement "systemctl cat" on the bus only) * rework fopen_temporary() to make use of open_tmpfile_linkable() (problem: the kernel doesn't support linkat() that replaces existing files, currently) * transient units: don't bother with actually setting unit properties, we reload the unit file anyway * optionally, also require WATCHDOG=1 notifications during service start-up and shutdown * cache sd_event_now() result from before the first iteration... * PID1: find a way how we can reload unit file configuration for specific units only, without reloading the whole of systemd * add an explicit parser for LimitRTPRIO= that verifies the specified range and generates sane error messages for incorrect specifications. * when we detect that there are waiting jobs but no running jobs, do something * PID 1 should send out sd_notify("WATCHDOG=1") messages (for usage in the --user mode, and when run via nspawn) * there's probably something wrong with having user mounts below /sys, as we have for debugfs. for example, src/core/mount.c handles mounts prefixed with /sys generally special. http://lists.freedesktop.org/archives/systemd-devel/2015-June/032962.html * fstab-generator: default to tmpfs-as-root if only usr= is specified on the kernel cmdline * initrd-parse-etc.service: can we skip daemon-reload if /sysroot/etc/fstab is missing? Note that we start initrd-fs.target and initrd-cleanup.target there, so a straightforward ConditionPathExists= is not enough. * docs: bring https://www.freedesktop.org/wiki/Software/systemd/MyServiceCantGetRealtime up to date * add a job mode that will fail if a transaction would mean stopping running units. Use this in timedated to manage the NTP service state. http://lists.freedesktop.org/archives/systemd-devel/2015-April/030229.html * The udev blkid built-in should expose a property that reflects whether media was sensed in USB CF/SD card readers. This should then be used to control SYSTEMD_READY=1/0 so that USB card readers aren't picked up by systemd unless they contain a medium. This would mirror the behaviour we already have for CD drives. * hostnamectl: show root image uuid * Find a solution for SMACK capabilities stuff: http://lists.freedesktop.org/archives/systemd-devel/2014-December/026188.html * synchronize console access with BSD locks: http://lists.freedesktop.org/archives/systemd-devel/2014-October/024582.html * as soon as we have sender timestamps, revisit coalescing multiple parallel daemon reloads: http://lists.freedesktop.org/archives/systemd-devel/2014-December/025862.html * figure out when we can use the coarse timers * maybe allow timer units with an empty Units= setting, so that they can be used for resuming the system but nothing else. * what to do about udev db binary stability for apps? (raw access is not an option) * exponential backoff in timesyncd when we cannot reach a server * timesyncd: add ugly bus calls to set NTP servers per-interface, for usage by NM * merge ~/.local/share and ~/.local/lib into one similar /usr/lib and /usr/share.... * add systemd.abort_on_kill or some other such flag to send SIGABRT instead of SIGKILL (throughout the codebase, not only PID1) * drop nss-myhostname in favour of nss-resolve? * resolved: - mDNS/DNS-SD - service registration - service/domain/types browsing - avahi compat - DNS-SD service registration from socket units - resolved should optionally register additional per-interface LLMNR names, so that for the container case we can establish the same name (maybe "host") for referencing the server, everywhere. - allow clients to request DNSSEC for a single lookup even if DNSSEC is off (?) - hook up resolved with machined-based address resolution * refcounting in sd-resolve is borked * add new gpt type for btrfs volumes * generator that automatically discovers btrfs subvolumes, identifies their purpose based on some xattr on them. * a way for container managers to turn off getty starting via $container_headless= or so... * figure out a nice way how we can let the admin know what child/sibling unit causes cgroup membership for a specific unit * For timer units: add some mechanisms so that timer units that trigger immediately on boot do not have the services they run added to the initial transaction and thus confuse Type=idle. * add bus api to query unit file's X fields. * gpt-auto-generator: - Define new partition type for encrypted swap? Support probed LUKS for encrypted swap? - Make /home automount rather than mount? * add generator that pulls in systemd-network from containers when CAP_NET_ADMIN is set, more than the loopback device is defined, even when it is otherwise off * MessageQueueMessageSize= (and suchlike) should use parse_iec_size(). * implement Distribute= in socket units to allow running multiple service instances processing the listening socket, and open this up for ReusePort= * cgroups: - implement per-slice CPUFairScheduling=1 switch - introduce high-level settings for RT budget, swappiness - how to reset dynamically changed unit cgroup attributes sanely? - when reloading configuration, apply new cgroup configuration - when recursively showing the cgroup hierarchy, optionally also show the hierarchies of child processes * transient units: - add field to transient units that indicate whether systemd or somebody else saves/restores its settings, for integration with libvirt * when we detect low battery and no AC on boot, show pretty splash and refuse boot * libsystemd-journal, libsystemd-login, libudev: add calls to easily attach these objects to sd-event event loops * be more careful what we export on the bus as (usec_t) 0 and (usec_t) -1 * rfkill,backlight: we probably should run the load tools inside of the udev rules so that the state is properly initialized by the time other software sees it * After coming back from hibernation reset hibernation swap partition using the /dev/snapshot ioctl APIs * If we try to find a unit via a dangling symlink, generate a clean error. Currently, we just ignore it and read the unit from the search path anyway. * refuse boot if /usr/lib/os-release is missing or /etc/machine-id cannot be set up * man: the documentation of Restart= currently is very misleading and suggests the tools from ExecStartPre= might get restarted. * load .d/*.conf dropins for device units * There's currently no way to cancel fsck (used to be possible via C-c or c on the console) * add option to sockets to avoid activation. Instead just drop packets/connections, see http://cyberelk.net/tim/2012/02/15/portreserve-systemd-solution/ * make sure systemd-ask-password-wall does not shutdown systemd-ask-password-console too early * verify that the AF_UNIX sockets of a service in the fs still exist when we start a service in order to avoid confusion when a user assumes starting a service is enough to make it accessible * Make it possible to set the keymap independently from the font on the kernel cmdline. Right now setting one resets also the other. * and a dbus call to generate target from current state * investigate whether the gnome pty helper should be moved into systemd, to provide cgroup support. * dot output for --test showing the 'initial transaction' * be able to specify a forced restart of service A where service B depends on, in case B needs to be auto-respawned? * pid1: - When logging about multiple units (stopping BoundTo units, conflicts, etc.), log both units as UNIT=, so that journalctl -u triggers on both. - generate better errors when people try to set transient properties that are not supported... http://lists.freedesktop.org/archives/systemd-devel/2015-February/028076.html - maybe introduce WantsMountsFor=? Usecase: http://lists.freedesktop.org/archives/systemd-devel/2015-January/027729.html - recreate systemd's D-Bus private socket file on SIGUSR2 - move PAM code into its own binary - when we automatically restart a service, ensure we restart its rdeps, too. - hide PAM options in fragment parser when compile time disabled - Support --test based on current system state - If we show an error about a unit (such as not showing up) and it has no Description string, then show a description string generated form the reverse of unit_name_mangle(). - after deserializing sockets in socket.c we should reapply sockopts and things - drop PID 1 reloading, only do reexecing (difficult: Reload() currently is properly synchronous, Reexec() is weird, because we cannot delay the response properly until we are back, so instead of being properly synchronous we just keep open the fd and close it when done. That means clients do not get a successful method reply, but much rather a disconnect on success. - when breaking cycles drop sysv services first, then services from /run, then from /etc, then from /usr - when a bus name of a service disappears from the bus make sure to queue further activation requests - maybe introduce CoreScheduling=yes/no to optionally set a PR_SCHED_CORE cookie, so that all processes in a service's cgroup share the same cookie and are guaranteed not to share SMT cores with other units https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/hw-vuln/core-scheduling.rst * unit files: - allow port=0 in .socket units - maybe introduce ExecRestartPre= - add ReloadSignal= for configuring a reload signal to use - implement Register= switch in .socket units to enable registration in Avahi, RPC and other socket registration services. - allow Type=simple with PIDFile= https://bugzilla.redhat.com/show_bug.cgi?id=723942 - allow writing multiple conditions in unit files on one line - introduce Type=pid-file - add a concept of RemainAfterExit= to scope units - Allow multiple ExecStart= for all Type= settings, so that we can cover rescue.service nicely - add verification of [Install] section to systemd-analyze verify * timer units: - timer units should get the ability to trigger when DST changes - Modulate timer frequency based on battery state * add libsystemd-password or so to query passwords during boot using the password agent logic * clean up date formatting and parsing so that all absolute/relative timestamps we format can also be parsed * on shutdown: move utmp, wall, audit logic all into PID 1 (or logind?), get rid of systemd-update-utmp-runlevel * make repeated alt-ctrl-del presses printing a dump * currently x-systemd.timeout is lost in the initrd, since crypttab is copied into dracut, but fstab is not * add a pam module that passes the hdd passphrase into the PAM stack and then expires it, for usage by gdm auto-login. * add a pam module that on password changes updates any LUKS slot where the password matches * test/: - add unit tests for config_parse_device_allow() * seems that when we follow symlinks to units we prefer the symlink destination path over /etc and /usr. We should not do that. Instead /etc should always override /run+/usr and also any symlink destination. * when isolating, try to figure out a way how we implicitly can order all units we stop before the isolating unit... * teach ConditionKernelCommandLine= globs or regexes (in order to match foobar={no,0,off}) * Add ConditionDirectoryNotEmpty= handle non-absoute paths as a search path or add ConditionConfigSearchPathNotEmpty= or different syntax? See the discussion starting at https://github.com/systemd/systemd/pull/15109#issuecomment-607740136. * BootLoaderSpec: Define a way how an installer can figure out whether a BLS compliant boot loader is installed. * think about requeuing jobs when daemon-reload is issued? usecase: the initrd issues a reload after fstab from the host is accessible and we might want to requeue the mounts local-fs acquired through that automatically. * systemd-inhibit: make taking delay locks useful: support sending SIGINT or SIGTERM on PrepareForSleep() * remove any syslog support from log.c — we probably cannot do this before split-off udev is gone for good * shutdown logging: store to EFI var, and store to USB stick? * merge unit_kill_common() and unit_kill_context() * add a dependency on standard-conf.xml and other included files to man pages * MountFlags=shared acts as MountFlags=slave right now. * properly handle loop back mounts via fstab, especially regards to fsck/passno * initialize the hostname from the fs label of /, if /etc/hostname does not exist? * sd-bus: - EBADSLT handling - GetAllProperties() on a non-existing object does not result in a failure currently - port to sd-resolve for connecting to TCP dbus servers - see if we can introduce a new sd_bus_get_owner_machine_id() call to retrieve the machine ID of the machine of the bus itself - see if we can drop more message validation on the sending side - add API to clone sd_bus_message objects - longer term: priority inheritance - dbus spec updates: - NameLost/NameAcquired obsolete - GVariant - path escaping - update systemd.special(7) to mention that dbus.socket is only about the compatibility socket now * sd-event - allow multiple signal handlers per signal? - document chaining of signal handler for SIGCHLD and child handlers - define more intervals where we will shift wakeup intervals around in, 1h, 6h, 24h, ... - maybe support iouring as backend, so that we allow hooking read and write operations instead of IO ready events into event loops. See considerations here: http://blog.vmsplice.net/2020/07/rethinking-event-loop-integration-for.html * dbus: when a unit failed to load (i.e. is in UNIT_ERROR state), we should be able to safely try another attempt when the bus call LoadUnit() is invoked. * maybe do not install getty@tty1.service symlink in /etc but in /usr? * print a nicer explanation if people use variable/specifier expansion in ExecStart= for the first word * mount: turn dependency information from /proc/self/mountinfo into dependency information between systemd units. * firstboot: allow provisioning of /etc/hosts entries, so that we can via the credentials logic insert host name to resolve into containers/hosts. Usecase: fork a container, and make it ping some specific address which is defined by the host on invocation * systemd-firstboot: make sure to always use chase_symlinks() before reading/writing files * firstboot: make it useful to be run immediately after yum --installroot to set up a machine. (most specifically, make --copy-root-password work even if /etc/passwd already exists * sd-boot: define a drop-in dir in the ESP that may contain X.509 certificates. If the firmware is detected to be in setup mode, automatically enroll them as PK/KEK/db, turn off setup mode and proceed. Optionally, instead of auto-enrolling them add them to the sd-boot menu, giving the user the option to manually enroll them, after selecting the menu entry. This way, installer images can just drop the certfiicates in the ESP, and on first boot can easily enroll the keys without ever booting up. * efi stub: optionally, load initrd from disk as a separate file, HMAC check it with key from TPM, bound to PCR, refusing if failing. This would then allow traditional distros that generate initrds locally to secure them with TPM: after generating the initrd, do the HMAC calculation, put result in initrd filename, done. This would then bind the validity of the initrd to the local host, and used kernel, and means people cannot change initrd or kernel without booting the kernel + initrd. * EFI: - honor language efi variables for default language selection (if there are any?) - honor timezone efi variables for default timezone selection (if there are any?) - change bootctl to be backed by systemd-bootd to control temporary and persistent default boot goal plus efi variables * bootctl - recognize the case when not booted on EFI * bootctl,sd-boot: actually honour the "architecture" key * bootctl: - teach it to prepare an ESP wholesale, i.e. with mkfs.vfat invocation - teach it to copy in unified kernel images and maybe type #1 boot loader spec entries from host - make it operate on loopback files, dissecting enough to find ESP to operate on - bootspec: properly support boot attempt counters when parsing entry file names * kernel-install: - optionally, support generating type #2 entries instead of type #1, including signing them * logind: - logind: optionally, ignore idle-hint logic for autosuspend, block suspend as long as a session is around - logind: wakelock/opportunistic suspend support - Add pretty name for seats in logind - logind: allow showing logout dialog from system? - add Suspend() bus calls which take timestamps to fix double suspend issues when somebody hits suspend and closes laptop quickly. - if pam_systemd is invoked by su from a process that is outside of a any session we should probably just become a NOP, since that's usually not a real user session but just some system code that just needs setuid(). - logind: make the Suspend()/Hibernate() bus calls wait for the for the job to be completed. before returning, so that clients can wait for "systemctl suspend" to finish to know when the suspending is complete. - logind: when the power button is pressed short, just popup a logout dialog. If it is pressed for 1s, do the usual shutdown. Inspiration are Macs here. - expose "Locked" property on logind session objects - maybe allow configuration of the StopTimeout for session scopes - rename session scope so that it includes the UID. THat way the session scope can be arranged freely in slices and we don't have make assumptions about their slice anymore. - follow PropertiesChanged state more closely, to deal with quick logouts and relogins - (optionally?) spawn seat-manager@$SEAT.service whenever a seat shows up that as CanGraphical set * move logind udev rules to top-level rule.d/ directory * move multiseat vid/pid matches from logind udev rule to hwdb * logind: rework pam_logind to also do a bus call in case of invocation from user@.service, which returns the XDG_RUNTIME_DIR value, and make this behaviour selectable via pam module option. * delay activation of logind until somebody logs in, or when /dev/tty0 pulls it in or lingering is on (so that containers don't bother with it until PAM is used). also exit-on-idle * journal: - consider introducing implicit _TTY= + _PPID= + _EUID= + _EGID= + _FSUID= + _FSGID= fields - journald: also get thread ID from client, plus thread name - journal: when waiting for journal additions in the client always sleep at least 1s or so, in order to minimize wakeups - add API to close/reopen/get fd for journal client fd in libsystemd-journal. - fall back to /dev/log based logging in libsystemd-journal, if we cannot log natively? - declare the local journal protocol stable in the wiki interface chart - sd-journal: speed up sd_journal_get_data() with transparent hash table in bg - journald: when dropping msgs due to ratelimit make sure to write "dropped %u messages" not only when we are about to print the next message that works, but already after a short timeout - check if we can make journalctl by default use --follow mode inside of less if called without args? - maybe add API to send pairs of iovecs via sd_journal_send - journal: add a setgid "systemd-journal" utility to invoke from libsystemd-journal, which passes fds via STDOUT and does PK access - journactl: support negative filtering, i.e. FOOBAR!="waldo", and !FOOBAR for events without FOOBAR. - journal: store timestamp of journal_file_set_offline() in the header, so it is possible to display when the file was last synced. - journal-send.c, log.c: when the log socket is clogged, and we drop, count this and write a message about this when it gets unclogged again. - journal: find a way to allow dropping history early, based on priority, other rules - journal: When used on NFS, check payload hashes - journald: add kernel cmdline option to disable ratelimiting for debug purposes - refuse taking lower-case variable names in sd_journal_send() and friends. - journald: we currently rotate only after MaxUse+MaxFilesize has been reached. - journal: deal nicely with byte-by-byte copied files, especially regards header - journal: sanely deal with entries which are larger than the individual file size, but where the components would fit - Replace utmp, wtmp, btmp, and lastlog completely with journal - journalctl: instead --after-cursor= maybe have a --cursor=XYZ+1 syntax? - when a kernel driver logs in a tight loop, we should ratelimit that too. - journald: optionally, log debug messages to /run but everything else to /var - journald: when we drop syslog messages because the syslog socket is full, make sure to write how many messages are lost as first thing to syslog when it works again. - journald: allow per-priority and per-service retention times when rotating/vacuuming - journald: make use of uid-range.h to managed uid ranges to split journals in. - journalctl: add the ability to look for the most recent process of a binary. journalctl /usr/bin/X11 --pid=-1 or so... - improve journalctl performance by loading journal files lazily. Encode just enough information in the file name, so that we do not have to open it to know that it is not interesting for us, for the most common operations. - man: document that corrupted journal files is nothing to act on - rework journald sigbus stuff to use mutex - Set RLIMIT_NPROC for systemd-journal-xyz, and all other of our services that run under their own user ids, and use User= (but only in a world where userns is ubiquitous since otherwise we cannot invoke those daemons on the host AND in a container anymore). Also, if LimitNPROC= is used without User= we should warn and refuse operation. - journalctl --verify: don't show files that are currently being written to as FAIL, but instead show that their are being written to. - add journalctl -H that talks via ssh to a remote peer and passes through binary logs data - add a version of --merge which also merges /var/log/journal/remote - journalctl: -m should access container journals directly by enumerating them via machined, and also watch containers coming and going. Benefit: nspawn --ephemeral would start working nicely with the journal. - assign MESSAGE_ID to log messages about failed services - check if loop in decompress_blob_xz() is necessary * journald: support RFC3164 fully for the incoming syslog transport, see https://github.com/systemd/systemd/issues/19251#issuecomment-816601955 * Hook up journald's FSS logic with TPM2: seal the verification disk by time-based policy, so that the verification key can remain on host and ve validated via TPM. * build short web pages out of each catalog entry, build them along with man pages, and include hyperlinks to them in the journal output * journald: do journal file writing out-of-process, with one writer process per client UID, so that synthetic hash table collisions can slow down a specific user's journal stream down but not the others. * tweak journald context caching. In addition to caching per-process attributes keyed by PID, cache per-cgroup attributes (i.e. the various xattrs we read) keyed by cgroup path, and guarded by ctime changes. This should provide us with a nice speed-up on services that have many processes running in the same cgroup. * maybe add call sd_journal_set_block_timeout() or so to set SO_SNDTIMEO for the sd-journal logging socket, and, if the timeout is set to 0, sets O_NONBLOCK on it. That way people can control if and when to block for logging. * journalctl: make sure -f ends when the container indicated by -M terminates * journald: sigbus API via a signal-handler safe function that people may call from the SIGBUS handler * add a test if all entries in the catalog are properly formatted. (Adding dashes in a catalog entry currently results in the catalog entry being silently skipped. journalctl --update-catalog must warn about this, and we should also have a unit test to check that all our message are OK.) * homed: - when user tries to log into record signed by unrecognized key, automatically add key to our chain after polkit auth - rollback when resize fails mid-operation - GNOME's side for forget key on suspend (requires rework so that lock screen runs outside of uid) - update LUKS password on login if we find there's a password that unlocks the JSON record but not the LUKS device. - create on activate? - properties: icon url?, preferred session type?, administrator bool (which translates to 'wheel' membership)?, address?, telephone?, vcard?, samba stuff?, parental controls? - communicate clearly when usb stick is safe to remove. probably involves beefing up logind to make pam session close hook synchronous and wait until systemd --user is shut down. - logind: maybe keep a "busy fd" as long as there's a non-released session around or the user@.service - maybe make automatic, read-only, time-based reflink-copies of LUKS disk images (and btrfs snapshots of subvolumes) (think: time machine) - distinguish destroy / remove (i.e. currently we can unregister a user, unregister+remove their home directory, but not just remove their home directory) - in systemd's PAMName= logic: query passwords with ssh-askpassword, so that we can make "loginctl set-linger" mode work - fingerprint authentication, pattern authentication, … - make sure "classic" user records can also be managed by homed - make size of $XDG_RUNTIME_DIR configurable in user record - query password from kernel keyring first - update even if record is "absent" - move acct mgmt stuff from pam_systemd_home to pam_systemd? - when "homectl --pkcs11-token-uri=" is used, synthesize ssh-authorized-keys records for all keys we have private keys on the stick for - make slice for users configurable (requires logind rework) - logind: populate auto-login list bus property from PKCS#11 token - when determining state of a LUKS home directory, check DM suspended sysfs file - when homed is in use, maybe start the user session manager in a mount namespace with MS_SLAVE, so that mounts propagate down but not up - eg, user A setting up a backup volume doesn't mean user B sees it - use credentials logic/TPM2 logic to store homed signing key - permit multiple user record signing keys to be used locally, and pick the right one for signing records automatically depending on a pre-existing signature - add a way to "adopt" a home directory, i.e. strip foreign signatures and insert a local signature instead. - as an extension to the directory+subvolume backend: if located on especially marked fs, then sync down password into LUKS header of that fs, and always verify passwords against it too. Bootstrapping is a problem though: if no one is logged in (or no other user even exists yet), how do you unlock the volume in order to create the first user and add the first pw. - support new FS_IOC_ADD_ENCRYPTION_KEY ioctl for setting up fscrypt - maybe pre-create ~/.cache as subvol so that it can have separate quota easily? - add a switch to homectl (maybe called --first-boot) where it will check if any non-system users exist, and if not prompts interactively for basic user info, mimicking systemd-firstboot. Then, place this in a service that runs after systemd-homed, but before gdm and friends, as a simple, barebones fallback logic to get a regular user created on uninitialized systems. - store PKCS#11 + FIDO2 token info in LUKS2 header, compatible with systemd-cryptsetup, so that it can unlock homed volumes - maybe make all *.home files owned by `systemd-home` user or so, so that we can easily set overall quota for all users - on login, if we can't fallocate initially, but rebalance is on, then allow login in discard mode, then immediately rebalance, then turn off discard - extend user records with optional "bulk" data. Specifically, a user avatar/photo or so. This data should be stored along with the user record, but probably shouldn't be part of the record itself, since it might be large. * add a new switch --auto-definitions=yes/no or so to systemd-repart. If specified, synthesize a definition automatically if we can: enlarge last partition on disk, but only if it is marked for growing and not read-only. * systemd-repart: read LUKS encryption key from $CREDENTIALS_DIRECTORY * systemd-repart: add a switch to factory reset the partition table without immediately applying the new configuration again. i.e. --factory-reset=leave or so. (this is useful to factory reset an image, then putting it into another machine, ensuring that luks key is generated on new machine, not old) * systemd-repart: support setting up dm-integrity with HMAC * systemd-repart: maybe remove half-initialized image on failure. It fails if the output file exists, so a repeated invocation will usually fail if something goes wrong on the way. * systemd-repart: drop pager mode on normal operation? * systemd-repart: by default generate minimized partition tables (i.e. tables that only cover the space actually used, excluding any free space at the end), in order to maximize dd'ability. Requires libfdisk work, see https://github.com/karelzak/util-linux/issues/907 * systemd-repart: MBR partition table support. Care needs to be taken regarding Type=, so that partition definitions can sanely apply to both the GPT and the MBR case. Idea: accept syntax "Type=gpt:home mbr:0x83" for setting the types for the two partition types explicitly. And provide an internal mapping so that "Type=linux-generic" maps to the right types for both partition tables automatically. * systemd-repart: allow sizing partitions as factor of available RAM, so that we can reasonably size swap partitions for hibernation. * systemd-repart: allow boolean option that ensures that if existing partition doesn't exist within the configured size bounds the whole command fails. This is useful to implement ESP vs. XBOOTLDR schemes in installers: have one set of repart files for the case where ESP is large enough and one where it isn't and XBOOTLDR is added in instead. Then apply the former first, and if it fails to apply use the latter. * systemd-repart: add per-partition option to never reuse existing partition and always create anew even if matching partition already exists. * systemd-repart: add per-partition option to fail if partition already exist, i.e. is not added new. Similar, add option to fail if partition does not exist yet. * systemd-repart: allow disabling growing of specific partitions, or making them (think ESP: we don't ever want to grow it, since we cannot resize vfat) * systemd-repart: make it a static checker during early boot for existence and absence of other partitions for trusted boot environments * document: - document that deps in [Unit] sections ignore Alias= fields in [Install] units of other units, unless those units are disabled - man: clarify that time-sync.target is not only sysv compat but also useful otherwise. Same for similar targets - document that service reload may be implemented as service reexec - add a man page containing packaging guidelines and recommending usage of things like Documentation=, PrivateTmp=, PrivateNetwork= and ReadOnlyDirectories=/etc /usr. - document systemd-journal-flush.service properly - documentation: recommend to connect the timer units of a service to the service via Also= in [Install] - man: document the very specific env the shutdown drop-in tools live in - man: add more examples to man pages, - in particular an example how to do the equivalent of switching runlevels - man: maybe sort directives in man pages, and take sections from --help and apply them to man too - document root=gpt-auto properly * systemctl: - add systemctl switch to dump transaction without executing it - Add a verbose mode to "systemctl start" and friends that explains what is being done or not done - "systemctl disable" on a static unit prints no message and does nothing. "systemctl enable" does nothing, and gives a bad message about it. Should fix both to print nice actionable messages. - print nice message from systemctl --failed if there are no entries shown, and hook that into ExecStartPre of rescue.service/emergency.service - add new command to systemctl: "systemctl system-reexec" which reexecs as many daemons as virtually possible - systemctl enable: fail if target to alias into does not exist? maybe show how many units are enabled afterwards? - systemctl: "Journal has been rotated since unit was started." message is misleading - systemctl status output should include list of triggering units and their status * introduce an option (or replacement) for "systemctl show" that outputs all properties as JSON, similar to busctl's new JSON output. In contrast to that it should skip the variant type string though. * add an explicit "vertical" mode to format-table, so that "systemctl status"-like outputs (i.e. with a series of field names left and values right) become genuine first class citizens, and we gain automatic, sane JSON output for them. * Add a "systemctl list-units --by-slice" mode or so, which rearranges the output of "systemctl list-units" slightly by showing the tree structure of the slices, and the units attached to them. * add "systemctl wait" or so, which does what "systemd-run --wait" does, but for all units. It should be both a way to pin units into memory as well as a wait to retrieve their exit data. * show whether a service has out-of-date configuration in "systemctl status" by using mtime data of ConfigurationDirectory=. * "systemctl preset-all" should probably order the unit files it operates on lexicographically before starting to work, in order to ensure deterministic behaviour if two unit files conflict (like DMs do, for example) * add "systemctl start -v foobar.service" that shows logs of a service while the start command runs. This is non-trivial to do without races though, since we should flush out all journal messages before returning from the "systemctl stop". * systemctl: if some operation fails, show log output? * Add a new verb "systemctl top" * unit install: - "systemctl mask" should find all names by which a unit is accessible (i.e. by scanning for symlinks to it) and link them all to /dev/null * nspawn: - emulate /dev/kmsg using CUSE and turn off the syslog syscall with seccomp. That should provide us with a useful log buffer that systemd can log to during early boot, and disconnect container logs from the kernel's logs. - as soon as networkd has a bus interface, hook up --network-interface=, --network-bridge= with networkd, to trigger netdev creation should an interface be missing - a nice way to boot up without machine id set, so that it is set at boot automatically for supporting --ephemeral. Maybe hash the host machine id together with the machine name to generate the machine id for the container - fix logic always print a final newline on output. https://github.com/systemd/systemd/pull/272#issuecomment-113153176 - should optionally support receiving WATCHDOG=1 messages from its payload PID 1... - optionally automatically add FORWARD rules to iptables whenever nspawn is running, remove them when shut down. * nspawn: add support for sysext extensions, too. i.e. a new --extension= switch that takes one or more arguments, and applies the extensions already during startup. * when main nspawn supervisor process gets suspended due to SIGSTOP/SIGTTOU or so, freeze the payload too. * machined: add API to acquire UID range. add API to mount/dissect loopback file. Both protected by PK. Then make nspawn use these APIs to run unprivileged containers. i.e. push the truly privileged bits into machined, so that the client side can remain entirely unprivileged, with SUID or anything like that. * nspawn: support time namespaces * nspawn: on cgroupsv1 issue cgroup empty handler process based on host events, so that we make cgroup agent logic safe * nspawn/machined: add API to invoke binary in container, then use that as fallback in "machinectl shell" * nspawn: make nspawn suitable for shell pipelines: instead of triggering a hangup when input is finished, send ^D, which synthesizes an EOF. Then wait for hangup or ^D before passing on the EOF. * nspawn: greater control over selinux label? * nspawn: support that /proc, /sys/, /dev are pre-mounted * machined: - add an API so that libvirt-lxc can inform us about network interfaces being removed or added to an existing machine - "machinectl migrate" or similar to copy a container from or to a difference host, via ssh - introduce systemd-nspawn-ephemeral@.service, and hook it into "machinectl start" with a new --ephemeral switch - "machinectl status" should also show internal logs of the container in question - "machinectl history" - "machinectl diff" - "machinectl commit" that takes a writable snapshot of a tree, invokes a shell in it, and marks it read-only after use * udev: - move to LGPL - kill scsi_id - add trigger --subsystem-match=usb/usb_device device - reimport udev db after MOVE events for devices without dev_t * coredump: - save coredump in Windows/Mozilla minidump format - when truncating coredumps, also log the full size that the process had, and make a metadata field so we can report truncated coredumps * support crash reporting operation modes (https://live.gnome.org/GnomeOS/Design/Whiteboards/ProblemReporting) * tmpfiles: - apply "x" on "D" too (see patch from William Douglas) - instead of ignoring unknown fields, reject them. - creating new directories/subvolumes/fifos/device nodes should not follow symlinks. None of the other adjustment or creation calls follow symlinks. - add --test mode - teach tmpfiles.d q/Q logic something sensible in the context of XFS/ext4 project quota * udev-link-config: - Make sure ID_PATH is always exported and complete for network devices where possible, so we can safely rely on Path= matching * sd-rtnl: - add support for more attribute types - inbuilt piping support (essentially degenerate async)? see loopback-setup.c and other places * networkd: - add more keys to [Route] and [Address] sections - add support for more DHCPv4 options (and, longer term, other kinds of dynamic config) - add reduced [Link] support to .network files - properly handle routerless dhcp leases - work with non-Ethernet devices - dhcp: do we allow configuring dhcp routes on interfaces that are not the one we got the dhcp info from? - the DHCP lease data (such as NTP/DNS) is still made available when a carrier is lost on a link. It should be removed instantly. - expose in the API the following bits: - option 15, domain name - option 12, hostname and/or option 81, fqdn - option 123, 144, geolocation - option 252, configure http proxy (PAC/wpad) - provide a way to define a per-network interface default metric value for all routes to it. possibly a second default for DHCP routes. - allow Name= to be specified repeatedly in the [Match] section. Maybe also support Name=foo*|bar*|baz ? - whenever uplink info changes, make DHCP server send out FORCERENEW * in networkd, when matching device types, fix up DEVTYPE rubbish the kernel passes to us * Figure out how to do unittests of networkd's state serialization * dhcp: - figure out how much we can increase Maximum Message Size * dhcp6: - add functions to set previously stored IPv6 addresses on startup and get them at shutdown; store them in client->ia_na - write more test cases - implement reconfigure support, see 5.3., 15.11. and 22.20. - implement support for temporary adressess (IA_TA) - implement dhcpv6 authentication - investigate the usefulness of Confirm messages; i.e. are there any situations where the link changes without any loss in carrier detection or interface down - some servers don't do rapid commit without a filled in IA_NA, verify this behavior - RouteTable= ?