systemd

mirror of https://github.com/systemd/systemd.git synced 2025-01-06 17:18:12 +03:00

Author	SHA1	Message	Date
Lennart Poettering	607d297487	man: link up D-Bus API docs from daemon man pages Let's systematically make sure that we link up the D-Bus interfaces from the daemon man pages once in prose and once in short form at the bottom ("See Also"), for all daemons. Also, add reverse links at the bottom of the D-Bus API docs. Fixes: #34996	2024-11-05 22:57:51 +01:00
Daan De Meyer	406f177501	core: Introduce PrivatePIDs= This new setting allows unsharing the pid namespace in a unit. Because you have to fork to get a process into a pid namespace, we fork in systemd-executor to get into the new pid namespace. The parent then sends the pid of the child process back to the manager and exits while the child process continues on with the rest of exec_invoke() and then executes the actual payload. Communicating the child pid is done via a new pidref socket pair that is set up on manager startup. We unshare the PID namespace right before the mount namespace so we mount procfs correctly. Note PrivatePIDs=yes always implies MountAPIVFS=yes to mount procfs. When running unprivileged in a user session, user namespace is set up first to allow for PID namespace to be unshared. However, when running in privileged mode, we unshare the user namespace last to ensure the user namespace does not own the PID namespace and cannot break out of the sandbox. Note we disallow Type=forking services from using PrivatePIDs=yes since the init proess inside the PID namespace must not exit for other processes in the namespace to exist. Note Daan De Meyer did the original work for this commit with Ryan Wilson addressing follow-ups. Co-authored-by: Daan De Meyer <daan.j.demeyer@gmail.com>	2024-11-05 05:32:02 -08:00
Luca Boccassi	890bdd1d77	core: add read-only flag for exec directories When an exec directory is shared between services, this allows one of the service to be the producer of files, and the other the consumer, without letting the consumer modify the shared files. This will be especially useful in conjunction with id-mapped exec directories so that fully sandboxed services can share directories in one direction, safely.	2024-11-01 10:46:55 +00:00
Ryan Wilson	cd58b5a135	cgroup: Add support for ProtectControlGroups= private and strict This commit adds two settings private and strict to the ProtectControlGroups= property. Private will unshare the cgroup namespace and mount a read-write private cgroup2 filesystem at /sys/fs/cgroup. Strict does the same except the mount is read-only. Since the unit is running in a cgroup namespace, the new root of /sys/fs/cgroup is the unit's own cgroup. We also add a new dbus property ProtectControlGroupsEx which accepts strings instead of boolean. This will allow users to use private/strict via dbus and systemd-run in addition to service files. Note private and strict fall back to no and yes respectively if the kernel doesn't support cgroup2 or system is not using unified hierarchy. Fixes: #34634	2024-10-28 08:37:36 -07:00
Mike Yuan	7e40b51a2e	man/org.freedesktop.systemd1: complete version info for ManagedOOMMemoryPressureDurationUSec Follow-up for `63d4c4271c` Some unit types were left out.	2024-10-22 19:12:27 +02:00
Ryan Wilson	63d4c4271c	cgroup: Add ManagedOOMMemoryPressureDurationSec= override setting for units This will allow units (scopes/slices/services) to override the default systemd-oomd setting DefaultMemoryPressureDurationSec=. The semantics of ManagedOOMMemoryPressureDurationSec= are: - If >= 1 second, overrides DefaultMemoryPressureDurationSec= from oomd.conf - If is empty, uses DefaultMemoryPressureDurationSec= from oomd.conf - Ignored if ManagedOOMMemoryPressure= is not "kill" - Disallowed if < 1 second Note the corresponding dbus property is DefaultMemoryPressureDurationUSec which is in microseconds. This is consistent with other time-based dbus properties.	2024-10-16 20:12:38 -07:00
Arthur Shau	cc0ab8c810	timer: introduce DeferReactivation setting By default, in instances where timers are running on a realtime schedule, if a service takes longer to run than the interval of a timer, the service will immediately start again when the previous invocation finishes. This is caused by the fact that the next elapse is calculated based on the last trigger time, which, combined with the fact that the interval is shorter than the runtime of the service, causes that elapse to be in the past, which in turn means the timer will trigger as soon as the service finishes running. This behavior can be changed by enabling the new DeferReactivation setting, which will cause the next calendar elapse to be calculated based on when the trigger unit enters inactivity, rather than the last trigger time. Thus, if a timer is on an realtime interval, the trigger will always adhere to that specified interval. E.g. if you have a timer that runs on a minutely interval, the setting guarantees that triggers will happen at ::00 times, whereas by default this may skew depending on how long the service runs. Co-authored-by: Matteo Croce <teknoraver@meta.com>	2024-10-11 22:54:16 +02:00
Lennart Poettering	0aaacc3a10	Merge pull request #34593 from Werkov/deprecate-aux-scopes core/manager: Deprecate StartAuxiliaryScope() method	2024-10-09 10:25:30 +02:00
Michal Koutný	64f173324e	core/manager: Deprecate StartAuxiliaryScope() method The method was added with migration of resources in mind (e.g. process's allocated memory will follow it to the new scope), however, such a resource migration is not in cgroup semantics. The method may thus have the intended users and others could be guided to StartTransientUnit(). Since this API was advertised in a regular release, start the removal with a deprecation message to callers. Eventually, the goal is to remove the method to clean up DBus API and simplify code (removal of cgroup_context_copy()). Part of DBus docs is retained to satisfy build checks.	2024-10-08 17:49:13 +02:00
Ryan Wilson	3543456f84	Add ExtraFileDescriptor property to StartTransientUnit dbus API This adds the ExtraFileDescriptor property to StartTransient dbus API with format "a(hs)" - array of (file descriptor, name) pairs. The FD will be passed to the unit via sd_notify like Socket and OpenFile. systemctl show also shows ExtraFileDescriptorName for these transient units. We only show the name passed to dbus as the FD numbers will change once passed over the unix socket and are duplicated, so its confusing to display the numbers. We do not add this functionality for systemd-run or general systemd service units as it is not useful for general systemd services. Arguably, it could be useful for systemd-run in bash scripts but we prefer to be cautious and not expose the API yet. Fixes: #34396	2024-10-07 09:01:48 -07:00
Luca Boccassi	3509fe124d	man: consolidate list of active unit states into a shared table Avoids the need to maintain the same list over and over again, and link it to the defition table in the implementation as a reminder too	2024-10-04 12:22:55 +02:00
Jason Yundt	dfb3155419	man: document ShowStatus and SetShowStatus() SetShowStatus() was added in order to fix #11447. Recently, I ran into the exact same problem that OP was experiencing in #11447. I wasn’t able to figure out how to deal with the problem until I found #11447, and it took me a while to find #11447. This commit takes what I learned from reading #11447 and adds it to the documentation. Hopefully, this will make it easier for other people who run into the same problem in the future.	2024-09-18 10:11:55 +02:00
Daan De Meyer	fa693fdc7e	core: Add support for PrivateUsers=identity This configures an indentity mapping similar to systemd-nspawn --private-users=identity.	2024-09-09 18:31:01 +02:00
Takeo Kondo	71f43fc882	man: CAP_SYS_ADMIN does NOT grant any permission for dbus API	2024-09-06 21:16:53 +02:00
Mike Yuan	7a9f0125bb	core: rename BindJournalSockets= to BindLogSockets= Addresses https://github.com/systemd/systemd/pull/32487#issuecomment-2328465309	2024-09-04 21:44:25 +02:00
Mike Yuan	368a3071e9	core: introduce BindJournalSockets= Closes #32478	2024-09-03 21:04:50 +02:00
Luca Boccassi	5162829ec8	core: do BindMount/MountImage operations in async control process These operations might require slow I/O, and thus might block PID1's main loop for an undeterminated amount of time. Instead of performing them inline, fork a worker process and stash away the D-Bus message, and reply once we get a SIGCHILD indicating they have completed. That way we don't break compatibility and callers can continue to rely on the fact that when they get the method reply the operation either succeeded or failed. To keep backward compatibility, unlike reload control processes, these are ran inside init.scope and not the target cgroup. Unlike ExecReload, this is under our control and is not defined by the unit. This is necessary because previously the operation also wasn't ran from the target cgroup, so suddenly forking a copy-on-write copy of pid1 into the target cgroup will make memory usage spike, and if there is a MemoryMax= or MemoryHigh= set and the cgroup is already close to the limit, it will cause an OOM kill, where previously it would have worked fine.	2024-08-29 12:48:55 +01:00
Luca Boccassi	7d8bbfbe08	service: add 'debug' option to RestartMode= One of the major pait points of managing fleets of headless nodes is that when something fails at startup, unless debug level was already enabled (which usually isn't, as it's a firehose), one needs to manually enable it and pray the issue can be reproduced, which often is really hard and time consuming, just to get extra info. Usually the extra log messages are enough to triage an issue. This new option makes it so that when a service fails and is restarted due to Restart=, log level for that unit is set to debug, so that all setup code in pid1 and sd-executor logs at debug level, and also a new DEBUG_INVOCATION=1 env var is passed to the service itself, so that it knows it should start with a higher log level. Once the unit succeeds or reaches the rate limit the original level is restored.	2024-08-27 12:24:45 +01:00
Daan De Meyer	831f208783	core: Add support for renaming credentials with ImportCredential= This allows for "per-instance" credentials for units. The use case is best explained with an example. Currently all our getty units have the following stanzas in their unit file: """ ImportCredential=agetty.* ImportCredential=login.* """ This means that setting agetty.autologin=root as a system credential will make every instance of our all our getty units autologin as the root user. This prevents us from doing autologin on /dev/hvc0 while still requiring manual login on all other ttys. To solve the issue, we introduce support for renaming credentials with ImportCredential=. This will allow us to add the following to e.g. serial-getty@.service: """ ImportCredential=tty.serial.%I.agetty.:agetty. ImportCredential=tty.serial.%I.login.:login. """ which for serial-getty@hvc0.service will make the service manager read all credentials of the form "tty.serial.hvc0.agetty.xxx" and pass them to the service in the form "agetty.xxx" (same goes for login). We can apply the same to each of the getty units to allow setting agetty and login credentials for individual ttys instead of globally.	2024-07-31 15:52:27 +02:00
Mike Yuan	9d50d053f3	core: expose PrivateTmp=disconnected As discussed in https://github.com/systemd/systemd/pull/32724#discussion_r1638963071 I don't find the opposite reasoning particularly convincing. We have ProtectHome=tmpfs and friends, and those can be pretty much trivially implemented through TemporaryFileSystem= too. The new logic brings many benefits, and is completely generic, hence I see no reason not to expose it. We can even get more tests for the code path if we make it public.	2024-06-21 17:31:44 +02:00
Mike Yuan	c3662116b9	man/org.freedesktop.systemd1: Status{Bus,Varlink}Error belongs to Service, not Scope Follow-up for `9c025022d9` Ugh, shouldn't have done this bit when I was sleepy...	2024-06-21 16:47:28 +02:00
Mike Yuan	9c025022d9	core/service: store BUSERROR= & VARLINKERROR= received through notification Closes #6073	2024-06-20 19:03:44 +02:00
Zbigniew Jędrzejewski-Szmek	75ced6d5ee	various: update links to usr-merge	2024-05-28 14:48:56 +02:00
Zbigniew Jędrzejewski-Szmek	f81af0b082	man: update links to "New Control Group Interfaces"	2024-05-28 14:46:44 +02:00
Lennart Poettering	3c1d1ca146	manager: switch service unit type over to using new handoff timestamping logic Also: rename Handover → Handoff. I think it makes it clearer that this is not really about handing over any resources, but that the executor is out off the game from that point on.	2024-04-25 13:40:41 +02:00
Luca Boccassi	c75c8a38b8	man: document service types that record ExecMainHandoverTimestamp Follow-up for `93cb78aee2`	2024-04-24 07:55:37 +02:00
Mike Yuan	844863c61e	core/manager: add unmerged-bin taint	2024-04-24 08:43:08 +08:00
Mike Yuan	ea81442892	core/manager: rearrange taint tags	2024-04-24 08:40:25 +08:00
Mike Yuan	2b28dfe6e6	core/manager: drop obsolete cgroup taint string Wwe can't boot on systems without cgroup anyway (even cgroup v1 will be gone pretty soon).	2024-04-24 08:39:29 +08:00
Luca Boccassi	93cb78aee2	core: add ExecMainHandoverTimestamp property recording time-of-execve Enable the exec_fd logic for Type=notify* services too, and change it to send a timestamp instead of a '1' byte. Record the timestamp in a new ExecMainHandoverTimestamp property so that users can track accurately when control is handed over from systemd to the service payload, so that latency and startup performance can be trivially and accurately tracked and attributed.	2024-04-22 15:16:05 +02:00
Luca Boccassi	ef5f7f9437	systemctl: add --clean= values to documentation and shell completion	2024-04-18 14:07:07 +02:00
Luca Boccassi	b3f548615f	core: rename SoftRebootStartTimestamp -> ShutdownStartTimestamp and generalize Follow-up for `54f86b86ba`	2024-04-17 18:19:27 +01:00
Luca Boccassi	95a289bfe7	man: mention initial value of SoftRebootsCount Follow-up for `66f35161f6`	2024-04-16 00:26:04 +01:00
Luca Boccassi	66f35161f6	core: add counter for soft-reboot iterations Allow to query via D-Bus how many times the current booted system has been soft rebooted	2024-03-27 01:27:35 +00:00
Luca Boccassi	54f86b86ba	core: add SoftRebootStartTimestamp Will be useful to calculate how long it took to shut down the system before starting in the new root	2024-03-27 01:25:49 +00:00
Jakub Sitnicki	97df75d7bd	socket: pass socket FDs to all ExecXYZ= commands but ExecStartPre= Today listen file descriptors created by socket unit don't get passed to commands in Exec{Start,Stop}{Pre,Post}= socket options. This prevents ExecXYZ= commands from accessing the created socket FDs to do any kind of system setup which involves the socket but is not covered by existing socket unit options. One concrete example is to insert a socket FD into a BPF map capable of holding socket references, such as BPF sockmap/sockhash [1] or reuseport_sockarray [2]. Or, similarly, send the file descriptor with SCM_RIGHTS to another process, which has access to a BPF map for storing sockets. To unblock this use case, pass ListenXYZ= file descriptors to ExecXYZ= commands as listen FDs [4]. As an exception, ExecStartPre= command does not inherit any file descriptors because it gets invoked before the listen FDs are created. This new behavior can potentially break existing configurations. Commands invoked from ExecXYZ= might not expect to inherit file descriptors through sd_listen_fds protocol. To prevent breakage, add a new socket unit parameter, PassFileDescriptorsToExec=, to control whether ExecXYZ= programs inherit listen FDs. [1] https://docs.kernel.org/bpf/map_sockmap.html [2] https://lore.kernel.org/r/20180808075917.3009181-1-kafai@fb.com [3] https://man.archlinux.org/man/socket.7#SO_INCOMING_CPU [4] https://www.freedesktop.org/software/systemd/man/latest/sd_listen_fds.html	2024-03-27 01:41:26 +08:00
Mike Yuan	1ea275f119	core/cgroup: introduce MemoryZSwapWriteback setting Added in `501a06fe8e`	2024-03-13 23:36:25 +00:00
Frantisek Sumsal	43b238f1c1	man: suffix signals with () Since signals can take arguments, let's suffix them with () as we already do with functions. To make sure we remain consistent, make the `update-dbus-docs.py` script check & fix any occurrences where this is not the case. Resolves: #31002	2024-01-23 16:27:50 +01:00
Luca Boccassi	d156e66f82	man: add more suggestions on how to use StartUnit and JobRemoved This is not immediately clear for users, so spell out the preferred pattern clearly in the D-Bus documentation.	2024-01-18 17:22:12 +00:00
Lennart Poettering	658dc909dc	man: fix references to systemd.exec(5) For some reason the section for the systemd.exec man page was added incorrectly and then copypasted everywhere else incorrectly too. Let's fix that.	2024-01-11 12:19:44 +00:00
Yu Watanabe	82a1597778	Merge pull request #28797 from Werkov/eff_limits Add MemoryMaxEffective=, MemoryHighEffective= and TasksMaxEff… …ective= properties	2024-01-04 05:38:06 +09:00
Michal Sekletar	84c01612de	core/manager: add dbus API to create auxiliary scope from running service This commit introduces new D-Bus API, StartAuxiliaryScope(). It may be used by services as part of the restart procedure. Service sends an array of PID file descriptors corresponding to processes that are part of the service and must continue running also after service restarts, i.e. they haven't finished the job why they were spawned in the first place (e.g. long running video transcoding job). Systemd creates new scope unit for these processes and migrates them into it. Cgroup properties of scope are copied from the service so it retains same cgroup settings and limits as service had.	2024-01-03 13:50:41 +01:00
Michal Koutný	4fb0d2dc14	cgroup: Add EffectiveMemoryMax=, EffectiveMemoryHigh= and EffectiveTasksMax= properties Users become perplexed when they run their workload in a unit with no explicit limits configured (moreover, listing the limit property would even show it's infinity) but they experience unexpected resource limitation. The memory and pid limits come as the most visible, therefore add new unit read-only properties: - EffectiveMemoryMax=, - EffectiveMemoryHigh=, - EffectiveTasksMax=. These properties represent the most stringent limit systemd is aware of for the given unit -- and that is typically() the effective value. Implement the properties by simply traversing all parents in the leaf-slice tree and picking the minimum value. Note that effective limits are thus defined even for units that don't enable explicit accounting (because of the hierarchy). () The evasive case is when systemd runs in a cgroupns and cannot reason about outer setup. Complete solution would need kernel support.	2024-01-03 13:37:08 +01:00
David Tardon	eea10b26f7	man: use same version in public and system ident.	2023-12-25 15:51:47 +01:00
Andrew Sayers	ff47602f5e	Fix a typo in the org.freedesktop.systemd1 man page	2023-12-15 07:39:05 +09:00
Luca Boccassi	9e615fa3aa	core: add WantsMountsFor= This is the equivalent of RequiresMountsFor=, but adds Wants= instead of Requires=. It will be useful for example for the autogenerated systemd-cryptsetup units. Fixes https://github.com/systemd/systemd/issues/11646	2023-11-29 11:04:59 +00:00
Yu Watanabe	58cde42f65	core: rename MemoryZswapCurrent -> MemoryZSwapCurrent Follow-up for `26caa66867`.	2023-11-13 13:54:56 +01:00
Florian Schmaus	26caa66867	cgroup: add support for memory.zswap.current	2023-11-12 21:10:40 +01:00
Florian Schmaus	37533c9432	cgroup: add support for memory.swap.current In systemctl-show we only show current swap if ever swapped or non-zero. This reduces the noise on swapless systems, that would otherwise always show a swap value that never has the chance to become non-zero. It further reduces the noise for services that never swapped.	2023-11-11 12:16:29 +01:00
Florian Schmaus	aac3384e56	cgroup: add support for memory.swap.peak	2023-11-11 12:14:07 +01:00

1 2 3 4 5

208 Commits