IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Now that we can configure which controllers to delegate precisely, let's
limit wht we delegate to the user session: only "cpu" and "pids" as a
minimal baseline.
Fixes: #1715
An explicit --user switch is necessary because for the user@0.service instance
systemd-tmpfiles is running as root, and we need to distinguish that from
systemd-tmpfiles running in systemd-tmpfiles*.service.
Fixes#2208.
v2:
- restore "systemd-" prefix
- add systemd-tmpfiles-clean.{service,timer}, systemd-setup.service to
systemd-tmpfiles(8)
This makes sense from the point of view of the whole distribution:
if there are some specific files that have syntax problems, or unknown
users or groups, or use unsupported features, failing the whole service
is not useful.
In particular, services with tmpfiles --boot should not be started after boot.
The premise of --boot is that there are actions which are only safe to do once
during boot, because the state evolves later through other means and re-running
the boot-time setup would destroy it. If services with --boot fail in the
initial transaction, they would be re-run later on when a unit which
(indirectly) depends on them is started, causing problems.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1507501.
(If we had a mode where a service would at most run once, and would not be
started in subsequent transactions, that'd be a good additional safeguard.
Using ExecStart=-... is a bit like that, but it causes all failure to be
ignored, which is too big of a hammer.)
So far I avoided adding license headers to meson files, but they are pretty
big and important and should carry license headers like everything else.
I added my own copyright, even though other people modified those files too.
But this is mostly symbolic, so I hope that's OK.
There should be a way to turn this logic of, and DefaultDependencies=
appears to be the right option for that, hence let's downgrade this
dependency type from "implicit" to "default, and thus honour
DefaultDependencies=.
This also drops mount_get_fstype() as we only have a single user needing
this now.
A follow-up for #7076.
remote-cryptsetup-pre.target was designed as an active unit (that pulls in
network-online.target), the opposite of remote-fs-pre.target (a passive unit,
with individual provider services ordering itself before it and pulling it in,
for example iscsi.service and nfs-client.target).
To make remote-cryptsetup-pre.target really work, those services should be
ordered before it too. But this would require updates to all those services,
not just changes from systemd side.
But the requirements for remote-fs-pre.target and remote-cryptset-pre.target
are fairly similar (e.g. iscsi devices can certainly be used for both), so
let's reuse remote-fs-pre.target also for remote cryptsetup units. This loses
a bit of flexibility, but does away with the requirement for various provider
services to know about remote-cryptsetup-pre.target.
In the past we introduced this property just for tmp.mount. However on
todays systems usually there are many more tmpfs mounts. Most notably
mounts backing XDG_RUNTIME_DIR for each user.
Let's generalize what we already have for tmp.mount and implement the
ordering After=swap.target for all tmpfs based mounts.
This makes this target the same as remote-fs.target in this regard. In practice
it probably doesn't make that much difference, because all encrypted devices
that are part of remote-fs.target (marked with _netdev) will be used for mount
points, so they will be pulled in anyway individually, but with this change any
such device will be configured, even if it is not pulled by any other unit.
After the discussions around #7003 I think we should restore the
User=systemd-journal-gateway line for systemd-journal-gatewayd.service,
too, so that we continue to use the state user if it exists, and create
it as dynamic user only when it does not.
Note that undoes part of a change made after 234, i.e. a never released
change.
The configuration option was called -Dresolve, but the internal define
was …RESOLVED. This options governs more than just resolved itself, so
let's settle on the version without "d".
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
1. If we exited emergency mode immediately, we don't want to have an
irreversible stop job still running for syslog.socket. I _suspect_ that
can't happen, but let's not waste effort working out exactly why it's
impossible and not just very improbable.
2. Similarly, it seems undesirable to have rescue.service and
emergency.service both running with an open FD of /dev/console, for
however short a period.
Note this commit only changes how the code is expressed; it does not change
the existence of any dependency.
The `Conflicts=` was added in 3136ec90, "Stop syslog.socket when entering
emergency mode". The discussion in the issue #266 raised concerns that
this might be needed for other units, but failed to point out why
syslog.socket is special. The reason is that syslog.socket has
DefaultDepedencies=no, so it does not get Requires=sysinit.target like
other socket units do. But syslog.service does require sysinit.target,
among other things.
We don't have many socket, path, or timer units with
DefaultDependencies=no, and I don't think any of the triggered services
have such additional hard dependencies as syslog.service does.
It is much less confusing if we keep this `Conflicts=` in the same file as
the `DefaultDependencies=no` which made it necessary.
The original aim of this commit is that starting machines.target from the
rescue shell would not kill the rescue shell and lock you out of the
system.
This is similar to commit 6579a622, for the conflict between
sysinit.target and the _emergency_ shell. That particular commit
introduced an ordering cycle and will need to be reverted and/or
fixed. This one does not, because it does not need to introduce any new
dependencies.
The reason why this commit is allowable also has it's own merit:
machines.target was not marked as AllowIsolate. Also, the point of
containers is to not escape them... I don't think we want to promote
machines.target as a default target or similar; you would generally want
some system service to allow you to shut down the machine, for example. I
don't see this approach used in CoreOS, nor in Fedora Atomic Host; we are
missing any positive examples of its utility.
Requires=basic.target / After=basic.target can be removed for the same
reason.
This reverts commit f1e24a259c. Oops.
# systemctl emergency
Failed to start emergency.target: Transaction order is cyclic. See syste...
See system logs and 'systemctl status emergency.target' for details.
# systemctl status emergency.target
● emergency.target - Emergency Mode
Loaded: loaded (/usr/lib/systemd/system/emergency.target; static; vendor preset: disabled)
Active: inactive (dead) since Mon 2017-09-25 10:43:02 BST; 2h 42min ago
Docs: man:systemd.special(7)
systemd[1]: sysinit.target: Found dependency on sysinit.target/stop
sysinit.target: Unable to break cycle starting with sysinit.target/stop
network.target: Found ordering cycle on wpa_supplicant.service/stop
network.target: Found dependency on sysinit.target/stop
network.target: Found dependency on emergency.target/start
network.target: Found dependency on emergency.service/start
network.target: Found dependency on serial-getty@ttyS0.service/stop
network.target: Found dependency on systemd-user-sessions.service/stop
network.target: Found dependency on network.target/stop
network.target: Unable to break cycle starting with network.target/stop
IMO #6509 is ugly enough that we should aim to answer it. But it could
take some time to investigate, so let's re-open the issue as a first step.
Why
---
The advantage of this is that starting sysinit.target from the emergency
shell will no longer kill the emergency shell and lock you out of the
system. Our docs already claimed that emergency.target was useful for
"starting individual units in order to continue the boot process in steps".
This resolves#6509 for my purposes.
Remaining limitation
--------------------
Starting getty.target will still kill the shell, and if you don't have a
root password you will then be locked out at that point. This is relevant
to distributions which patch the sulogin system to permit logins when the
root password is locked. Both Debian and RedHat used to follow this
behaviour! Debian have been discussing what they could replace it with at
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=806852
So this doesn't quite achieve perfection, but I think it's a worthwhile
change. It should be easier to understand the logic now it doesn't have
such a big hole in it. Repairing the sysinit stage of the boot is the main
reason we have emergency.target. And as discussed in the issue,
sysinit.target gets pulled in implicitly as soon as any DefaultDependencies
service is activated.
How
---
sysinit.target only needs to conflict with emergency.target. It didn't
need to conflict with emergency.service as well. In theory the conflicts
are pointless, we could just change the dependency of sysinit.target on
local-fs.target from Wants to Requires. However, doing so would mean that
when local-fs fails, the screen is flooded with yellow [DEPEND] failures.
That would hinder the poor unfortunate admin, so let's not do that.
There is no additional ordering requirement against emergency. If the
failure happens, the job for sysinit will be cancelled instantly. We don't
need to worry about when sysinit.target and its dependents would be
stopped, because sysinit waits for local-fs before it starts.
emergency.target is still necessarily stopped once we reach sysinit
(you can't express a one-way conflict in pure unit directives).
This is largely cosmetic... though perhaps it symbolizes that you're no
longer in Emergency Mode if System Initialization is successful ;-).
As a secondary advantage, the getty's which conflict on rescue.service now
need to conflict on emergency.service as well. This makes the system more
uniform and simpler to understand.
The only other effect this should have is that
`systemctl start emergency.target` is now practically the same as
`systemctl start rescue.target`. The only units this command will stop are
the conflicting getty units. Neither of those commands should ever be
used. E.g. they will not stop the gdm.service unit on Fedora 26.
The pair is similar to remote-fs.target and remote-fs-pre.target. Any
cryptsetup devices which require network shall be ordered after
remote-cryptsetup-pre.target and before remote-cryptsetup.target.
Normally this happens automatically, but if it happened that both targets were
pulled in, even though there were no cryptsetup units, they could be started
in reverse order, which would be somewhat confusing. Add an explicit ordering
to avoid this potential issue.
This new target is a passive unit, hence it is supposed to be pulled in
to the transaction by the service that wants to block login on the
console (e.g. text version of initial-setup). Now both getty and
serial-getty are ordered after this target.
https://lists.freedesktop.org/archives/systemd-devel/2015-July/033754.html
and the same for hibernate.target and hybrid-sleep.target.
Tested with both sucessful and unsuccessful suspends. The result of the
start job was correct in both cases. Closes#6419 (a regression in v233
and v234).
> suspend is unsual for a target, because it has to stop itself once it's
> started. Otherwise you couldn't start it again, so you could only suspend
> once! Currently that's implemented using BindsTo=systemd-sleep.service.
> Meaning it pulls in systemd-sleep.service to do the actual suspend, and
> then de-activates afterwards. But the behaviour of BindsTo was changed
> recently (not without some issues during development) - maybe this bug
> is caused by poettering/systemd@631b676 which I think was added in
> release v233.
>
> sleep.target (see man systemd.special) has the same need, but it
> implements it differently. It simply has StopWhenUnneeded=yes.
This commit switches suspend.target etc. to the approach used by
sleep.target.
Since hotplugs happen as soon as udevd is started, there is not much sense
in giving udev-trigger an After= dependency on any service. The device
could be hotplugged before coldplug starts.
This is intended to avoid the race window where we create the hwdb with
the wrong selinux context (then fix it up afterwards).
https://github.com/systemd/systemd/issues/3458#issuecomment-322444107
> Note that console-getty.service as more uses than just containers. The
> idea is that it may be used as alternative to the whole VC/logind stuff,
> if all you need is a console on /dev/console, even on physical devices.
This means we want to remove RestartSec=0, for serial systems.
See 4bf0432 "units/serial-getty@.service: use the default RestartSec".
The traditional runlevel 1 is "single user mode", and shuts down all but
the main console. In systemd, rescue.target provides runlevel1.target.
But it did not shut down logins on secondary consoles... if systemd was
running in a container.
I don't think we strictly need to change this. But when you look at both
container-getty@.service and getty@.service, you see that both have
IgnoreOnIsolate, but only the latter has Conflicts=rescue.service.
This also makes rescue.target in a container consistent with
emergency.target. In the latter case, the gettys were already stopped,
because they have a Requires dependency on sysinit.target.
Currently we have 4 getty services. 1 has a BindsTo dependency on a
device unit. 3 have ConditionPathExists, but the reason is different in
every single one.
* Add comment to console-getty@.service (see commit 1b41981d)
* getty@.service is already commented
* container-getty.service is not strictly correct, as I realized while
trying to compose a comment. Reported as #6584.
* Containers don't use serial-getty@console.service,
they use console-getty.service instead, and suppress
scanning for kernel or virtualizer consoles.
* Nowadays gettys are started on *all* configured kernel consoles.
* except for the line printer console, because that's not a tty.
(Seriously. Search CONFIG_LP_CONSOLE).
* Revert "modprobe.d: ship drop-in to set bonding max_bonds to 0 (#6448)"
This reverts commit 582faeb461.
* Revert "units: set ConditionVirtualization=!private-users on journald audit socket (#6508)"
This reverts commit d2a1ba103b.
Some kdbus_flag and memfd related parts are left behind, because they
are entangled with the "legacy" dbus support.
test-bus-benchmark is switched to "manual". It was already broken before
(in the non-kdbus mode) but apparently nobody noticed. Hopefully it can
be fixed later.
Since busname units are only useful with kdbus, they weren't actively
used. This was dead code, only compile-tested. If busname units are
ever added back, it'll be cleaner to start from scratch (possibly reverting
parts of this patch).
Currently we set 4096 as maximum for number of stream connections that
we accept. However maximum number of file descriptors that systemd is
willing to accept from us is just 1024. This means we can't retain all
stream connections that we accepted. Hence bump the limit of fds in a
unit file so that systemd holds open all stream fds while we are
restarted.
New limit is set to 4224 (4096 + 128).
Make agetty started by *getty* units pass '-p' option to "login", so it
doesn't clear the environment and passes whatever was setup by systemd
to shells. This is needed especially for programs which are specified as
user shells, but won't read locale settings from anywhere but
environment.
[zj: cherry-pick just the second patch from the series, see discussion
on the pull request.]
In the initial design, foobar-wait-online.service would have
Requisite=foobar.service, so that foobar-wait-online.service could be enabled
unconditionally, irrespective of whether foobar.service itself is enabled.
Unfortunately this doesn't work too well:
1. the message about foobar-wait-online.service being skipped because of a
"missing dependency" *looks* like an is problem. This is mostly cosmetic,
but it also quite confusing. We generally don't want any messages of this
type during default boot.
2. it is impossible to start and wait for the network in an
implementation-agnostic way: systemctl start network-online.target, or
Wants/After=network-online.target in a unit don't work because pulling in
network-online.target pulls in foobar-wait-online.service, but it in turn
does not pull in foobar.service. During startup, foobar.service is pulled in
by multi-user.target, but not in a smaller transaction which does not
include multi-user.target.
This change means that *-wait-online.service should be installed through
presets, so that it can be enabled/disabled at will by the administrator.
Our own systemd-networkd-wait-online.service does this already, and
similar change has been requested for NetworkManager-wait-online.service
(https://bugzilla.redhat.com/show_bug.cgi?id=1455704).
This change should by mostly backwards-compatible, unless somebody has some
wait-online.service enabled, without having the corresponding network
implementation enabled, and they are relying on it not being started. I think
that's relatively unlikely because of issue 1. above, and I'm not aware of this
being the default in any distro. And being able to start the network in an
implementation-agnostic way is pretty important, see
https://bugzilla.redhat.com/show_bug.cgi?id=1452866.
/var can be on a remote filesystem, thus hooking it to local-fs.target is not correct.
Also, only install the mount unit when machined is enabled, because
machined is the one managing the underlying device, and thus makes no
sense without machined.
Fixes#1175
This patch ensures that session devices are saved for each session.
In order to make the revokation logic work when logind is restarted, the
session devices are now saved in the session state files and their respective
file descriptors sent to PID1's fdstore in order to keep them open accross
restart.
This is mandatory in order to keep the revokation logic working. Indeed in case
of input-devices, the same file descriptors must be shared by logind and a
given session controller in order EVIOCREVOKE to work otherwise multiple
sessions can have device access in parallel.
This should be the only remaining and missing piece for making logind fully
restartable.
Fixes: #1163
Using conf.set() with a boolean argument does the right thing:
either #ifdef or #undef. This means that conf.set can be used unconditionally.
Previously I used '1' as the placeholder value, and that needs to be changed to
'true' for consistency (under meson 1 cannot be used in boolean context). All
checks need to be adjusted.
Shell scripts should be executable so that meson reports their
invocation succinctly (does not print 'sh' '-e').
Python scripts should not be executable so that meson does the
detection of the right python binary itself.
Add -u everywhere to catch potential errors.
The indentation for emacs'es meson-mode is added .dir-locals.
All files are reindented automatically, using the lasest meson-mode from git.
Indentation should now be fairly consistent.
Ideally, we would chain the m4 processing, .in substitutions, and file
installation so that the commands don't have to be repeated. Unfortunately
this does not seem currently possible, because custom_target() output cannot
be fed into install_data(), so it's necessary to use the 'install',
'install_dir' arguments to control installation. Nevertheless, rework the
rules to repeat less stuff and unify handling of conditions between the
different file types.
This is the equivalent of $(INSTALL_DIRS) and install-touch-usr-hook.
I did not bother to create the directories into which we install files,
since they will be created anyway.
v2:
- remove bashism
This is the equivalent of $(SYSTEM_UNIT_ALIASES) and $(GENERAL_ALIASES)
in Makefile.am.
ninja-build uninstall does not remove the symlinks, see
https://github.com/mesonbuild/meson/issues/1602.
I don't consider this a blocker: after all either one installs into $DESTDIR,
where uninstallation doesn't make much sense, or into a real system, where a
successfull uninstallation would likely destroy the system.
v2:
- remove bashisms
- add various forgotten symlinks and fix service/timer/target confusions
It's crucial that we can build systemd using VS2010!
... er, wait, no, that's not the official reason. We need to shed old systems
by requring python 3! Oh, no, it's something else. Maybe we need to throw out
345 years of knowlege accumulated in autotools? Whatever, this new thing is
cool and shiny, let's use it.
This is not complete, I'm throwing it out here for your amusement and critique.
- rules for sd-boot are missing. Those might be quite complicated.
- rules for tests are missing too. Those are probably quite simple and
repetitive, but there's lots of them.
- it's likely that I didn't get all the conditions right, I only tested "full"
compilation where most deps are provided and nothing is disabled.
- busname.target and all .busname units are skipped on purpose.
Otherwise, installation into $DESTDIR has the same list of files and the
autoconf install, except for .la files.
It'd be great if people had a careful look at all the library linking options.
I added stuff until things compiled, and in the end there's much less linking
then in the old system. But it seems that there's still a lot of unnecessary
deps.
meson has a `shared_module` statement, which sounds like something appropriate
for our nss and pam modules. Unfortunately, I couldn't get it to work. For the
nss modules, we need an .so version of '2', but `shared_module` disallows the
version argument. For the pam module, it also didn't work, I forgot the reason.
The handling of .m4 and .in and .m4.in files is rather awkward. It's likely
that this could be simplified. If make support is ever dropped, I think it'd
make sense to switch to a different templating system so that two different
languages and not required, which would make everything simpler yet.
v2:
- use get_pkgconfig_variable
- use sh not bash
- use add_project_arguments
v3:
- drop required:true and fix progs/prog typo
v4:
- use find_library('bz2')
- add TTY_GID definition
- define __SANE_USERSPACE_TYPES__
- use join_paths(prefix, ...) is used on all paths to make them all absolute
v5:
- replace all declare_dependency's with []
- add more conf.get guards around optional components
v6:
- drop -pipe, -Wall which are the default in meson
- use compiler.has_function() and compiler.has_header_symbol instead of the
hand-rolled checks.
- fix duplication in 'liblibsystemd' library name
- use the right .sym file for pam_systemd
- rename 'compiler' to 'cc': shorter, and more idiomatic.
v7:
- use ENABLE_ENVIRONMENT_D not HAVE_ENVIRONMENT_D
- rename prefix to prefixdir, rootprefix to rootprefixdir
("prefix" is too common of a name and too easy to overwrite by mistake)
- wrap more stuff with conf.get('ENABLE...') == 1
- use rootprefix=='/' and rootbindir as install_dir, to fix paths under
split-usr==true.
v8:
- use .split() also for src/coredump. Now everything is consistent ;)
- add rootlibdir option and use it on the libraries that require it
v9:
- indentation
v10:
- fix check for qrencode and libaudit
v11:
- unify handling of executable paths, provide options for all progs
This makes the meson build behave slightly differently than the
autoconf-based one, because we always first try to find the executable in the
filesystem, and fall back to the default. I think different handling of
loadkeys, setfont, and telinit was just a historical accident.
In addition to checking in $PATH, also check /usr/sbin/, /sbin for programs.
In Fedora $PATH includes /usr/sbin, (and /sbin is is a symlink to /usr/sbin),
but in Debian, those directories are not included in the path.
C.f. https://github.com/mesonbuild/meson/issues/1576.
- call all the options 'xxx-path' for clarity.
- sort man/rules/meson.build properly so it's stable
systemd-resolved provides
1. local API via NSS and D-Bus
2. kind of a local "DNS proxy" through its stub listener
The 1st item should be started before nss-lookup.target.
The 2nd item should be started before network-online.target,
because if the networking works in general, then DNS (and DNS proxy) should too.
Fixes#5650
This makes dbus-org.freedesktop.network1.service like dbus-org.freedesktop.resolve1.service.
When systemd-networkd.service is disabled, the alias is also removed.
The commit c7fb922d62 prohibits
journal-upload to save its state in /var/lib/systemd/journal-upload/state,
thus the daemon fails and outputs the following error message even if
the directory is not read-only file system
```Cannot save state to /var/lib/systemd/journal-upload/state: Read-only file system```
This commit adds the permission the daemon to write the state file.
Creating quota on an iscsi device is causing dependency loops at next reboot.
Reason is that systemd-quotacheck and quotaon.service are ordered before
local-fs.target and quota enabled mounts have a before dependency to them.
This cannot work for _netdev mounts, because network activation is ordered
after local-fs.target.
Moving the Before dependency for systemd-quotacheck and quotaon.service
to remote-fs.target fixes this.
Commit 5ed020d8d1 already fixed this issue for
getty@.service but forgot serial console.
Note that this is not needed for emergency target as the sysinit target
conflicts against this target already.
In 58a6dd1558 s-n-wait-online.service was added
to presets to synchronize the presets with the state after installation. But it
is harmful to have s-n-wait-online.service enabled when s-n.service is
disabled, because s-n-wait-online.service has Requsite=s-n.service and cannot
be activated. Thus remove s-n-wait-online.service from presets again, and let
it be enabled whenever s-n.service is enabled.
During installation we create enablement symlinks by hand, and since s-n.service
is enabled, s-n-w-o.service should be enabled too, so the symlink should still
be created during installation.
https://bugzilla.redhat.com/show_bug.cgi?id=1433459#c15
The emergency.service and rescue.service units have become rather
convoluted. We spawn multiple shells and the help text spans multiple lines
which makes the units hard to read.
Move the logic into a single shell script and call that via ExecStart.
Ideally, plymouth should only be referenced via dependencies,
not ExecStartPre's. This at least avoids the confusing error message
on minimal installations that do not carry plymouth.
The change:
-/usr/lib/systemd/system/dbus-org.freedesktop.resolve1.service
+/etc/systemd/system/dbus-org.freedesktop.resolve1.service
If resolved is disabled, without this, talking to the resolved bus API will
activate it regardless whether it is enabled or not, let's fix that.
Basically, we turn it on for most long-running services, with the
exception of machined (whose child processes need to join containers
here and there), and importd (which sandboxes tar in a CLONE_NEWNET
namespace). machined is left unrestricted, and importd is restricted to
use only "net"
When the service is run in the initramfs, it is possible for it to get started
and not be fast enough to exit before the root switch happens. It is started
multiple times (depending on the consoles being detected), and runs
asynchronously, so this is quite likely. It'll then get killed by killall(),
and systemd will consider the service failed. To avoid all this, just wait
for the service to terminate on it's own.
Before=initrd-switch-root.target should be good for the initramfs, and
Before=shutdown.tuarget should be good for the real system, although it's
unlikely to make any difference there.
The service already has DefaultDeps disabled, so systemd should not try to stop
it. And if it *does* get stopped, we don't want the zombie process around.
KillMode=none does not change anything in the killall() phase, and we already
use argv[0][0] = '@' to protect against that anyway. KillMode=none should not
be useful in normal operation, so let's leave it out.
The service is supposed to regenerate the catalog index whenever /usr is
updated, but /var is not. Hence the ConditionNeedsUpdate= line should
actually reference /var, as that's where the index file is located.
This adds support for a new kernel command line option "systemd.volatile=" that
provides the same functionality that systemd-nspawn's --volatile= switch
provides, but for host systems (i.e. systems booting with a kernel).
It takes the same parameter and has the same effect.
In order to implement systemd.volatile=yes a new service
systemd-volatile-root.service is introduced that only runs in the initrd and
rearranges the root directory as needed to become a tmpfs instance. Note that
systemd.volatile=state is implemented different: it simply generates a
var.mount unit file that is part of the normal boot and has no effect on the
initrd execution.
The way this is implemented ensures that other explicit configuration for /var
can always override the effect of these options. Specifically, the var.mount
unit is generated in the "late" generator directory, so that it only is in
effect if nothing else overrides it.
Note: the name is "system-update-cleanup.service" rather than
"system-update-done.service", because it should not run normally, and also
because there's already "systemd-update-done.service", and having them named
so similarly would be confusing.
In https://bugzilla.redhat.com/show_bug.cgi?id=1395686 the system repeatedly
entered system-update.target on boot. Because of a packaging issue, the tool
that created the /system-update symlink could be installed without the service
unit that was supposed to perform the upgrade (and remove the symlink). In
fact, if there are no units in system-update.target, and /system-update symlink
is created, systemd always "hangs" in system-update.target. This is confusing
for users, because there's no feedback what is happening, and fixing this
requires starting an emergency shell somehow, and also knowing that the symlink
must be removed. We should be more resilient in this case, and remove the
symlink automatically ourselves, if there are no upgrade service to handle it.
This adds a service which is started after system-update.target is reached and
the symlink still exists. It nukes the symlink and reboots the machine. It
should subsequently boot into the default default.target.
This is a more general fix for
https://bugzilla.redhat.com/show_bug.cgi?id=1395686 (the packaging issue was
already fixed).
- use "service" instead of "script", because various offline updaters that we have
aren't really scripts, e.g. dnf-plugin-system-upgrade, packagekit-offline-update,
fwupd-offline-update.
- strongly recommend After=sysinit.target, Wants=sysinit.target
- clarify a bit what should happen when multiple update services are started
- replace links to the wiki with refs to the man page that replaced it.
This is a different way to implement the fix proposed by commit
a4021390fe suggested by Lennart Poettering.
In this patch we instruct PID1 to not kill "systemctl switch-root" command
started by initrd-switch-root service using the "argv[0][0]='@'" trick.
See: https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/ for
more details.
We had to backup argv[0] because argv is modified by dispatch_verb().
With the previous improvements, networkd.service's "After=dbus.service" can now
be dropped. That ordering effectively forced networkd.service to run in late
boot only (dbus.service was rejected to run in early boot in
https://bugs.freedesktop.org/show_bug.cgi?id=98254).
Fixes#4504
Since commit 1f0958f640, systemd considers SIGTERM for short-running
services (aka Type=oneshot) as a failure.
This can be an issue with initrd-switch-root.service as the command run by this
service (in order to switch to the new rootfs) may still be running when
systemd does the switch.
However PID1 sends SIGTERM to all remaining processes right before
switching and initrd-switch-root.service can be one of those.
After systemd is reexecuted and its previous state is deserialized, systemd
notices that initrd-switch-root.service was killed with SIGTERM and considers
this as a failure which leads to the emergency shell.
To prevent this, this patch teaches systemd to consider a SIGTERM exit as a
clean one for this service.
It also removes "KillMode=none" since this is pretty useless as the service is
never stopped by systemd but it either exits normally or it's killed by a
SIGTERM as described previously.
The mount fails, even though CAP_SYS_ADMIN is granted.
Only file systems with FU_USERNS_MOUNT in .fs_flags may be mounted in userns,
and the patch to add that fusectl was rejected [1]. It would be nice if we
could check if the kernel has FU_USERNS_MOUNT for a given fs type, since this
could change over time, but this information doesn't seem to be exported.
So let's just skip this mount in userns to avoid an error during boot.
[1] https://patchwork.kernel.org/patch/2828269/
Since this unit is synthesized anyway there's no point in actually shipping it
on disk. This also has the benefit that "cd /usr/lib/systemd/system ; ls *"
won't be confused by the leading dash of the file name anymore.
This simply changes this line:
ConditionPathIsReadWrite=/proc/sys/
to this:
ConditionPathIsReadWrite=/proc/sys/net/
The background for this is that the latter is namespaced through network
namespacing usually and hence frequently set as writable in containers, even
though the former is kept read-only. If /proc/sys is read-only but
/proc/sys/net is writable we should run the sysctl service, as useful settings
may be made in this case.
Fixes: #4370
By default all user and all system services get stop timeouts for 90s. This is
problematic as the user manager of course is run as system service. Thus, if
the default time-out is hit for any user service, then it will also be hit for
user@.service as a whole, thus making the whole concept useless for user
services.
This patch extends the stop timeout to 120s for user@.service hence, so that
that the user service manager has ample time to process user services timing
out.
(The other option would have been to shorten the default user service timeout,
but I think a user service should get the same timeout by default as a system
service)
Fixes: #4206
`systemctl isolate initrd-switch-root.target` called by initrd-cleanup.service
kills initrd-cleanup.service itself. Then, initrd-cleanup.service failed and
system goes to emergency shell.
To prevent this problem, this commit adds `Wants=initrd-cleanup.service` to
initrd-switch-root.target.
fixes: #4343.
console-shell.service was supposed to be useful for normal clean boots
(i.e. multi-user.target or so), as a replacement for logind/getty@.service for
simpler use cases.
But due to the lack of documentation and sanity check one can easily be
confused and enable this service in // with getty@.service.
In this case we end up with both services sharing the same tty which ends up in
strange results.
Even worse, console-shell.service might be failing while getty@.service tries
to acquire the terminal which ends up in the system to poweroff since
console-shell.service uses:
"ExecStopPost=-/usr/bin/systemctl poweroff".
Another issue: this service doesn't work well if plymouth is also used since it
lets the splash screen program run and mess the tty (at least a "plymouth quit"
is missing).
So let's kill it for now.
Let's make this an excercise in dogfooding: let's turn on more security
features for all our long-running services.
Specifically:
- Turn on RestrictRealtime=yes for all of them
- Turn on ProtectKernelTunables=yes and ProtectControlGroups=yes for most of
them
- Turn on RestrictAddressFamilies= for all of them, but different sets of
address families for each
Also, always order settings in the unit files, that the various sandboxing
features are close together.
Add a couple of missing, older settings for a numbre of unit files.
Note that this change turns off AF_INET/AF_INET6 from udevd, thus effectively
turning of networking from udev rule commands. Since this might break stuff
(that is already broken I'd argue) this is documented in NEWS.
Mere presence of the socket in the filesystem makes
udev_queue_get_udev_is_active() return that udev is running. Note that,
udev on exit doesn't unlink control socket nor does systemd. Thus socket
stays around even when both daemon and socket are stopped. This causes
problems for cryptsetup because when it detects running udev it launches
synchronous operations that *really* require udev. This in turn may
cause blocking and subsequent timeout in systemd-cryptsetup on reboot
while machine is in a state that udev and its control socket units are
stopped, e.g. emergency mode.
Fixes#2477
This was causing preset-all --global to create symlinks:
$ systemctl preset-all --global --root=/var/tmp/inst1
Created symlink /var/tmp/inst1/etc/systemd/user/shutdown.target → /usr/lib/systemd/user/../system/shutdown.target.
Created symlink /var/tmp/inst1/etc/systemd/user/sockets.target → /usr/lib/systemd/user/../system/sockets.target.
Created symlink /var/tmp/inst1/etc/systemd/user/timers.target → /usr/lib/systemd/user/../system/timers.target.
Created symlink /var/tmp/inst1/etc/systemd/user/paths.target → /usr/lib/systemd/user/../system/paths.target.
Created symlink /var/tmp/inst1/etc/systemd/user/bluetooth.target → /usr/lib/systemd/user/../system/bluetooth.target.
Created symlink /var/tmp/inst1/etc/systemd/user/printer.target → /usr/lib/systemd/user/../system/printer.target.
Created symlink /var/tmp/inst1/etc/systemd/user/sound.target → /usr/lib/systemd/user/../system/sound.target.
Created symlink /var/tmp/inst1/etc/systemd/user/smartcard.target → /usr/lib/systemd/user/../system/smartcard.target.
Created symlink /var/tmp/inst1/etc/systemd/user/busnames.target → /usr/lib/systemd/user/../system/busnames.target.
It is better to create units in a state that completely matches the presets, i.e.
preset-all should do nothing when invoked immediately after installation.
I'm sure it was confusing to users too, suggesting that system and user units
may somehow alias each other.
This complements graphical-session.target for services which set up the
environment (e. g. dbus-update-activation-environment) and need to run before
the actual graphical session.
Udev rules cover all the necessary initializations.
As the service now is neither installed, nor installable - we can
remove explicit dependencies and RemainAfterExit=yes option.
This unit acts as a dynamic "alias" target for any concrete graphical user
session like gnome-session.target; these should declare
"BindsTo=graphical-session.target" so that both targets stop and start at the
same time.
This allows services that run in a particular graphical user session (e. g.
gnome-settings-daemon.service) to declare "PartOf=graphical-session.target"
without having to know or get updated for all/new session types. This will
ensure that stopping the graphical session will stop all services which are
associated to it.
If user isolates rescue target from multi-user or graphical target (or just
starts the service), IgnoreOnIsolate will cause issues with sulogin which is
directly started on current virtual console. This patch adds necessary
Conflicts= and Before= against rescue.service.
Note that this is not needed for emergency target, as implicit Requires= and
After= against sysinit.target is in effect for this service
(DefaultDependencies=yes).
https://github.com/systemd/systemd/pull/3685 introduced
/run/systemd/inaccessible/{chr,blk} to map inacessible devices,
this patch allows systemd running inside a nspawn container to create
/run/systemd/inaccessible/{chr,blk}.
When a container scope is allocated via machined it gets 16K set already since
cf7d1a30e4. Make sure when a container is run as
system service it gets the same values.
udevd already limits its number of workers/children: the max number is actually
twice the number of CPUs the system is using.
(The limit can also be raised with udev.children-max= kernel command line
option BTW).
On some servers, this limit can easily exceed the maximum number of tasks that
systemd put on all services, which is 512 by default.
Since udevd has already its limitation logic, simply disable the static
limitation done by TasksMax.
Specifically "machinectl shell" (or its OpenShell() bus call) is implemented by
entering the file system namespace of the container and opening a TTY there.
In order to enter the file system namespace, chroot() is required, which is
filtered by SystemCallFilter='s @mount group. Hence, let's make this work again
and drop @mount from the filter list.
Quoting @cgwalters:
Just uploading this as an RFC. Now I know reading the code that systemd says
`Welcome to $OS` as a generic thing, but my initial impression on seeing this
was that it was almost sarcastic =)
Let's say "You are in emergency mode" as a more neutral/less excited phrase.
This patch is based on #3556, but makes the same change for rescue mode.
In order to improve compatibility with local clients that speak DNS directly
(and do not use NSS or our bus API) listen locally on 127.0.0.53:53 and process
any queries made that way.
Note that resolved does not implement a full DNS server on this port, but
simply enough to allow normal, local clients to resolve RRs through resolved.
Specifically it does not implement queries without the RD bit set (these are
requests where recursive lookups are explicitly disabled), and neither queries
with DNSSEC DO set in combination with DNSSEC CD (i.e. DNSSEC lookups with
validation turned off). It also refuses zone transfers and obsolete RR types.
All lookups done this way will be rejected with a clean error code, so that the
client side can repeat the query with a reduced feature set.
The code will set the DNSSEC AD flag however, depending on whether the data
resolved has been validated (or comes from a local, trusted source).
Lookups made via this mechanisms are propagated to LLMNR and mDNS as necessary,
but this is only partially useful as DNS packets cannot carry IP scope data
(i.e. the ifindex), and hence link-local addresses returned cannot be used
properly (and given that LLMNR/mDNS are mostly about link-local communication
this is quite a limitation). Also, given that DNS tends to use IDNA for
non-ASCII names, while LLMNR/mDNS uses UTF-8 lookups cannot be mapped 1:1.
In general this should improve compatibility with clients bypassing NSS but
it is highly recommended for clients to instead use NSS or our native bus API.
This patch also beefs up the DnsStream logic, as it reuses the code for local
TCP listening. DnsStream now provides proper reference counting for its
objects.
In order to avoid feedback loops resolved will no silently ignore 127.0.0.53
specified as DNS server when reading configuration.
resolved listens on 127.0.0.53:53 instead of 127.0.0.1:53 in order to leave
the latter free for local, external DNS servers or forwarders.
This also changes the "etc.conf" tmpfiles snippet to create a symlink from
/etc/resolv.conf to /usr/lib/systemd/resolv.conf by default, thus making this
stub the default mode of operation if /etc is not populated.
Add a line
SystemCallFilter=~@clock @module @mount @obsolete @raw-io ptrace
for daemons shipped by systemd. As an exception, systemd-timesyncd
needs @clock system calls and systemd-localed is not privileged.
ptrace(2) is blocked to prevent seccomp escapes.
In the same vein as commit ac59f0c12c which added
the --wait option to the emergency service, this patch makes sure that plymouth
has exited before entering into the rescue mode.
In order to support stateless systems that support offline /usr updates
properly, let's restore the ConditionNeesUpdate=/etc line that makes sure we
are run when /usr is updated and this update needs to be propagated to the
/etc/ld.so.conf file stored in /etc.
This reverts part of #2859, which snuck this change in, but really shouldn't
have.
Add a synchronization point so that custom initramfs units can run
after the root device becomes available, before it is fsck'd and
mounted.
This is useful for custom initramfs units that may modify the
root disk partition table, where the root device is not known in
advance (it's dynamically selected by the generators).
When enabling ForwardToSyslog=yes, the syslog.socket is active when entering
emergency mode. Any log message then triggers the start of rsyslog.service (or
other implementation) along with its dependencies such as local-fs.target and
sysinit.target. As these might fail themselves (e. g. faulty /etc/fstab), this
breaks the emergency mode.
This causes syslog.socket to fail with "Failed to queue service startup job:
Transition is destructive".
Add Conflicts=syslog.socket to emergency.service to make sure the socket is
stopped when emergency.service is started.
Fixes#266
Container images from Debian or suchlike contain device nodes in /dev. Let's
make sure we can clone them properly, hence pass CAP_MKNOD to machined.
Fixes: #2867#465
That way we can be sure that local users are logged out before the network is
shut down when the system goes down, so that SSH session should be ending
cleanly before the system goes down.
Fixes: #2390
With the current "Type=forking", systemd tries to guess the PID it
should wait on at reboot (because we have no "PIDFile="). Depending on
how wrong the guess is, we can end up hanging forever at reboot.
Asking it not to do that eliminates the problem.
Also drop ConditionNeedsUpdate=|/etc. Regardless if system is updated
online or offline, updating dynamic loader cache should always be
responsibility of packaging tools/scripts.
When using `%I` for instances of `systemd-nspawn@.service`, the result
will be `systemd-nspawn` trying to launch a container named e.g.
`fedora/23` instead of `fedora-23`.
Using `%i` instead prevents escaping `-` in a container name and uses
the unmodified container name from the machine store.
This commit rips out systemd-bootchart. It will be given a new home, outside
of the systemd repository. The code itself isn't actually specific to
systemd and can be used without systemd even, so let's put it somewhere
else.
As kdbus won't land in the anticipated way, the bus-proxy is not needed in
its current form. It can be resurrected at any time thanks to the history,
but for now, let's remove it from the sources. If we'll have a similar tool
in the future, it will look quite differently anyway.
Note that stdio-bridge is still available. It was restored from a version
prior to f252ff17, and refactored to make use of the current APIs.
This reworks the coredumping logic so that the coredump handler invoked from the kernel only collects runtime data
about the crashed process, and then submits it for processing to a socket-activate coredump service, which extracts a
stacktrace and writes the coredump to disk.
This has a number of benefits: the disk IO and stack trace generation may take a substantial amount of resources, and
hence should better be managed by PID 1, so that resource management applies. This patch uses RuntimeMaxSec=, Nice=, OOMScoreAdjust=
and various sandboxing settings to ensure that the coredump handler doesn't take away unbounded resources from normally
priorized processes.
This logic is also nice since this makes sure the coredump processing and storage is delayed correctly until
/var/systemd/coredump is mounted and writable.
Fixes: #2286
Now that requiring of a masked unit results in failure again, downgrade the dependency on /tmp to Wants= again, so that
our suggested way to disable /tmp-on-tmpfs by masking doesn't result in a failing boot.
References: #2315
The user manager is still limited by its parent slice user-UID.slice,
which defaults to 4096 tasks. However, it no longer has an additional
limit of 512 tasks.
Fixes#1955.
Apparently, util-linux' mount command implicitly drops the smack-related
options anyway before passing them to the kernel, if the kernel doesn't
know SMACK, hence there's no point in duplicating this in systemd.
Fixes#1696
Otherwise we might run into deadlocks, when journald blocks on the
notify socket on PID 1, and PID 1 blocks on IPC to dbus-daemon and
dbus-daemon blocks on logging to journald. Break this cycle by making
sure that journald never ever blocks on PID 1.
Note that this change disables support for event loop watchdog support,
as these messages are sent in blocking style by sd-event. That should
not be a big loss though, as people reported frequent problems with the
watchdog hitting journald on excessively slow IO.
Fixes: #1505.
If SMACK is enabled, 'smackfsroot=*' option should be specified when
/tmp is mounted since many non-root processes use /tmp for temporary
usage. If not, /tmp is labeled as '_' and smack denial occurs when
writing.
In order to do that, 'SmackFileSystemRoot=*' is newly added into
tmp.mount.
This reverts commit 409c2a13fd.
It breaks the bootup of systems which enable smack at compile time, but have no
smack enabled in the kernel. This needs a different solution.
If SMACK is enabled, 'smackfsroot=*' option should be specified in
tmp.mount file since many non-root processes use /tmp for temporary
usage. If not, /tmp is labeled as '_' and smack denial occurs when
writing.
Usually we try to properly uppercase first characters in the
description, do so here, too. Also, keep it close to the string used in
systemd-networkd.service.
With this rework we introduce systemd-rfkill.service as singleton that
is activated via systemd-rfkill.socket that listens on /dev/rfkill. That
way, we get notified each time a new rfkill device shows up or changes
state, in which case we restore and save its current setting to disk.
This is nicer than the previous logic, as this means we save/restore
state even of rfkill devices that are around only intermittently, and
save/restore the state even if the system is shutdown abruptly instead
of cleanly.
This implements what I suggested in #1019 and obsoletes it.
And remove machine-id-commit as separate binary.
There's really no point in keeping this separate, as the sources are
pretty much identical, and have pretty identical interfaces. Let's unify
this in one binary.
Given that machine-id-commit was a private binary of systemd (shipped in
/usr/lib/) removing the tool is not an API break.
While we are at it, improve the documentation of the command substantially.
Apparently, disk IO issues are more frequent than we hope, and 1min
waiting for disk IO happens, so let's increase the watchdog timeout a
bit, for all our services.
See #1353 for an example where this triggers.
When a systemd service running in a container exits with a non-zero
code, it can be useful to terminate the container immediately and get
the exit code back to the host, when systemd-nspawn returns. This was
not possible to do. This patch adds the following to make it possible:
- Add a read-only "ExitCode" property on PID 1's "Manager" bus object.
By default, it is 0 so the behaviour stays the same as previously.
- Add a method "SetExitCode" on the same object. The method fails when
called on baremetal: it is only allowed in containers or in user
session.
- Add support in systemctl to call "systemctl exit 42". It reuses the
existing code for user session.
- Add exit.target and systemd-exit.service to the system instance.
- Change main() to actually call systemd-shutdown to exit() with the
correct value.
- Add verb 'exit' in systemd-shutdown with parameter --exit-code
- Update systemctl manpage.
I used the following to test it:
| $ sudo rkt --debug --insecure-skip-verify run \
| --mds-register=false --local docker://busybox \
| --exec=/bin/chroot -- /proc/1/root \
| systemctl --force exit 42
| ...
| Container rkt-895a0cba-5c66-4fa5-831c-e3f8ddc5810d failed with error code 42.
| $ echo $?
| 42
Fixes https://github.com/systemd/systemd/issues/1290
The bus-proxy manages the kdbus connections of all users on the system
(regarding the system bus), hence, it needs an elevated NOFILE.
Otherwise, a single user can trigger ENFILE by opening NOFILE connections
to the bus-proxy.
Note that the bus-proxy still does per-user accounting, indirectly via
the proxy/fake API of kdbus. Hence, the effective per-user limit is not
raised by this. However, we now prevent one user from consuming the whole
FD limit of the shared proxy.
Also note that there is no *perfect* way to set this. The proxy is a
shared object, so it needs a larger NOFILE limit than the highest limit
of all users. This limit can be changed dynamically, though. Hence, we
cannot protect against it. However, a raised NOFILE limit is a privilege,
so we just treat it as such and basically allow these privileged users to
be able to consume more resources than normal users (and, maybe, cause
some limits to be exceeded by this).
Right now, kdbus hard-codes 1024 max connections per user on each bus.
However, we *must not* rely on this. This limits could be easily dropped
entirely, as the NOFILE limit is a suitable limit on its on.
Make sure we support ExecReload= for bus-proxyd to reload configuration
during runtime. This is *really* handy when hacking on kdbus.
Package-managers are still recommended to run
`busctl --address=unix:path=` directly.
This drops the libsystemd-terminal and systemd-consoled code for various
reasons:
* It's been sitting there unfinished for over a year now and won't get
finished any time soon.
* Since its initial creation, several parts need significant rework: The
input handling should be replaced with the now commonly used libinput,
the drm accessors should coordinate the handling of mode-object
hotplugging (including split connectors) with other DRM users, and the
internal library users should be converted to sd-device and friends.
* There is still significant kernel work required before sd-console is
really useful. This includes, but is not limited to, simpledrm and
drmlog.
* The authority daemon is needed before all this code can be used for
real. And this will definitely take a lot more time to get done as
no-one else is currently working on this, but me.
* kdbus maintenance has taken up way more time than I thought and it has
much higher priority. I don't see me spending much time on the
terminal code in the near future.
If anyone intends to hack on this, please feel free to contact me. I'll
gladly help you out with any issues. Once kdbus and authorityd are
finished (whenever that will be..) I'll definitely pick this up again. But
until then, lets reduce compile times and maintenance efforts on this code
and drop it for now.
Otherwise copying full directory trees between container and host won't
work, as we cannot access some fiels and cannot adjust the ownership
properly on the destination.
Of course, adding these many caps to the daemon kinda defeats the
purpose of the caps lock-down... but well...
Fixes#433
Merely calling "plymouth quit" isn't sufficient, as plymouth needs some time to
shut down. This needs plymouth --wait (which is a no-op when it's not running).
Fixes invisible emergency shell with plymouth running endlessly.
https://launchpad.net/bugs/1471258
./configure --enable/disable-kdbus can be used to set the default
behavior regarding kdbus.
If no kdbus kernel support is available, dbus-dameon will be used.
With --enable-kdbus, the kernel command line option "kdbus=0" can
be used to disable kdbus.
With --disable-kdbus, the kernel command line option "kdbus=1" is
required to enable kdbus support.
The daemon requires the busname unit to operate (on kdbus systems),
since it contains the policy that allows it to acquire its service
name.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=90287
This way we know that any bridges and other user-created network devices
are in place, and can be properly added to the container.
In the long run this should be dropped, and replaced by direct calls
inside nspawn that cause the devices to be created when necessary.
For a longer discussion see this:
http://lists.freedesktop.org/archives/systemd-devel/2015-April/030175.html
This introduces /run/systemd/fsck.progress as a simply
AF_UNIX/SOCK_STREAM socket. If it exists and is connectable we'll
connect fsck's -c switch with it. If external programs want to get
progress data they should hence listen on this socket and will get
all they need via that socket. To get information about the connecting
fsck client they should use SO_PEERCRED.
Unless /run/systemd/fsck.progress is around and connectable this change
reverts back to v219 behaviour where we'd forward fsck output to
/dev/console on our own.
Even trivial service occasionally get stuck, for example when
there's a problem with the journal. There's nothing more annoying
that looking at the cylon eye for a job with an infinite timeout.
Use standard 90s for jobs that do some work, and 30s for those which
should be almost instantenous.
Not that all functionality has been ported over to logind, the old
implementation can be removed. There goes one of the oldest parts of
the systemd code base.
Fedora's filesystem package ships /usr/bin (and other directories) which are
not writable by its owner. machinectl pull-dkr (and possibly others) are not
able to extract those:
14182 mkdirat(3, "usr", 0700) = 0
14182 mkdirat(3, "usr/bin", 0500) = 0
14182 openat(3, "usr/bin/[", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0700) = -1 EACCES (Permission denied)
...
We support /var, /tmp and /var/tmp on NFS. NFS shares however are by
default ordered only before remote-fs.target which is a late-boot
service. /var, /tmp, /var/tmp need to be around earlier though, hence
explicitly order them before basic.target.
Note that this change simply makes explicit what was implicit before,
since many early-boot services pulled in parts of /var anyway early.
Create minimal image which runs systemd
FROM rhel7.1
RUN yum install -y /usr/bin/ps
ENV container docker
CMD [ "/usr/sbin/init" ]
When you run the container without -t, the process
/sbin/agetty --noclear --keep-baud console 115200 38400 9600
is not happy and checking the journal in the container, there is a stream of
Mar 13 04:50:15 11bf07f59fff agetty[66]: /dev/console: No such file or directory
Mar 13 04:50:25 11bf07f59fff systemd[1]: console-getty.service holdoff time over, scheduling restart.
Mar 13 04:50:25 11bf07f59fff systemd[1]: Stopping Console Getty...
Mar 13 04:50:25 11bf07f59fff systemd[1]: Starting Console Getty...
Mar 13 04:50:25 11bf07f59fff systemd[1]: Started Console Getty.
Mar 13 04:50:25 11bf07f59fff agetty[67]: /dev/console: No such file or directory
Mar 13 04:50:35 11bf07f59fff systemd[1]: console-getty.service holdoff time over, scheduling restart.
Mar 13 04:50:35 11bf07f59fff systemd[1]: Stopping Console Getty...
Mar 13 04:50:35 11bf07f59fff systemd[1]: Starting Console Getty...
Mar 13 04:50:35 11bf07f59fff systemd[1]: Started Console Getty.
Mar 13 04:50:35 11bf07f59fff agetty[74]: /dev/console: No such file or directory
Mar 13 04:50:45 11bf07f59fff systemd[1]: console-getty.service holdoff time over, scheduling restart.
Mar 13 04:50:45 11bf07f59fff systemd[1]: Stopping Console Getty...
Mar 13 04:50:45 11bf07f59fff systemd[1]: Starting Console Getty...
On Fri, Mar 13, 2015 at 8:25 PM, Michael Marineau <michael.marineau@coreos.com> wrote:
> Currently systemd-timesyncd.service includes
> ConditionVirtualization=no, disabling it in both containers and
> virtual machines. Each VM platform tends to deal with or ignore the
> time problem in their own special ways, KVM/QEMU has the kernel time
> source kvm-clock, Xen has had different schemes over the years, VMware
> expects a userspace daemon sync the clock, and other platforms are
> content to drift with the wind as far as I can tell.
>
> I don't know of a robust way to know if a platform needs a little
> extra help from userspace to keep the clock sane or not but it seems
> generally safer to try than to risk drifting. Does anyone know of a
> reason to leave timesyncd off by default? Otherwise switching to
> ConditionVirtualization=!container should be reasonable.
When manipulating container and VM images we need efficient and atomic
directory snapshots and file copies, as well as disk quota. btrfs
provides this, legacy file systems do not. Hence, implicitly create a
loopback file system in /var/lib/machines.raw and mount it to
/var/lib/machines, if that directory is not on btrfs anyway.
This is done implicitly and transparently the first time the user
invokes "machinectl import-xyz".
This allows us to take benefit of btrfs features for container
management without actually having the rest of the system use btrfs.
The loopback is sized 500M initially. Patches to grow it dynamically are
to follow.
We will be woken up on rtnl or dbus activity, so let's just quit if some time has passed and that is the only thing that can happen.
Note that we will always stay around if we expect network activity (e.g. DHCP is enabled), as we are not restarted on that.
Only the very basics, more to come.
For now:
$ busctl tree org.freedesktop.network1
└─/org/freedesktop/network1
└─/org/freedesktop/network1/link
├─/org/freedesktop/network1/link/1
├─/org/freedesktop/network1/link/2
├─/org/freedesktop/network1/link/3
├─/org/freedesktop/network1/link/4
├─/org/freedesktop/network1/link/5
├─/org/freedesktop/network1/link/6
├─/org/freedesktop/network1/link/7
├─/org/freedesktop/network1/link/8
└─/org/freedesktop/network1/link/9
$ busctl introspect org.freedesktop.network1 /org/freedesktop/network1
NAME TYPE SIGNATURE RESULT/VALUE FLAGS
org.freedesktop.network1.Manager interface - - -
.OperationalState property s "carrier" emits-change
$ busctl introspect org.freedesktop.network1 /org/freedesktop/network1/link/1
NAME TYPE SIGNATURE RESULT/VALUE FLAGS
org.freedesktop.network1.Link interface - - -
.AdministrativeState property s "unmanaged" emits-change
.OperationalState property s "carrier" emits-change
Services which are not crucial to system bootup, and have Type=oneshot
can effectively "hang" the system if they fail to complete for whatever
reason. To allow the boot to continue, kill them after a timeout.
In case of systemd-journal-flush the flush will continue in the background,
and in the other two cases the job will be aborted, but this should not
result in any permanent problem.
The old "systemd-import" binary is now an internal tool. We still use it
as asynchronous backend for systemd-importd. Since the import tool might
require some IO and CPU resources (due to qcow2 explosion, and
decompression), and because we might want to run it with more minimal
priviliges we still keep it around as the worker binary to execute as
child process of importd.
machinectl now has verbs for pulling down images, cancelling them and
listing them.
Instead of using Accept=true and running one proxy for each connection, we
now run one proxy-daemon with a thread per connection. This will enable us
to share resources like policies in the future.
When there are a lot of split out journal files, we might run out of fds
quicker then we want. Hence: bump RLIMIT_NOFILE to 16K if possible.
Do these even for journalctl. On Fedora the soft RLIMIT_NOFILE is at 1K,
the hard at 4K by default for normal user processes, this code hence
bumps this up for users to 4K.
https://bugzilla.redhat.com/show_bug.cgi?id=1179980
Making use of the fd storage capability of the previous commit, allow
restarting journald by serilizing stream state to /run, and pushing open
fds to PID 1.
- Unescape instance name so that we can take almost anything as instance
name.
- Introduce "machines.target" which consists of all enabled nspawns and
can be used to start/stop them altogether
- Look for container directory using -M instead of harcoding the path in
/var/lib/container
This adds a new bus call to machined that enumerates /var/lib/container
and returns all trees stored in it, distuingishing three types:
- GPT disk images, which are files suffixed with ".gpt"
- directory trees
- btrfs subvolumes
This pulls out the hwdb managment from udevadm into an independent tool.
The old code is left in place for backwards compatibility, and easy of
testing, but all documentation is dropped to encourage use of the new
tool instead.
Otherwise this actually remains in the generated unit in /usr/lib.
If you want to keep it commented out, a m4-compatible way would be:
m4_ifdef(`HAVE_SMACK',
dnl Capabilities=cap_mac_admin=i
dnl SecureBits=keep-caps
)
When dbus client connects to systemd-bus-proxyd through
Unix domain socket proxy takes client's smack label and sets for itself.
It is done before and independent of dropping privileges.
The reason of such soluton is fact that tests of access rights
performed by lsm may take place inside kernel, not only
in userspace of recipient of message.
The bus-proxyd needs CAP_MAC_ADMIN to manipulate its label.
In case of systemd running in system mode, CAP_MAC_ADMIN
should be added to CapabilityBoundingSet in service file of bus-proxyd.
In case of systemd running in user mode ('systemd --user')
it can be achieved by addition
Capabilities=cap_mac_admin=i and SecureBits=keep-caps
to user@.service file
and setting cap_mac_admin+ei on bus-proxyd binary.
The unit file only active the machine-id-commit helper if /etc is mounted
writable and /etc/machine-id is an independant mount point (should be a tmpfs).
--link-journal={host,guest} fail if the host does not have persistent
journalling enabled and /var/log/journal/ does not exist. Even worse, as there
is no stdout/err any more, there is no error message to point that out.
Introduce two new modes "try-host" and "try-guest" which don't fail in this
case, and instead just silently skip the guest journal setup.
Change -j to mean "try-guest" instead of "guest", and fix the wrong --help
output for it (it said "host" before).
Change systemd-nspawn@.service.in to use "try-guest" so that this unit works
with both persistent and non-persistent journals on the host without failing.
https://bugs.debian.org/770275
This reverts commit a4962513c5.
logind.service is a D-Bus service, hence we should use the dbus name as
indication that we are up. Type=dbus is implied if BusName= is
specified, as it is in this case.
This removes a warning that is printed because a BusName= is specified
for a Type=notify unit.
The code already calls sd_notify("READY=1"), so we may as well take
advantage of the startup behavior in the unit. The same was done for
the journal in a87a38c20.
kdbus has seen a larger update than expected lately, most notably with
kdbusfs, a file system to expose the kdbus control files:
* Each time a file system of this type is mounted, a new kdbus
domain is created.
* The layout inside each mount point is the same as before, except
that domains are not hierarchically nested anymore.
* Domains are therefore also unnamed now.
* Unmounting a kdbusfs will automatically also detroy the
associated domain.
* Hence, the action of creating a kdbus domain is now as
privileged as mounting a filesystem.
* This way, we can get around creating dev nodes for everything,
which is last but not least something that is not limited by
20-bit minor numbers.
The kdbus specific bits in nspawn have all been dropped now, as nspawn
can rely on the container OS to set up its own kdbus domain, simply by
mounting a new instance.
A new set of mounts has been added to mount things *after* the kernel
modules have been loaded. For now, only kdbus is in this set, which is
invoked with mount_setup_late().
It seems that there actually aren't any long running tasks which are
performed at shutdown. If it turns out that there actually are, this
should be revisited.
This reverts most of commit 038193efa6.
For boot, we might kill fsck in the middle, with likely catastrophic
consequences.
On shutdown there might be other jobs, like downloading of updates for
installation, and other custom jobs. It seems better to schedule an
individual timeout on each one separately, when it is known what
timeout is useful.
Disable the timeouts for now, until we have a clearer picture of how
we can deal with long-running jobs.
For priviliged units this resource control property ensures that the
processes have all controllers systemd manages enabled.
For unpriviliged services (those with User= set) this ensures that
access rights to the service cgroup is granted to the user in question,
to create further subgroups. Note that this only applies to the
name=systemd hierarchy though, as access to other controllers is not
safe for unpriviliged processes.
Delegate=yes should be set for container scopes where a systemd instance
inside the container shall manage the hierarchies below its own cgroup
and have access to all controllers.
Delegate=yes should also be set for user@.service, so that systemd
--user can run, controlling its own cgroup tree.
This commit changes machined, systemd-nspawn@.service and user@.service
to set this boolean, in order to ensure that container management will
just work, and the user systemd instance can run fine.
Otherwise we could attempt to flush the journal while /var/log/ was
still ro, and silently skip journal flushing.
The way that errors in flushing are handled should still be changed to
be more transparent and robust.
Since commit 19f8d03783 'timer: order OnCalendar units after
timer-sync.target if DefaultDependencies=no' timers might get a
dependency on time-sync.target, which does not really belong in early
boot. If ntp is enabled, time-sync.target might be delayed until a
network connection is established.
It turns out that majority of timer units found in the wild do not
need to be started in early boot. Out of the timer units available in
Fedora 21, only systemd-readahead-done.timer and mdadm-last-resort@.timer
should be started early, but they both have DefaultDependencies=no,
so are not part of timers.target anyway. All the rest look like they
will be fine with being started a bit later (and the majority even
much later, since they run daily or weekly).
Let timers.target be pulled in by basic.target, but without the
temporal dependency. This means timer units are started on a "best
effort" schedule.
https://bugzilla.redhat.com/show_bug.cgi?id=1158206
In some cases it is preferable to ship system images with a pre-generated
binary hwdb database, to avoid having to build it at runtime, avoid shipping
the source hwdb files, or avoid storing large binary files in /etc.
So if hwdb.bin does not exist in /etc/udev/, fall back to looking for it in
UDEVLIBEXECDIR. This keeps the possibility to add files to /etc/udev/hwdb.d/
and re-generating the database which trumps the one in /usr/lib.
Add a new --usr flag to "udevadm hwdb --update" which puts the database
into UDEVLIBEXECDIR.
Adjust systemd-udev-hwdb-update.service to not generate the file in /etc if we
already have it in /usr.
Using the new JobTimeoutAction= setting make sure we power off the
machine after basic.target is queued for longer than 15min but not
executed. Similar, if poweroff.target is queued for longer than 30min
but does not complete, forcibly turn off the system. Similar, if
reboot.target is queued for longer than 30min but does not complete,
forcibly reboot the system.
This new command will ask the journal daemon to flush all log data
stored in /run to /var, and wait for it to complete. This is useful, so
that in case of Storage=persistent we can order systemd-tmpfiles-setup
afterwards, to ensure any possibly newly created directory in /var/log
gets proper access mode and owners.
systemd-journald check the cgroup id to support rate limit option for
every messages. so journald should be available to access cgroup node in
each process send messages to journald.
In system using SMACK, cgroup node in proc is assigned execute label
as each process's execute label.
so if journald don't want to denied for every process, journald
should have all of access rule for all process's label.
It's too heavy. so we could give special smack label for journald te get
all accesses's permission.
'^' label.
When assign '^' execute smack label to systemd-journald,
systemd-journald need to add CAP_MAC_OVERRIDE capability to get that smack privilege.
so I want to notice this information and set default capability to
journald whether system use SMACK or not.
because that capability affect to only smack enabled kernel
They were left from an early review iteration, when hibernate-resume
functionality was intended to work also outside of initramfs.
Now this is not the case, and these dependencies became redundant
as systemd-fsck-root.service can never be part of initramfs, and
systemd-remount-fs.service makes little sense in it.
This way we are sure that /dev/net/tun has been given the right permissions before we try to connect to it.
Ideally, we should create tun/tap devices over netlink, and then this whole issue would go away.
It seems the return code of systemctl daemon-reload can be !=0 in some
circumstances, which causes a failure of the unit and breaks booting in
the initrd.
This can be used to initiate a resume from hibernation by path to a swap
device containing the hibernation image.
The respective templated unit is also added. It is instantiated using
path to the desired resume device.
With this change, it becomes possible to order a unit to activate before any
modifications to the file systems. This is especially useful for supporting
resume from hibernation.
For pluggable ttys such as USB serial devices, the getty is restarted
and exits in a loop until the remove event reaches systemd. Under
certain circumstances the restart loop can overload the system in a
way that prevents the remove event from reaching systemd for a long
time (e.g. at least several minutes on a small embedded system).
Use the default RestartSec to prevent the restart loop from
overloading the system. Serial gettys are interactive units, so
waiting an extra 100ms really doesn't make a difference anyways
compared to the time it takes the user to log in.
Currently after exiting rescue shell we isolate default target. User
might want to isolate to some other target than default one. However
issuing systemctl isolate command to desired target would bring system
to default target as a consequence of running ExecStopPost action.
Having common ancestor for rescue shell and possible followup systemctl
default command should fix this. If user exits rescue shell we will
proceed with isolating default target, otherwise, on manual isolate,
parent shell process is terminated and we don't isolate default target,
but target chosen by user.
Suggested-by: Michal Schmidt <mschmidt@redhat.com>
As Zbigniew pointed out a new ConditionFirstBoot= appears like the nicer
way to hook in systemd-firstboot.service on first boots (those with /etc
unpopulated), so let's do this, and get rid of the generator again.
A new tool "systemd-firstboot" can be used either interactively on boot,
where it will query basic locale, timezone, hostname, root password
information and set it. Or it can be used non-interactively from the
command line when prepareing disk images for booting. When used
non-inertactively the tool can either copy settings from the host, or
take settings on the command line.
$ systemd-firstboot --root=/path/to/my/new/root --copy-locale --copy-root-password --hostname=waldi
The tool will be automatically invoked (interactively) now on first boot
if /etc is found unpopulated.
This also creates the infrastructure for generators to be notified via
an environment variable whether they are running on the first boot, or
not.
We really don't want these in containers as they provide a too lowlevel
look on the system.
Conditionalize them with CAP_SYS_RAWIO since that's required to access
/proc/kcore, /dev/kmem and similar, which feel similar in style. Also,
npsawn containers lack that capability.
npsawn containers generally have CAP_MKNOD, since this is required
to make PrviateDevices= work. Thus, it's not useful anymore to
conditionalize the kmod static device node units.
Use CAP_SYS_MODULES instead which is not available for nspawn
containers. However, the static device node logic is only done for being
able to autoload modules with it, and if we can't do that there's no
point in doing it.
Reported by Gerardo Exequiel Pozzi:
Looks like [commit a4a878d0] also changes a unrelated file
(units/local-fs.target) [partially]reverting the commit
40f862e3 (filesystem targets: disable default dependencies)
The side effect, at least in my case is that the "nofail" option in both
"crypttab" and "fstab" has partial effect does the default timeout
instead of continue normal boot without timeout.
In a normal running system, non-passive targets and units used during
early bootup are always started. So refusing "manual start" for them
doesn't make any difference, because a "start" command doesn't cause
any action.
In early boot however, the administrator might want to start on
of those targets or services by hand. We shouldn't interfere with that.
Note: in case of systemd-tmpfiles-setup.service, really running the
unit after system is up would break the system. So e.g. restarting
should not be allowed. The unit has "RefuseManualStop=yes", which
prevents restart too.