IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
https://github.com/systemd/systemd/pull/3685 introduced
/run/systemd/inaccessible/{chr,blk} to map inacessible devices,
this patch allows systemd running inside a nspawn container to create
/run/systemd/inaccessible/{chr,blk}.
When a container scope is allocated via machined it gets 16K set already since
cf7d1a30e4. Make sure when a container is run as
system service it gets the same values.
udevd already limits its number of workers/children: the max number is actually
twice the number of CPUs the system is using.
(The limit can also be raised with udev.children-max= kernel command line
option BTW).
On some servers, this limit can easily exceed the maximum number of tasks that
systemd put on all services, which is 512 by default.
Since udevd has already its limitation logic, simply disable the static
limitation done by TasksMax.
Specifically "machinectl shell" (or its OpenShell() bus call) is implemented by
entering the file system namespace of the container and opening a TTY there.
In order to enter the file system namespace, chroot() is required, which is
filtered by SystemCallFilter='s @mount group. Hence, let's make this work again
and drop @mount from the filter list.
Quoting @cgwalters:
Just uploading this as an RFC. Now I know reading the code that systemd says
`Welcome to $OS` as a generic thing, but my initial impression on seeing this
was that it was almost sarcastic =)
Let's say "You are in emergency mode" as a more neutral/less excited phrase.
This patch is based on #3556, but makes the same change for rescue mode.
In order to improve compatibility with local clients that speak DNS directly
(and do not use NSS or our bus API) listen locally on 127.0.0.53:53 and process
any queries made that way.
Note that resolved does not implement a full DNS server on this port, but
simply enough to allow normal, local clients to resolve RRs through resolved.
Specifically it does not implement queries without the RD bit set (these are
requests where recursive lookups are explicitly disabled), and neither queries
with DNSSEC DO set in combination with DNSSEC CD (i.e. DNSSEC lookups with
validation turned off). It also refuses zone transfers and obsolete RR types.
All lookups done this way will be rejected with a clean error code, so that the
client side can repeat the query with a reduced feature set.
The code will set the DNSSEC AD flag however, depending on whether the data
resolved has been validated (or comes from a local, trusted source).
Lookups made via this mechanisms are propagated to LLMNR and mDNS as necessary,
but this is only partially useful as DNS packets cannot carry IP scope data
(i.e. the ifindex), and hence link-local addresses returned cannot be used
properly (and given that LLMNR/mDNS are mostly about link-local communication
this is quite a limitation). Also, given that DNS tends to use IDNA for
non-ASCII names, while LLMNR/mDNS uses UTF-8 lookups cannot be mapped 1:1.
In general this should improve compatibility with clients bypassing NSS but
it is highly recommended for clients to instead use NSS or our native bus API.
This patch also beefs up the DnsStream logic, as it reuses the code for local
TCP listening. DnsStream now provides proper reference counting for its
objects.
In order to avoid feedback loops resolved will no silently ignore 127.0.0.53
specified as DNS server when reading configuration.
resolved listens on 127.0.0.53:53 instead of 127.0.0.1:53 in order to leave
the latter free for local, external DNS servers or forwarders.
This also changes the "etc.conf" tmpfiles snippet to create a symlink from
/etc/resolv.conf to /usr/lib/systemd/resolv.conf by default, thus making this
stub the default mode of operation if /etc is not populated.
Add a line
SystemCallFilter=~@clock @module @mount @obsolete @raw-io ptrace
for daemons shipped by systemd. As an exception, systemd-timesyncd
needs @clock system calls and systemd-localed is not privileged.
ptrace(2) is blocked to prevent seccomp escapes.
In the same vein as commit ac59f0c12c which added
the --wait option to the emergency service, this patch makes sure that plymouth
has exited before entering into the rescue mode.
In order to support stateless systems that support offline /usr updates
properly, let's restore the ConditionNeesUpdate=/etc line that makes sure we
are run when /usr is updated and this update needs to be propagated to the
/etc/ld.so.conf file stored in /etc.
This reverts part of #2859, which snuck this change in, but really shouldn't
have.
Add a synchronization point so that custom initramfs units can run
after the root device becomes available, before it is fsck'd and
mounted.
This is useful for custom initramfs units that may modify the
root disk partition table, where the root device is not known in
advance (it's dynamically selected by the generators).
When enabling ForwardToSyslog=yes, the syslog.socket is active when entering
emergency mode. Any log message then triggers the start of rsyslog.service (or
other implementation) along with its dependencies such as local-fs.target and
sysinit.target. As these might fail themselves (e. g. faulty /etc/fstab), this
breaks the emergency mode.
This causes syslog.socket to fail with "Failed to queue service startup job:
Transition is destructive".
Add Conflicts=syslog.socket to emergency.service to make sure the socket is
stopped when emergency.service is started.
Fixes#266
Container images from Debian or suchlike contain device nodes in /dev. Let's
make sure we can clone them properly, hence pass CAP_MKNOD to machined.
Fixes: #2867#465
That way we can be sure that local users are logged out before the network is
shut down when the system goes down, so that SSH session should be ending
cleanly before the system goes down.
Fixes: #2390
With the current "Type=forking", systemd tries to guess the PID it
should wait on at reboot (because we have no "PIDFile="). Depending on
how wrong the guess is, we can end up hanging forever at reboot.
Asking it not to do that eliminates the problem.
Also drop ConditionNeedsUpdate=|/etc. Regardless if system is updated
online or offline, updating dynamic loader cache should always be
responsibility of packaging tools/scripts.
When using `%I` for instances of `systemd-nspawn@.service`, the result
will be `systemd-nspawn` trying to launch a container named e.g.
`fedora/23` instead of `fedora-23`.
Using `%i` instead prevents escaping `-` in a container name and uses
the unmodified container name from the machine store.
This commit rips out systemd-bootchart. It will be given a new home, outside
of the systemd repository. The code itself isn't actually specific to
systemd and can be used without systemd even, so let's put it somewhere
else.
As kdbus won't land in the anticipated way, the bus-proxy is not needed in
its current form. It can be resurrected at any time thanks to the history,
but for now, let's remove it from the sources. If we'll have a similar tool
in the future, it will look quite differently anyway.
Note that stdio-bridge is still available. It was restored from a version
prior to f252ff17, and refactored to make use of the current APIs.
This reworks the coredumping logic so that the coredump handler invoked from the kernel only collects runtime data
about the crashed process, and then submits it for processing to a socket-activate coredump service, which extracts a
stacktrace and writes the coredump to disk.
This has a number of benefits: the disk IO and stack trace generation may take a substantial amount of resources, and
hence should better be managed by PID 1, so that resource management applies. This patch uses RuntimeMaxSec=, Nice=, OOMScoreAdjust=
and various sandboxing settings to ensure that the coredump handler doesn't take away unbounded resources from normally
priorized processes.
This logic is also nice since this makes sure the coredump processing and storage is delayed correctly until
/var/systemd/coredump is mounted and writable.
Fixes: #2286
Now that requiring of a masked unit results in failure again, downgrade the dependency on /tmp to Wants= again, so that
our suggested way to disable /tmp-on-tmpfs by masking doesn't result in a failing boot.
References: #2315
The user manager is still limited by its parent slice user-UID.slice,
which defaults to 4096 tasks. However, it no longer has an additional
limit of 512 tasks.
Fixes#1955.
Apparently, util-linux' mount command implicitly drops the smack-related
options anyway before passing them to the kernel, if the kernel doesn't
know SMACK, hence there's no point in duplicating this in systemd.
Fixes#1696
Otherwise we might run into deadlocks, when journald blocks on the
notify socket on PID 1, and PID 1 blocks on IPC to dbus-daemon and
dbus-daemon blocks on logging to journald. Break this cycle by making
sure that journald never ever blocks on PID 1.
Note that this change disables support for event loop watchdog support,
as these messages are sent in blocking style by sd-event. That should
not be a big loss though, as people reported frequent problems with the
watchdog hitting journald on excessively slow IO.
Fixes: #1505.
If SMACK is enabled, 'smackfsroot=*' option should be specified when
/tmp is mounted since many non-root processes use /tmp for temporary
usage. If not, /tmp is labeled as '_' and smack denial occurs when
writing.
In order to do that, 'SmackFileSystemRoot=*' is newly added into
tmp.mount.
This reverts commit 409c2a13fd.
It breaks the bootup of systems which enable smack at compile time, but have no
smack enabled in the kernel. This needs a different solution.
If SMACK is enabled, 'smackfsroot=*' option should be specified in
tmp.mount file since many non-root processes use /tmp for temporary
usage. If not, /tmp is labeled as '_' and smack denial occurs when
writing.
Usually we try to properly uppercase first characters in the
description, do so here, too. Also, keep it close to the string used in
systemd-networkd.service.
With this rework we introduce systemd-rfkill.service as singleton that
is activated via systemd-rfkill.socket that listens on /dev/rfkill. That
way, we get notified each time a new rfkill device shows up or changes
state, in which case we restore and save its current setting to disk.
This is nicer than the previous logic, as this means we save/restore
state even of rfkill devices that are around only intermittently, and
save/restore the state even if the system is shutdown abruptly instead
of cleanly.
This implements what I suggested in #1019 and obsoletes it.
And remove machine-id-commit as separate binary.
There's really no point in keeping this separate, as the sources are
pretty much identical, and have pretty identical interfaces. Let's unify
this in one binary.
Given that machine-id-commit was a private binary of systemd (shipped in
/usr/lib/) removing the tool is not an API break.
While we are at it, improve the documentation of the command substantially.
Apparently, disk IO issues are more frequent than we hope, and 1min
waiting for disk IO happens, so let's increase the watchdog timeout a
bit, for all our services.
See #1353 for an example where this triggers.
When a systemd service running in a container exits with a non-zero
code, it can be useful to terminate the container immediately and get
the exit code back to the host, when systemd-nspawn returns. This was
not possible to do. This patch adds the following to make it possible:
- Add a read-only "ExitCode" property on PID 1's "Manager" bus object.
By default, it is 0 so the behaviour stays the same as previously.
- Add a method "SetExitCode" on the same object. The method fails when
called on baremetal: it is only allowed in containers or in user
session.
- Add support in systemctl to call "systemctl exit 42". It reuses the
existing code for user session.
- Add exit.target and systemd-exit.service to the system instance.
- Change main() to actually call systemd-shutdown to exit() with the
correct value.
- Add verb 'exit' in systemd-shutdown with parameter --exit-code
- Update systemctl manpage.
I used the following to test it:
| $ sudo rkt --debug --insecure-skip-verify run \
| --mds-register=false --local docker://busybox \
| --exec=/bin/chroot -- /proc/1/root \
| systemctl --force exit 42
| ...
| Container rkt-895a0cba-5c66-4fa5-831c-e3f8ddc5810d failed with error code 42.
| $ echo $?
| 42
Fixes https://github.com/systemd/systemd/issues/1290
The bus-proxy manages the kdbus connections of all users on the system
(regarding the system bus), hence, it needs an elevated NOFILE.
Otherwise, a single user can trigger ENFILE by opening NOFILE connections
to the bus-proxy.
Note that the bus-proxy still does per-user accounting, indirectly via
the proxy/fake API of kdbus. Hence, the effective per-user limit is not
raised by this. However, we now prevent one user from consuming the whole
FD limit of the shared proxy.
Also note that there is no *perfect* way to set this. The proxy is a
shared object, so it needs a larger NOFILE limit than the highest limit
of all users. This limit can be changed dynamically, though. Hence, we
cannot protect against it. However, a raised NOFILE limit is a privilege,
so we just treat it as such and basically allow these privileged users to
be able to consume more resources than normal users (and, maybe, cause
some limits to be exceeded by this).
Right now, kdbus hard-codes 1024 max connections per user on each bus.
However, we *must not* rely on this. This limits could be easily dropped
entirely, as the NOFILE limit is a suitable limit on its on.
Make sure we support ExecReload= for bus-proxyd to reload configuration
during runtime. This is *really* handy when hacking on kdbus.
Package-managers are still recommended to run
`busctl --address=unix:path=` directly.
This drops the libsystemd-terminal and systemd-consoled code for various
reasons:
* It's been sitting there unfinished for over a year now and won't get
finished any time soon.
* Since its initial creation, several parts need significant rework: The
input handling should be replaced with the now commonly used libinput,
the drm accessors should coordinate the handling of mode-object
hotplugging (including split connectors) with other DRM users, and the
internal library users should be converted to sd-device and friends.
* There is still significant kernel work required before sd-console is
really useful. This includes, but is not limited to, simpledrm and
drmlog.
* The authority daemon is needed before all this code can be used for
real. And this will definitely take a lot more time to get done as
no-one else is currently working on this, but me.
* kdbus maintenance has taken up way more time than I thought and it has
much higher priority. I don't see me spending much time on the
terminal code in the near future.
If anyone intends to hack on this, please feel free to contact me. I'll
gladly help you out with any issues. Once kdbus and authorityd are
finished (whenever that will be..) I'll definitely pick this up again. But
until then, lets reduce compile times and maintenance efforts on this code
and drop it for now.
Otherwise copying full directory trees between container and host won't
work, as we cannot access some fiels and cannot adjust the ownership
properly on the destination.
Of course, adding these many caps to the daemon kinda defeats the
purpose of the caps lock-down... but well...
Fixes#433
Merely calling "plymouth quit" isn't sufficient, as plymouth needs some time to
shut down. This needs plymouth --wait (which is a no-op when it's not running).
Fixes invisible emergency shell with plymouth running endlessly.
https://launchpad.net/bugs/1471258
./configure --enable/disable-kdbus can be used to set the default
behavior regarding kdbus.
If no kdbus kernel support is available, dbus-dameon will be used.
With --enable-kdbus, the kernel command line option "kdbus=0" can
be used to disable kdbus.
With --disable-kdbus, the kernel command line option "kdbus=1" is
required to enable kdbus support.
The daemon requires the busname unit to operate (on kdbus systems),
since it contains the policy that allows it to acquire its service
name.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=90287
This way we know that any bridges and other user-created network devices
are in place, and can be properly added to the container.
In the long run this should be dropped, and replaced by direct calls
inside nspawn that cause the devices to be created when necessary.
For a longer discussion see this:
http://lists.freedesktop.org/archives/systemd-devel/2015-April/030175.html
This introduces /run/systemd/fsck.progress as a simply
AF_UNIX/SOCK_STREAM socket. If it exists and is connectable we'll
connect fsck's -c switch with it. If external programs want to get
progress data they should hence listen on this socket and will get
all they need via that socket. To get information about the connecting
fsck client they should use SO_PEERCRED.
Unless /run/systemd/fsck.progress is around and connectable this change
reverts back to v219 behaviour where we'd forward fsck output to
/dev/console on our own.
Even trivial service occasionally get stuck, for example when
there's a problem with the journal. There's nothing more annoying
that looking at the cylon eye for a job with an infinite timeout.
Use standard 90s for jobs that do some work, and 30s for those which
should be almost instantenous.
Not that all functionality has been ported over to logind, the old
implementation can be removed. There goes one of the oldest parts of
the systemd code base.
Fedora's filesystem package ships /usr/bin (and other directories) which are
not writable by its owner. machinectl pull-dkr (and possibly others) are not
able to extract those:
14182 mkdirat(3, "usr", 0700) = 0
14182 mkdirat(3, "usr/bin", 0500) = 0
14182 openat(3, "usr/bin/[", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0700) = -1 EACCES (Permission denied)
...
We support /var, /tmp and /var/tmp on NFS. NFS shares however are by
default ordered only before remote-fs.target which is a late-boot
service. /var, /tmp, /var/tmp need to be around earlier though, hence
explicitly order them before basic.target.
Note that this change simply makes explicit what was implicit before,
since many early-boot services pulled in parts of /var anyway early.
Create minimal image which runs systemd
FROM rhel7.1
RUN yum install -y /usr/bin/ps
ENV container docker
CMD [ "/usr/sbin/init" ]
When you run the container without -t, the process
/sbin/agetty --noclear --keep-baud console 115200 38400 9600
is not happy and checking the journal in the container, there is a stream of
Mar 13 04:50:15 11bf07f59fff agetty[66]: /dev/console: No such file or directory
Mar 13 04:50:25 11bf07f59fff systemd[1]: console-getty.service holdoff time over, scheduling restart.
Mar 13 04:50:25 11bf07f59fff systemd[1]: Stopping Console Getty...
Mar 13 04:50:25 11bf07f59fff systemd[1]: Starting Console Getty...
Mar 13 04:50:25 11bf07f59fff systemd[1]: Started Console Getty.
Mar 13 04:50:25 11bf07f59fff agetty[67]: /dev/console: No such file or directory
Mar 13 04:50:35 11bf07f59fff systemd[1]: console-getty.service holdoff time over, scheduling restart.
Mar 13 04:50:35 11bf07f59fff systemd[1]: Stopping Console Getty...
Mar 13 04:50:35 11bf07f59fff systemd[1]: Starting Console Getty...
Mar 13 04:50:35 11bf07f59fff systemd[1]: Started Console Getty.
Mar 13 04:50:35 11bf07f59fff agetty[74]: /dev/console: No such file or directory
Mar 13 04:50:45 11bf07f59fff systemd[1]: console-getty.service holdoff time over, scheduling restart.
Mar 13 04:50:45 11bf07f59fff systemd[1]: Stopping Console Getty...
Mar 13 04:50:45 11bf07f59fff systemd[1]: Starting Console Getty...
On Fri, Mar 13, 2015 at 8:25 PM, Michael Marineau <michael.marineau@coreos.com> wrote:
> Currently systemd-timesyncd.service includes
> ConditionVirtualization=no, disabling it in both containers and
> virtual machines. Each VM platform tends to deal with or ignore the
> time problem in their own special ways, KVM/QEMU has the kernel time
> source kvm-clock, Xen has had different schemes over the years, VMware
> expects a userspace daemon sync the clock, and other platforms are
> content to drift with the wind as far as I can tell.
>
> I don't know of a robust way to know if a platform needs a little
> extra help from userspace to keep the clock sane or not but it seems
> generally safer to try than to risk drifting. Does anyone know of a
> reason to leave timesyncd off by default? Otherwise switching to
> ConditionVirtualization=!container should be reasonable.
When manipulating container and VM images we need efficient and atomic
directory snapshots and file copies, as well as disk quota. btrfs
provides this, legacy file systems do not. Hence, implicitly create a
loopback file system in /var/lib/machines.raw and mount it to
/var/lib/machines, if that directory is not on btrfs anyway.
This is done implicitly and transparently the first time the user
invokes "machinectl import-xyz".
This allows us to take benefit of btrfs features for container
management without actually having the rest of the system use btrfs.
The loopback is sized 500M initially. Patches to grow it dynamically are
to follow.
We will be woken up on rtnl or dbus activity, so let's just quit if some time has passed and that is the only thing that can happen.
Note that we will always stay around if we expect network activity (e.g. DHCP is enabled), as we are not restarted on that.
Only the very basics, more to come.
For now:
$ busctl tree org.freedesktop.network1
└─/org/freedesktop/network1
└─/org/freedesktop/network1/link
├─/org/freedesktop/network1/link/1
├─/org/freedesktop/network1/link/2
├─/org/freedesktop/network1/link/3
├─/org/freedesktop/network1/link/4
├─/org/freedesktop/network1/link/5
├─/org/freedesktop/network1/link/6
├─/org/freedesktop/network1/link/7
├─/org/freedesktop/network1/link/8
└─/org/freedesktop/network1/link/9
$ busctl introspect org.freedesktop.network1 /org/freedesktop/network1
NAME TYPE SIGNATURE RESULT/VALUE FLAGS
org.freedesktop.network1.Manager interface - - -
.OperationalState property s "carrier" emits-change
$ busctl introspect org.freedesktop.network1 /org/freedesktop/network1/link/1
NAME TYPE SIGNATURE RESULT/VALUE FLAGS
org.freedesktop.network1.Link interface - - -
.AdministrativeState property s "unmanaged" emits-change
.OperationalState property s "carrier" emits-change
Services which are not crucial to system bootup, and have Type=oneshot
can effectively "hang" the system if they fail to complete for whatever
reason. To allow the boot to continue, kill them after a timeout.
In case of systemd-journal-flush the flush will continue in the background,
and in the other two cases the job will be aborted, but this should not
result in any permanent problem.
The old "systemd-import" binary is now an internal tool. We still use it
as asynchronous backend for systemd-importd. Since the import tool might
require some IO and CPU resources (due to qcow2 explosion, and
decompression), and because we might want to run it with more minimal
priviliges we still keep it around as the worker binary to execute as
child process of importd.
machinectl now has verbs for pulling down images, cancelling them and
listing them.
Instead of using Accept=true and running one proxy for each connection, we
now run one proxy-daemon with a thread per connection. This will enable us
to share resources like policies in the future.
When there are a lot of split out journal files, we might run out of fds
quicker then we want. Hence: bump RLIMIT_NOFILE to 16K if possible.
Do these even for journalctl. On Fedora the soft RLIMIT_NOFILE is at 1K,
the hard at 4K by default for normal user processes, this code hence
bumps this up for users to 4K.
https://bugzilla.redhat.com/show_bug.cgi?id=1179980
Making use of the fd storage capability of the previous commit, allow
restarting journald by serilizing stream state to /run, and pushing open
fds to PID 1.
- Unescape instance name so that we can take almost anything as instance
name.
- Introduce "machines.target" which consists of all enabled nspawns and
can be used to start/stop them altogether
- Look for container directory using -M instead of harcoding the path in
/var/lib/container
This adds a new bus call to machined that enumerates /var/lib/container
and returns all trees stored in it, distuingishing three types:
- GPT disk images, which are files suffixed with ".gpt"
- directory trees
- btrfs subvolumes
This pulls out the hwdb managment from udevadm into an independent tool.
The old code is left in place for backwards compatibility, and easy of
testing, but all documentation is dropped to encourage use of the new
tool instead.
Otherwise this actually remains in the generated unit in /usr/lib.
If you want to keep it commented out, a m4-compatible way would be:
m4_ifdef(`HAVE_SMACK',
dnl Capabilities=cap_mac_admin=i
dnl SecureBits=keep-caps
)
When dbus client connects to systemd-bus-proxyd through
Unix domain socket proxy takes client's smack label and sets for itself.
It is done before and independent of dropping privileges.
The reason of such soluton is fact that tests of access rights
performed by lsm may take place inside kernel, not only
in userspace of recipient of message.
The bus-proxyd needs CAP_MAC_ADMIN to manipulate its label.
In case of systemd running in system mode, CAP_MAC_ADMIN
should be added to CapabilityBoundingSet in service file of bus-proxyd.
In case of systemd running in user mode ('systemd --user')
it can be achieved by addition
Capabilities=cap_mac_admin=i and SecureBits=keep-caps
to user@.service file
and setting cap_mac_admin+ei on bus-proxyd binary.
The unit file only active the machine-id-commit helper if /etc is mounted
writable and /etc/machine-id is an independant mount point (should be a tmpfs).
--link-journal={host,guest} fail if the host does not have persistent
journalling enabled and /var/log/journal/ does not exist. Even worse, as there
is no stdout/err any more, there is no error message to point that out.
Introduce two new modes "try-host" and "try-guest" which don't fail in this
case, and instead just silently skip the guest journal setup.
Change -j to mean "try-guest" instead of "guest", and fix the wrong --help
output for it (it said "host" before).
Change systemd-nspawn@.service.in to use "try-guest" so that this unit works
with both persistent and non-persistent journals on the host without failing.
https://bugs.debian.org/770275
This reverts commit a4962513c5.
logind.service is a D-Bus service, hence we should use the dbus name as
indication that we are up. Type=dbus is implied if BusName= is
specified, as it is in this case.
This removes a warning that is printed because a BusName= is specified
for a Type=notify unit.
The code already calls sd_notify("READY=1"), so we may as well take
advantage of the startup behavior in the unit. The same was done for
the journal in a87a38c20.
kdbus has seen a larger update than expected lately, most notably with
kdbusfs, a file system to expose the kdbus control files:
* Each time a file system of this type is mounted, a new kdbus
domain is created.
* The layout inside each mount point is the same as before, except
that domains are not hierarchically nested anymore.
* Domains are therefore also unnamed now.
* Unmounting a kdbusfs will automatically also detroy the
associated domain.
* Hence, the action of creating a kdbus domain is now as
privileged as mounting a filesystem.
* This way, we can get around creating dev nodes for everything,
which is last but not least something that is not limited by
20-bit minor numbers.
The kdbus specific bits in nspawn have all been dropped now, as nspawn
can rely on the container OS to set up its own kdbus domain, simply by
mounting a new instance.
A new set of mounts has been added to mount things *after* the kernel
modules have been loaded. For now, only kdbus is in this set, which is
invoked with mount_setup_late().
It seems that there actually aren't any long running tasks which are
performed at shutdown. If it turns out that there actually are, this
should be revisited.
This reverts most of commit 038193efa6.
For boot, we might kill fsck in the middle, with likely catastrophic
consequences.
On shutdown there might be other jobs, like downloading of updates for
installation, and other custom jobs. It seems better to schedule an
individual timeout on each one separately, when it is known what
timeout is useful.
Disable the timeouts for now, until we have a clearer picture of how
we can deal with long-running jobs.
For priviliged units this resource control property ensures that the
processes have all controllers systemd manages enabled.
For unpriviliged services (those with User= set) this ensures that
access rights to the service cgroup is granted to the user in question,
to create further subgroups. Note that this only applies to the
name=systemd hierarchy though, as access to other controllers is not
safe for unpriviliged processes.
Delegate=yes should be set for container scopes where a systemd instance
inside the container shall manage the hierarchies below its own cgroup
and have access to all controllers.
Delegate=yes should also be set for user@.service, so that systemd
--user can run, controlling its own cgroup tree.
This commit changes machined, systemd-nspawn@.service and user@.service
to set this boolean, in order to ensure that container management will
just work, and the user systemd instance can run fine.
Otherwise we could attempt to flush the journal while /var/log/ was
still ro, and silently skip journal flushing.
The way that errors in flushing are handled should still be changed to
be more transparent and robust.
Since commit 19f8d03783 'timer: order OnCalendar units after
timer-sync.target if DefaultDependencies=no' timers might get a
dependency on time-sync.target, which does not really belong in early
boot. If ntp is enabled, time-sync.target might be delayed until a
network connection is established.
It turns out that majority of timer units found in the wild do not
need to be started in early boot. Out of the timer units available in
Fedora 21, only systemd-readahead-done.timer and mdadm-last-resort@.timer
should be started early, but they both have DefaultDependencies=no,
so are not part of timers.target anyway. All the rest look like they
will be fine with being started a bit later (and the majority even
much later, since they run daily or weekly).
Let timers.target be pulled in by basic.target, but without the
temporal dependency. This means timer units are started on a "best
effort" schedule.
https://bugzilla.redhat.com/show_bug.cgi?id=1158206
In some cases it is preferable to ship system images with a pre-generated
binary hwdb database, to avoid having to build it at runtime, avoid shipping
the source hwdb files, or avoid storing large binary files in /etc.
So if hwdb.bin does not exist in /etc/udev/, fall back to looking for it in
UDEVLIBEXECDIR. This keeps the possibility to add files to /etc/udev/hwdb.d/
and re-generating the database which trumps the one in /usr/lib.
Add a new --usr flag to "udevadm hwdb --update" which puts the database
into UDEVLIBEXECDIR.
Adjust systemd-udev-hwdb-update.service to not generate the file in /etc if we
already have it in /usr.
Using the new JobTimeoutAction= setting make sure we power off the
machine after basic.target is queued for longer than 15min but not
executed. Similar, if poweroff.target is queued for longer than 30min
but does not complete, forcibly turn off the system. Similar, if
reboot.target is queued for longer than 30min but does not complete,
forcibly reboot the system.
This new command will ask the journal daemon to flush all log data
stored in /run to /var, and wait for it to complete. This is useful, so
that in case of Storage=persistent we can order systemd-tmpfiles-setup
afterwards, to ensure any possibly newly created directory in /var/log
gets proper access mode and owners.
systemd-journald check the cgroup id to support rate limit option for
every messages. so journald should be available to access cgroup node in
each process send messages to journald.
In system using SMACK, cgroup node in proc is assigned execute label
as each process's execute label.
so if journald don't want to denied for every process, journald
should have all of access rule for all process's label.
It's too heavy. so we could give special smack label for journald te get
all accesses's permission.
'^' label.
When assign '^' execute smack label to systemd-journald,
systemd-journald need to add CAP_MAC_OVERRIDE capability to get that smack privilege.
so I want to notice this information and set default capability to
journald whether system use SMACK or not.
because that capability affect to only smack enabled kernel
They were left from an early review iteration, when hibernate-resume
functionality was intended to work also outside of initramfs.
Now this is not the case, and these dependencies became redundant
as systemd-fsck-root.service can never be part of initramfs, and
systemd-remount-fs.service makes little sense in it.
This way we are sure that /dev/net/tun has been given the right permissions before we try to connect to it.
Ideally, we should create tun/tap devices over netlink, and then this whole issue would go away.
It seems the return code of systemctl daemon-reload can be !=0 in some
circumstances, which causes a failure of the unit and breaks booting in
the initrd.
This can be used to initiate a resume from hibernation by path to a swap
device containing the hibernation image.
The respective templated unit is also added. It is instantiated using
path to the desired resume device.
With this change, it becomes possible to order a unit to activate before any
modifications to the file systems. This is especially useful for supporting
resume from hibernation.
For pluggable ttys such as USB serial devices, the getty is restarted
and exits in a loop until the remove event reaches systemd. Under
certain circumstances the restart loop can overload the system in a
way that prevents the remove event from reaching systemd for a long
time (e.g. at least several minutes on a small embedded system).
Use the default RestartSec to prevent the restart loop from
overloading the system. Serial gettys are interactive units, so
waiting an extra 100ms really doesn't make a difference anyways
compared to the time it takes the user to log in.
Currently after exiting rescue shell we isolate default target. User
might want to isolate to some other target than default one. However
issuing systemctl isolate command to desired target would bring system
to default target as a consequence of running ExecStopPost action.
Having common ancestor for rescue shell and possible followup systemctl
default command should fix this. If user exits rescue shell we will
proceed with isolating default target, otherwise, on manual isolate,
parent shell process is terminated and we don't isolate default target,
but target chosen by user.
Suggested-by: Michal Schmidt <mschmidt@redhat.com>
As Zbigniew pointed out a new ConditionFirstBoot= appears like the nicer
way to hook in systemd-firstboot.service on first boots (those with /etc
unpopulated), so let's do this, and get rid of the generator again.
A new tool "systemd-firstboot" can be used either interactively on boot,
where it will query basic locale, timezone, hostname, root password
information and set it. Or it can be used non-interactively from the
command line when prepareing disk images for booting. When used
non-inertactively the tool can either copy settings from the host, or
take settings on the command line.
$ systemd-firstboot --root=/path/to/my/new/root --copy-locale --copy-root-password --hostname=waldi
The tool will be automatically invoked (interactively) now on first boot
if /etc is found unpopulated.
This also creates the infrastructure for generators to be notified via
an environment variable whether they are running on the first boot, or
not.
We really don't want these in containers as they provide a too lowlevel
look on the system.
Conditionalize them with CAP_SYS_RAWIO since that's required to access
/proc/kcore, /dev/kmem and similar, which feel similar in style. Also,
npsawn containers lack that capability.
npsawn containers generally have CAP_MKNOD, since this is required
to make PrviateDevices= work. Thus, it's not useful anymore to
conditionalize the kmod static device node units.
Use CAP_SYS_MODULES instead which is not available for nspawn
containers. However, the static device node logic is only done for being
able to autoload modules with it, and if we can't do that there's no
point in doing it.
Reported by Gerardo Exequiel Pozzi:
Looks like [commit a4a878d0] also changes a unrelated file
(units/local-fs.target) [partially]reverting the commit
40f862e3 (filesystem targets: disable default dependencies)
The side effect, at least in my case is that the "nofail" option in both
"crypttab" and "fstab" has partial effect does the default timeout
instead of continue normal boot without timeout.
In a normal running system, non-passive targets and units used during
early bootup are always started. So refusing "manual start" for them
doesn't make any difference, because a "start" command doesn't cause
any action.
In early boot however, the administrator might want to start on
of those targets or services by hand. We shouldn't interfere with that.
Note: in case of systemd-tmpfiles-setup.service, really running the
unit after system is up would break the system. So e.g. restarting
should not be allowed. The unit has "RefuseManualStop=yes", which
prevents restart too.
networkd-wait-online should never exist in the default transaction,
unless explicitly enable or pulled in via things like NFS. However, just
enabling networkd shouldn't enable networkd-wait-online, since it's
common to use the former without the latter.