IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
In such a case let's suppress the warning (downgrade to LOG_DEBUG),
under the assumption that the user has no config file to update in its
place, but a symlink that points to something like resolved's
automatically managed resolve.conf file.
While we are at it, also stop complaining if we cannot write /etc/resolv.conf
due to a read-only disk, given that there's little we could do about it.
If the kernel do not support user namespace then one of the children
created by nspawn parent will fail at clone(CLONE_NEWUSER) with the
generic error EINVAL and without logging the error. At the same time
the parent may also try to setup the user namespace and will fail with
another error.
To improve this, check if the kernel supports user namespace as early
as possible.
This ports a lot of manual code over to sigprocmask_many() and friends.
Also, we now consistly check for sigprocmask() failures with
assert_se(), since the call cannot realistically fail unless there's a
programming error.
Also encloses a few sd_event_add_signal() calls with (void) when we
ignore the return values for it knowingly.
Remove old temporary snapshots, but only at boot. Ideally we'd have
"self-destroying" btrfs snapshots that go away if the last last
reference to it does. To mimic a scheme like this at least remove the
old snapshots on fresh boots, where we know they cannot be referenced
anymore. Note that we actually remove all temporary files in
/var/lib/machines/ at boot, which should be safe since the directory has
defined semantics. In the root directory (where systemd-nspawn
--ephemeral places snapshots) we are more strict, to avoid removing
unrelated temporary files.
This also splits out nspawn/container related tmpfiles bits into a new
tmpfiles snippet to systemd-nspawn.conf
This adds a "char *extra" parameter to tempfn_xxxxxx(), tempfn_random(),
tempfn_ranomd_child(). If non-NULL this string is included in the middle
of the newly created file name. This is useful for being able to
distuingish the kind of temporary file when we see one.
This also adds tests for the three call.
For now, we don't make use of this at all, but port all users over.
seccomp_load returns -EINVAL when seccomp support is not enabled in the
kernel [1]. This should be a debug log, not an error that interrupts nspawn.
If the seccomp filter can't be set and audit is enabled, the user will
get an error message anyway.
[1]: http://man7.org/linux/man-pages/man2/prctl.2.html
Also, when the child is potentially long-running make sure to set a
death signal.
Also, ignore the result of the reset operations explicitly by casting
them to (void).
Rather than checking the return of asprintf() we are checking if buf gets allocated,
make it clear that it is ok to ignore the return value.
Fixes CID 1299644.
Allowed interface name is relatively small. Lets not make
users go in to the source code to figure out what happened.
--machine=debian-tree conflicts with
--machine=debian-tree2
ex: Failed to add new veth \
interfaces (host0, vb-debian-tree): File exists
When systemd-nspawn gets exec*()ed, it inherits the followings file
descriptors:
- 0, 1, 2: stdin, stdout, stderr
- SD_LISTEN_FDS_START, ... SD_LISTEN_FDS_START+LISTEN_FDS: file
descriptors passed by the system manager (useful for socket
activation). They are passed to the child process (process leader).
- extra lock fd: rkt passes a locked directory as an extra fd, so the
directory remains locked as long as the container is alive.
systemd-nspawn used to close all open fds except 0, 1, 2 and the
SD_LISTEN_FDS_START..SD_LISTEN_FDS_START+LISTEN_FDS. This patch delays
the close just before the exec so the nspawn process (parent) keeps the
extra fds open.
This patch supersedes the previous attempt ("cloexec extraneous fds"):
http://lists.freedesktop.org/archives/systemd-devel/2015-May/031608.html
If a symlink to a combined cgroup hierarchy already exists and points to
the right path, skip it. This avoids an error when the cgroups are set
manually before calling nspawn.
Previously all bind mount mounts were applied in the order specified,
followed by all tmpfs mounts in the order specified. This is
problematic, if bind mounts shall be placed within tmpfs mounts.
This patch hence reworks the custom mount point logic, and alwas applies
them in strict prefix-first order. This means the order of mounts
specified on the command line becomes irrelevant, the right operation
will always be executed.
While we are at it this commit also adds native support for overlayfs
mounts, as supported by recent kernels.
Some systems abusively restrict mknod, even when the device node already
exists in /dev. This is unfortunate because it prevents systemd-nspawn
from creating the basic devices in /dev in the container.
This patch implements a workaround: when mknod fails, fallback on bind
mounts.
Additionally, /dev/console was created with a mknod with the same
major/minor as /dev/null before bind mounting a pts on it. This patch
removes the mknod and creates an empty regular file instead.
In order to test this patch, I used the following configuration, which I
think should replicate the system with the abusive restriction on mknod:
# grep devices /proc/self/cgroup
4:devices:/user.slice/restrict
# cat /sys/fs/cgroup/devices/user.slice/restrict/devices.list
c 1:9 r
c 5:2 rw
c 136:* rw
# systemd-nspawn --register=false -D .
v2:
- remove "bind", it is not needed since there is already MS_BIND
v3:
- fix error management when calling touch()
- fix lowercase in error message
We have no such check in any of the other tools, hence don't have one in
nspawn either.
(This should make things nicer for Rocket, among other things)
Note: removing this check does not mean that we support running nspawn
on non-systemd. We explicitly don't. It just means that we remove the
check for running it like that. You are still on your own if you do...
This change makes it so all seccomp filters are mapped
to the appropriate capability and are only added if that
capability was not requested when running the container.
This unbreaks the remaining use cases broken by the
addition of seccomp filters without respecting requested
capabilities.
Co-Authored-By: Clif Houck <me@clifhouck.com>
[zj: - adapt to our coding style, make struct anonymous]
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
Previously we always invoked the container PID 1 on /dev/console of the
container. With this change we do so only if nspawn was invoked
interactively (i.e. its stdin/stdout was connected to a TTY). In all other
cases we directly pass through the fds unmodified.
This has the benefit that nspawn can be added into shell pipelines.
https://bugs.freedesktop.org/show_bug.cgi?id=87732
nspawn containers currently block module loading in all cases, with
no option to disable it. This allows an admin, specifically setting
capability=CAP_SYS_MODULE or capability=all to load modules.
After all it is now much more like strjoin() than strappend(). At the
same time, add support for NULL sentinels, even if they are normally not
necessary.
We really want /tmp to be properly mounted, especially in containers
that lack CAP_SYS_ADMIN or that are not fully booted up and only get a
shell, hence let's do so in nspawn already.
When we set up a loopback device with partition probing, the udev
"change" event about the configured device is first passed on to
userspace, only the the in-kernel partition prober is started. Since
partition probing fails with EBUSY when somebody has the device open,
the probing frequently fails since udev starts probing/opening the
device as soon as it gets the notification about it, and it might do so
earlier than the kernel probing.
This patch adds a (hopefully temporary) work-around for this, that
compares the number of probed partitions of the kernel with those of
blkid and synchronously asks for reprobing until the numebrs are in
sync.
This really deserves a proper kernel fix.
After all, nspawn can now dissect MBR partition levels, too, hence
".gpt" appears a misnomer. Moreover, the the .raw suffix for these files
is already pretty popular (the Fedora disk images use it for example),
hence sounds like an OK scheme to adopt.
Sometimes udev or some other background daemon might keep the loopback
devices busy while we already want to detach them. Downgrade the warning
about it.
Given that we use autodetach downgrading these messages should be with
little risk.
With this change nspawn's -i switch now can now make sense of MBR disk
images too - however only if there's only a single, bootable partition
of type 0x83 on the image. For all other cases we cannot really make
sense from the partition table alone.
The big benefit of this change is that upstream Fedora Cloud Images can
now be booted unmodified with systemd-nspawn:
# wget http://download.fedoraproject.org/pub/fedora/linux/releases/21/Cloud/Images/x86_64/Fedora-Cloud-Base-20141203-21.x86_64.raw.xz
# unxz Fedora-Cloud-Base-20141203-21.x86_64.raw.xz
# systemd-nspawn -i Fedora-Cloud-Base-20141203-21.x86_64.raw -b
Next stop: teach the import logic to automatically download these
images, uncompress and verify them.
This is useful for nspawn managers that want to learn when nspawn is
finished with initialiuzation, as well what the PID of the init system
in the container is.
This adds three kinds of file system locks for container images:
a) a file system lock next to the actual image, in a .lck file in the
same directory the image is located. This lock has the benefit of
usually being located on the same NFS share as the image itself, and
thus allows locking container images across NFS shares.
b) a file system lock in /run, named after st_dev and st_ino of the
root of the image. This lock has the advantage that it is unique even
if the same image is bind mounted to two different places at the same
time, as the ino/dev stays constant for them.
c) a file system lock that is only taken when a new disk image is about
to be created, that ensures that checking whether the name is already
used across the search path, and actually placing the image is not
interrupted by other code taking the name.
a + b are read-write locks. When a container is booted in read-only mode
a read lock is taken, otherwise a write lock.
Lock b is always taken after a, to avoid ABBA problems.
Lock c is mostly relevant when renaming or cloning images.
Now that networkd's IP masquerading support means that running
containers with "--network-veth" will provide network access out of the
box for the container, let's add a shortcut "-n" for it, to make it
easily accessible.
It does not use any functions from libcap directly. The CAP_* constants in use
through this file come from "missing.h" which will import <linux/capability.h>
and complement it with CAP_* constants not defined by the current kernel
headers.
Add an explicit import of our "capability.h" since it does use the function
capability_bounding_set_drop from that header file. Previously, that header was
implicitly imported through through "cap-list.h".
Tested that "systemd-nspawn" builds cleanly and works after this change.
Also, when booting up an ephemeral container of / use the system
hostname as default machine name.
This way specifiyng -M is unnecessary when booting up an ephemeral
container, while allowing any number of ephemeral containers to run from
the same tree.
This works now:
# systemd-nspawn -xb -D / -M foobar
Which boots up an ephemeral container, based on the host's root file
system. Or in other words: you can now run the very same host OS you
booted your system with also in a container, on top of it, without
having it interfere. Great for testing whether the init system you are
hacking on still boots without reboot the system!
This adds --template= to duplicate an OS tree as btrfs snpashot and run
it
This also adds --ephemeral or -x to create a snapshot of an OS tree and
boot that, removing it after exit.