1
0
mirror of https://github.com/systemd/systemd.git synced 2024-11-08 11:27:32 +03:00
Commit Graph

375 Commits

Author SHA1 Message Date
Jay Faulkner
9a71b1122c nspawn: Map all seccomp filters to capabilities
This change makes it so all seccomp filters are mapped
to the appropriate capability and are only added if that
capability was not requested when running the container.

This unbreaks the remaining use cases broken by the
addition of seccomp filters without respecting requested
capabilities.

Co-Authored-By: Clif Houck <me@clifhouck.com>

[zj: - adapt to our coding style, make struct anonymous]
2015-03-04 23:18:09 -05:00
Lennart Poettering
c6c8f6e218 nspawn: make kill signal to use for PID 1 configurable 2015-02-25 22:06:54 +01:00
Thomas Hindoe Paaboel Andersen
2eec67acbb remove unused includes
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
2015-02-23 23:53:42 +01:00
Jan Synacek
4aab5d0cbd nspawn: fix whitespace and typo in partition table blurb 2015-02-23 15:26:58 +01:00
Lennart Poettering
6278cf6048 nspawn: chown basic device nodes to userns root 2015-02-19 12:03:39 +01:00
Lennart Poettering
d15d65a01f nspawn: fix build on non-selinux systems 2015-02-19 12:03:12 +01:00
Lennart Poettering
6dac160c0a nspawn: add basic user namespacing support
(This is incomplete, /proc and /sys are still owned by root from outside
the container, not inside)
2015-02-19 11:31:08 +01:00
Lennart Poettering
9c857b9d16 nspawn: when connected to pipes for stdin/stdout, pass them as-is to PID 1
Previously we always invoked the container PID 1 on /dev/console of the
container. With this change we do so only if nspawn was invoked
interactively (i.e. its stdin/stdout was connected to a TTY). In all other
cases we directly pass through the fds unmodified.

This has the benefit that nspawn can be added into shell pipelines.

https://bugs.freedesktop.org/show_bug.cgi?id=87732
2015-02-18 23:36:20 +01:00
Lennart Poettering
f36933fef6 nspawn: add support for --property= to set scope properties
This is similar to systemd-run's --property= setting.
2015-02-18 19:42:24 +01:00
Jay Faulkner
d0a0ccf3fe nspawn: Allow module loading if CAP_SYS_MODULE is requested
nspawn containers currently block module loading in all cases, with
no option to disable it. This allows an admin, specifically setting
capability=CAP_SYS_MODULE or capability=all to load modules.
2015-02-04 13:34:46 +01:00
Lennart Poettering
63c372cb9d util: rework strappenda(), and rename it strjoina()
After all it is now much more like strjoin() than strappend(). At the
same time, add support for NULL sentinels, even if they are normally not
necessary.
2015-02-03 02:05:59 +01:00
Thomas Hindoe Paaboel Andersen
fed6df828d remove unused variables 2015-02-02 22:58:06 +01:00
Lennart Poettering
c0534580ac nspawn: when mounting the cgroup hierarchies, use the exact same mount options for the superblock as the host
Otherwise we'll generate kernel runtime warnings about non-matching
mount options.
2015-01-23 01:43:16 +01:00
Lennart Poettering
bbb99c30d0 nspawn: mount /tmp in the container, don't leave this to the container's init
We really want /tmp to be properly mounted, especially in containers
that lack CAP_SYS_ADMIN or that are not fully booted up and only get a
shell, hence let's do so in nspawn already.
2015-01-23 01:27:06 +01:00
Alban Crequy
05e7da5afa nspawn: allow bind-mounting char and block files 2015-01-23 01:22:55 +01:00
Lennart Poettering
c09ef2e4e8 nspawn: work around kernel bug with partition table probing on loopback devices
When we set up a loopback device with partition probing, the udev
"change" event about the configured device is first passed on to
userspace, only the the in-kernel partition prober is started. Since
partition probing fails with EBUSY when somebody has the device open,
the probing frequently fails since udev starts probing/opening the
device as soon as it gets the notification about it, and it might do so
earlier than the kernel probing.

This patch adds a (hopefully temporary) work-around for this, that
compares the number of probed partitions of the kernel with those of
blkid and synchronously asks for reprobing until the numebrs are in
sync.

This really deserves a proper kernel fix.
2015-01-20 20:40:45 +01:00
Tom Gundersen
4bbfe7ad22 nspawn: add ipvlan support 2015-01-20 00:46:13 +01:00
Lennart Poettering
f6c51a8136 nspawn: support dissecting GPT images that contain only a single generic linux partition
This should allow running Ubuntu UEFI GPT Images with nspawn,
unmodified.
2015-01-19 20:24:10 +01:00
Lennart Poettering
2fbe4296c5 inspawn: wait until udev has probed a loopback device before making us of it 2015-01-19 20:24:10 +01:00
Jonathan Boulle
835214146b nspawn: fix log typos 2015-01-15 08:19:30 +01:00
Lennart Poettering
aceac2f0b6 import: rename "gpt" disk image type to "raw"
After all, nspawn can now dissect MBR partition levels, too, hence
".gpt" appears a misnomer. Moreover, the the .raw suffix for these files
is already pretty popular (the Fedora disk images use it for example),
hence sounds like an OK scheme to adopt.
2015-01-15 01:47:21 +01:00
Lennart Poettering
5e4074aa31 spawn: downgrade loopback detach errors to debug
Sometimes udev or some other background daemon might keep the loopback
devices busy while we already want to detach them. Downgrade the warning
about it.

Given that we use autodetach downgrading these messages should be with
little risk.
2015-01-15 00:51:56 +01:00
Lennart Poettering
ada4799ac5 nspawn: add support for limited dissecting of MBR disk images with nspawn
With this change nspawn's -i switch now can now make sense of MBR disk
images too - however only if there's only a single, bootable partition
of type 0x83 on the image. For all other cases we cannot really make
sense from the partition table alone.

The big benefit of this change is that upstream Fedora Cloud Images can
now be booted unmodified with systemd-nspawn:

 # wget http://download.fedoraproject.org/pub/fedora/linux/releases/21/Cloud/Images/x86_64/Fedora-Cloud-Base-20141203-21.x86_64.raw.xz
 # unxz Fedora-Cloud-Base-20141203-21.x86_64.raw.xz
 # systemd-nspawn -i Fedora-Cloud-Base-20141203-21.x86_64.raw -b

Next stop: teach the import logic to automatically download these
images, uncompress and verify them.
2015-01-15 00:47:10 +01:00
Lennart Poettering
733d15ac7a nspawn: pass the container's init PID out via sd_notify()
This is useful for nspawn managers that want to learn when nspawn is
finished with initialiuzation, as well what the PID of the init system
in the container is.
2015-01-14 23:29:01 +01:00
Lennart Poettering
657bdca9e4 nspawn: fix an incorrect assert comparison 2015-01-14 23:18:33 +01:00
Lennart Poettering
30535c1692 nspawn: add file system locks for controlling access to container images
This adds three kinds of file system locks for container images:

a) a file system lock next to the actual image, in a .lck file in the
   same directory the image is located. This lock has the benefit of
   usually being located on the same NFS share as the image itself, and
   thus allows locking container images across NFS shares.

b) a file system lock in /run, named after st_dev and st_ino of the
   root of the image. This lock has the advantage that it is unique even
   if the same image is bind mounted to two different places at the same
   time, as the ino/dev stays constant for them.

c) a file system lock that is only taken when a new disk image is about
   to be created, that ensures that checking whether the name is already
   used across the search path, and actually placing the image is not
   interrupted by other code taking the name.

a + b are read-write locks. When a container is booted in read-only mode
a read lock is taken, otherwise a write lock.

Lock b is always taken after a, to avoid ABBA problems.

Lock c is mostly relevant when renaming or cloning images.
2015-01-14 23:18:33 +01:00
Lennart Poettering
8937422f3b nspawn: remove the right propagation directory 2015-01-14 23:18:33 +01:00
Lennart Poettering
ab5e3a1bcc nspawn: --help typo fix 2015-01-13 20:59:07 +01:00
Lennart Poettering
0dfaa00607 nspawn: add "-n" shortcut for "--network-veth"
Now that networkd's IP masquerading support means that running
containers with "--network-veth" will provide network access out of the
box for the container, let's add a shortcut "-n" for it, to make it
easily accessible.
2015-01-13 20:17:06 +01:00
Lennart Poettering
6d0b55c272 nspawn: add new option "--port=" for exposing container ports on the local host
This exposes an IP port on the container as local port using DNAT.
2015-01-13 13:55:15 +01:00
Lennart Poettering
f2068bcce0 machined: when cloning a raw disk image, also set the NOCOW flag 2015-01-08 23:13:45 +01:00
Tom Gundersen
080e78329a nspawn: fix error message when mknod fails 2015-01-08 17:09:45 +01:00
Lennart Poettering
0ec5543c4c machinectl: make sure that "machinectl login" exits immediately when the machine it is connected to dies 2015-01-07 03:08:00 +01:00
Lennart Poettering
b12afc8c5c nspawn: mount most of the cgroup tree read-only in nspawn containers except for the container's own subtree in the name=systemd hierarchy
More specifically mount all other hierarchies in their entirety and the
name=systemd above the container's subtree read-only.
2015-01-05 01:40:51 +01:00
Lennart Poettering
814a3fdfdc nspawn: report back to systemd only very late whether we are OK
That way, systemd can actually figure out if everything is OK with
nspawn.
2014-12-29 17:54:33 +01:00
Lennart Poettering
1b9cebf638 nspawn: use the same image discovery logic in nspawn as in machined 2014-12-28 02:08:40 +01:00
Filipe Brandenburger
f01ae8260d nspawn: remove spurious include of <sys/capability.h>
It does not use any functions from libcap directly. The CAP_* constants in use
through this file come from "missing.h" which will import <linux/capability.h>
and complement it with CAP_* constants not defined by the current kernel
headers.

Add an explicit import of our "capability.h" since it does use the function
capability_bounding_set_drop from that header file. Previously, that header was
implicitly imported through through "cap-list.h".

Tested that "systemd-nspawn" builds cleanly and works after this change.
2014-12-25 10:55:42 -05:00
Lennart Poettering
611b312b7d nspawn,pty: port over to new ptsname_malloc() helper 2014-12-23 03:26:24 +01:00
Lennart Poettering
c7b7d4493a machinectl,nspawn: don't print extra final newline if pty terminal output was newline-terinated anyway 2014-12-23 03:26:24 +01:00
Lennart Poettering
9b15b7846d run: add a new "-t" mode for invoking a binary on an allocated TTY 2014-12-23 03:26:24 +01:00
Lennart Poettering
785890acf6 machinectl: implement "bind" command to create additional bind mounts from host to container during runtime 2014-12-18 01:36:28 +01:00
Ken Werner
60e1651a31 nspawn: fix invocation of the raw clone() system call on s390 and cris
Since the order of the first and second arguments of the raw clone() system
call is reversed on s390 and cris it needs to be invoked differently.
2014-12-17 00:20:56 -05:00
Lennart Poettering
b9ba4dabba nspawn: when booting in ephemeral mode, append random token to machine name
Also, when booting up an ephemeral container of / use the system
hostname as default machine name.

This way specifiyng -M is unnecessary when booting up an ephemeral
container, while allowing any number of ephemeral containers to run from
the same tree.
2014-12-12 17:30:25 +01:00
Lennart Poettering
c4e34a612c nspawn: allow spawning ephemeral nspawn containers based on the root file system of the OS
This works now:

        # systemd-nspawn -xb -D / -M foobar

Which boots up an ephemeral container, based on the host's root file
system. Or in other words: you can now run the very same host OS you
booted your system with also in a container, on top of it, without
having it interfere. Great for testing whether the init system you are
hacking on still boots without reboot the system!
2014-12-12 17:30:25 +01:00
Lennart Poettering
df9a75e480 nspawn: don't link journals in ephemeral mode 2014-12-12 17:30:25 +01:00
Lennart Poettering
53e438e301 nspawn: properly unset arg_link_journal_try, when --link-journal= is specified 2014-12-12 17:30:25 +01:00
Lennart Poettering
ec16945ebf nspawn: beef up nspawn with some btrfs magic
This adds --template= to duplicate an OS tree as btrfs snpashot and run
it

This also adds --ephemeral or -x to create a snapshot of an OS tree and
boot that, removing it after exit.
2014-12-12 13:35:32 +01:00
Lennart Poettering
0c3c42847d nspawn: properly validate machine names 2014-12-12 13:35:32 +01:00
Lennart Poettering
2822da4fb7 util: introduce our own gperf based capability list
This way, we can ensure we have a more complete, up-to-date list of
capabilities around, always.
2014-12-10 03:21:07 +01:00
Lennart Poettering
a90e23051b nspawn: create the macvlan MAC addresses in an arch independent stable way 2014-12-10 00:26:16 +01:00
Lennart Poettering
e867ceb6b9 nspawn: make sure macvlan MAC addresses are stable
https://bugs.freedesktop.org/show_bug.cgi?id=85527
2014-12-09 01:20:09 +01:00
Lennart Poettering
04a9193940 nspawn: correct EEXIST check when creating directory to mount /tmp in
https://bugs.freedesktop.org/show_bug.cgi?id=86309
2014-12-03 17:53:33 +01:00
Zbigniew Jędrzejewski-Szmek
01dc33ce28 nspawn: fix unused variable warning 2014-11-29 11:11:10 -05:00
Zbigniew Jędrzejewski-Szmek
820d3acfe9 delta: diff returns 1 when files differ, ignore this
https://bugs.debian/org/771397
2014-11-29 11:10:51 -05:00
Michal Schmidt
4a62c710b6 treewide: another round of simplifications
Using the same scripts as in f647962d64 "treewide: yet more log_*_errno
+ return simplifications".
2014-11-28 19:57:32 +01:00
Michal Schmidt
56f64d9576 treewide: use log_*_errno whenever %m is in the format string
If the format string contains %m, clearly errno must have a meaningful
value, so we might as well use log_*_errno to have ERRNO= logged.

Using:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\((".*%m.*")/log_\1_errno(errno, \2/'

Plus some whitespace, linewrap, and indent adjustments.
2014-11-28 19:49:27 +01:00
Michal Schmidt
f647962d64 treewide: yet more log_*_errno + return simplifications
Using:
find . -name '*.[ch]' | while read f; do perl -i.mmm -e \
 'local $/;
  local $_=<>;
  s/(if\s*\([^\n]+\))\s*{\n(\s*)(log_[a-z_]*_errno\(\s*([->a-zA-Z_]+)\s*,[^;]+);\s*return\s+\g4;\s+}/\1\n\2return \3;/msg;
  print;'
 $f
done

And a couple of manual whitespace fixups.
2014-11-28 18:56:16 +01:00
Michal Schmidt
da927ba997 treewide: no need to negate errno for log_*_errno()
It corrrectly handles both positive and negative errno values.
2014-11-28 13:29:21 +01:00
Michal Schmidt
0a1beeb642 treewide: auto-convert the simple cases to log_*_errno()
As a followup to 086891e5c1 "log: add an "error" parameter to all
low-level logging calls and intrdouce log_error_errno() as log calls
that take error numbers", use sed to convert the simple cases to use
the new macros:

find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\("(.*)%s"(.*), strerror\(-([a-zA-Z_]+)\)\);/log_\1_errno(-\4, "\2%m"\3);/'

Multi-line log_*() invocations are not covered.
And we also should add log_unit_*_errno().
2014-11-28 12:04:41 +01:00
Richard Schütz
6c2d07020f nspawn: ignore EEXIST when mounting tmpfs
commit 79d80fc146 introduced a regression that
prevents mounting a tmpfs if the mount point already exits in the container's
root file system. This commit fixes the problem by ignoring EEXIST.
2014-11-22 20:05:19 -05:00
Martin Pitt
574edc9006 nspawn: Add try-{host,guest} journal link modes
--link-journal={host,guest} fail if the host does not have persistent
journalling enabled and /var/log/journal/ does not exist. Even worse, as there
is no stdout/err any more, there is no error message to point that out.

Introduce two new modes "try-host" and "try-guest" which don't fail in this
case, and instead just silently skip the guest journal setup.

Change -j to mean "try-guest" instead of "guest", and fix the wrong --help
output for it (it said "host" before).

Change systemd-nspawn@.service.in to use "try-guest" so that this unit works
with both persistent and non-persistent journals on the host without failing.

https://bugs.debian.org/770275
2014-11-21 14:27:26 +01:00
Daniel Mack
63cc4c3138 sd-bus: sync with kdbus upstream (ABI break)
kdbus has seen a larger update than expected lately, most notably with
kdbusfs, a file system to expose the kdbus control files:

 * Each time a file system of this type is mounted, a new kdbus
   domain is created.

 * The layout inside each mount point is the same as before, except
   that domains are not hierarchically nested anymore.

 * Domains are therefore also unnamed now.

 * Unmounting a kdbusfs will automatically also detroy the
   associated domain.

 * Hence, the action of creating a kdbus domain is now as
   privileged as mounting a filesystem.

 * This way, we can get around creating dev nodes for everything,
   which is last but not least something that is not limited by
   20-bit minor numbers.

The kdbus specific bits in nspawn have all been dropped now, as nspawn
can rely on the container OS to set up its own kdbus domain, simply by
mounting a new instance.

A new set of mounts has been added to mount things *after* the kernel
modules have been loaded. For now, only kdbus is in this set, which is
invoked with mount_setup_late().
2014-11-13 20:41:52 +01:00
David Herrmann
dfb05a1cf5 barrier: explicitly ignore return values of barrier_place()
The barrier implementation tracks remote states internally. There is no
need to check the return value of any barrier_*() function if the caller
is not interested in the result. The barrier helpers only return the state
of the remote side, which is usually not interesting as later calls to
barrier_sync() will catch this, anyway.

Shut up coverity by explicitly ignoring return values of barrier_place()
if we're not interested in it.
2014-11-04 09:49:43 +01:00
Lennart Poettering
023fb90b83 ptyforward: rework PTY forwarder logic used by nspawn to utilize the normal event loop
We really should not run manual event loops anymore, but standardize on
sd_event, so that we can run sd_bus connections from it eventually.
2014-10-31 16:55:04 +01:00
Lennart Poettering
919699ec30 units: don't order journal flushing afte remote-fs.target
Instead, only depend on the actual file systems we need.

This should solve dep loops on setups where remote-fs.target is moved
into late boot.
2014-10-31 16:23:39 +01:00
Lennart Poettering
fddbb89c46 nspawn: don't make up -1 as error code 2014-10-31 16:23:39 +01:00
Dave Reisner
1ab19cb167 nspawn: ignore EEXIST when creating mount point
A combination of commits f3c80515c and 79d80fc14 cause nspawn to
silently fail with a commandline such as:

  # systemd-nspawn -D /build/extra-x86_64 --bind=/usr

strace shows the culprit:

  [pid 27868] writev(2, [{"Failed to create mount point /build/extra-x86_64/usr: File exists", 82}, {"\n", 1}], 2) = 83
2014-10-29 13:42:51 -04:00
Michal Sekletar
605f81a896 util: introduce sethostname_idempotent
Function queries system hostname and applies changes only when necessary. Also,
migrate all client of sethostname to sethostname_idempotent while at it.
2014-10-27 10:37:46 +01:00
Daniel Mack
317cde8b80 nspawn: fix DeviceAllow list
Commit 864e17068 ("nspawn: actually allow access to /dev/net/tun in the
container") added "/dev/net/tun" to the list of allowed devices but forgot
to tweak the array length, which caused "/dev/kdbus/*" to be missed.
2014-10-17 16:07:12 +02:00
Lennart Poettering
864e17068c nspawn: actually allow access to /dev/net/tun in the container
It's not sufficient to just copy the device node over, we need to update
the policy for it too.
2014-10-10 11:11:25 +02:00
Tom Gundersen
85614d663e nspawn: copy /dev/net/tun from host
This enables tuntap support in the container (assumning the necessary capabilities are in place).
2014-10-08 15:52:07 +02:00
Tom Gundersen
e8c8ddccfc nspawn: log when tearing down of loop device fails 2014-09-29 20:52:10 +02:00
Tom Gundersen
79d80fc146 nspawn: check some more return values
Most of these failures would anyway get caught later on, but now the error messages are a bit more
specific.
2014-09-25 19:10:11 +02:00
Tom Gundersen
c00524c9cc nspawn: don't try to create veth link with too long ifname
Reported by: James Lott <james@lottspot.com>
2014-09-19 23:02:00 +02:00
Tom Gundersen
3125b3ef5d nspawn: fix --network-interface
Use SETLINK when modifying an existing link.
2014-08-28 12:16:07 +02:00
Lennart Poettering
1b6d7fa742 util: make use of newly added reset_signal_mask() call wherever appropriate 2014-08-26 21:12:54 +02:00
Lennart Poettering
af4ec4309e notify: send STOPPING=1 from our daemons 2014-08-21 17:24:21 +02:00
Lennart Poettering
4f758c2398 nspawn: make sure that when --network-veth is used both the host and the container side get fixed MAC addresses 2014-08-04 19:15:07 +02:00
Lennart Poettering
249968612f bus: always explicitly close bus from main programs
Since b5eca3a205 we don't attempt to GC
busses anymore when unsent messages remain that keep their reference,
when they otherwise are not referenced anymore. This means that if we
explicitly want connections to go away, we need to close them.

With this change we will no do so explicitly wherver we connect to the
bus from a main program (and thus know when the bus connection should go
away), or when we create a private bus connection, that really should go
away after our use.

This fixes connection leaks in the NSS and PAM modules.
2014-08-04 16:25:24 +02:00
Zbigniew Jędrzejewski-Szmek
601185b43d Unify parse_argv style
getopt is usually good at printing out a nice error message when
commandline options are invalid. It distinguishes between an unknown
option and a known option with a missing arg. It is better to let it
do its job and not use opterr=0 unless we actually want to suppress
messages. So remove opterr=0 in the few places where it wasn't really
useful.

When an error in options is encountered, we should not print a lengthy
help() and overwhelm the user, when we know precisely what is wrong
with the commandline. In addition, since help() prints to stdout, it
should not be used except when requested with -h or --help.

Also, simplify things here and there.
2014-08-03 21:46:07 -04:00
Zbigniew Jędrzejewski-Szmek
4212a3375e nspawn: fix truncation of machine names in interface names
Based on patch by Michael Marineau <michael.marineau@coreos.com>:

When deriving the network interface name from machine name strncpy was
not properly null terminating the string and the maximum string size as
returned by strlen() is actually IFNAMSIZ-1, not IFNAMSIZ.
2014-08-03 01:29:51 -04:00
Zbigniew Jędrzejewski-Szmek
a2a5291b3f Reject invalid quoted strings
String which ended in an unfinished quote were accepted, potentially
with bad memory accesses.

Reject anything which ends in a unfished quote, or contains
non-whitespace characters right after the closing quote.

_FOREACH_WORD now returns the invalid character in *state. But this return
value is not checked anywhere yet.

Also, make 'word' and 'state' variables const pointers, and rename 'w'
to 'word' in various places. Things are easier to read if the same name
is used consistently.

mbiebl_> am I correct that something like this doesn't work
mbiebl_> ExecStart=/usr/bin/encfs --extpass='/bin/systemd-ask-passwd "Unlock EncFS"'
mbiebl_> systemd seems to strip of the quotes
mbiebl_> systemctl status shows
mbiebl_> ExecStart=/usr/bin/encfs --extpass='/bin/systemd-ask-password Unlock EncFS  $RootDir $MountPoint
mbiebl_> which is pretty weird
2014-07-31 04:00:31 -04:00
Zbigniew Jędrzejewski-Szmek
7566e26721 barrier: initalize file descriptors with -1
Explicitly initalize descriptors using explicit assignment like
bus_error. This makes barriers follow the same conventions as
everything else and makes things a bit simpler too.

Rename barier_init to barier_create so it is obvious that it is
not about initialization.

Remove some parens, etc.
2014-07-18 20:12:44 -04:00
David Herrmann
3496b9eeaf nspawn: fix barrier-destroy call
I dropped the cleanup-helper before pushing so use _cleanup_() directly.
2014-07-17 11:48:39 +02:00
David Herrmann
a2da110b78 nspawn: use Barrier API instead of eventfd-util
The Barrier-API simplifies cross-fork() synchronization a lot. Replace the
hard-coded eventfd-util implementation and drop it.

Compared to the old API, Barriers also handle exit() of the remote side as
abortion. This way, segfaults will not cause the parent to deadlock.

EINTR handling is currently ignored for any barrier-waits. This can easily
be added, but it isn't needed so far so I dropped it. EINTR handling in
general is ugly, anyway. You need to deal with pselect/ppoll/... variants
and make sure not to unblock signals at the wrong times. So genrally,
there's little use in adding it.
2014-07-17 11:34:25 +02:00
Lennart Poettering
5aa4bb6b5b nspawn: register external network interface with machined 2014-07-10 22:48:30 +02:00
Lennart Poettering
4d9f07b492 nspawn: add new --volatile switch for booting containers in volatile (ephemeral) mode
Two modes are supported: --volatile=yes mounts only /usr into the
container, and a tmpfs as root directory. --volatile=state mounts the
full OS tree in, but overmounts /var with a tmpfs.

--volatile=yes hence boots with an unpopulated /etc and /var, starting
with pristine configuration and state.

--volatile=state hence boots with an unpopulated /var, only starting
with pristine state.
2014-07-04 03:24:42 +02:00
Lennart Poettering
ce38dbc84b nspawn: when running in a service unit, use systemd for restarts
THis way we can remove cgroup priviliges after setup, but get them back
for the next restart, as we need it.
2014-07-03 12:51:07 +02:00
Lennart Poettering
28650077f3 nspawn: block open_by_handle_at() and others via seccomp
Let's protect ourselves against the recently reported docker security
issue. Our man page makes clear that we do not make any security
promises anyway, but well, this one is easy to mitigate, so let's do it.
While we are at it block a couple of more syscalls that are no good in
containers, too.
2014-06-30 16:22:12 +02:00
Lennart Poettering
840295fc1e nspawn: let's avoid using goto to wildly for non-cleanup purposes 2014-06-30 15:20:59 +02:00
Lennart Poettering
ce9f1527b6 nspawn: simplify exit condition check 2014-06-30 15:19:00 +02:00
Luke Shumaker
8baaf7a3d8 nspawn: log a warning on failure from wait_for_terminate()
This is at the suggestion of Djalal Harouni on the mailing list, and
reflects the behavior of shared/util.c:wait_for_terminate_and_warn().
2014-06-30 15:13:53 +02:00
Luke Shumaker
6d416b9cc8 nspawn: Fix regression with exit status
Commit 113cea8 introduced a bug that caused the exit code of systemd-nspawn
to not reflect the exit code of the program executed in the container.
2014-06-30 15:13:47 +02:00
Kay Sievers
971ff8c78b switch-root: create essential base directories at system bootup
This allows us to bootup a rootfs with a /usr directory only.
2014-06-24 18:12:31 +02:00
Kay Sievers
3577de7ac3 nspawn: create essential base directories at system bootup
This allows us to bootup a rootfs with a /usr directory only.
2014-06-24 15:41:03 +02:00
Thomas Hindoe Paaboel Andersen
c8b32e11ee consistently order cleanup attribute before type 2014-06-22 00:45:15 +02:00
Lennart Poettering
5ae4d543cb os-release: define /usr/lib/os-release as fallback for /etc/os-release
The file should have been in /usr/lib/ in the first place, since it
describes the OS container in /usr (and not the configuration in /etc),
hence, let's support os-release files in /usr/lib as fallback if no
version in /etc exists, following the usual override logic.

A prior commit already enabled tmpfiles to create /etc/os-release as a
symlink to /usr/lib/os-release should it be missing, thus providing nice
compatibility with applications only checking in /etc.

While it's probably a good idea if all apps check both locations via a
fallback logic, it is only necessary in the early boot process, as long
as the /etc/os-release symlink has not been restored, in case we boot
with an empty /etc.
2014-06-13 20:11:59 +02:00
Lennart Poettering
06c17c39a8 nspawn: add new --tmpfs= option to mount a tmpfs on specific directories, such as /var 2014-06-11 00:44:30 +02:00
Lennart Poettering
849958d1ba tmpfiles: add new "C" line for copying files or directories 2014-06-10 23:02:40 +02:00
Zbigniew Jędrzejewski-Szmek
45f1386c9a nspawn: split long message into two lines
For names like /var/lib/container/something, the message
becomes quite long. Better to split it.

Also reword the message not to suggest that ^]^]^] only works
in the beginning.
2014-06-07 16:30:51 -04:00
Lennart Poettering
d6797c920e namespace: beef up read-only bind mount logic
Instead of blindly creating another bind mount for read-only mounts,
check if there's already one we can use, and if so, use it. Also,
recursively mark all submounts read-only too. Also, ignore autofs mounts
when remounting read-only unless they are already triggered.
2014-06-06 14:37:40 +02:00
Djalal Harouni
e866af3acc nspawn: make nspawn robust to container failure
nspawn and the container child use eventfd to wait and notify each other
that they are ready so the container setup can be completed.

However in its current form the wait/notify event ignore errors that
may especially affect the child (container).

On errors the child will jump to the "child_fail" label and terminate
with _exit(EXIT_FAILURE) without notifying the parent. Since the eventfd
is created without the "EFD_NONBLOCK" flag, this leaves the parent
blocking on the eventfd_read() call. The container can also be killed
at any moment before execv() and the parent will not receive
notifications.

We can fix this by using cheap mechanisms, the new high level eventfd
API and handle SIGCHLD signals:

* Keep the cheap eventfd and EFD_NONBLOCK flag.

* Introduce eventfd states for parent and child to sync.
Child notifies parent with EVENTFD_CHILD_SUCCEEDED on success or
EVENTFD_CHILD_FAILED on failure and before _exit(). This prevents the
parent from waiting on an event that will never come.

* If the child is killed before execv() or before notifying the parent,
we install a NOP handler for SIGCHLD which will interrupt blocking calls
with EINTR. This gives a chance to the parent to call wait() and
terminate in main().

* If there are no errors, parent will block SIGCHLD, restore default
handler and notify child which will do execv(), then parent will pass
control to process_pty() to do its magic.

This was exposed in part by:
https://bugs.freedesktop.org/show_bug.cgi?id=76193

Reported-by: Tobias Hunger tobias.hunger@gmail.com
2014-05-25 11:23:35 +08:00
Djalal Harouni
113cea802d nspawn: move container wait logic into wait_for_container()
Move the container wait logic into its own wait_for_container() function
and add two status codes: CONTAINER_TERMINATED or CONTAINER_REBOOTED.
The status will be stored in its argument, this way we handle:
a) Return negative on failures.
b) Return zero on success and set the status to either
   CONTAINER_REBOOTED or CONTAINER_TERMINATED.

These status codes are used to terminate nspawn or loop again in case of
CONTAINER_REBOOTED.
2014-05-25 11:23:30 +08:00
Cristian Rodríguez
590b6b9188 Use %m instead of strerror(errno) where appropiate 2014-05-25 11:18:28 +08:00
Lennart Poettering
cdb2b9d05a nspawn: restore journal directory is empty check
This undoes part of commit e6a4a517be.

Instead of removing the error message about non-empty journal bind mount
directories, simply downgrade the message to a warning and proceed.
2014-05-22 15:21:01 +09:00
Djalal Harouni
e6a4a517be nspawn: allow to bind mount journal on top of a non empty container journal dentry
Currently if nspawn was called with --link-journal=host or
--link-journal=auto and the right /var/log/journal/machine-id/ exists
then the bind mount the subdirectory into the container might fail due
to the ~/mycontainer/var/log/journal/machine-id/ of the container not
being empty.

There is no reason to check if the container journal subdir is empty
since there will be a bind mount on top of it. The user asked for a bind
mount so give it.

Note: a next call with --link-journal=guest may fail due to the
/var/log/journal/machine-id/ on the host not being empty.

https://bugs.freedesktop.org/show_bug.cgi?id=76193

Reported-by: Tobias Hunger <tobias.hunger@gmail.com>
2014-05-22 09:55:23 +09:00
Nis Martensen
f1721625e7 fix spelling of privilege 2014-05-19 00:40:44 +09:00
Lennart Poettering
9f24adc288 nspawn: properly format container_uuid in UUID format
http://lists.freedesktop.org/archives/systemd-devel/2014-April/018971.html
2014-05-16 19:37:19 +02:00
Philip Lorenz
70f539ca14 nspawn: Fix erroneous OOM when building group list
change_uid_gid() never initialises sz which may cause greedy_realloc to
skip the initial buffer allocation.
2014-04-10 09:50:39 -04:00
Tom Gundersen
d8e538ecd9 sd-rtnl: rework rtnl type system
Use a static table with all the typing information, rather than repeated
switch statements. This should make it a lot simpler to add new types.

We need to keep all the type info to be able to create containers
without exposing their implementation details to the users of the library.

As a freebee we verify the types of appended/read attributes.

The API is extended to nicely deal with unions of container types.
2014-03-28 19:11:59 +01:00
Lennart Poettering
3d94f76c99 util: replace close_pipe() with new safe_close_pair()
safe_close_pair() is more like safe_close(), except that it handles
pairs of fds, and doesn't make and misleading allusion, as it works
similarly well for socketpairs() as for pipe()s...
2014-03-24 03:22:44 +01:00
Lennart Poettering
03e334a1c7 util: replace close_nointr_nofail() by a more useful safe_close()
safe_close() automatically becomes a NOP when a negative fd is passed,
and returns -1 unconditionally. This makes it easy to write lines like
this:

        fd = safe_close(fd);

Which will close an fd if it is open, and reset the fd variable
correctly.

By making use of this new scheme we can drop a > 200 lines of code that
was required to test for non-negative fds or to reset the closed fd
variable afterwards.
2014-03-18 19:31:34 +01:00
Tom Gundersen
039dd4afd6 nspawn: UP the host side of the veth pair after adding it to a bridge 2014-03-16 13:55:41 +01:00
Dave Reisner
7947952ede nspawn: remove unused variable 2014-03-13 21:56:07 -04:00
Brandon Philips
f418f31d50 nspawn: allow -EEXIST on mkdir_safe /home/${uid}
With systemd 211 nspawn attempts to create the home directory for the
given uid. However, if the home directory already exists then it will
fail. Don't error out on -EEXIST.
2014-03-14 02:25:56 +01:00
Tom Gundersen
01dde0611b nspawn: make host0's MAC address persistent
We still need to make sure that no two MAC addresses are the same, so we use
a logic similar to what is used in udev to generate MAC addresses, and base
it on a hash of the host's machine ID and thecontainer's name.
2014-03-13 17:47:33 +01:00
Lennart Poettering
727fd4fda5 nspawn: honour GPT partition flags when mounting file systems following the discoverable partitions spec 2014-03-13 01:33:33 +01:00
Mantas Mikulėnas
4de8292689 nspawn: fix argv[0] for getent 2014-03-11 17:45:20 +01:00
Lennart Poettering
a07f961e98 nspawn: allow using kdbus from nspawn containers 2014-03-11 17:43:41 +01:00
Lennart Poettering
8c4e25b73c nspawn: fix getent fallback 2014-03-11 03:08:54 +01:00
Lennart Poettering
0cb9fbcd44 nspawn: when resoliving UIDs/GIDs for "-u", do so in forked off /usr/bin/getent instead of in-process
When the container runs a different native architecture than the host we
shouldn't attempt to load the container's NSS modules with the host's
libc. Instead, resolve UID/GID by invoking /usr/bin/getent in the
container. The tool should be fairly universally available and allows us
to do resolving of the UID/GID with the container's libc in a parsable
format.

https://bugs.freedesktop.org/show_bug.cgi?id=75733
2014-03-11 02:41:13 +01:00
Lennart Poettering
d96c1ecf7b nspawn: make sure we don't try to mount the container block device in the child after the parent added us to the device cgroup 2014-03-11 01:01:38 +01:00
Lennart Poettering
eb0f0863f5 nspawn: don't try mknod() of /dev/console with the correct major/minor
We overmount /dev/console with an external pty anyway, hence there's no
point in using the real major/minor when we create the node to
overmount. Instead, use the one of /dev/null now.

This fixes a race against the cgroup device controller setup we are
using. In case /dev/console was create before the cgroup policy was
applied all was good, but if created in the opposite order the mknod()
would fail, since creating /dev/console is not allowed by it. Creating
/dev/null instances is however permitted, and hence use it.
2014-03-10 21:36:01 +01:00
Lennart Poettering
1b9e5b1263 nspawn: add --image= switch to boot GPT disk images that follow the Discoverable Partitions Specification 2014-03-10 20:35:52 +01:00
Tero Roponen
13e8ceb84e nspawn: fix detection of missing /proc/self/loginuid
Running 'systemd-nspawn -D /srv/Fedora/' gave me this error:
 Failed to read /proc/self/loginuid: No such file or directory

 Container Fedora failed with error code 1.

This patch fixes the problem.
2014-02-28 12:58:02 +01:00
Lennart Poettering
9875fd7875 nspawn: no need for duplicate checks against EEXIST 2014-02-26 02:19:28 +01:00
Lennart Poettering
c74e630d0c nspawn: add new switch --network-macvlan= to add a macvlan device to the container 2014-02-25 02:37:59 +01:00
Lennart Poettering
9457ac5b4e nspawn: make use of the devices cgroup controller by default 2014-02-24 03:38:58 +01:00
Lennart Poettering
08af0da269 nspawn: when adding a veth interface to a bridge, use the "vb-" rather than "ve-" interface name prefix
This way we can recognize the interfaces later on to apply different
host-side configuration to them.
2014-02-21 04:02:12 +01:00
Lennart Poettering
151b9b9662 api: in constructor function calls, always put the returned object pointer first (or second)
Previously the returned object of constructor functions where sometimes
returned as last, sometimes as first and sometimes as second parameter.
Let's clean this up a bit. Here are the new rules:

1. The object the new object is derived from is put first, if there is any

2. The object we are creating will be returned in the next arguments

3. This is followed by any additional arguments

Rationale:

For functions that operate on an object we always put that object first.
Constructors should probably not be too different in this regard. Also,
if the additional parameters might want to use varargs which suggests to
put them last.

Note that this new scheme only applies to constructor functions, not to
all other functions. We do give a lot of freedom for those.

Note that this commit only changes the order of the new functions we
added, for old ones we accept the wrong order and leave it like that.
2014-02-20 00:03:10 +01:00
Lennart Poettering
39883f622f make gcc shut up
If -flto is used then gcc will generate a lot more warnings than before,
among them a number of use-without-initialization warnings. Most of them
without are false positives, but let's make them go away, because it
doesn't really matter.
2014-02-19 17:53:50 +01:00
Lennart Poettering
ac45f971a1 core: add Personality= option for units to set the personality for spawned processes 2014-02-19 03:27:03 +01:00
Lennart Poettering
6afc95b736 nspawn: add new --personality= switch to make it easier to run 32bit containers on a 64bit host 2014-02-18 23:37:27 +01:00
Lennart Poettering
3302da4667 nspawn: x86 is special with its socketcall() semantics, be permissive in the seccomp setup 2014-02-18 22:27:46 +01:00
Lennart Poettering
e9642be2cc seccomp: add helper call to add all secondary archs to a seccomp filter
And make use of it where appropriate for executing services and for
nspawn.
2014-02-18 22:14:00 +01:00
Dave Reisner
f3d5485b80 nspawn: allow 32-bit chroots from 64-bit hosts
Arch Linux uses nspawn as a container for building packages and needs
to be able to start a 32bit chroot from a 64bit host. 24fb111207
disrupted this feature when seccomp handling was added.
2014-02-18 21:26:24 +01:00
Tom Gundersen
4fb7242cbb sd-rtnl-message: store reference to the bus in the message
This mimics the sd-bus api, as we may need it in the future.
2014-02-18 11:21:22 +01:00
Lennart Poettering
37c47eb709 nspawn: netns_fd can be removed now 2014-02-17 15:49:21 +01:00
Thomas Hindoe Paaboel Andersen
32457153f4 nspawn: typo fix in help 2014-02-16 22:15:24 +01:00
Tom Gundersen
ab046dde6f nspawn: add new --network-bridge= switch
This adds the host side of the veth link to the given bridge.

Also refactor the creation of the veth interfaces a bit to set it up
from the host rather than the container. This simplifies the addition
to the bridge, but otherwise the behavior is unchanged.
2014-02-16 21:40:28 +01:00
Tom Gundersen
818dc5e72a sd-rtnl: always include linux/rtnetlink.h 2014-02-15 12:14:45 +01:00
Tom Gundersen
ee3a6a51e5 sd-rtnl: message_open_container - don't take a 'size' argument
We can always know the size based on the type, so let's do this inside the library.
2014-02-15 12:14:45 +01:00
Lennart Poettering
262d10e6bd nspawn: if we don't find bash, try sh 2014-02-14 16:41:03 +01:00
Lennart Poettering
6b9132a9c4 nspawn: don't accept just any tree to execute
When invoked without -D in an arbitrary directory we should not try to
execute anything, make some validity checks first.
2014-02-14 16:35:18 +01:00
Lennart Poettering
24fb111207 nspawn: make socket(AF_NETLINK, *, NETLINK_AUDIT) fail with EAFNOTSUPPORT in containers
The kernel still doesn't support audit in containers, so let's make use
of seccomp and simply turn it off entirely. We can get rid of this big
as soon as the kernel is fixed again.
2014-02-13 20:30:02 +01:00
Lennart Poettering
69c79d3c32 nspawn: add new --network-veth switch to add a virtual ethernet link to the host 2014-02-13 18:47:53 +01:00
Lennart Poettering
7e2270246b nspawn: check with udev before we take possession of an interface 2014-02-13 14:38:02 +01:00
Lennart Poettering
b88eb17a7a nspawn: no need to subscribe to netlink messages if we just want to execute one operation 2014-02-13 14:08:16 +01:00
Lennart Poettering
a42c8b54b1 nspawn: --private-network should imply CAP_NET_ADMIN 2014-02-13 14:07:59 +01:00
Lennart Poettering
d595c5cc9e rtnl: rename constructors from the form sd_rtnl_xxx_yyy_new() to sd_rtnl_xxx_new_yyy()
So far we followed the rule to always indicate the "flavour" of
constructors after the "_new_" or "_open_" in the function name, so
let's keep things in sync here for rtnl and do the same.
2014-02-13 13:53:25 +01:00