1
1
mirror of https://github.com/systemd/systemd-stable.git synced 2025-01-26 10:03:40 +03:00

3029 Commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek
ee43050b40 Merge pull request #4692 from poettering/networkd-dhcp
Various networkd/DHCP fixes.
2016-11-22 23:22:04 -05:00
Janne Heß
6d9e45e97f Document an edge-case with resume and mounting (#4581)
When trying to read keyfiles from an encrypted partition to unlock the swap,
a cyclic dependency is generated because systemd can not mount the
filesystem before it has checked if there is a swap to resume from.

Closes #3940
2016-11-22 23:19:56 -05:00
Lennart Poettering
17cbb288fa nspawn: add fallback top normal copy/reflink when we cannot btrfs snapshot
Given that other file systems (notably: xfs) support reflinks these days, let's
extend the file system snapshotting logic to fall back to plan copies or
reflinks when full btrfs subvolume snapshots are not available.

This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn
--template=" available on non-btrfs subvolumes. Of course, both operations will
still be slower on non-btrfs than on btrfs (simply because reflinking each file
individually in a directory tree is still slower than doing this in one step
for a whole subvolume), but it's probably good enough for many cases, and we
should provide the users with the tools, they have to figure out what's good
for them.

Note that "machinectl clone" already had a fallback like this in place, this
patch generalizes this, and adds similar support to our other cases.
2016-11-22 13:35:09 +01:00
Lennart Poettering
0f3be6ca4d nspawn: support ephemeral boots from images
Previously --ephemeral was only supported with container trees in btrfs
subvolumes (i.e. in combination with --directory=). This adds support for
--ephemeral in conjunction with disk images (i.e. --image=) too.

As side effect this fixes that --ephemeral was accepted but ignored when using
-M on a container that turned out to be an image.

Fixes: #4664
2016-11-22 13:35:09 +01:00
Lennart Poettering
1a1b13c957 seccomp: add @filesystem syscall group (#4537)
@filesystem groups various file system operations, such as opening files and
directories for read/write and stat()ing them, plus renaming, deleting,
symlinking, hardlinking.
2016-11-21 19:29:12 -05:00
Lennart Poettering
640be8806e man: make /etc/nsswitch.conf documentation for nss-resolve match example
Fixes: #4683
2016-11-21 22:58:27 +01:00
Lennart Poettering
2e6dbc0fcd Merge pull request #4538 from fbuihuu/confirm-spawn-fixes
Confirm spawn fixes/enhancements
2016-11-18 11:08:06 +01:00
Franck Bui
7d5ceb6416 core: allow to redirect confirmation messages to a different console
It's rather hard to parse the confirmation messages (enabled with
systemd.confirm_spawn=true) amongst the status messages and the kernel
ones (if enabled).

This patch gives the possibility to the user to redirect the confirmation
message to a different virtual console, either by giving its name or its path,
so those messages are separated from the other ones and easier to read.
2016-11-17 18:16:16 +01:00
Lennart Poettering
5327c910d2 namespace: simplify, optimize and extend handling of mounts for namespace
This changes a couple of things in the namespace handling:

It merges the BindMount and TargetMount structures. They are mostly the same,
hence let's just use the same structue, and rely on C's implicit zero
initialization of partially initialized structures for the unneeded fields.

This reworks memory management of each entry a bit. It now contains one "const"
and one "malloc" path. We use the former whenever we can, but use the latter
when we have to, which is the case when we have to chase symlinks or prefix a
root directory. This means in the common case we don't actually need to
allocate any dynamic memory. To make this easy to use we add an accessor
function bind_mount_path() which retrieves the right path string from a
BindMount structure.

While we are at it, also permit "+" as prefix for dirs configured with
ReadOnlyPaths= and friends: if specified the root directory of the unit is
implicited prefixed.

This also drops set_bind_mount() and uses C99 structure initialization instead,
which I think is more readable and clarifies what is being done.

This drops append_protect_kernel_tunables() and
append_protect_kernel_modules() as append_static_mounts() is now simple enough
to be called directly.

Prefixing with the root dir is now done in an explicit step in
prefix_where_needed(). It will prepend the root directory on each entry that
doesn't have it prefixed yet. The latter is determined depending on an extra
bit in the BindMount structure.
2016-11-17 18:08:32 +01:00
Lennart Poettering
82948f6c8e systemctl: show waiting jobs when "systemctl list-jobs --after/--before" is called
Let's expose the new bus functions we added in the previous commit in
systemctl.
2016-11-16 17:01:46 +01:00
Djalal Harouni
8526555680 doc: move ProtectKernelModules= documentation near ProtectKernelTunalbes= 2016-11-15 15:04:41 +01:00
Djalal Harouni
a7db8614f3 doc: note when no new privileges is implied 2016-11-15 15:04:35 +01:00
Lucas Werkmeister
b793ddfa6c man: add Itanium root GUID to table (#4656)
This GUID was added in #2263, but the manpage was not updated.
2016-11-11 22:25:32 -05:00
Christian Hesse
110773f6c9 fstab-generator: add x-systemd.mount-timeout (#4603)
This adds a new systemd fstab option x-systemd.mount-timeout. The option
adds a timeout value that specifies how long systemd waits for the mount
command to finish. It allows to mount huge btrfs volumes without issues.

This is equivalent to adding option TimeoutSec= to [Mount] section in a
mount unit file.

fixes #4055
2016-11-11 09:08:57 -05:00
Zbigniew Jędrzejewski-Szmek
d48bb46b5a man: update machine-id(5) with a note about privacy (#4645) 2016-11-11 13:31:52 +01:00
Susant Sahani
9faed222fc networkd: support setting dhcp client listen port (#4631)
Allow setting custom port for the DHCP client to listen on in networkd.

[DHCP]
ListenPort=6677
2016-11-10 18:34:19 -05:00
Lucas Werkmeister
6d24947638 man: mention start rate limiting in Restart= doc (#4637) 2016-11-10 18:20:44 -05:00
Susant Sahani
a39f92d391 Link: port to new ethtool ETHTOOL_xLINKSETTINGS
Link: port to new ethtool ETHTOOL_xLINKSETTINGS
This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
handled by the new get_link_ksettings/set_link_ksettings .

This is a WIP version based on this [kernel
patch](https://patchwork.kernel.org/patch/8411401/).

commit 0527f1c

3f1ac7a700ommit
35afb33
2016-11-10 15:12:56 +05:30
Jonathan Boulle
fa000db391 man/sd_watchdog_enabled: correct minor typos (#4632) 2016-11-09 17:30:10 +01:00
Zbigniew Jędrzejewski-Szmek
d85a0f8028 Merge pull request #4536 from poettering/seccomp-namespaces
core: add new RestrictNamespaces= unit file setting

Merging, not rebasing, because this touches many files and there were tree-wide cleanups in the mean time.
2016-11-08 19:54:21 -05:00
Yu Watanabe
b719b26cb3 man: fix typo (#4615) 2016-11-08 10:51:35 +01:00
Zbigniew Jędrzejewski-Szmek
ed7fd549d0 man: add an example how to unconditionally empty a directory (#4570)
It was logical, but not entirely obvious, that 'e' with no arguments does
nothing. Expand the explanation a bit and add an example.

Fixes #4564.
2016-11-08 09:39:10 +01:00
Lennart Poettering
add005357d core: add new RestrictNamespaces= unit file setting
This new setting permits restricting whether namespaces may be created and
managed by processes started by a unit. It installs a seccomp filter blocking
certain invocations of unshare(), clone() and setns().

RestrictNamespaces=no is the default, and does not restrict namespaces in any
way. RestrictNamespaces=yes takes away the ability to create or manage any kind
of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces
so that only mount and IPC namespaces may be created/managed, but no other
kind of namespaces.

This setting should be improve security quite a bit as in particular user
namespacing was a major source of CVEs in the kernel in the past, and is
accessible to unprivileged processes. With this setting the entire attack
surface may be removed for system services that do not make use of namespaces.
2016-11-04 07:40:13 -06:00
Zbigniew Jędrzejewski-Szmek
c4c50112ec man: update kernel-install(8) to match reality (#4563) 2016-11-04 06:40:58 -06:00
Zbigniew Jędrzejewski-Szmek
cf88547034 Merge pull request #4548 from keszybz/seccomp-help
systemd-analyze syscall-filter
2016-11-03 20:27:45 -04:00
Kees Cook
d974f949f1 doc: clarify NoNewPrivileges (#4562)
Setting no_new_privs does not stop UID changes, but rather blocks
gaining privileges through execve(). Also fixes a small typo.
2016-11-03 20:26:59 -04:00
Zbigniew Jędrzejewski-Szmek
d5efc18b60 seccomp-util, analyze: export comments as a help string
Just to make the whole thing easier for users.
2016-11-03 09:35:36 -04:00
Zbigniew Jędrzejewski-Szmek
869feb3388 analyze: add syscall-filter verb
This should make it easier for users to understand what each filter
means as the list of syscalls is updated in subsequent systemd versions.
2016-11-03 09:35:35 -04:00
Lucas Werkmeister
0cc6064c3c man: fix two typos (is → are) (#4544) 2016-11-02 18:10:29 -06:00
Lennart Poettering
31887c73b9 Merge pull request #4456 from keszybz/stored-fds
Preserve stored fds over service restart
2016-11-02 16:29:04 -06:00
Lennart Poettering
2ca8dc15f9 man: document that too strict system call filters may affect the service manager
If execve() or socket() is filtered the service manager might get into trouble
executing the service binary, or handling any failures when this fails. Mention
this in the documentation.

The other option would be to implicitly whitelist all system calls that are
required for these codepaths. However, that appears less than desirable as this
would mean socket() and many related calls have to be whitelisted
unconditionally. As writing system call filters requires a certain level of
expertise anyway it sounds like the better option to simply document these
issues and suggest that the user disables system call filters in the service
temporarily in order to debug any such failures.

See: #3993.
2016-11-02 08:55:24 -06:00
Lennart Poettering
133ddbbeae seccomp: add two new syscall groups
@resources contains various syscalls that alter resource limits and memory and
scheduling parameters of processes. As such they are good candidates to block
for most services.

@basic-io contains a number of basic syscalls for I/O, similar to the list
seccomp v1 permitted but slightly more complete. It should be useful for
building basic whitelisting for minimal sandboxes
2016-11-02 08:50:00 -06:00
Lennart Poettering
aa6b9cec88 man: two minor fixes 2016-11-02 08:50:00 -06:00
Lennart Poettering
cd5bfd7e60 seccomp: include pipes and memfd in @ipc
These system calls clearly fall in the @ipc category, hence should be listed
there, simply to avoid confusion and surprise by the user.
2016-11-02 08:50:00 -06:00
Lennart Poettering
a8c157ff30 seccomp: drop execve() from @process list
The system call is already part in @default hence implicitly allowed anyway.
Also, if it is actually blocked then systemd couldn't execute the service in
question anymore, since the application of seccomp is immediately followed by
it.
2016-11-02 08:49:59 -06:00
Lennart Poettering
c79aff9a82 seccomp: add clock query and sleeping syscalls to "@default" group
Timing and sleep are so basic operations, it makes very little sense to ever
block them, hence don't.
2016-11-02 08:49:59 -06:00
Zbigniew Jędrzejewski-Szmek
aa34055ffb seccomp: allow specifying arm64, mips, ppc (#4491)
"Secondary arch" table for mips is entirely speculative…
2016-11-01 09:33:18 -06:00
Jakub Wilk
b17649ee5e man: fix typos (#4527) 2016-10-31 08:08:08 -04:00
George Hilliard
52028838a1 Implement VeraCrypt volume handling in crypttab (#4501)
This introduces a new option, `tcrypt-veracrypt`, that sets the
corresponding VeraCrypt flag in the flags passed to cryptsetup.
2016-10-30 10:25:31 -04:00
Lucas Werkmeister
8bb36a1122 man: make systemd-escape examples more consistent
The first example wasn't phrased with "To ..." as the other three are,
and the last example was lacking the colon.
2016-10-30 02:44:07 +02:00
Lucas Werkmeister
918737f365 man: add missing period 2016-10-30 02:43:17 +02:00
Lucas Werkmeister
c7a7f78bb0 man: improve systemd-escape --path description
The option does more than the documentation gave it credit for.
2016-10-30 02:42:22 +02:00
Zbigniew Jędrzejewski-Szmek
99bdcdc7fc man: add a note that FDSTORE=1 requires epoll-compatible fds
Let's say that this was not obvious from our man page.
2016-10-28 22:45:05 -04:00
Djalal Harouni
fa1f250d6f Merge pull request #4495 from topimiettinen/block-shmat-exec
seccomp: also block shmat(..., SHM_EXEC) for MemoryDenyWriteExecute
2016-10-28 15:41:07 +02:00
Martin Pitt
1740c5a807 Merge pull request #4458 from keszybz/man-nonewprivileges
Document NoNewPrivileges default value
2016-10-28 15:35:29 +02:00
Michal Sekletar
4f985bd802 udev: allow substitutions for SECLABEL key (#4505) 2016-10-28 12:09:14 +02:00
Evgeny Vereshchagin
492466c1b5 Merge pull request #4442 from keszybz/detect-virt-userns
detect-virt: add --private-users switch to check if a userns is active; add Condition=private-users
2016-10-27 13:16:16 +03:00
Zbigniew Jędrzejewski-Szmek
299a34c11a detect-virt: add --private-users switch to check if a userns is active
Various things don't work when we're running in a user namespace, but it's
pretty hard to reliably detect if that is true.

A function is added which looks at /proc/self/uid_map and returns false
if the default "0 0 UINT32_MAX" is found, and true if it finds anything else.
This misses the case where an 1:1 mapping with the full range was used, but
I don't know how to distinguish this case.

'systemd-detect-virt --private-users' is very similar to
'systemd-detect-virt --chroot', but we check for a user namespace instead.
2016-10-26 20:12:51 -04:00
Michal Soltys
808b95ef82 vconsole: manual update (#4021)
To more correctly reflect current behaviour as well as to provide
a few more details.
2016-10-26 19:21:02 -04:00
Topi Miettinen
d2ffa389b8 seccomp: also block shmat(..., SHM_EXEC) for MemoryDenyWriteExecute
shmat(..., SHM_EXEC) can be used to create writable and executable
memory, so let's block it when MemoryDenyWriteExecute is set.
2016-10-26 18:59:14 +03:00