1
0
mirror of https://github.com/systemd/systemd.git synced 2024-11-06 16:59:03 +03:00
Commit Graph

2745 Commits

Author SHA1 Message Date
Djalal Harouni
4084e8fc89 core: check protect_kernel_modules and private_devices in order to setup NNP 2016-10-12 14:12:07 +02:00
Djalal Harouni
c575770b75 core:sandbox: lets make /lib/modules/ inaccessible on ProtectKernelModules=
Lets go further and make /lib/modules/ inaccessible for services that do
not have business with modules, this is a minor improvment but it may
help on setups with custom modules and they are limited... in regard of
kernel auto-load feature.

This change introduce NameSpaceInfo struct which we may embed later
inside ExecContext but for now lets just reduce the argument number to
setup_namespace() and merge ProtectKernelModules feature.
2016-10-12 14:11:16 +02:00
Djalal Harouni
2cd0a73547 core:sandbox: remove CAP_SYS_RAWIO on PrivateDevices=yes
The rawio system calls were filtered, but CAP_SYS_RAWIO allows to access raw
data through /proc, ioctl and some other exotic system calls...
2016-10-12 13:39:49 +02:00
Djalal Harouni
502d704e5e core:sandbox: Add ProtectKernelModules= option
This is useful to turn off explicit module load and unload operations on modular
kernels. This option removes CAP_SYS_MODULE from the capability bounding set for
the unit, and installs a system call filter to block module system calls.

This option will not prevent the kernel from loading modules using the module
auto-load feature which is a system wide operation.
2016-10-12 13:31:21 +02:00
Zbigniew Jędrzejewski-Szmek
3ccb886283 Allow block and char classes in DeviceAllow bus properties (#4353)
Allowed paths are unified betwen the configuration file parses and the bus
property checker. The biggest change is that the bus code now allows "block-"
and "char-" classes. In addition, path_startswith("/dev") was used in the bus
code, and startswith("/dev") was used in the config file code. It seems
reasonable to use path_startswith() which allows a slightly broader class of
strings.

Fixes #3935.
2016-10-12 11:12:11 +02:00
0xAX
74e7579c17 core/main: get rid from excess check of ACTION_TEST (#4350)
If `--test` command line option was passed, the systemd set skip_setup
to true during bootup. But after this we check again that arg_action is
test or help and opens pager depends on result.

We should skip setup in a case when `--test` is passed, but it is also
safe to set skip_setup in a case of `--help`. So let's remove first
check and move skip_setup = true to the second check.
2016-10-11 17:30:04 -04:00
Lennart Poettering
e0d2adfde6 core: chown() any TTY used for stdin, not just when StandardInput=tty is used (#4347)
If stdin is supplied as an fd for transient units (using the
StandardInputFileDescriptor pseudo-property for transient units), then we
should also fix up the TTY ownership, not just when we opened the TTY
ourselves.

This simply drops the explicit is_terminal_input()-based check. Note that
chown_terminal() internally does a much more appropriate isatty()-based check
anyway, hence we can drop this without replacement.

Fixes: #4260
2016-10-11 14:07:22 -04:00
Zbigniew Jędrzejewski-Szmek
b744e8937c Merge pull request #4067 from poettering/invocation-id
Add an "invocation ID" concept to the service manager
2016-10-11 13:40:50 -04:00
Zbigniew Jędrzejewski-Szmek
ec72b96366 Merge pull request #4337 from poettering/exit-code
Fix for #4275 and more
2016-10-10 21:24:57 -04:00
Lennart Poettering
052364d41f core: simplify if branches a bit
We do the same thing in two branches, let's merge them. Let's also add an
explanatory comment, while we are at it.
2016-10-10 22:57:02 +02:00
Lennart Poettering
f2aed3070d core: make use of IN_SET() in various places in mount.c 2016-10-10 22:57:02 +02:00
Lennart Poettering
1f0958f640 core: when determining whether a process exit status is clean, consider whether it is a command or a daemon
SIGTERM should be considered a clean exit code for daemons (i.e. long-running
processes, as a daemon without SIGTERM handler may be shut down without issues
via SIGTERM still) while it should not be considered a clean exit code for
commands (i.e. short-running processes).

Let's add two different clean checking modes for this, and use the right one at
the appropriate places.

Fixes: #4275
2016-10-10 22:57:01 +02:00
Lennart Poettering
38107f5a4a core: lower exit status "level" at one place
When we print information about PID 1's crashdump subprocess failing. In this
case we *know* that we do not generate LSB exit codes, as it's basically PID 1
itself that exited there.
2016-10-10 22:56:55 +02:00
0xAX
f6dd106c73 main: use strdup instead of free_and_strdup to initialize default unit (#4335)
Previously we've used free_and_strdup() to fill arg_default_unit with unit
name, If we didn't pass default unit name through a kernel command line or
command line arguments. But we can use just strdup() instead of
free_and_strdup() for this, because we will start fill arg_default_unit
only if it wasn't set before.
2016-10-10 22:11:36 +02:00
Lennart Poettering
41e2036eb8 exit-status: kill is_clean_exit_lsb(), move logic to sysv-generator
Let's get rid of is_clean_exit_lsb(), let's move the logic for the special
handling of the two LSB exit codes into the sysv-generator by writing out
appropriate SuccessExitStatus= lines if the LSB header exists. This is not only
semantically more correct, bug also fixes a bug as the code in service.c that
chose between is_clean_exit_lsb() and is_clean_exit() based this check on
whether a native unit files was available for the unit. However, that check was
bogus since a long time, since the SysV generator was introduced and native
SysV script support was removed from PID 1, as in that case a unit file always
existed.
2016-10-10 21:48:08 +02:00
0xAX
c76cf844d6 tree-wide: pass return value of make_null_stdio() to warning instead of errno (#4328)
as @poettering suggested in the #4320
2016-10-10 19:51:33 +02:00
0xAX
10c961b9c9 main: initialize default unit little later (#4321)
systemd fills arg_default_unit during startup with default.target
value. But arg_default_unit may be overwritten in parse_argv() or
parse_proc_cmdline_item().

Let's check value of arg_default_unit after calls of parse_argv()
and parse_proc_cmdline_item() and fill it with default.target if
it wasn't filled before. In this way we will not spend unnecessary
time to for filling arg_default_unit with default.target.
2016-10-09 22:57:03 -04:00
0xAX
9fc932bff1 tree-wide: print warning in a failure case of make_null_stdio() (#4320)
The make_null_stdio() may fail. Let's check its result and print
warning message instead of keeping silence.
2016-10-09 22:55:24 -04:00
Lennart Poettering
4b58153dd2 core: add "invocation ID" concept to service manager
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.

The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.

The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.

The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.

Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.

This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().

PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.

A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.

Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
2016-10-07 20:14:38 +02:00
Zbigniew Jędrzejewski-Szmek
8f4d640135 core: only warn on short reads on signal fd 2016-10-07 10:05:04 -04:00
Lennart Poettering
875ca88da5 manager: tighten incoming notification message checks
Let's not accept datagrams with embedded NUL bytes. Previously we'd simply
ignore everything after the first NUL byte. But given that sending us that is
pretty ugly let's instead complain and refuse.

With this change we'll only accept messages that have exactly zero or one NUL
bytes at the very end of the datagram.
2016-10-07 12:14:33 +02:00
Lennart Poettering
045a3d5989 manager: be stricter with incomining notifications, warn properly about too large ones
Let's make the kernel let us know the full, original datagram size of the
incoming message. If it's larger than the buffer space provided by us, drop the
whole message with a warning.

Before this change the kernel would truncate the message for us to the buffer
space provided, and we'd not complain about this, and simply process the
incomplete message as far as it made sense.
2016-10-07 12:12:10 +02:00
Lennart Poettering
c55ae51e77 manager: don't ever busy loop when we get a notification message we can't process
If the kernel doesn't permit us to dequeue/process an incoming notification
datagram message it's still better to stop processing the notification messages
altogether than to enter a busy loop where we keep getting notified but can't
do a thing about it.

With this change, manager_dispatch_notify_fd() behaviour is changed like this:

- if an error indicating a spurious wake-up is seen on recvmsg(), ignore it
  (EAGAIN/EINTR)

- if any other error is seen on recvmsg() propagate it, thus disabling
  processing of further wakeups

- if any error is seen on later code in the function, warn about it but do not
  propagate it, as in this cas we're not going to busy loop as the offending
  message is already dequeued.
2016-10-07 12:08:51 +02:00
Lukáš Nykrýn
24dd31c19e core: add possibility to set action for ctrl-alt-del burst (#4105)
For some certification, it should not be possible to reboot the machine through ctrl-alt-delete. Currently we suggest our customers to mask the ctrl-alt-delete target, but that is obviously not enough.

Patching the keymaps to disable that is really not a way to go for them, because the settings need to be easily checked by some SCAP tools.
2016-10-06 21:08:21 -04:00
Lennart Poettering
97f0e76f18 user-util: rework maybe_setgroups() a bit
Let's drop the caching of the setgroups /proc field for now. While there's a
strict regime in place when it changes states, let's better not cache it since
we cannot really be sure we follow that regime correctly.

More importantly however, this is not in performance sensitive code, and
there's no indication the cache is really beneficial, hence let's drop the
caching and make things a bit simpler.

Also, while we are at it, rework the error handling a bit, and always return
negative errno-style error codes, following our usual coding style. This has
the benefit that we can sensible hanld read_one_line_file() errors, without
having to updat errno explicitly.
2016-10-06 19:04:10 +02:00
Lennart Poettering
2d6fce8d7c core: leave PAM stub process around with GIDs updated
In the process execution code of PID 1, before
096424d123 the GID settings where changed before
invoking PAM, and the UID settings after. After the change both changes are
made after the PAM session hooks are run. When invoking PAM we fork once, and
leave a stub process around which will invoke the PAM session end hooks when
the session goes away. This code previously was dropping the remaining privs
(which were precisely the UID). Fix this code to do this correctly again, by
really dropping them else (i.e. the GID as well).

While we are at it, also fix error logging of this code.

Fixes: #4238
2016-10-06 19:04:10 +02:00
Giuseppe Scrivano
36d854780c core: do not fail in a container if we can't use setgroups
It might be blocked through /proc/PID/setgroups
2016-10-06 11:49:00 +02:00
Giuseppe Scrivano
77531863ca Fix typo 2016-10-05 18:36:48 +02:00
Stefan Schweter
629ff674ac tree-wide: remove consecutive duplicate words in comments 2016-10-04 17:06:25 +02:00
Michael Olbrich
c080fbce9c automount: make sure the expire event is restarted after a daemon-reload (#4265)
If the corresponding mount unit is deserialized after the automount unit
then the expire event is set up in automount_trigger_notify(). However, if
the mount unit is deserialized first then the automount unit is still in
state AUTOMOUNT_DEAD and automount_trigger_notify() aborts without setting
up the expire event.
Explicitly call automount_start_expire() during coldplug to make sure that
the expire event is set up as necessary.

Fixes #4249.
2016-10-04 16:13:27 +02:00
Zbigniew Jędrzejewski-Szmek
a63ee40751 core: do not try to create /run/systemd/transient in test mode
This prevented systemd-analyze from unprivileged operation on older systemd
installations, which should be possible.
Also, we shouldn't touch the file system in test mode even if we can.
2016-10-01 22:53:17 +02:00
Zbigniew Jędrzejewski-Szmek
dd5e7000cb core: complain if Before= dep on .device is declared
[Unit]
Before=foobar.device

[Service]
ExecStart=/bin/true
Type=oneshot

$ systemd-analyze verify before-device.service
before-device.service: Dependency Before=foobar.device ignored (.device units cannot be delayed)
2016-10-01 22:53:17 +02:00
Zbigniew Jędrzejewski-Szmek
5fd2c135f1 core: update warning message
"closing all" might suggest that _all_ fds received with the notification message
will be closed. Reword the message to clarify that only the "unused" ones will be
closed.
2016-10-01 11:01:31 +02:00
Zbigniew Jędrzejewski-Szmek
c4bee3c40e core: get rid of unneeded state variable
No functional change.
2016-10-01 11:01:31 +02:00
Zbigniew Jędrzejewski-Szmek
a86b76753d pid1: more informative error message for ignored notifications
It's probably easier to diagnose a bad notification message if the
contents are printed. But still, do anything only if debugging is on.
2016-09-29 22:57:57 +02:00
Zbigniew Jędrzejewski-Szmek
8523bf7dd5 pid1: process zero-length notification messages again
This undoes 531ac2b234. I acked that patch without looking at the code
carefully enough. There are two problems:
- we want to process the fds anyway
- in principle empty notification messages are valid, and we should
  process them as usual, including logging using log_unit_debug().
2016-09-29 22:57:57 +02:00
Franck Bui
9987750e7a pid1: don't return any error in manager_dispatch_notify_fd() (#4240)
If manager_dispatch_notify_fd() fails and returns an error then the handling of
service notifications will be disabled entirely leading to a compromised system.

For example pid1 won't be able to receive the WATCHDOG messages anymore and
will kill all services supposed to send such messages.
2016-09-29 19:44:34 +02:00
Jorge Niedbalski
531ac2b234 If the notification message length is 0, ignore the message (#4237)
Fixes #4234.

Signed-off-by: Jorge Niedbalski <jnr@metaklass.org>
2016-09-29 05:26:16 -04:00
Evgeny Vereshchagin
cc238590e4 Merge pull request #4185 from endocode/djalal-sandbox-first-protection-v1
core:sandbox: Add new ProtectKernelTunables=, ProtectControlGroups=, ProtectSystem=strict and fixes
2016-09-28 04:50:30 +03:00
Paweł Szewczyk
00bb64ecfa core: Fix USB functionfs activation and clarify its documentation (#4188)
There was no certainty about how the path in service file should look
like for usb functionfs activation. Because of this it was treated
differently in different places, which made this feature unusable.

This patch fixes the path to be the *mount directory* of functionfs, not
ep0 file path and clarifies in the documentation that ListenUSBFunction should be
the location of functionfs mount point, not ep0 file itself.
2016-09-26 18:45:47 +02:00
Djalal Harouni
8f81a5f61b core: Use @raw-io syscall group to filter I/O syscalls when PrivateDevices= is set
Instead of having a local syscall list, use the @raw-io group which
contains the same set of syscalls to filter.
2016-09-25 12:52:27 +02:00
Djalal Harouni
b6c432ca7e core:namespace: simplify ProtectHome= implementation
As with previous patch simplify ProtectHome and don't care about
duplicates, they will be sorted by most restrictive mode and cleaned.
2016-09-25 12:41:16 +02:00
Djalal Harouni
f471b2afa1 core: simplify ProtectSystem= implementation
ProtectSystem= with all its different modes and other options like
PrivateDevices= + ProtectKernelTunables= + ProtectHome= are orthogonal,
however currently it's a bit hard to parse that from the implementation
view. Simplify it by giving each mode its own table with all paths and
references to other Protect options.

With this change some entries are duplicated, but we do not care since
duplicate mounts are first sorted by the most restrictive mode then
cleaned.
2016-09-25 12:21:25 +02:00
Djalal Harouni
49accde7bd core:sandbox: add more /proc/* entries to ProtectKernelTunables=
Make ALSA entries, latency interface, mtrr, apm/acpi, suspend interface,
filesystems configuration and IRQ tuning readonly.

Most of these interfaces now days should be in /sys but they are still
available through /proc, so just protect them. This patch does not touch
/proc/net/...
2016-09-25 11:30:11 +02:00
Djalal Harouni
2652c6c103 core:namespace: simplify mount calculation
Move out mount calculation on its own function. Actually the logic is
smart enough to later drop nop and duplicates mounts, this change
improves code readability.
---
 src/core/namespace.c | 47 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 36 insertions(+), 11 deletions(-)
2016-09-25 11:25:00 +02:00
Djalal Harouni
11a30cec2a core:namespace: put paths protected by ProtectKernelTunables= in
Instead of having all these paths everywhere, put the ones that are
protected by ProtectKernelTunables= into their own table. This way it
is easy to add paths and track which ones are protected.
2016-09-25 11:16:44 +02:00
Djalal Harouni
9c94d52e09 core:namespace: minor improvements to append_mounts() 2016-09-25 11:03:21 +02:00
Lennart Poettering
cefc33aee2 execute: move SMACK setup code into its own function
While we are at it, move PAM code #ifdeffery into setup_pam() to simplify the
main execution logic a bit.
2016-09-25 10:52:57 +02:00
Lennart Poettering
cd2902c954 namespace: drop all mounts outside of the new root directory
There's no point in mounting these, if they are outside of the root directory
we'll move to.
2016-09-25 10:52:57 +02:00
Lennart Poettering
54500613a4 main: minor simplification 2016-09-25 10:52:57 +02:00