1
0
mirror of https://github.com/systemd/systemd.git synced 2024-11-06 16:59:03 +03:00
Commit Graph

2579 Commits

Author SHA1 Message Date
Lennart Poettering
3862e809d0 core: when a scope was abandoned, always log about processes we kill
After all, if a unit is abandoned, all processes inside of it may be considered
"left over" and are something we should better log about.
2016-07-20 14:35:15 +02:00
Lennart Poettering
f4b0fb236b core: make sure RequestStop signal is send directed
This was accidentally left commented out for debugging purposes, let's fix that
and make the signal directed again.
2016-07-20 14:35:15 +02:00
Lennart Poettering
1d98fef17d core: when forcibly killing/aborting left-over unit processes log about it
Let's lot at LOG_NOTICE about any processes that we are going to
SIGKILL/SIGABRT because clean termination of them didn't work.

This turns the various boolean flag parameters to cg_kill(), cg_migrate() and
related calls into a single binary flags parameter, simply because the function
now gained even more parameters and the parameter listed shouldn't get too
long.

Logging for killing processes is done either when the kill signal is SIGABRT or
SIGKILL, or on explicit request if KILL_TERMINATE_AND_LOG instead of LOG_TERMINATE
is passed. This isn't used yet in this patch, but is made use of in a later
patch.
2016-07-20 14:35:15 +02:00
Alessandro Puccetti
2a624c36e6 doc,core: Read{Write,Only}Paths= and InaccessiblePaths=
This patch renames Read{Write,Only}Directories= and InaccessibleDirectories=
to Read{Write,Only}Paths= and InaccessiblePaths=, previous names are kept
as aliases but they are not advertised in the documentation.

Renamed variables:
`read_write_dirs` --> `read_write_paths`
`read_only_dirs` --> `read_only_paths`
`inaccessible_dirs` --> `inaccessible_paths`
2016-07-19 17:22:02 +02:00
Alessandro Puccetti
c4b4170746 namespace: unify limit behavior on non-directory paths
Despite the name, `Read{Write,Only}Directories=` already allows for
regular file paths to be masked. This commit adds the same behavior
to `InaccessibleDirectories=` and makes it explicit in the doc.
This patch introduces `/run/systemd/inaccessible/{reg,dir,chr,blk,fifo,sock}`
{dile,device}nodes and mounts on the appropriate one the paths specified
in `InacessibleDirectories=`.

Based on Luca's patch from https://github.com/systemd/systemd/pull/3327
2016-07-19 17:22:02 +02:00
Lukáš Nykrýn
ccc2c98e1b manager: don't skip sigchld handler for main and control pid for services (#3738)
During stop when service has one "regular" pid one main pid and one
control pid and the sighld for the regular one is processed first the
unit_tidy_watch_pids will skip the main and control pid and does not
remove them from u->pids(). But then we skip the sigchld event because we
already did one in the iteration and there are two pids in u->pids.

v2: Use general unit_main_pid() and unit_control_pid() instead of
reaching directly to service structure.
2016-07-16 15:04:13 -04:00
Zbigniew Jędrzejewski-Szmek
2ed968802c tree-wide: get rid of selinux_context_t (#3732)
9eb9c93275
deprecated selinux_context_t. Replace with a simple char* everywhere.

Alternative fix for #3719.
2016-07-15 18:44:02 +02:00
Zbigniew Jędrzejewski-Szmek
1071fd0823 macros: provide %_systemdgeneratordir and %_systemdusergeneratordir (#3672)
... as requested in
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/DJ7HDNRM5JGBSA4HL3UWW5ZGLQDJ6Y7M/.
Adding the macro makes it marginally easier to create generators
for outside projects.

I opted for "generatordir" and "usergeneratordir" to match
%unitdir and %userunitdir. OTOH, "_systemd" prefix makes it obvious
that this is related to systemd. "%_generatordir" would be to generic
of a name.
2016-07-15 09:35:49 +02:00
Lennart Poettering
2e79d1828a shutdown: already sync IO before we enter the final killing spree
This way, slow IO journald has to wait for can't cause it to reach the killing
spree timeout and is hit by SIGKILL in addition to SIGTERM.
2016-07-12 17:38:19 +02:00
Lennart Poettering
d450612953 shutdown: use 90s SIGKILL timeout
There's really no reason to use 10s here, let's instead default to 90s like we
do for everything else.

The SIGKILL during the final killing spree is in most regards the fourth level
of a safety net, after all: any normal service should have already been stopped
during the normal service shutdown logic, first via SIGTERM and then SIGKILL,
and then also via SIGTERM during the finall killing spree before we send
SIGKILL. And as a fourth level safety net it should only be required in
exceptional cases, which means it's safe to rais the default timeout, as normal
shutdowns should never be delayed by it.

Note that journald excludes itself from the normal service shutdown, and relies
on the final killing spree to terminate it (this is because it wants to cover
the normal shutdown phase's complete logging). If the system's IO is
excessively slow, then the 10s might not be enough for journald to sync
everything to disk and logs might get lost during shutdown.
2016-07-12 17:32:30 +02:00
Michael Biebl
595bfe7df2 Various fixes for typos found by lintian (#3705) 2016-07-12 12:52:11 +02:00
Luca Bruno
391b81cd03 seccomp: only abort on syscall name resolution failures (#3701)
seccomp_syscall_resolve_name() can return a mix of positive and negative
(pseudo-) syscall numbers, while errors are signaled via __NR_SCMP_ERROR.
This commit lets the syscall filter parser only abort on real parsing
failures, letting libseccomp handle pseudo-syscall number on its own
and allowing proper multiplexed syscalls filtering.
2016-07-12 11:55:26 +02:00
Torstein Husebø
61233823aa treewide: fix typos and remove accidental repetition of words 2016-07-11 16:18:43 +02:00
Evgeny Vereshchagin
224d3d8266 Merge pull request #3680 from joukewitteveen/pam-env
Follow up on #3503 (pass service env vars to PAM sessions)
2016-07-08 17:33:12 +03:00
Jouke Witteveen
84eada2f7f execute: Do not alter call-by-ref parameter on failure
Prevent free from being called on (a part of) the call-by-reference
variable env when setup_pam fails.
2016-07-08 09:42:48 +02:00
David Michael
4f952a3f07 core: queue loading transient units after setting their properties (#3676)
The unit load queue can be processed in the middle of setting the
unit's properties, so its load_state would no longer be UNIT_STUB
for the check in bus_unit_set_properties(), which would cause it to
incorrectly return an error.
2016-07-08 05:43:01 +02:00
Daniel Mack
78a4ee591a cgroup: fix memory cgroup limit regression on kernel 3.10 (#3673)
Commit da4d897e ("core: add cgroup memory controller support on the unified
hierarchy (#3315)") changed the code in src/core/cgroup.c to always write
the real numeric value from the cgroup parameters to the
"memory.limit_in_bytes" attribute file.

For parameters set to CGROUP_LIMIT_MAX, this results in the string
"18446744073709551615" being written into that file, which is UINT64_MAX.
Before that commit, CGROUP_LIMIT_MAX was special-cased to the string "-1".

This causes a regression on CentOS 7, which is based on kernel 3.10, as the
value is interpreted as *signed* 64 bit, and clamped to 0:

[root@n54 ~]# echo 18446744073709551615 >/sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
[root@n54 ~]# cat /sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
0

[root@n54 ~]# echo -1 >/sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
[root@n54 ~]# cat /sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
9223372036854775807

Hence, all units that are subject to the limits enforced by the memory
controller will crash immediately, even though they have no actual limit
set. This happens to for the user.slice, for instance:

[  453.577153] Hardware name: SeaMicro SM15000-64-CC-AA-1Ox1/AMD Server CRB, BIOS Estoc.3.72.19.0018 08/19/2014
[  453.587024]  ffff880810c56780 00000000aae9501f ffff880813d7fcd0 ffffffff816360fc
[  453.594544]  ffff880813d7fd60 ffffffff8163109c ffff88080ffc5000 ffff880813d7fd28
[  453.602120]  ffffffff00000202 fffeefff00000000 0000000000000001 ffff880810c56c03
[  453.609680] Call Trace:
[  453.612156]  [<ffffffff816360fc>] dump_stack+0x19/0x1b
[  453.617324]  [<ffffffff8163109c>] dump_header+0x8e/0x214
[  453.622671]  [<ffffffff8116d20e>] oom_kill_process+0x24e/0x3b0
[  453.628559]  [<ffffffff81088dae>] ? has_capability_noaudit+0x1e/0x30
[  453.634969]  [<ffffffff811d4155>] mem_cgroup_oom_synchronize+0x575/0x5a0
[  453.641721]  [<ffffffff811d3520>] ? mem_cgroup_charge_common+0xc0/0xc0
[  453.648299]  [<ffffffff8116da84>] pagefault_out_of_memory+0x14/0x90
[  453.654621]  [<ffffffff8162f4cc>] mm_fault_error+0x68/0x12b
[  453.660233]  [<ffffffff81642012>] __do_page_fault+0x3e2/0x450
[  453.666017]  [<ffffffff816420a3>] do_page_fault+0x23/0x80
[  453.671467]  [<ffffffff8163e308>] page_fault+0x28/0x30
[  453.676656] Task in /user.slice/user-0.slice/user@0.service killed as a result of limit of /user.slice/user-0.slice/user@0.service
[  453.688477] memory: usage 0kB, limit 0kB, failcnt 7
[  453.693391] memory+swap: usage 0kB, limit 9007199254740991kB, failcnt 0
[  453.700039] kmem: usage 0kB, limit 9007199254740991kB, failcnt 0
[  453.706076] Memory cgroup stats for /user.slice/user-0.slice/user@0.service: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[  453.725702] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[  453.733614] [ 2837]     0  2837    11950      899      23        0             0 (systemd)
[  453.741919] Memory cgroup out of memory: Kill process 2837 ((systemd)) score 1 or sacrifice child
[  453.750831] Killed process 2837 ((systemd)) total-vm:47800kB, anon-rss:3188kB, file-rss:408kB

Fix this issue by special-casing the UINT64_MAX case again.
2016-07-07 19:29:35 -07:00
Jouke Witteveen
1280503b7e execute: Cleanup the environment early
By cleaning up before setting up PAM we maintain control of overriding
behavior in setting variables. Otherwise, pam_putenv is in control.
This also makes sure we use a cleaned up environment in replacing
variables in argv.
2016-07-07 14:15:50 +02:00
Kyle Walker
1e706c8dff manager: Fixing a debug printf formatting mistake (#3640)
A 'llu' formatting statement was used in a debugging printf statement
instead of a 'PRIu64'. Correcting that mistake here.
2016-07-01 20:03:35 +03:00
Lennart Poettering
b12cc5b0f8 Merge pull request #3634 from disneyworldguy/v2sigchld
manager: Only invoke a single sigchld per unit within a cleanup cycle
2016-06-30 15:57:39 -07:00
Martin Pitt
f15461b2b2 Merge pull request #3596 from poettering/machine-clean
make "machinectl clean" asynchronous, and open it up via PolicyKit
2016-06-30 21:30:35 +02:00
Kyle Walker
36f20ae3b2 manager: Only invoke a single sigchld per unit within a cleanup cycle
By default, each iteration of manager_dispatch_sigchld() results in a unit level
sigchld event being invoked. For scope units, this results in a scope_sigchld_event()
which can seemingly stall for workloads that have a large number of PIDs within the
scope. The stall exhibits itself as a SIG_0 being initiated for each u->pids entry
as a result of pid_is_unwaited().

v2:
This patch resolves this condition by only paying to cost of a sigchld in the underlying
scope unit once per sigchld iteration. A new "sigchldgen" member resides within the
Unit struct. The Manager is incremented via the sd event loop, accessed via
sd_event_get_iteration, and the Unit member is set to the same value as the manager each
time that a sigchld event is invoked. If the Manager iteration value and Unit member
match, the sigchld event is not invoked for that iteration.
2016-06-30 15:16:47 -04:00
Franck Bui
6edefe0b06 pid1: restore console color support for containers (#3595)
Commit 3a18b60489 introduced a regression that
disabled the color mode for container.

This patch fixes this.
2016-06-24 16:08:43 +02:00
Lennart Poettering
2b40998d3c cgroup: minor coding style fix 2016-06-24 15:59:24 +02:00
Lennart Poettering
f4170c671b execute: add a new easy-to-use RestrictRealtime= option to units
It takes a boolean value. If true, access to SCHED_RR, SCHED_FIFO and
SCHED_DEADLINE is blocked, which my be used to lock up the system.
2016-06-23 01:45:45 +02:00
Lennart Poettering
abd84d4d83 execute: be a little less drastic when MemoryDenyWriteExecute= hits
Let's politely refuse with EPERM rather than kill the whole thing right-away.
2016-06-23 01:35:04 +02:00
Lennart Poettering
686d9ba614 execute: set PR_SET_NO_NEW_PRIVS also in case the exec memory protection is used
This was forgotten when MemoryDenyWriteExecute= was added: we should set NNP in
all cases when we set seccomp filters.
2016-06-23 01:33:07 +02:00
Lennart Poettering
03857c43ce execute: use the return value of setrlimit_closest() properly
It's a function defined by us, hence we should look for the error in its return
value, not in "errno".
2016-06-23 01:31:24 +02:00
Lennart Poettering
fc40065bcd core: when writing transient unit files, make sure all lines end with a newline
This is a fix-up for 2a9a6f8ac0 which covered
non-transient units, but missed the case for transient units.
2016-06-23 01:29:33 +02:00
Minkyung
2787d83c28 watchdog: Support changing watchdog_usec during runtime (#3492)
Add sd_notify() parameter to change watchdog_usec during runtime.

Application can change watchdog_usec value by
sd_notify like this. Example. sd_notify(0, "WATCHDOG_USEC=20000000").

To reset watchdog_usec as configured value in service file,
restart service.

Notice.
sd_event is not currently supported. If application uses
sd_event_set_watchdog, or sd_watchdog_enabled, do not use
"WATCHDOG_USEC" option through sd_notify.
2016-06-22 13:26:05 +02:00
Lennart Poettering
98471bf0fa Merge pull request #3526 from fbuihuu/fix-console-log-color
Fix console log color
2016-06-22 12:34:25 +02:00
Franck Bui
3a18b60489 pid1: initialize status color mode after setting up TERM
Also we had to connect PID's stdio to null later since colors_enabled()
assume that stdout is connected to the console.
2016-06-22 08:29:02 +02:00
Franck Bui
32391275c0 pid1: initialize TERM environment variable correctly
When systemd is started by the kernel, the kernel set the TERM
environment variable unconditionnally to "linux" no matter the console
device used. This might be an issue for dumb devices with no colors
support.

This patch uses default_term_for_tty() for getting a more accurate
value. But it makes sure to keep the user preferences (if any) which
might be passed via the kernel command line. For that purpose /proc
should be mounted.
2016-06-22 08:28:55 +02:00
Evgeny Vereshchagin
eee0a1e48e core: log the right set of the supported controllers (#3558)
Jun 16 05:12:08 systemd[1]: Controller 'io' supported: yes
Jun 16 05:12:08 systemd[1]: Controller 'memory' supported: yes
Jun 16 05:12:08 systemd[1]: Controller 'pids' supported: yes

instead of

Jun 16 04:06:50 systemd[1]: Controller 'memory' supported: yes
Jun 16 04:06:50 systemd[1]: Controller 'devices' supported: yes
Jun 16 04:06:50 systemd[1]: Controller 'pids' supported: yes
2016-06-20 20:40:46 +02:00
Franck Bui
8ce0611e42 Revert "do not pass-along the environment from the kernel or initrd"
This reverts commit ce8aba5681.

We should pass an environment as close as possible to what we originally
got.
2016-06-20 18:55:09 +02:00
Franck Bui
affd7ed1a9 pid1: reconnect to the console before being re-executed
When re-executed, reconnect the console to PID1's stdios as it was the case
when PID1 was initially started by the kernel.
2016-06-20 18:40:51 +02:00
Dave Reisner
222953e87f Ensure kdbus isn't used (#3501)
Delete the dbus1 generator and some critical wiring. This prevents
kdbus from being loaded or detected. As such, it will never be used,
even if the user still has a useful kdbus module loaded on their system.

Sort of fixes #3480. Not really, but it's better than the current state.
2016-06-18 17:24:23 -04:00
Lennart Poettering
616aab6085 Merge pull request #3481 from poettering/relative-memcg
various changes, most importantly regarding memory metrics
2016-06-16 13:56:23 +02:00
Zbigniew Jędrzejewski-Szmek
732cd53eeb Merge pull request #3537 from poettering/journal-stream-env
Permit services to detect whether their stdout/stderr is connected to the journal.
2016-06-15 21:30:59 -04:00
Zbigniew Jędrzejewski-Szmek
a1feacf77f load-fragment: ignore ENOTDIR/EACCES errors (#3510)
If for whatever reason the file system is "corrupted", we want
to be resilient and ignore the error, as long as we can load the units
from a different place.

Arch bug https://bugs.archlinux.org/task/49547.

A user had an ntfs symlink (essentially a file) instead of a directory after
restoring from backup. We should just ignore that like we would treat a missing
directory, for general resiliency.

We should treat permission errors similarly. For example an unreadable
/usr/local/lib directory would prevent (user) instances of systemd from
loading any units. It seems better to continue.
2016-06-15 23:02:27 +02:00
Lennart Poettering
7bce046bcf core: set $JOURNAL_STREAM to the dev_t/ino_t of the journal stream of executed services
This permits services to detect whether their stdout/stderr is connected to the
journal, and if so talk to the journal directly, thus permitting carrying of
metadata.

As requested by the gtk folks: #2473
2016-06-15 23:00:27 +02:00
Lennart Poettering
fd1f9c89f7 execute: minor coding style improvements 2016-06-15 22:51:01 +02:00
Lennart Poettering
8e38570ebe tree-wide: htonl() is weird, let's use htobe32() instead (#3538)
Super-important change, yeah!
2016-06-15 01:26:01 +02:00
Lennart Poettering
3f71dec5d7 unit: properly comment generated comments in unit files
Fix-up for 2a9a6f8ac0
2016-06-14 20:01:45 +02:00
Lennart Poettering
d58d600efd systemctl: allow percent-based MemoryLimit= settings via systemctl set-property
The unit files already accept relative, percent-based memory limit
specification, let's make sure "systemctl set-property" support this too.

Since we want the physical memory size of the destination machine to apply we
pass the percentage in a new set of properties that only exist for this
purpose, and can only be set.
2016-06-14 20:01:45 +02:00
Lennart Poettering
d8cf2ac79b util: introduce physical_memory_scale() to unify how we scale by physical memory
The various bits of code did the scaling all different, let's unify this,
given that the code is not trivial.
2016-06-14 20:01:45 +02:00
Lennart Poettering
799ec13412 core: make sure to use "infinity" in unit files, not "max"
THe latter is a kernelism, we only understand "infinity".
2016-06-14 19:50:38 +02:00
Lennart Poettering
cd0a7a8e58 core: when receiving a memory limit via the bus, refuse 0
When parsing unit files we already refuse unit memory limits of zero, let's
also refuse it when the value is set via the bus.
2016-06-14 19:50:38 +02:00
Lennart Poettering
875ae5661a core: optionally, accept a percentage value for MemoryLimit= and related settings
If a percentage is used, it is taken relative to the installed RAM size. This
should make it easier to write generic unit files that adapt to the local system.
2016-06-14 19:50:38 +02:00
Lennart Poettering
9184ca48ea util-lib: introduce parse_percent() for parsing percent specifications
And port a couple of users over to it.
2016-06-14 19:50:38 +02:00