1
0
mirror of https://github.com/systemd/systemd.git synced 2024-11-06 16:59:03 +03:00
Commit Graph

2611 Commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek
dadd6ecfa5 Merge pull request #3728 from poettering/dynamic-users 2016-07-25 16:40:26 -04:00
Michael Olbrich
87d41d6244 automount: don't cancel mount/umount request on reload/reexec (#3670)
All pending tokens are already serialized correctly and will be handled
when the mount unit is done.

Without this a 'daemon-reload' cancels all pending tokens. Any process
waiting for the mount will continue with EHOSTDOWN.
This can happen when the mount unit waits for it's dependencies, e.g.
network, devices, fsck, etc.
2016-07-25 20:04:02 +02:00
Michael Olbrich
2de0b9e913 transaction: don't cancel jobs for units with IgnoreOnIsolate=true (#3671)
This is important if a job was queued for a unit but not yet started.
Without this, the job will be canceled and is never executed even though
IgnoreOnIsolate it set to 'true'.
2016-07-25 20:02:55 +02:00
Lennart Poettering
43eb109aa9 core: change ExecStart=! syntax to ExecStart=+ (#3797)
As suggested by @mbiebl we already use the "!" special char in unit file
assignments for negation, hence we should not use it in a different context for
privileged execution. Let's use "+" instead.
2016-07-25 16:53:33 +02:00
Zbigniew Jędrzejewski-Szmek
31b14fdb6f Merge pull request #3777 from poettering/id128-rework
uuid/id128 code rework
2016-07-22 21:18:41 -04:00
Lennart Poettering
5052c4eadd Merge pull request #3753 from poettering/tasks-max-scale
Add support for relative TasksMax= specifications, and bump default for services
2016-07-22 17:40:12 +02:00
Alessandro Puccetti
0d9e799102 cgroup: whitelist inaccessible devices for "auto" and "closed" DevicePolicy.
https://github.com/systemd/systemd/pull/3685 introduced
/run/systemd/inaccessible/{chr,blk} to map inacessible devices,
this patch allows systemd running inside a nspawn container to create
/run/systemd/inaccessible/{chr,blk}.
2016-07-22 16:08:31 +02:00
Lennart Poettering
409093fe10 nss: add new "nss-systemd" NSS module for mapping dynamic users
With this NSS module all dynamic service users will be resolvable via NSS like
any real user.
2016-07-22 15:53:45 +02:00
Lennart Poettering
6f3e79859d core: enforce user/group name validity also when creating transient units 2016-07-22 15:53:45 +02:00
Lennart Poettering
29206d4619 core: add a concept of "dynamic" user ids, that are allocated as long as a service is running
This adds a new boolean setting DynamicUser= to service files. If set, a new
user will be allocated dynamically when the unit is started, and released when
it is stopped. The user ID is allocated from the range 61184..65519. The user
will not be added to /etc/passwd (but an NSS module to be added later should
make it show up in getent passwd).

For now, care should be taken that the service writes no files to disk, since
this might result in files owned by UIDs that might get assigned dynamically to
a different service later on. Later patches will tighten sandboxing in order to
ensure that this cannot happen, except for a few selected directories.

A simple way to test this is:

        systemd-run -p DynamicUser=1 /bin/sleep 99999
2016-07-22 15:53:45 +02:00
Lennart Poettering
66dccd8d85 core: be stricter when parsing User=/Group= fields
Let's verify the validity of the syntax of the user/group names set.
2016-07-22 15:53:45 +02:00
Lennart Poettering
b3785cd5e6 core: check for overflow when handling scaled MemoryLimit= settings
Just in case...
2016-07-22 15:33:13 +02:00
Harald Hoyer
2424b6bd71 macros.systemd.in: add %systemd_ordering (#3776)
To remove the hard dependency on systemd, for packages, which function
without a running systemd the %systemd_ordering macro can be used to
ensure ordering in the rpm transaction. %systemd_ordering makes sure,
the systemd rpm is installed prior to the package, so the %pre/%post
scripts can execute the systemd parts.

Installing systemd afterwards though, does not result in the same outcome.
2016-07-22 09:33:13 -04:00
Lennart Poettering
79baeeb96d core: change TasksMax= default for system services to 15%
As it turns out 512 is max number of tasks per service is hit by too many
applications, hence let's bump it a bit, and make it relative to the system's
maximum number of PIDs. With this change the new default is 15%. At the
kernel's default pids_max value of 32768 this translates to 4915. At machined's
default TasksMax= setting of 16384 this translates to 2457.

Why 15%? Because it sounds like a round number and is close enough to 4096
which I was going for, i.e. an eight-fold increase over the old 512

Summary:

            | on the host | in a container
old default |         512 |           512
new default |        4915 |          2457
2016-07-22 15:33:13 +02:00
Lennart Poettering
84af7821b6 main: simplify things a bit by moving container check into fixup_environment() 2016-07-22 15:33:12 +02:00
Lennart Poettering
f7903e8db6 core: rename MemoryLimitByPhysicalMemory transient property to MemoryLimitScale
That way, we can neatly keep this in line with the new TasksMaxScale= option.

Note that we didn't release a version with MemoryLimitByPhysicalMemory= yet,
hence this change should be unproblematic without breaking API.
2016-07-22 15:33:12 +02:00
Lennart Poettering
83f8e80857 core: support percentage specifications on TasksMax=
This adds support for a TasksMax=40% syntax for specifying values relative to
the system's configured maximum number of processes. This is useful in order to
neatly subdivide the available room for tasks within containers.
2016-07-22 15:33:12 +02:00
Lennart Poettering
4b1afed01f core: rework machine-id-setup.c to use the calls from id128-util.[ch]
This allows us to delete quite a bit of code and make the whole thing a lot
shorter.
2016-07-22 12:59:36 +02:00
Lennart Poettering
e042eab720 main: make sure set_machine_id() doesn't clobber arg_machine_id on failure 2016-07-22 12:59:36 +02:00
Lennart Poettering
15b1248a6b machine-id-setup: port machine_id_commit() to new id128-util.c APIs 2016-07-22 12:59:36 +02:00
Lennart Poettering
910fd145f4 sd-id128: split UUID file read/write code into new id128-util.[ch]
We currently have code to read and write files containing UUIDs at various
places. Unify this in id128-util.[ch], and move some other stuff there too.

The new files are located in src/libsystemd/sd-id128/ (instead of src/shared/),
because they are actually the backend of sd_id128_get_machine() and
sd_id128_get_boot().

In follow-up patches we can use this reduce the code in nspawn and
machine-id-setup by adopted the common implementation.
2016-07-22 12:59:36 +02:00
Martin Pitt
bf3dd08a81 Merge pull request #3762 from poettering/sigkill-log
log about all processes we forcibly kill
2016-07-22 09:18:30 +02:00
Martin Pitt
5c3c778014 Merge pull request #3764 from poettering/assorted-stuff-2
Assorted fixes
2016-07-22 09:10:04 +02:00
Alessandro Puccetti
31d28eabc1 nspawn: enable major=0/minor=0 devices inside the container (#3773)
https://github.com/systemd/systemd/pull/3685 introduced
/run/systemd/inaccessible/{chr,blk} to map inacessible devices,
this patch allows systemd running inside a nspawn container to create
/run/systemd/inaccessible/{chr,blk}.
2016-07-21 17:39:38 +02:00
Thomas H. P. Andersen
f8298f7be3 core: remove duplicate includes (#3771) 2016-07-21 10:52:07 +02:00
Topi Miettinen
176e51b710 namespace: fix wrong return value from mount(2) (#3758)
Fix bug introduced by #3263: mount(2) return value is 0 or -1, not errno.

Thanks to Evgeny Vereshchagin (@evverx) for reporting.
2016-07-20 17:43:21 +03:00
Lennart Poettering
33df919d5c execute: make sure JoinsNamespaceOf= doesn't leak ns fds to executed processes 2016-07-20 14:53:15 +02:00
Lennart Poettering
fe048ce56a namespace: add a (void) cast 2016-07-20 14:53:15 +02:00
Lennart Poettering
9ce9347880 core: normalize header inclusion in execute.h a bit
We don't actually need any functionality from cgroup.h in execute.h, hence
don't include that. However, we do need the Unit structure from unit.h, hence
include that, and move it as late as possible, since it needs the definitions
from execute.h.
2016-07-20 14:53:15 +02:00
Lennart Poettering
7a1ab780c4 execute: normalize connect_logger_as() parameters slightly
All other functions in execute.c that need the unit id take a Unit* parameter
as first argument. Let's change connect_logger_as() to follow a similar logic.
2016-07-20 14:53:15 +02:00
Lennart Poettering
3862e809d0 core: when a scope was abandoned, always log about processes we kill
After all, if a unit is abandoned, all processes inside of it may be considered
"left over" and are something we should better log about.
2016-07-20 14:35:15 +02:00
Lennart Poettering
f4b0fb236b core: make sure RequestStop signal is send directed
This was accidentally left commented out for debugging purposes, let's fix that
and make the signal directed again.
2016-07-20 14:35:15 +02:00
Lennart Poettering
1d98fef17d core: when forcibly killing/aborting left-over unit processes log about it
Let's lot at LOG_NOTICE about any processes that we are going to
SIGKILL/SIGABRT because clean termination of them didn't work.

This turns the various boolean flag parameters to cg_kill(), cg_migrate() and
related calls into a single binary flags parameter, simply because the function
now gained even more parameters and the parameter listed shouldn't get too
long.

Logging for killing processes is done either when the kill signal is SIGABRT or
SIGKILL, or on explicit request if KILL_TERMINATE_AND_LOG instead of LOG_TERMINATE
is passed. This isn't used yet in this patch, but is made use of in a later
patch.
2016-07-20 14:35:15 +02:00
Lennart Poettering
5fd7cf6fe2 namespace: minor improvements
We generally try to avoid strerror(), due to its threads-unsafety, let's do
this here, too.

Also, let's be tiny bit more explanatory with the log messages, and let's
shorten a few things.
2016-07-20 08:57:25 +02:00
Lennart Poettering
d724118e20 core: hide legacy bus properties
We usually hide legacy bus properties from introspection. Let's do that for the
InaccessibleDirectories= properties too.

The properties stay accessible if requested, but they won't be listed anymore
if people introspect the unit.
2016-07-20 08:55:50 +02:00
Alessandro Puccetti
2a624c36e6 doc,core: Read{Write,Only}Paths= and InaccessiblePaths=
This patch renames Read{Write,Only}Directories= and InaccessibleDirectories=
to Read{Write,Only}Paths= and InaccessiblePaths=, previous names are kept
as aliases but they are not advertised in the documentation.

Renamed variables:
`read_write_dirs` --> `read_write_paths`
`read_only_dirs` --> `read_only_paths`
`inaccessible_dirs` --> `inaccessible_paths`
2016-07-19 17:22:02 +02:00
Alessandro Puccetti
c4b4170746 namespace: unify limit behavior on non-directory paths
Despite the name, `Read{Write,Only}Directories=` already allows for
regular file paths to be masked. This commit adds the same behavior
to `InaccessibleDirectories=` and makes it explicit in the doc.
This patch introduces `/run/systemd/inaccessible/{reg,dir,chr,blk,fifo,sock}`
{dile,device}nodes and mounts on the appropriate one the paths specified
in `InacessibleDirectories=`.

Based on Luca's patch from https://github.com/systemd/systemd/pull/3327
2016-07-19 17:22:02 +02:00
Lukáš Nykrýn
ccc2c98e1b manager: don't skip sigchld handler for main and control pid for services (#3738)
During stop when service has one "regular" pid one main pid and one
control pid and the sighld for the regular one is processed first the
unit_tidy_watch_pids will skip the main and control pid and does not
remove them from u->pids(). But then we skip the sigchld event because we
already did one in the iteration and there are two pids in u->pids.

v2: Use general unit_main_pid() and unit_control_pid() instead of
reaching directly to service structure.
2016-07-16 15:04:13 -04:00
Zbigniew Jędrzejewski-Szmek
2ed968802c tree-wide: get rid of selinux_context_t (#3732)
9eb9c93275
deprecated selinux_context_t. Replace with a simple char* everywhere.

Alternative fix for #3719.
2016-07-15 18:44:02 +02:00
Zbigniew Jędrzejewski-Szmek
1071fd0823 macros: provide %_systemdgeneratordir and %_systemdusergeneratordir (#3672)
... as requested in
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/DJ7HDNRM5JGBSA4HL3UWW5ZGLQDJ6Y7M/.
Adding the macro makes it marginally easier to create generators
for outside projects.

I opted for "generatordir" and "usergeneratordir" to match
%unitdir and %userunitdir. OTOH, "_systemd" prefix makes it obvious
that this is related to systemd. "%_generatordir" would be to generic
of a name.
2016-07-15 09:35:49 +02:00
Lennart Poettering
2e79d1828a shutdown: already sync IO before we enter the final killing spree
This way, slow IO journald has to wait for can't cause it to reach the killing
spree timeout and is hit by SIGKILL in addition to SIGTERM.
2016-07-12 17:38:19 +02:00
Lennart Poettering
d450612953 shutdown: use 90s SIGKILL timeout
There's really no reason to use 10s here, let's instead default to 90s like we
do for everything else.

The SIGKILL during the final killing spree is in most regards the fourth level
of a safety net, after all: any normal service should have already been stopped
during the normal service shutdown logic, first via SIGTERM and then SIGKILL,
and then also via SIGTERM during the finall killing spree before we send
SIGKILL. And as a fourth level safety net it should only be required in
exceptional cases, which means it's safe to rais the default timeout, as normal
shutdowns should never be delayed by it.

Note that journald excludes itself from the normal service shutdown, and relies
on the final killing spree to terminate it (this is because it wants to cover
the normal shutdown phase's complete logging). If the system's IO is
excessively slow, then the 10s might not be enough for journald to sync
everything to disk and logs might get lost during shutdown.
2016-07-12 17:32:30 +02:00
Michael Biebl
595bfe7df2 Various fixes for typos found by lintian (#3705) 2016-07-12 12:52:11 +02:00
Luca Bruno
391b81cd03 seccomp: only abort on syscall name resolution failures (#3701)
seccomp_syscall_resolve_name() can return a mix of positive and negative
(pseudo-) syscall numbers, while errors are signaled via __NR_SCMP_ERROR.
This commit lets the syscall filter parser only abort on real parsing
failures, letting libseccomp handle pseudo-syscall number on its own
and allowing proper multiplexed syscalls filtering.
2016-07-12 11:55:26 +02:00
Torstein Husebø
61233823aa treewide: fix typos and remove accidental repetition of words 2016-07-11 16:18:43 +02:00
Evgeny Vereshchagin
224d3d8266 Merge pull request #3680 from joukewitteveen/pam-env
Follow up on #3503 (pass service env vars to PAM sessions)
2016-07-08 17:33:12 +03:00
Jouke Witteveen
84eada2f7f execute: Do not alter call-by-ref parameter on failure
Prevent free from being called on (a part of) the call-by-reference
variable env when setup_pam fails.
2016-07-08 09:42:48 +02:00
David Michael
4f952a3f07 core: queue loading transient units after setting their properties (#3676)
The unit load queue can be processed in the middle of setting the
unit's properties, so its load_state would no longer be UNIT_STUB
for the check in bus_unit_set_properties(), which would cause it to
incorrectly return an error.
2016-07-08 05:43:01 +02:00
Daniel Mack
78a4ee591a cgroup: fix memory cgroup limit regression on kernel 3.10 (#3673)
Commit da4d897e ("core: add cgroup memory controller support on the unified
hierarchy (#3315)") changed the code in src/core/cgroup.c to always write
the real numeric value from the cgroup parameters to the
"memory.limit_in_bytes" attribute file.

For parameters set to CGROUP_LIMIT_MAX, this results in the string
"18446744073709551615" being written into that file, which is UINT64_MAX.
Before that commit, CGROUP_LIMIT_MAX was special-cased to the string "-1".

This causes a regression on CentOS 7, which is based on kernel 3.10, as the
value is interpreted as *signed* 64 bit, and clamped to 0:

[root@n54 ~]# echo 18446744073709551615 >/sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
[root@n54 ~]# cat /sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
0

[root@n54 ~]# echo -1 >/sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
[root@n54 ~]# cat /sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
9223372036854775807

Hence, all units that are subject to the limits enforced by the memory
controller will crash immediately, even though they have no actual limit
set. This happens to for the user.slice, for instance:

[  453.577153] Hardware name: SeaMicro SM15000-64-CC-AA-1Ox1/AMD Server CRB, BIOS Estoc.3.72.19.0018 08/19/2014
[  453.587024]  ffff880810c56780 00000000aae9501f ffff880813d7fcd0 ffffffff816360fc
[  453.594544]  ffff880813d7fd60 ffffffff8163109c ffff88080ffc5000 ffff880813d7fd28
[  453.602120]  ffffffff00000202 fffeefff00000000 0000000000000001 ffff880810c56c03
[  453.609680] Call Trace:
[  453.612156]  [<ffffffff816360fc>] dump_stack+0x19/0x1b
[  453.617324]  [<ffffffff8163109c>] dump_header+0x8e/0x214
[  453.622671]  [<ffffffff8116d20e>] oom_kill_process+0x24e/0x3b0
[  453.628559]  [<ffffffff81088dae>] ? has_capability_noaudit+0x1e/0x30
[  453.634969]  [<ffffffff811d4155>] mem_cgroup_oom_synchronize+0x575/0x5a0
[  453.641721]  [<ffffffff811d3520>] ? mem_cgroup_charge_common+0xc0/0xc0
[  453.648299]  [<ffffffff8116da84>] pagefault_out_of_memory+0x14/0x90
[  453.654621]  [<ffffffff8162f4cc>] mm_fault_error+0x68/0x12b
[  453.660233]  [<ffffffff81642012>] __do_page_fault+0x3e2/0x450
[  453.666017]  [<ffffffff816420a3>] do_page_fault+0x23/0x80
[  453.671467]  [<ffffffff8163e308>] page_fault+0x28/0x30
[  453.676656] Task in /user.slice/user-0.slice/user@0.service killed as a result of limit of /user.slice/user-0.slice/user@0.service
[  453.688477] memory: usage 0kB, limit 0kB, failcnt 7
[  453.693391] memory+swap: usage 0kB, limit 9007199254740991kB, failcnt 0
[  453.700039] kmem: usage 0kB, limit 9007199254740991kB, failcnt 0
[  453.706076] Memory cgroup stats for /user.slice/user-0.slice/user@0.service: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[  453.725702] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[  453.733614] [ 2837]     0  2837    11950      899      23        0             0 (systemd)
[  453.741919] Memory cgroup out of memory: Kill process 2837 ((systemd)) score 1 or sacrifice child
[  453.750831] Killed process 2837 ((systemd)) total-vm:47800kB, anon-rss:3188kB, file-rss:408kB

Fix this issue by special-casing the UINT64_MAX case again.
2016-07-07 19:29:35 -07:00
Jouke Witteveen
1280503b7e execute: Cleanup the environment early
By cleaning up before setting up PAM we maintain control of overriding
behavior in setting variables. Otherwise, pam_putenv is in control.
This also makes sure we use a cleaned up environment in replacing
variables in argv.
2016-07-07 14:15:50 +02:00