1
0
mirror of https://github.com/systemd/systemd.git synced 2025-01-09 01:18:19 +03:00
Commit Graph

636 Commits

Author SHA1 Message Date
Luca Boccassi
427353f668 core: add mount options support for MountImages
Follow the same model established for RootImage and RootImageOptions,
and allow to either append a single list of options or tuples of
partition_number:options.
2020-08-20 14:45:40 +01:00
Luca Boccassi
9ece644435 core: change RootImageOptions to use names instead of partition numbers
Follow the designations from the Discoverable Partitions Specification
2020-08-20 13:58:02 +01:00
Lennart Poettering
a6991726f8 man: clarify that LogNamespace= is for system services only
Fixes: #16638
2020-08-06 18:24:11 +02:00
Luca Boccassi
b3d133148e core: new feature MountImages
Follows the same pattern and features as RootImage, but allows an
arbitrary mount point under / to be specified by the user, and
multiple values - like BindPaths.

Original implementation by @topimiettinen at:
https://github.com/systemd/systemd/pull/14451
Reworked to use dissect's logic instead of bare libmount() calls
and other review comments.
Thanks Topi for the initial work to come up with and implement
this useful feature.
2020-08-05 21:34:55 +01:00
Luca Boccassi
18d7370587 service: add new RootImageOptions feature
Allows to specify mount options for RootImage.
In case of multi-partition images, the partition number can be prefixed
followed by colon. Eg:

RootImageOptions=1:ro,dev 2:nosuid nodev

In absence of a partition number, 0 is assumed.
2020-07-29 17:17:32 +01:00
Lennart Poettering
6b222c4b02 man: fix link markup 2020-07-30 00:51:41 +09:00
Zbigniew Jędrzejewski-Szmek
8fa2cd83c6 Revert "man: add note about systemd-vconsole-setup.service and tty as input/output"
This reverts commit 0b57803630.

From https://github.com/systemd/systemd/pull/16503#issuecomment-660212813:
systemd-vconsole-setup (the binary) is supposed to run asynchronously by udev
therefore ordering early interactive services after systemd-vconsole-setup.service
has basically no effect.

Let's remove this paragraph. It's better to say nothing than to give pointless
advice.
2020-07-22 10:43:52 +02:00
Zbigniew Jędrzejewski-Szmek
cd990847b9 tree-wide: more repeated words 2020-07-07 12:08:22 +02:00
Zbigniew Jędrzejewski-Szmek
e9dd698407 tree-wide: fixes for assorted grammar and spelling issues
Fixes #16363. Also includes some changes where I generalized the pattern.
2020-07-06 11:29:05 +02:00
Zbigniew Jędrzejewski-Szmek
0986bb9b95
Merge pull request #16278 from keszybz/fix-man-links
Fix html links
2020-06-25 18:38:37 +02:00
Zbigniew Jędrzejewski-Szmek
b7a4734551 man: fix links to various external man pages
In cases where we used both die-net and man-pages for the same reference,
I switched to use man-pages everywhere.
2020-06-25 14:41:44 +02:00
Luca Boccassi
d4d55b0d13 core: add RootHashSignature service parameter
Allow to explicitly pass root hash signature as a unit option. Takes precedence
over implicit checks.
2020-06-25 08:45:21 +01:00
Lennart Poettering
6b000af4f2 tree-wide: avoid some loaded terms
https://tools.ietf.org/html/draft-knodel-terminology-02
https://lwn.net/Articles/823224/

This gets rid of most but not occasions of these loaded terms:

1. scsi_id and friends are something that is supposed to be removed from
   our tree (see #7594)

2. The test suite defines an API used by the ubuntu CI. We can remove
   this too later, but this needs to be done in sync with the ubuntu CI.

3. In some cases the terms are part of APIs we call or where we expose
   concepts the kernel names the way it names them. (In particular all
   remaining uses of the word "slave" in our codebase are like this,
   it's used by the POSIX PTY layer, by the network subsystem, the mount
   API and the block device subsystem). Getting rid of the term in these
   contexts would mean doing some major fixes of the kernel ABI first.

Regarding the replacements: when whitelist/blacklist is used as noun we
replace with with allow list/deny list, and when used as verb with
allow-list/deny-list.
2020-06-25 09:00:19 +02:00
Luca Boccassi
0389f4fa81 core: add RootHash and RootVerity service parameters
Allow to explicitly pass root hash (explicitly or as a file) and verity
device/file as unit options. Take precedence over implicit checks.
2020-06-23 10:50:09 +02:00
Zbigniew Jędrzejewski-Szmek
201632e314 tree-wide: s/time-out/timeout/g
See 3f9a0a522f for justification.
2020-05-26 10:28:59 +02:00
Lennart Poettering
d2b843554a man: drop some left-over mentions of StandardOutput=syslog
We dropped them from the StandardOuput= documentation long ago, but
elswhere some references where lurking.
2020-05-15 00:05:46 +02:00
Lennart Poettering
6551cf2d61 man: document $LOG_NAMESPACE 2020-05-14 17:29:28 +02:00
Zbigniew Jędrzejewski-Szmek
26b8190841 man: mention that ProtectSystem= also takes care of /efi 2020-04-30 09:54:00 +02:00
Frantisek Sumsal
86b52a3958 tree-wide: fix spelling errors
Based on a report from Fossies.org using Codespell.

Followup to #15436
2020-04-21 23:21:08 +02:00
Lennart Poettering
33b58dfb41 core: automatically add udev dependency for units using RootImage=
We use udev to wait for /dev/loopX devices to be fully proped hence we
need an implicit ordering dependency on it, for RootImage= to work
reliably in early boot, too.

Fixes: #14972
2020-04-21 16:31:06 +02:00
Lennart Poettering
9b3c65ed36
Merge pull request #15352 from poettering/user-group-name-valdity-rework
user/group name validity rework
2020-04-09 18:49:22 +02:00
Lennart Poettering
611cb82612
Merge pull request #15318 from fbuihuu/inherit-umask-for-user-units
pid1: by default make user units inherit their umask from the user ma…
2020-04-09 17:15:55 +02:00
Franck Bui
5e37d1930b pid1: by default make user units inherit their umask from the user manager
This patch changes the way user managers set the default umask for the units it
manages.

Indeed one can expect that if user manager's umask is redefined through PAM
(via /etc/login.defs or pam_umask), all its children including the units it
spawns have their umask set to the new value.

Hence make user units inherit their umask value from their parent instead of
the hard coded value 0022 but allow them to override this value via their unit
file.

Note that reexecuting managers with 'systemctl daemon-reexec' after changing
UMask= has no effect. To take effect managers need to be restarted with
'systemct restart' instead. This behavior was already present before this
patch.

Fixes #6077.
2020-04-09 14:17:07 +02:00
Zbigniew Jędrzejewski-Szmek
ad21e542b2 manager: add CoredumpFilter= setting
Fixes #6685.
2020-04-09 14:08:48 +02:00
Lennart Poettering
887a8fa341 docs: hook up the new USER_NAMES document everywhere
(Also correct the set of names we accept in User=, which was forgotten
to be updated in ae480f0b09.
2020-04-08 17:30:04 +02:00
Michal Sekletár
e2b2fb7f56 core: add support for setting CPUAffinity= to special "numa" value
systemd will automatically derive CPU affinity mask from NUMA node
mask.

Fixes #13248
2020-03-16 08:57:28 +01:00
Lennart Poettering
5b0a76d107 man: document LogNamespace= unit setting 2020-01-31 15:10:40 +01:00
Kevin Kuehler
022d334561 man: doc: Document ProtectClock= 2020-01-27 11:21:36 -08:00
Lennart Poettering
54ed193f8d man: clarify that user rlimits cannot go beyond limits set for service mgr
Fixes: #10758
2020-01-17 10:09:50 +01:00
Lennart Poettering
ba96a8a277 man: document that program invocation will fail if the User= does not exist
Fixes: #14565
2020-01-17 10:08:13 +01:00
Zbigniew Jędrzejewski-Szmek
ea7fe1d1c2
Merge pull request #14390 from poettering/gpt-var-tmp
introduce GPT partition types for /var and /var/tmp and support them for auto-discovery
2020-01-14 15:37:53 +01:00
Topi Miettinen
412a6c646c systemd.exec: document the file system for EnvironmentFile paths
Files specified with EnvironmentFile are read from PID1 mount namespace, before
any file system operations like RootImage or BindPaths are in effect.
2020-01-02 14:21:16 +01:00
Lennart Poettering
19ac32cdd6 docs: import discoverable partitions spec
This was previously available here:

https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/

Let's pull it into our repository.
2019-12-23 14:44:33 +01:00
Anita Zhang
5749f855a7 core: PrivateUsers=true for (unprivileged) user managers
Let per-user service managers have user namespaces too.

For unprivileged users, user namespaces are set up much earlier
(before the mount, network, and UTS namespaces vs after) in
order to obtain capbilities in the new user namespace and enable use of
the other listed namespaces. However for privileged users (root), the
set up for the user namspace is still done at the end to avoid any
restrictions with combining namespaces inside a user namespace (see
inline comments).

Closes #10576
2019-12-18 11:09:30 -08:00
Lennart Poettering
d58b613bbb man: apparently we lowercased STDOUT/STDERR everywhere else in docs, do so here too 2019-11-28 14:25:38 +01:00
Zbigniew Jędrzejewski-Szmek
f8b68539d0 man: fix a few bogus entries in directives index
When wrong element types are used, directives are sometimes placed in the wrong
section. Also, strip part of text starting with "'", which is used in a few
places and which is displayed improperly in the index.
2019-11-21 22:06:30 +01:00
Lennart Poettering
8af381679d
Merge pull request #13940 from keur/protect_kernel_logs
Add ProtectKernelLogs to systemd.exec
2019-11-15 16:26:10 +01:00
Kevin Kuehler
d916e35b9f man: Add description for ProtectKernelLogs= 2019-11-14 13:31:06 -08:00
Zbigniew Jędrzejewski-Szmek
67f5b9e06e
Merge pull request #14003 from keszybz/user-path-configurable
meson: make user $PATH configurable
2019-11-14 10:08:40 +01:00
Zbigniew Jędrzejewski-Szmek
3602ca6f0c meson: make user $PATH configurable
This partially reverts db11487d10 (the logic to
calculate the correct value is removed, we always use the same setting as for
the system manager). Distributions have an easy mechanism to override this if
they wish.

I think making this configurable is better, because different distros clearly
want different defaults here, and making this configurable is nice and clean.
If we don't make it configurable, distros which either have to carry patches,
or what would be worse, rely on some other configuration mechanism, like
/etc/profile. Those other solutions do not apply everywhere (they usually
require the shell to be used at some point), so it is better if we provide
a nice way to override the default.

Fixes  #13469.
2019-11-13 22:34:14 +01:00
Zbigniew Jędrzejewski-Szmek
1f6597a84c man: mention $RUNTIME_DIRECTORY & friends in environment list 2019-11-13 22:05:11 +01:00
Anita Zhang
644ee25461
Merge pull request #13676 from ClydeByrdIII/service-result-patch
Update service result table
2019-10-29 11:35:41 -07:00
Mark Stosberg
69bdb3b150 man: document updated newline support
Since v239 newlines have been allowed for PassEnvironment=
and EnvironmentFile=, due to #8471.

This PR documents the behavior change.
2019-10-04 11:54:28 +02:00
ClydeByrdIII
b122296272
Update service result table
exec-condition and oom-kill were added without updating this table

Updated success to reflect the code, which also allows kills by signal in certain situations
2019-09-28 01:43:02 -07:00
Yu Watanabe
bd9014c360 man: move TimeoutCleanSec= entry from .service to .exec
Follow-up for 12213aed12.

Closes #13546.
2019-09-13 15:06:40 +02:00
Zbigniew Jędrzejewski-Szmek
db11487d10 manager: put bin before sbin for user instances
Traditionally, user logins had a $PATH in which /bin was before /sbin, while
root logins had a $PATH with /sbin first. This allows the tricks that
consolehelper is doing to work. But even if we ignore consolehelper, having the
path in this order might have been used by admins for other purposes, and
keeping the order in user sessions will make it easier the adoption of systemd
user sessions a bit easier.

Fixes #733.
https://bugzilla.redhat.com/show_bug.cgi?id=1744059

OOM handling in manager_default_environment wasn't really correct.
Now the (theorertical) malloc failure in strv_new() is handled.

Please note that this has no effect on:
- systems with merged /bin-/sbin (e.g. arch)

- when there are no binaries that differ between the two locations.

  E.g. on my F30 laptop there is exactly one program that is affected:
  /usr/bin/setup -> consolehelper.

  There is less and less stuff that relies on consolehelper, but there's still
  some.

So for "clean" systems this makes no difference, but helps with legacy setups.

$ dnf repoquery --releasever=31 --qf %{name} --whatrequires usermode
anaconda-live
audit-viewer
beesu
chkrootkit
driftnet
drobo-utils-gui
hddtemp
mate-system-log
mock
pure-ftpd
setuptool
subscription-manager
system-config-httpd
system-config-rootpassword
system-switch-java
system-switch-mail
usermode-gtk
vpnc-consoleuser
wifi-radar
xawtv
2019-08-27 18:24:44 +02:00
Lennart Poettering
29a3d5caea man: remove trailing space in link in HTML output 2019-07-29 19:25:49 +02:00
Lennart Poettering
b042dd687c man: document that the supplementary groups list is initialized from User='s database entry
Fixes: #12936
2019-07-12 14:25:28 +02:00
Lennart Poettering
8c8208cb80 man: document new "systemctl clean…" operation 2019-07-11 12:18:51 +02:00
Philip Withnall
a9a50bd680 man: Add some notes about variable $prefix for StateDirectory=
tl;dr: It isn’t supported.

Wording by Zbigniew Jędrzejewski-Szmek.

See https://twitter.com/pid_eins/status/1102639279614906369 and
https://gitlab.freedesktop.org/libfprint/fprintd/merge_requests/5#note_125536
onwards.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
2019-07-04 18:26:03 +02:00
Lennart Poettering
330703fb22 man: beef up systemd.exec(5)
Prompted by:

https://lists.freedesktop.org/archives/systemd-devel/2019-May/042773.html
2019-06-24 18:31:36 +02:00
Michal Sekletar
b070c7c0e1 core: introduce NUMAPolicy and NUMAMask options
Make possible to set NUMA allocation policy for manager. Manager's
policy is by default inherited to all forked off processes. However, it
is possible to override the policy on per-service basis. Currently we
support, these policies: default, prefer, bind, interleave, local.
See man 2 set_mempolicy for details on each policy.

Overall NUMA policy actually consists of two parts. Policy itself and
bitmask representing NUMA nodes where is policy effective. Node mask can
be specified using related option, NUMAMask. Default mask can be
overwritten on per-service level.
2019-06-24 16:58:54 +02:00
Lennart Poettering
eedaf7f322 man: drop references to "syslog" and "syslog+console" from man page
These options are pretty much equivalent to "journal" and
"journal+console" anyway, let's simplify things, and drop them from the
documentation hence.

For compat reasons let's keep them in the code.

(Note that they are not 100% identical to 'journal', but I doubt the
distinction in behaviour is really relevant to keep this in the docs.
And we should probably should drop 'syslog' entirely from our codebase
eventually, but it's problematic as long as we semi-support udev on
non-systemd systems still.)
2019-06-24 15:23:11 +02:00
Lennart Poettering
e0e65f7d09 man: document that DynamicUser=1 implied sandboxing cannot be turned off
Fixes: #12476
2019-06-24 14:20:36 +02:00
Zbigniew Jędrzejewski-Szmek
61fbbac1d5 pid1: parse CPUAffinity= in incremental fashion
This makes the handling of this option match what we do in unit files. I think
consistency is important here. (As it happens, it is the only option in
system.conf that is "non-atomic", i.e. where there's a list of things which can
be split over multiple assignments. All other options are single-valued, so
there's no issue of how to handle multiple assignments.)
2019-05-29 10:29:28 +02:00
Ben Boeckel
5238e95759 codespell: fix spelling errors 2019-04-29 16:47:18 +02:00
Zbigniew Jędrzejewski-Szmek
db8d154dc4 man: describe interaction with ProtectHome=/InaccessiblePaths= in BindMount=
https://github.com/systemd/systemd/issues/7153#issuecomment-485252308

Apparently this is still confusing for people.

Longer-term, I think we should just make BindMount= automatically "upgrade"
(or "downgrade", depending on how you look at this), any InaccessiblePath=
mountpoints to "tmpfs". I don't see much point in forcing users to remember
this interaction. But let's at least document the status quo, we can always
update the docs if the code changes.
2019-04-24 10:21:37 +02:00
Lennart Poettering
8e74bf7f9c man: document new OOMPolicy= setting 2019-04-09 11:17:58 +02:00
Lennart Poettering
bf65b7e0c9 core: imply NNP and SUID/SGID restriction for DynamicUser=yes service
Let's be safe, rather than sorry. This way DynamicUser=yes services can
neither take benefit of, nor create SUID/SGID binaries.

Given that DynamicUser= is a recent addition only we should be able to
get away with turning this on, even though this is strictly speaking a
binary compatibility breakage.
2019-04-02 16:56:48 +02:00
Lennart Poettering
7445db6eb7 man: document the new RestrictSUIDSGID= setting 2019-04-02 16:56:48 +02:00
Lennart Poettering
6d463b8aed man: refer to innermost directory as innermost, not as "lowest"
Let's avoid confusion whether the root is at the top or of the bottom of
the directory tree. Moreover we use "innermost" further down for the
same concept, so let's stick to the same terminology here.
2019-04-01 18:30:18 +02:00
Lennart Poettering
8601482cd8 man: tweak XyzDirectory= table a bit 2019-04-01 18:30:18 +02:00
Zbigniew Jędrzejewski-Szmek
de04bbdce1 tree-wide: spell "lifecycle" without hyphen everywhere
We had 10 instances of unhyphentated spelling, and 4 of the hyphenated one.
Consistency trumps ispell.
2019-03-14 22:47:44 +01:00
Lennart Poettering
b3f6c4531e
Merge pull request #12002 from keszybz/man-headers
Man headers
2019-03-14 15:55:04 +01:00
Lennart Poettering
c4d4b5a708 man: say explicitly which settings are not available in --user services
Fixes: #3944
2019-03-14 15:13:33 +01:00
Zbigniew Jędrzejewski-Szmek
3a54a15760 man: use same header for all files
The "include" files had type "book" for some raeason. I don't think this
is meaningful. Let's just use the same everywhere.

$ perl -i -0pe 's^..DOCTYPE (book|refentry) PUBLIC "-//OASIS//DTD DocBook XML V4.[25]//EN"\s+"http^<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"\n  "http^gms' man/*.xml
2019-03-14 14:42:05 +01:00
Zbigniew Jędrzejewski-Szmek
0307f79171 man: standarize on one-line license header
No need to waste space, and uniformity is good.

$ perl -i -0pe 's|\n+<!--\s*SPDX-License-Identifier: LGPL-2.1..\s*-->|\n<!-- SPDX-License-Identifier: LGPL-2.1+ -->|gms' man/*.xml
2019-03-14 14:29:37 +01:00
Lennart Poettering
c648d4d4c8 man: mention that DynamicUser= should not be mixed with ReadWriteDirectory= or AF_UNIX dir fd passing 2019-03-14 09:31:09 +01:00
Zbigniew Jędrzejewski-Szmek
fb6692ed33
Merge pull request #11927 from poettering/network-namespace-path
Add NetworkNamespacePath= to unit files
2019-03-12 14:29:14 +01:00
Lennart Poettering
8df87b4383 man: document that ProtectHostname= disables hostname change notifications 2019-03-08 15:49:10 +01:00
Lennart Poettering
4107452e51 man: document NetworkNamespacePath= 2019-03-07 21:27:02 +01:00
Lennart Poettering
eb5149ba74
Merge pull request #11682 from topimiettinen/private-utsname
core: ProtectHostname feature
2019-02-20 14:12:15 +01:00
Topi Miettinen
aecd5ac621 core: ProtectHostname= feature
Let services use a private UTS namespace. In addition, a seccomp filter is
installed on set{host,domain}name and a ro bind mounts on
/proc/sys/kernel/{host,domain}name.
2019-02-20 10:50:44 +02:00
Lennart Poettering
dcf3c3c3d9 core: export $PIDFILE env var for services, derived from PIDFile= 2019-02-15 11:32:19 +01:00
Zbigniew Jędrzejewski-Szmek
e0e2ecd5a8 man: move entries to the right section in systemd.directives
They were in "miscellaneuos" because of the missing class= assignment.
Probably introduced when the split into sections was done.
2019-02-13 11:17:41 +01:00
Yu Watanabe
d1698b82e6 man: add referecne to systemd-system.conf 2019-02-01 12:31:51 +01:00
Yu Watanabe
68d838f71d man: fix volume num of journalctl 2019-02-01 12:30:36 +01:00
Topi Miettinen
10d44e72ec Document weaknesses with MDWE and suggest hardening
Closes #11473
2019-01-21 11:37:46 +01:00
Philip Withnall
35f2c0ba6a man: Fix a typo in systemd.exec.xml
Signed-off-by: Philip Withnall <withnall@endlessm.com>
2019-01-16 21:33:38 +09:00
Alex Mayer
8d7fac92f0 Docs: Add Missing Space Between Words 2019-01-03 03:07:50 +09:00
Zbigniew Jędrzejewski-Szmek
0b57803630 man: add note about systemd-vconsole-setup.service and tty as input/output
Closes #10019.
2018-12-14 11:18:32 +01:00
Lennart Poettering
438311a518 man: document that env vars are not suitable for passing secrets
Prompted by the thread around:

https://lists.freedesktop.org/archives/systemd-devel/2018-November/041665.html
2018-11-14 09:12:49 +03:00
Lennart Poettering
0e18724eb1 man: emphasize the ReadOnlyPaths= mount propagation "hole"
This changes the ProtectSystem= documentation to refer in more explicit
words to the restrictions of ReadOnlyPath=, as sugegsted in #9857.

THis also extends the paragraph in ReadOnlyPath= that explains the hole.

Fixes: #9857
2018-10-30 15:30:18 +01:00
Lennart Poettering
d287820dec man: document that various sandboxing settings are not available in --user services
This is brief and doesn't go into detail, but should at least indicate
to those searching for it that some stuff is not available.

Fixes: #9870
2018-10-30 15:30:18 +01:00
Anita Zhang
90fc172e19 core: implement per unit journal rate limiting
Add LogRateLimitIntervalSec= and LogRateLimitBurst= options for
services. If provided, these values get passed to the journald
client context, and those values are used in the rate limiting
function in the journal over the the journald.conf values.

Part of #10230
2018-10-18 09:56:20 +02:00
Alan Jenkins
923f910115 man/systemd.exec: MountFlags=shared behaviour was changed (fixed?)
The behaviour described *was* observed on Fedora 28
(systemd-238-9.git0e0aa59), with and without SELinux.  I don't actually
know why though!  It contradicts my understanding of the code, including an
explicit comment in the code.

Testing in a VM upgraded to v239-792-g1327f272d, this behaviour goes away.


Test case:

# /etc/systemd/system/mount-test.service
[Service]
MountFlags=shared
Type=oneshot
ExecStart=/usr/bin/ls -l /proc/1/ns/mnt /proc/self/ns/mnt
ExecStart=/usr/bin/grep ext4 /proc/self/mountinfo


Weird old behaviour: new mount namespace but / is fully shared.

lrwxrwxrwx. 1 root root 0 Sep 14 11:18 /proc/1/ns/mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 root root 0 Sep 14 11:48 /proc/self/ns/mnt ->
mnt:[4026532851]

968 967 253:0 / / rw,relatime shared:1 - ext4 /dev/mapper/alan_dell_2016...


Current behaviour: / is not fully shared

lrwxrwxrwx. 1 root root 0 Sep 14 11:39 /proc/1/ns/mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 root root 0 Sep 14 11:41 /proc/self/ns/mnt ->
mnt:[4026532329]

591 558 8:3 / / rw,relatime shared:313 master:1 - ext4 /dev/sda3 rw,secl...
2018-10-05 17:38:38 +02:00
Yu Watanabe
d491e65e74 man: document RUNTIME_DIRECTORY= or friends 2018-09-13 17:02:58 +09:00
Lennart Poettering
2d2224e407 man: document that most sandboxing options are best effort only 2018-08-21 20:00:33 +02:00
Yu Watanabe
fe65e88ba6 namespace: implicitly adds DeviceAllow= when RootImage= is set
RootImage= may require the following settings
```
DeviceAllow=/dev/loop-control rw
DeviceAllow=block-loop rwm
DeviceAllow=block-blkext rwm
```
This adds the following settings implicitly when RootImage= is
specified.

Fixes #9737.
2018-08-06 14:02:31 +09:00
Zsolt Dollenstein
566b7d23eb Add support for opening files for appending
Addresses part of #8983
2018-07-20 03:54:22 -07:00
Lennart Poettering
9236cabf78 man: elaborate a bit on the effect of PrivateNetwork=
Triggered by this thread:

https://lists.freedesktop.org/archives/systemd-devel/2018-July/040992.html
2018-07-17 21:41:23 +02:00
Alexander Kurtz
1448dfa6bf man: Mention that paths in unit files must be fully normalized.
Related to issues #9107 and #9498 and PRs #9149 and #9157.
2018-07-05 22:55:26 +02:00
Zbigniew Jędrzejewski-Szmek
514094f933 man: drop mode line in file headers
This is already included in .dir-locals, so we don't need it
in the files themselves.
2018-07-03 01:32:25 +02:00
Lennart Poettering
705268414f seccomp: add new system call filter, suitable as default whitelist for system services
Currently we employ mostly system call blacklisting for our system
services. Let's add a new system call filter group @system-service that
helps turning this around into a whitelist by default.

The new group is very similar to nspawn's default filter list, but in
some ways more restricted (as sethostname() and suchlike shouldn't be
available to most system services just like that) and in others more
relaxed (for example @keyring is blocked in nspawn since it's not
properly virtualized yet in the kernel, but is fine for regular system
services).
2018-06-14 17:44:20 +02:00
Zbigniew Jędrzejewski-Szmek
fdbbee37d5 man: drop unused <authorgroup> tags from man sources
Docbook styles required those to be present, even though the templates that we
use did not show those names anywhere. But something changed semi-recently (I
would suspect docbook templates, but there was only a minor version bump in
recent years, and the changelog does not suggest anything related), and builds
now work without those entries. Let's drop this dead weight.

Tested with F26-F29, debian unstable.

$ perl -i -0pe 's/\s*<authorgroup>.*<.authorgroup>//gms' man/*xml
2018-06-14 12:22:18 +02:00
Lennart Poettering
0c69794138 tree-wide: remove Lennart's copyright lines
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
2018-06-14 10:20:20 +02:00
Lennart Poettering
818bf54632 tree-wide: drop 'This file is part of systemd' blurb
This part of the copyright blurb stems from the GPL use recommendations:

https://www.gnu.org/licenses/gpl-howto.en.html

The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.

hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
2018-06-14 10:20:20 +02:00
Zbigniew Jędrzejewski-Szmek
70127be805
Merge pull request #9153 from poettering/private-mounts
introduce PrivateMounts= setting and clean up documentation for MountFlags=
2018-06-13 08:20:18 +02:00
Michael Biebl
1b2ad5d9a5 doc: more spelling fixes 2018-06-12 16:31:30 +02:00
Lennart Poettering
2f2e14b251 man: document the new PrivateMounts= setting
Also, extend the documentation on MountFlags= substantially, hopefully
addressing all the questions of #4393

Fixes: #4393
2018-06-12 16:27:37 +02:00
Lennart Poettering
f86fae61ec tree-wide: drop trailing whitespace 2018-06-12 13:05:38 +02:00
Bruno Vernay
8d00da49fb Table is easier to grasp
State goes in CONFIG for users

3rd review
2018-06-11 13:52:55 +02:00
Yu Watanabe
d3c8afd092 man: RuntimeDirectory= or friends accept dot contained paths 2018-06-04 01:44:04 +09:00
Yu Watanabe
617d253afa load-fragment: make IOScheduling{Class,Priority}= accept the empty string 2018-05-31 11:09:41 +09:00
Lennart Poettering
cdc0f9be92
Merge pull request #8817 from yuwata/cleanup-nsflags
core: allow to specify RestrictNamespaces= multiple times
2018-05-24 16:49:13 +02:00
Lucas Werkmeister
8d29bef6b5 man: fix reference in StandardOutput=
Since StandardOutput=file:path is more similar to StandardInput= than
StandardInputText=, and only StandardInput= is actually documented above
StandardOutput= whereas StandardInputText= is documented below it, I
assume the intention was to refer to the former.
2018-05-14 08:11:37 +02:00
Yu Watanabe
b086654c6a man: fix merging rule for CapabilityBoundingSet= 2018-05-05 11:07:37 +09:00
Yu Watanabe
53255e53ce man: mention that RestrictNamespaces= can be specified multiple times 2018-05-05 11:07:37 +09:00
Lennart Poettering
46b073298f man: don't claim we'd set XDG_SEAT and XDG_VTNR as part of service management
Previously, reading through systemd.exec(5) one might get the idea that
XDG_SEAT and XDG_VTNR are part of the service management logic, but they
are not, they are only set if pam_systemd is part of a PAM stack an
pam_systemd is used.

Hence, let's drop these env vars from the list of env vars, and instead
add a paragraph after the list mentioning that pam_systemd might add
more systemd-specific env vars if included in the PAM stack for a
service that uses PAMName=.
2018-04-27 17:32:01 +02:00
Lennart Poettering
3e0bff7d0b man: document BSD exit codes in systemd.exec(5) too
Our own tools use them now, and we probably should encourage that, hence
let's document them along with the other exit codes we use.
2018-04-27 17:32:01 +02:00
Lennart Poettering
5d13a15b1d tree-wide: drop spurious newlines (#8764)
Double newlines (i.e. one empty lines) are great to structure code. But
let's avoid triple newlines (i.e. two empty lines), quadruple newlines,
quintuple newlines, …, that's just spurious whitespace.

It's an easy way to drop 121 lines of code, and keeps the coding style
of our sources a bit tigther.
2018-04-19 12:13:23 +02:00
Zbigniew Jędrzejewski-Szmek
11a1589223 tree-wide: drop license boilerplate
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.

I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
2018-04-06 18:58:55 +02:00
Yu Watanabe
e568a92d99 man: suggests TemporaryFileSystem= when people want to nest bind mounts inside InaccessiblePaths= (#8288)
Suggested by @sourcejedi in #8242.
Closes #7895, #7153, and #2780.
2018-02-27 08:59:03 +01:00
Alan Jenkins
59e00b2a16
Merge pull request #7908 from yuwata/rfe-7895
core: add TemporaryFileSystem= setting and 'tmpfs' option to ProtectHome=
2018-02-21 08:57:11 +00:00
Yu Watanabe
e4da7d8c79 core: add new option 'tmpfs' to ProtectHome=
This make ProtectHome= setting can take 'tmpfs'. This is mostly
equivalent to `TemporaryFileSystem=/home /run/user /root`.
2018-02-21 09:18:17 +09:00
Yu Watanabe
c10b460b5a man: add documents for TemporaryFileSystem= 2018-02-21 09:18:11 +09:00
Yu Watanabe
4ca763a902 core/namespace: make '-' prefix in Bind{,ReadOnly}Paths= work
Each path in `Bind{ReadOnly}Paths=` accept '-' prefix. However,
the prefix is completely ignored.
This makes it work as expected.
2018-02-21 09:07:56 +09:00
Lennart Poettering
00f5ad93b5 core: change KeyringMode= to "shared" by default for non-service units in the system manager (#8172)
Before this change all unit types would default to "private" in the
system service manager and "inherit" to in the user service manager.

With this change this is slightly altered: non-service units of the
system service manager are now run with KeyringMode=shared. This appears
to be the more appropriate choice as isolation is not as desirable for
mount tools, which regularly consume key material. After all mounts are
a shared resource themselves as they appear system-wide hence it makes a
lot of sense to share their key material too.

Fixes: #8159
2018-02-20 08:53:34 +01:00
Alan Jenkins
2428aaf8a2 seccomp: allow x86-64 syscalls on x32, used by the VDSO (fix #8060)
The VDSO provided by the kernel for x32, uses x86-64 syscalls instead of
x32 ones.

I think we can safely allow this; the set of x86-64 syscalls should be
very similar to the x32 ones.  The real point is not to allow *x86*
syscalls, because some of those are inconveniently multiplexed and we're
apparently not able to block the specific actions we want to.
2018-02-02 18:12:34 +00:00
Alan Jenkins
62a0680bf2 man: systemd.exec: cleanup "only X will be permitted" ... "but X=X+1"
> Only system calls of the *specified* architectures will be permitted to
> processes of this unit.

(my emphasis)

> Note that setting this option to a non-empty list implies that
> native is included too.

Attempting to use "implies" in the later sentence, in a way that
contradicts the very clear meaning of the earlier sentence... it's too
much.
2018-01-31 15:39:13 +00:00
Yu Watanabe
5af1644314 man: note that systemctl show does not overridden value
Fixes #7694.
2017-12-19 16:07:04 +09:00
Yu Watanabe
69b528832a man: LockPersonality= implies NoNewPrivileges= 2017-12-19 12:48:54 +09:00
Lennart Poettering
f95b0be742 man: "systemd" is to be written in all lower-case, even at beginnings of sentences
This very important commit is very important.
2017-12-13 17:42:04 +01:00
Yu Watanabe
bf2d3d7cae man: fix typo 2017-12-05 23:30:47 +09:00
Yu Watanabe
606df9a5a5 man: fix typo (#7511) 2017-11-30 12:02:20 +01:00
Lennart Poettering
b8afec2107 man: reorder/add sections to systemd.exec(5) (#7412)
The long long list of settings is getting too confusing, let's add some
sections and reorder things in them.

This makes no changes regarding contents, it only reorders things,
sometimes reindents them, and adds sections that made sense to me to
some degree.

Within each sections the settings are ordered by relevance (at least
according to how relevant I personally find them), and not
alphabetically.
2017-11-23 21:20:48 +01:00
Lennart Poettering
0133d5553a
Merge pull request #7198 from poettering/stdin-stdout
Add StandardInput=data, StandardInput=file:... and more
2017-11-19 19:49:11 +01:00
Zbigniew Jędrzejewski-Szmek
572eb058cf Add SPDX license identifiers to man pages 2017-11-19 19:08:15 +01:00
Zbigniew Jędrzejewski-Szmek
a6fabe384d man: add link to kernel docs about no_new_privs 2017-11-19 11:58:45 +01:00
Lennart Poettering
fc8d038130 man: document all the new options we acquired 2017-11-17 11:13:44 +01:00
Lennart Poettering
8b8de13d54 man: document LogFieldMax= and LogExtraFields= 2017-11-16 12:40:17 +01:00
Lennart Poettering
4d14b2bd35 man: update SyslogXYZ= documentation a bit
Let's clarify that these settings only apply to stdout/stderr logging.
Always mention the journal before syslog (as the latter is in most ways
just a legacy alias these days). Always mention the +console cases too.
2017-11-16 12:40:17 +01:00
Yu Watanabe
798499278a man: fix wrong tag (#7358) 2017-11-16 11:35:30 +01:00
Lennart Poettering
b0e8cec2dd man: document > /dev/stderr pitfalls (#7317)
Fixes: #7254
See: #2473
2017-11-14 10:51:09 +01:00
Zbigniew Jędrzejewski-Szmek
b835eeb4ec
shared/seccomp: disallow pkey_mprotect the same as mprotect for W^X mappings (#7295)
MemoryDenyWriteExecution policy could be be bypassed by using pkey_mprotect
instead of mprotect to create an executable writable mapping.

The impact is mitigated by the fact that the man page says "Note that this
feature is fully available on x86-64, and partially on x86", so hopefully
people do not rely on it as a sole security measure.

Found by Karin Hossen and Thomas Imbert from Sogeti ESEC R&D.

https://bugs.launchpad.net/bugs/1725348
2017-11-12 17:28:48 +01:00
Yu Watanabe
3df90f24cc core: allow to specify errno number in SystemCallErrorNumber= 2017-11-11 21:54:24 +09:00
Yu Watanabe
8cfa775f4f core: add support to specify errno in SystemCallFilter=
This makes each system call in SystemCallFilter= blacklist optionally
takes errno name or number after a colon. The errno takes precedence
over the one given by SystemCallErrorNumber=.

C.f. #7173.
Closes #7169.
2017-11-11 21:54:12 +09:00
Yu Watanabe
fdfcb94631 man: update documents for RuntimeDirectory= and friends 2017-11-08 15:52:08 +09:00
Zbigniew Jędrzejewski-Szmek
895265ad7d Merge pull request #7059 from yuwata/dynamic-user-7013
dynamic-user: permit the case static uid and gid are different
2017-10-18 08:37:12 +02:00
Yu Watanabe
3bd493dc93 man: comment a requirement about the static user or group when DynamicUser=yes 2017-10-18 15:30:00 +09:00
Jakub Wilk
dcfaecc70a man: fix typos (#7029) 2017-10-10 21:59:03 +02:00
Lennart Poettering
44898c5358 seccomp: add three more seccomp groups
@aio → asynchronous IO calls
@sync → msync/fsync/... and friends
@chown → changing file ownership

(Also, change @privileged to reference @chown now, instead of the
individual syscalls it contains)
2017-10-05 15:42:48 +02:00
Djalal Harouni
09d3020b0a seccomp: remove '@credentials' syscall set (#6958)
This removes the '@credentials' syscall set that was added in commit
v234-468-gcd0ddf6f75.

Most of these syscalls are so simple that we do not want to filter them.
They work on the current calling process, doing only read operations,
they do not have a deep kernel path.

The problem may only be in 'capget' syscall since it can query arbitrary
processes, and used to discover processes, however sending signal 0 to
arbitrary processes can be used to discover if a process exists or not.
It is unfortunate that Linux allows to query processes of different
users. Lets put it now in '@process' syscall set, and later we may add
it to a new '@basic-process' set that allows most basic process
operations.
2017-10-03 07:20:05 +02:00
Lennart Poettering
4a62836033 man: document the new logic 2017-10-02 17:41:44 +02:00
Lennart Poettering
5aaeeffb5f man: document that PAMName= and NotifyAccess=all don't mix well.
See: #6045
2017-10-02 12:58:42 +02:00
Zbigniew Jędrzejewski-Szmek
3d7d3cbbda Merge pull request #6832 from poettering/keyring-mode
Add KeyringMode unit property to fix cryptsetup key caching
2017-09-15 21:24:48 +02:00
Lennart Poettering
b1edf4456e core: add new per-unit setting KeyringMode= for controlling kernel keyring setup
Usually, it's a good thing that we isolate the kernel session keyring
for the various services and disconnect them from the user keyring.
However, in case of the cryptsetup key caching we actually want that
multiple instances of the cryptsetup service can share the keys in the
root user's user keyring, hence we need to be able to disable this logic
for them.

This adds KeyringMode=inherit|private|shared:

    inherit: don't do any keyring magic (this is the default in systemd --user)
    private: a private keyring as before (default in systemd --system)
    shared: the new setting
2017-09-15 16:53:35 +02:00
Jan Synacek
91a8f867b6 doc: document service exit codes
(Heavily reworked by Lennart while rebasing)

Fixes: #3545
Replaces: #5159
2017-09-15 16:44:06 +02:00
Lennart Poettering
ab2116b140 core: make sure that $JOURNAL_STREAM prefers stderr over stdout information (#6824)
If two separate log streams are connected to stdout and stderr, let's
make sure $JOURNAL_STREAM points to the latter, as that's the preferred
log destination, and the environment variable has been created in order
to permit services to automatically upgrade from stderr based logging to
native journal logging.

Also, document this behaviour.

Fixes: #6800
2017-09-15 08:26:38 +02:00
Lennart Poettering
21f0669163 Merge pull request #6801 from johnlinp/master
man: explicitly distinguish "implicit dependencies" and "default dependencies"
2017-09-14 21:41:13 +02:00
Zbigniew Jędrzejewski-Szmek
8b5c528ce8 Merge pull request #6818 from poettering/nspawn-whitelist
convert nspawn syscall blacklist into a whitelist (and related stuff)
2017-09-14 19:47:59 +02:00
Lennart Poettering
cd0ddf6f75 seccomp: add four new syscall groups
These groups should be useful shortcuts for sets of closely related
syscalls where it usually makes more sense to allow them altogether or
not at all.
2017-09-14 15:45:21 +02:00
Lennart Poettering
00819cc151 core: add new UnsetEnvironment= setting for unit files
With this setting we can explicitly unset specific variables for
processes of a unit, as last step of assembling the environment block
for them. This is useful to fix #6407.

While we are at it, greatly expand the documentation on how the
environment block for forked off processes is assembled.
2017-09-14 15:17:40 +02:00
Zbigniew Jędrzejewski-Szmek
e124ccdf5b man: rework grammatical form of sentences in a table in systemd.exec(5)
"Currently, the following values are defined: xxx: in case <condition>" is
awkward because "xxx" is always defined unconditionally. It is _used_ in case
<condition> is true. Correct this and a bunch of other places where the
sentence structure makes it unclear what is the subject of the sentence.
2017-09-13 23:06:20 +02:00
John Lin
45f09f939b man: explicitly distinguish "implicit dependencies" and "default dependencies"
Fixes: #6793
2017-09-13 11:39:09 +08:00
Lennart Poettering
38a7c3c0bd man: complete and rework $SERVICE_RESULT documentation
This reworks the paragraph describing $SERVICE_RESULT into a table, and
adds two missing entries: "success" and "start-limit-hit".

These two entries are then also added to the table explaining the
$EXIT_CODE + $EXIT_STATUS variables.

Fixes: #6597
2017-09-12 18:04:26 +02:00
Yu Watanabe
de7070b49a man: add examples for CapabilityBoundingSet=
Follow-up for c792ec2e35.
2017-09-04 16:20:55 +09:00
Yu Watanabe
e8d85bc062 man: LockPersonality= takes a boolean argument (#6718)
Follow-up for 78e864e5b3.
2017-09-01 09:38:41 +02:00
Yu Watanabe
ada5e27657 core: StateDirectory= and friends imply RequiresMountsFor= 2017-08-31 18:19:35 +09:00
Topi Miettinen
78e864e5b3 seccomp: LockPersonality boolean (#6193)
Add LockPersonality boolean to allow locking down personality(2)
system call so that the execution domain can't be changed.
This may be useful to improve security because odd emulations
may be poorly tested and source of vulnerabilities, while
system services shouldn't need any weird personalities.
2017-08-29 15:54:50 +02:00
Diogo Pereira
c29ebc1a10 Fix typo in man/systemd.exec.xml (#6683) 2017-08-28 18:38:29 +02:00
Lennart Poettering
6eaaeee93a seccomp: add new @setuid seccomp group
This new group lists all UID/GID credential changing syscalls (which are
quite a number these days). This will become particularly useful in a
later commit, which uses this group to optionally permit user credential
changing to daemons in case ambient capabilities are not available.
2017-08-10 15:02:50 +02:00
Yu Watanabe
2d35b79cdc man: DynamicUser= does not imply PrivateDevices= (#6510)
Follow-up for effbd6d2ea.
2017-08-07 11:02:47 +02:00
Yu Watanabe
3536f49e8f core: add {State,Cache,Log,Configuration}Directory= (#6384)
This introduces {State,Cache,Log,Configuration}Directory= those are
similar to RuntimeDirectory=. They create the directories under
/var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode
specified in {State,Cache,Log,Configuration}DirectoryMode=.

This also fixes #6391.
2017-07-18 14:34:52 +02:00
Lennart Poettering
7398320f9a Merge pull request #6328 from yuwata/runtime-preserve
core: Allow preserving contents of RuntimeDirectory over process restart
2017-07-17 10:02:19 +02:00
Yu Watanabe
23a7448efa core: support subdirectories in RuntimeDirectory= option 2017-07-17 16:30:53 +09:00
Yu Watanabe
53f47dfc7b core: allow preserving contents of RuntimeDirectory= over process restart
This introduces RuntimeDirectoryPreserve= option which takes a boolean
argument or 'restart'.

Closes #6087.
2017-07-17 16:22:25 +09:00
Lucas Werkmeister
ceabfb889d Fix spelling (#6378) 2017-07-15 12:29:09 -04:00
Lennart Poettering
6297d07b82 Merge pull request #6300 from keszybz/refuse-to-load-some-units
Refuse to load some units
2017-07-12 09:28:20 +02:00
Zbigniew Jędrzejewski-Szmek
b023856884 man: add warnings that Private*= settings are not always applied 2017-07-11 13:38:13 -04:00
Lennart Poettering
565dab8ef4 man: briefly document permitted user/group name syntax for User=/Group= and syusers.d (#6321)
As discussed here:

https://lists.freedesktop.org/archives/systemd-devel/2017-July/039237.html
2017-07-10 13:44:06 -04:00
Zbigniew Jędrzejewski-Szmek
189cd8c2ab man: describe RuntimeDirectoryMode=
Fixes #5509.
2017-06-17 15:23:02 -04:00
Zbigniew Jędrzejewski-Szmek
03c3c52040 man: update MemoryDenyWriteExecute description for executable stacks
Without going into details, mention that libraries are also covered by the
filters, and that executable stacks are a no no.

Closes #5970.
2017-05-30 16:44:48 -04:00
Zbigniew Jędrzejewski-Szmek
98e9d71022 man: fix links to external man pages
linkchecker ftw!
2017-05-07 11:29:40 -04:00
James Cowgill
a3645cc6dd seccomp: add clone syscall definitions for mips (#5880)
Also updates the documentation and adds a mention of ppc64 support
which was enabled by #5325.

Tested on Debian mipsel and mips64el. The other 4 mips architectures
should have an identical user <-> kernel ABI to one of the 2 tested
systems.
2017-05-03 18:35:45 +02:00
Mark Stosberg
b8e485faf1 man: document how to include an equals sign in a value provided to Environment= (#5710)
It wasn't clear before how an equals sign in an "Environment=" value might be
handled. Ref:
http://stackoverflow.com/questions/43278883/how-to-write-systemd-environment-variables-value-which-contains/43280157
2017-04-11 23:19:06 +02:00
Torstein Husebø
6cf5a96489 man: fix typo (#5556) 2017-03-08 07:54:22 -05:00
Lennart Poettering
525872bfab man: document that ProtectKernelTunables= and ProtectControlGroups= implies MountAPIVFS=
See: #5384
2017-02-21 21:55:43 +01:00
AsciiWolf
28a0ad81ee man: use https:// in URLs 2017-02-21 16:28:04 +01:00
Lennart Poettering
0b8fab97cf man: improve documentation on seccomp regarding alternative ABIs
Let's clarify that RestrictAddressFamilies= and MemoryDenyWriteExecute=
are only fully effective if non-native system call architectures are
disabled, since they otherwise may be used to circumvent the filters, as
the filters aren't equally effective on all ABIs.

Fixes: #5277
2017-02-09 18:42:17 +01:00
Lennart Poettering
23deef88b9 Revert "core/execute: set HOME, USER also for root users"
This reverts commit 8b89628a10.

This broke #5246
2017-02-09 11:43:44 +01:00
Zbigniew Jędrzejewski-Szmek
fc6149a6ce Merge pull request #4962 from poettering/root-directory-2
Add new MountAPIVFS= boolean unit file setting + RootImage=
2017-02-08 23:05:05 -05:00
Zbigniew Jędrzejewski-Szmek
ef3116b5d4 man: add more commas for clarify and reword a few sentences 2017-02-08 22:53:16 -05:00
Lennart Poettering
ae9d60ce4e seccomp: on s390 the clone() parameters are reversed
Add a bit of code that tries to get the right parameter order in place
for some of the better known architectures, and skips
restrict_namespaces for other archs.

This also bypasses the test on archs where we don't know the right
order.

In this case I didn't bother with testing the case where no filter is
applied, since that is hopefully just an issue for now, as there's
nothing stopping us from supporting more archs, we just need to know
which order is right.

Fixes: #5241
2017-02-08 22:21:27 +01:00
Lennart Poettering
8a50cf6957 seccomp: MemoryDenyWriteExecute= should affect both mmap() and mmap2() (#5254)
On i386 we block the old mmap() call entirely, since we cannot properly
filter it. Thankfully it hasn't been used by glibc since quite some
time.

Fixes: #5240
2017-02-08 15:14:02 +01:00
Lennart Poettering
915e6d1676 core: add RootImage= setting for using a specific image file as root directory for a service
This is similar to RootDirectory= but mounts the root file system from a
block device or loopback file instead of another directory.

This reuses the image dissector code now used by nspawn and
gpt-auto-discovery.
2017-02-07 12:19:42 +01:00
Lennart Poettering
5d997827e2 core: add a per-unit setting MountAPIVFS= for mounting /dev, /proc, /sys in conjunction with RootDirectory=
This adds a boolean unit file setting MountAPIVFS=. If set, the three
main API VFS mounts will be mounted for the service. This only has an
effect on RootDirectory=, which it makes a ton times more useful.

(This is basically the /dev + /proc + /sys mounting code posted in the
original #4727, but rebased on current git, and with the automatic logic
replaced by explicit logic controlled by a unit file setting)
2017-02-07 11:22:05 +01:00
Lennart Poettering
142bd808a1 man: Document that RestrictAddressFamilies= doesn't work on s390/s390x/...
We already say that it doesn't work on i386, but there are more archs
like that apparently.
2017-02-06 14:17:12 +01:00
Zbigniew Jędrzejewski-Szmek
8b89628a10 core/execute: set HOME, USER also for root users
This changes the environment for services running as root from:

LANG=C.utf8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
INVOCATION_ID=ffbdec203c69499a9b83199333e31555
JOURNAL_STREAM=8:1614518

to

LANG=C.utf8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
HOME=/root
LOGNAME=root
USER=root
SHELL=/bin/sh
INVOCATION_ID=15a077963d7b4ca0b82c91dc6519f87c
JOURNAL_STREAM=8:1616718

Making the environment special for the root user complicates things
unnecessarily. This change simplifies both our logic (by making the setting
of the variables unconditional), and should also simplify the logic in
services (particularly scripts).

Fixes #5124.
2017-02-03 11:49:22 -05:00
Brandon Philips
9806301614 man: fix spelling error parth -> path 2017-02-02 00:54:42 +01:00
Jakub Wilk
301a21a880 man: fix typos (#5109) 2017-01-19 16:54:22 +01:00
Zbigniew Jędrzejewski-Szmek
5b3637b44a Merge pull request #4991 from poettering/seccomp-fix 2017-01-17 23:10:46 -05:00
Zbigniew Jędrzejewski-Szmek
374e692252 Merge pull request #5009 from ian-kelling/ian-mnt-namespace-doc 2017-01-11 15:23:00 -05:00
Ian Kelling
fa2a396620 doc: MountFlags= don't reference container which may not exist (#5011) 2017-01-03 21:32:31 +01:00
Ian Kelling
7141028d30 doc: correct "or" to "and" in MountFlags= description (#5010) 2017-01-03 21:31:20 +01:00
Ian Kelling
4b957756b8 man: document mount deletion between commands 2017-01-03 02:17:50 -08:00
Martin Pitt
56a9366d7d Merge pull request #4994 from poettering/private-tmp-tmpfiles
automatically clean up PrivateTmp= left-overs in /var/tmp on next boot
2016-12-29 11:18:38 +01:00
Lennart Poettering
9eb484fa40 man: add brief documentation for the (sd-pam) processes created due to PAMName= (#4967)
A follow-up for #4942, adding a brief but more correct explanation of
the processes.
2016-12-29 10:55:27 +01:00
Lennart Poettering
d71f050599 core: implicitly order units with PrivateTmp= after systemd-tmpfiles-setup.service
Preparation for fixing #4401.
2016-12-27 23:25:24 +01:00
Lennart Poettering
bd2ab3f4f6 seccomp: add two new filter sets: @reboot and @swap
These groupe reboot()/kexec() and swapon()/swapoff() respectively
2016-12-27 18:09:37 +01:00
Lennart Poettering
d2d6c096f6 core: add ability to define arbitrary bind mounts for services
This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow
defining arbitrary bind mounts specific to particular services. This is
particularly useful for services with RootDirectory= set as this permits making
specific bits of the host directory available to chrooted services.

The two new settings follow the concepts nspawn already possess in --bind= and
--bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these
latter options should probably be renamed to BindPaths= and BindReadOnlyPaths=
too).

Fixes: #3439
2016-12-14 00:54:10 +01:00
Jouke Witteveen
a4e26faf33 man: fix $SERVICE_RESULT/$EXIT_CODE/$EXIT_STATUS documentation
Note that any exit code is available through $EXIT_STATUS and not through
$EXIT_CODE. This mimics siginfo.
2016-12-06 13:37:14 +01:00
Jouke Witteveen
7ed0a4c537 bus-util: add protocol error type explanation 2016-11-29 23:19:52 +01:00
Jouke Witteveen
e0c7d5f7be man: document protocol error type for service failures (#4724) 2016-11-23 22:51:33 +01:00
Lennart Poettering
1a1b13c957 seccomp: add @filesystem syscall group (#4537)
@filesystem groups various file system operations, such as opening files and
directories for read/write and stat()ing them, plus renaming, deleting,
symlinking, hardlinking.
2016-11-21 19:29:12 -05:00
Lennart Poettering
5327c910d2 namespace: simplify, optimize and extend handling of mounts for namespace
This changes a couple of things in the namespace handling:

It merges the BindMount and TargetMount structures. They are mostly the same,
hence let's just use the same structue, and rely on C's implicit zero
initialization of partially initialized structures for the unneeded fields.

This reworks memory management of each entry a bit. It now contains one "const"
and one "malloc" path. We use the former whenever we can, but use the latter
when we have to, which is the case when we have to chase symlinks or prefix a
root directory. This means in the common case we don't actually need to
allocate any dynamic memory. To make this easy to use we add an accessor
function bind_mount_path() which retrieves the right path string from a
BindMount structure.

While we are at it, also permit "+" as prefix for dirs configured with
ReadOnlyPaths= and friends: if specified the root directory of the unit is
implicited prefixed.

This also drops set_bind_mount() and uses C99 structure initialization instead,
which I think is more readable and clarifies what is being done.

This drops append_protect_kernel_tunables() and
append_protect_kernel_modules() as append_static_mounts() is now simple enough
to be called directly.

Prefixing with the root dir is now done in an explicit step in
prefix_where_needed(). It will prepend the root directory on each entry that
doesn't have it prefixed yet. The latter is determined depending on an extra
bit in the BindMount structure.
2016-11-17 18:08:32 +01:00
Djalal Harouni
8526555680 doc: move ProtectKernelModules= documentation near ProtectKernelTunalbes= 2016-11-15 15:04:41 +01:00
Djalal Harouni
a7db8614f3 doc: note when no new privileges is implied 2016-11-15 15:04:35 +01:00
Lennart Poettering
add005357d core: add new RestrictNamespaces= unit file setting
This new setting permits restricting whether namespaces may be created and
managed by processes started by a unit. It installs a seccomp filter blocking
certain invocations of unshare(), clone() and setns().

RestrictNamespaces=no is the default, and does not restrict namespaces in any
way. RestrictNamespaces=yes takes away the ability to create or manage any kind
of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces
so that only mount and IPC namespaces may be created/managed, but no other
kind of namespaces.

This setting should be improve security quite a bit as in particular user
namespacing was a major source of CVEs in the kernel in the past, and is
accessible to unprivileged processes. With this setting the entire attack
surface may be removed for system services that do not make use of namespaces.
2016-11-04 07:40:13 -06:00
Zbigniew Jędrzejewski-Szmek
cf88547034 Merge pull request #4548 from keszybz/seccomp-help
systemd-analyze syscall-filter
2016-11-03 20:27:45 -04:00
Kees Cook
d974f949f1 doc: clarify NoNewPrivileges (#4562)
Setting no_new_privs does not stop UID changes, but rather blocks
gaining privileges through execve(). Also fixes a small typo.
2016-11-03 20:26:59 -04:00
Zbigniew Jędrzejewski-Szmek
d5efc18b60 seccomp-util, analyze: export comments as a help string
Just to make the whole thing easier for users.
2016-11-03 09:35:36 -04:00
Zbigniew Jędrzejewski-Szmek
869feb3388 analyze: add syscall-filter verb
This should make it easier for users to understand what each filter
means as the list of syscalls is updated in subsequent systemd versions.
2016-11-03 09:35:35 -04:00
Lennart Poettering
2ca8dc15f9 man: document that too strict system call filters may affect the service manager
If execve() or socket() is filtered the service manager might get into trouble
executing the service binary, or handling any failures when this fails. Mention
this in the documentation.

The other option would be to implicitly whitelist all system calls that are
required for these codepaths. However, that appears less than desirable as this
would mean socket() and many related calls have to be whitelisted
unconditionally. As writing system call filters requires a certain level of
expertise anyway it sounds like the better option to simply document these
issues and suggest that the user disables system call filters in the service
temporarily in order to debug any such failures.

See: #3993.
2016-11-02 08:55:24 -06:00
Lennart Poettering
133ddbbeae seccomp: add two new syscall groups
@resources contains various syscalls that alter resource limits and memory and
scheduling parameters of processes. As such they are good candidates to block
for most services.

@basic-io contains a number of basic syscalls for I/O, similar to the list
seccomp v1 permitted but slightly more complete. It should be useful for
building basic whitelisting for minimal sandboxes
2016-11-02 08:50:00 -06:00
Lennart Poettering
aa6b9cec88 man: two minor fixes 2016-11-02 08:50:00 -06:00
Lennart Poettering
cd5bfd7e60 seccomp: include pipes and memfd in @ipc
These system calls clearly fall in the @ipc category, hence should be listed
there, simply to avoid confusion and surprise by the user.
2016-11-02 08:50:00 -06:00
Lennart Poettering
a8c157ff30 seccomp: drop execve() from @process list
The system call is already part in @default hence implicitly allowed anyway.
Also, if it is actually blocked then systemd couldn't execute the service in
question anymore, since the application of seccomp is immediately followed by
it.
2016-11-02 08:49:59 -06:00
Lennart Poettering
c79aff9a82 seccomp: add clock query and sleeping syscalls to "@default" group
Timing and sleep are so basic operations, it makes very little sense to ever
block them, hence don't.
2016-11-02 08:49:59 -06:00
Zbigniew Jędrzejewski-Szmek
aa34055ffb seccomp: allow specifying arm64, mips, ppc (#4491)
"Secondary arch" table for mips is entirely speculative…
2016-11-01 09:33:18 -06:00
Jakub Wilk
b17649ee5e man: fix typos (#4527) 2016-10-31 08:08:08 -04:00
Djalal Harouni
fa1f250d6f Merge pull request #4495 from topimiettinen/block-shmat-exec
seccomp: also block shmat(..., SHM_EXEC) for MemoryDenyWriteExecute
2016-10-28 15:41:07 +02:00
Topi Miettinen
d2ffa389b8 seccomp: also block shmat(..., SHM_EXEC) for MemoryDenyWriteExecute
shmat(..., SHM_EXEC) can be used to create writable and executable
memory, so let's block it when MemoryDenyWriteExecute is set.
2016-10-26 18:59:14 +03:00
Zbigniew Jędrzejewski-Szmek
74388c2d11 man: document the default value of NoNewPrivileges=
Fixes #4329.
2016-10-24 23:45:57 -04:00
Lennart Poettering
47da760efd man: document default for User=
Replaces: #4375
2016-10-20 13:21:25 +02:00
Luca Bruno
52c239d770 core/exec: add a named-descriptor option ("fd") for streams (#4179)
This commit adds a `fd` option to `StandardInput=`,
`StandardOutput=` and `StandardError=` properties in order to
connect standard streams to externally named descriptors provided
by some socket units.

This option looks for a file descriptor named as the corresponding
stream. Custom names can be specified, separated by a colon.
If multiple name-matches exist, the first matching fd will be used.
2016-10-17 20:05:49 -04:00
Lennart Poettering
c7458f9399 man: avoid abbreviated "cgroups" terminology (#4396)
Let's avoid the overly abbreviated "cgroups" terminology. Let's instead write:

"Linux Control Groups (cgroups)" is the long form wherever the term is
introduced in prose. Use "control groups" in the short form wherever the term
is used within brief explanations.

Follow-up to: #4381
2016-10-17 09:50:26 -04:00
Zbigniew Jędrzejewski-Szmek
74b47bbd5d man: add crosslink between systemd.resource-control(5) and systemd.exec(5)
Fixes #4379.
2016-10-15 18:38:20 -04:00
Lennart Poettering
8bfdf29b24 Merge pull request #4243 from endocode/djalal/sandbox-first-protection-kernelmodules-v1
core:sandbox: Add ProtectKernelModules= and some fixes
2016-10-13 18:36:29 +02:00
Thomas Hindoe Paaboel Andersen
2dd678171e man: typo fixes
A mix of fixes for typos and UK english
2016-10-12 23:02:44 +02:00
Djalal Harouni
c575770b75 core:sandbox: lets make /lib/modules/ inaccessible on ProtectKernelModules=
Lets go further and make /lib/modules/ inaccessible for services that do
not have business with modules, this is a minor improvment but it may
help on setups with custom modules and they are limited... in regard of
kernel auto-load feature.

This change introduce NameSpaceInfo struct which we may embed later
inside ExecContext but for now lets just reduce the argument number to
setup_namespace() and merge ProtectKernelModules feature.
2016-10-12 14:11:16 +02:00
Djalal Harouni
ac246d9868 doc: minor hint about InaccessiblePaths= in regard of ProtectKernelTunables= 2016-10-12 13:52:40 +02:00
Djalal Harouni
2cd0a73547 core:sandbox: remove CAP_SYS_RAWIO on PrivateDevices=yes
The rawio system calls were filtered, but CAP_SYS_RAWIO allows to access raw
data through /proc, ioctl and some other exotic system calls...
2016-10-12 13:39:49 +02:00
Djalal Harouni
502d704e5e core:sandbox: Add ProtectKernelModules= option
This is useful to turn off explicit module load and unload operations on modular
kernels. This option removes CAP_SYS_MODULE from the capability bounding set for
the unit, and installs a system call filter to block module system calls.

This option will not prevent the kernel from loading modules using the module
auto-load feature which is a system wide operation.
2016-10-12 13:31:21 +02:00
Zbigniew Jędrzejewski-Szmek
56b4c80b42 Merge pull request #4348 from poettering/docfixes
Various smaller documentation fixes.
2016-10-11 13:49:15 -04:00
Lennart Poettering
f4c9356d13 man: beef up documentation on per-unit resource limits a bit
Let's clarify that for user services some OS-defined limits bound the settings
in the unit files.

Fixes: #4232
2016-10-11 18:42:22 +02:00
Lennart Poettering
4b58153dd2 core: add "invocation ID" concept to service manager
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.

The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.

The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.

The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.

Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.

This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().

PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.

A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.

Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
2016-10-07 20:14:38 +02:00
hbrueckner
6abfd30372 seccomp: add support for the s390 architecture (#4287)
Add seccomp support for the s390 architecture (31-bit and 64-bit)
to systemd.

This requires libseccomp >= 2.3.1.
2016-10-05 13:58:55 +02:00
Stefan Schweter
cfaf4b75e0 man: remove consecutive duplicate words (#4268)
This PR removes consecutive duplicate words from the man pages of:

* `resolved.conf.xml`
* `systemd.exec.xml`
* `systemd.socket.xml`
2016-10-03 17:09:54 +02:00
Djalal Harouni
8f81a5f61b core: Use @raw-io syscall group to filter I/O syscalls when PrivateDevices= is set
Instead of having a local syscall list, use the @raw-io group which
contains the same set of syscalls to filter.
2016-09-25 12:52:27 +02:00
Djalal Harouni
49accde7bd core:sandbox: add more /proc/* entries to ProtectKernelTunables=
Make ALSA entries, latency interface, mtrr, apm/acpi, suspend interface,
filesystems configuration and IRQ tuning readonly.

Most of these interfaces now days should be in /sys but they are still
available through /proc, so just protect them. This patch does not touch
/proc/net/...
2016-09-25 11:30:11 +02:00
Djalal Harouni
9221aec8d0 doc: explicitly document that /dev/mem and /dev/port are blocked by PrivateDevices=true 2016-09-25 11:25:44 +02:00
Djalal Harouni
e778185bb5 doc: documentation fixes for ReadWritePaths= and ProtectKernelTunables=
Documentation fixes for ReadWritePaths= and ProtectKernelTunables=
as reported by Evgeny Vereshchagin.
2016-09-25 11:25:31 +02:00
Lennart Poettering
6757c06a1a man: shorten the exit status table a bit
Let's merge a couple of columns, to make the table a bit shorter. This
effectively just drops whitespace, not contents, but makes the currently
humungous table much much more compact.
2016-09-25 10:52:57 +02:00
Lennart Poettering
81c8aceed4 man: the exit code/signal is stored in $EXIT_CODE, not $EXIT_STATUS 2016-09-25 10:52:57 +02:00
Lennart Poettering
effbd6d2ea man: rework documentation for ReadOnlyPaths= and related settings
This reworks the documentation for ReadOnlyPaths=, ReadWritePaths=,
InaccessiblePaths=. It no longer claims that we'd follow symlinks relative to
the host file system. (Which wasn't true actually, as we didn't follow symlinks
at all in the most recent releases, and we know do follow them, but relative to
RootDirectory=).

This also replaces all references to the fact that all fs namespacing options
can be undone with enough privileges and disable propagation by a single one in
the documentation of ReadOnlyPaths= and friends, and then directs the read to
this in all other places.

Moreover a hint is added to the documentation of SystemCallFilter=, suggesting
usage of ~@mount in case any of the fs namespacing related options are used.
2016-09-25 10:42:18 +02:00
Lennart Poettering
b2656f1b1c man: in user-facing documentaiton don't reference C function names
Let's drop the reference to the cap_from_name() function in the documentation
for the capabilities setting, as it is hardly helpful. Our readers are not
necessarily C hackers knowing the semantics of cap_from_name(). Moreover, the
strings we accept are just the plain capability names as listed in
capabilities(7) hence there's really no point in confusing the user with
anything else.
2016-09-25 10:42:18 +02:00
Lennart Poettering
63bb64a056 core: imply ProtectHome=read-only and ProtectSystem=strict if DynamicUser=1
Let's make sure that services that use DynamicUser=1 cannot leave files in the
file system should the system accidentally have a world-writable directory
somewhere.

This effectively ensures that directories need to be whitelisted rather than
blacklisted for access when DynamicUser=1 is set.
2016-09-25 10:42:18 +02:00
Lennart Poettering
3f815163ff core: introduce ProtectSystem=strict
Let's tighten our sandbox a bit more: with this change ProtectSystem= gains a
new setting "strict". If set, the entire directory tree of the system is
mounted read-only, but the API file systems /proc, /dev, /sys are excluded
(they may be managed with PrivateDevices= and ProtectKernelTunables=). Also,
/home and /root are excluded as those are left for ProtectHome= to manage.

In this mode, all "real" file systems (i.e. non-API file systems) are mounted
read-only, and specific directories may only be excluded via
ReadWriteDirectories=, thus implementing an effective whitelist instead of
blacklist of writable directories.

While we are at, also add /efi to the list of paths always affected by
ProtectSystem=. This is a follow-up for
b52a109ad3 which added /efi as alternative for
/boot. Our namespacing logic should respect that too.
2016-09-25 10:42:18 +02:00
Lennart Poettering
59eeb84ba6 core: add two new service settings ProtectKernelTunables= and ProtectControlGroups=
If enabled, these will block write access to /sys, /proc/sys and
/proc/sys/fs/cgroup.
2016-09-25 10:18:48 +02:00