1
0
mirror of https://github.com/systemd/systemd.git synced 2025-01-18 10:04:04 +03:00

64 Commits

Author SHA1 Message Date
Luca Boccassi
d6243ebedd journald: enable persistent FD Store to fix logging during soft-reboot
A unit with StandardOutput=journal (the default) will get its stdout/stderr sockets
disconnected when journald stops, as the file descriptors on journald's side are
not preserved (it works on restart, as the FD Store keeps them open during restarts).
Set FileDescriptorStorePreserve=yes so that the journal FD's stay open during a soft
reboot, and applications don't get broken stdout/stderr.
2024-06-03 16:30:54 +01:00
Yu Watanabe
37143fdf5a units: stop systemd-journald before systemd-soft-reboot.service
Typically, soft-reboot.target is never reached. So, without this change,
systemd-journald may be killed by PID1 on soft-reboot, and may cause
journal corruption.
2024-05-23 00:08:14 +09:00
Sam Leonard
f31cff849d
journald: implement socket forwarding
This commit adds a new way of forwarding journal messages - forwarding
over a socket.

The socket can be any of AF_INET, AF_INET6, AF_UNIUX or AF_VSOCK.

The address to connect to is retrieved from the "journald.forward_address" credential.

It can also be specified in systemd-journald's unit file with ForwardAddress=
2024-02-15 14:08:20 +00:00
Yu Watanabe
f89985ca49 unit: make journald stopped on soft-reboot before broadcasting SIGKILL
Workaround for #30195.
2023-11-28 18:28:17 +09:00
Luca Boccassi
b0d3095fd6 Drop split-usr and unmerged-usr support
As previously announced, execute order 66:

https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html

The meson options split-usr, rootlibdir and rootprefix become no-ops
that print a warning if they are set to anything other than the
default values. We can remove them in a future release.
2023-07-28 19:34:03 +01:00
Daan De Meyer
a4180c0fb3 journald-console: Add colors when forwarding to console
Let's color output when we're forwarding to the console. To make this
work, we inherit TERM from pid 1 and use it to decide whether we should
output colors or not.
2023-03-16 11:22:58 +01:00
Franck Bui
2aba77057e journal: give the ability to enable/disable systemd-journald-audit.socket
Before this patch the only way to prevent journald from reading the audit
messages was to mask systemd-journald-audit.socket. However this had main
drawback that downstream couldn't ship the socket disabled by default (beside
the fact that masking units is not supposed to be the usual way to disable
them).

Fixes #15777
2023-01-11 17:18:57 +01:00
Zbigniew Jędrzejewski-Szmek
74c4bd6b1a units: add IgnoreOnIsolate=yes to systemd-journald too
We already had it on the socket units, so it's possible that
systemd-journald.service would be stopped and then restarted when trafic hits
the sockets when something logs. Let's not try to stop it. It is supposed to
run until the end and be eventually killed in the final killing spree.

This might (or not) help with #23287.
2022-07-01 14:17:33 +09:00
Zbigniew Jędrzejewski-Szmek
059cc610b7 meson: use jinja2 for unit templates
We don't need two (and half) templating systems anymore, yay!

I'm keeping the changes minimal, to make the diff manageable. Some enhancements
due to a better templating system might be possible in the future.

For handling of '## ' — see the next commit.
2021-05-19 10:24:43 +09:00
Lennart Poettering
cb42e63179 units: typo fix /proc/<pid>/exec → /proc/<pid>/exe
Fix a pretty relevant typo introduced in
c7faa23235694a1e803ba093cba6d6e0193a093e.
2020-11-25 11:23:38 +01:00
Franck Bui
c7faa23235 units: document why CAP_SYS_PTRACE is needed by journald 2020-11-25 09:54:28 +01:00
Yu Watanabe
db9ecf0501 license: LGPL-2.1+ -> LGPL-2.1-or-later 2020-11-09 13:23:58 +09:00
Topi Miettinen
cabc1c6d7a units: add ProtectClock=yes
Add `ProtectClock=yes` to systemd units. Since it implies certain
`DeviceAllow=` rules, make sure that the units have `DeviceAllow=` rules so
they are still able to access other devices. Exclude timesyncd and timedated.
2020-04-07 15:37:14 +02:00
Lennart Poettering
340cb115b3 units: define RuntimeDirectory= in systemd-journald.service
It doesn't get us much, but makes the differences between the templated
and non-templated versions a bit smaller.
2020-01-31 15:04:24 +01:00
Lennart Poettering
5591cd4e20 units: sort settings in systemd-journald.service again 2020-01-31 15:04:15 +01:00
Zbigniew Jędrzejewski-Szmek
21d0dd5a89 meson: allow WatchdogSec= in services to be configured
As discussed on systemd-devel [1], in Fedora we get lots of abrt reports
about the watchdog firing [2], but 100% of them seem to be caused by resource
starvation in the machine, and never actual deadlocks in the services being
monitored. Killing the services not only does not improve anything, but it
makes the resource starvation worse, because the service needs cycles to restart,
and coredump processing is also fairly expensive. This adds a configuration option
to allow the value to be changed. If the setting is not set, there is no change.

My plan is to set it to some ridiculusly high value, maybe 1h, to catch cases
where a service is actually hanging.

[1] https://lists.freedesktop.org/archives/systemd-devel/2019-October/043618.html
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1300212
2019-10-25 17:20:24 +02:00
Lennart Poettering
2ec71e439f journald: slightly bump OOM adjust for journald (#13366)
If logging disappears issues are hard to debug, hence let's give
journald a slight edge over other services when the OOM killer hits.

Here are the special adjustments we now make:

 systemd-coredump@.service.in OOMScoreAdjust=500
 systemd-journald.service.in  OOMScoreAdjust=-250
 systemd-udevd.service.in     OOMScoreAdjust=-1000

(i.e. the coredump processing is made more likely to be killed on OOM,
and udevd and journald are less likely to be killed)
2019-08-22 10:02:28 +02:00
Topi Miettinen
9af2820694 units: deny access to block devices
While the need for access to character devices can be tricky to determine for
the general case, it's obvious that most of our services have no need to access
block devices. For logind and timedated this can be tightened further.
2019-06-20 14:03:57 +02:00
Lennart Poettering
62aa29247c units: turn on RestrictSUIDSGID= in most of our long-running daemons 2019-04-02 16:56:48 +02:00
Lennart Poettering
a18449b5bd units: turn of ProtectHostname= again for services hat need to know about system hostname changes
ProtectHostname= turns off hostname change propagation from host to
service. This means for services that care about the hostname and need
to be able to notice changes to it it's not suitable (though it is
useful for most other cases still).

Let's turn it off hence for journald (which logs the current hostname)
for networkd (which optionally sends the current hostname to dhcp
servers) and resolved (which announces the current hostname via
llmnr/mdns).
2019-03-08 15:49:10 +01:00
Topi Miettinen
99894b867f units: enable ProtectHostname=yes 2019-02-20 10:50:44 +02:00
Lennart Poettering
3ca9940cb9 units: set NoNewPrivileges= for all long-running services
Previously, setting this option by default was problematic due to
SELinux (as this would also prohibit the transition from PID1's label to
the service's label). However, this restriction has since been lifted,
hence let's start making use of this universally in our services.

On SELinux system this change should be synchronized with a policy
update that ensures that NNP-ful transitions from init_t to service
labels is permitted.

An while we are at it: sort the settings in the unit files this touches.
This might increase the size of the change in this case, but hopefully
should result in stabler patches later on.

Fixes: #1219
2018-11-12 19:02:55 +01:00
Zbigniew Jędrzejewski-Szmek
c02b6ee496 meson: define @HIGH_RLIMIT_NOFILE@ and use it everywhere 2018-10-17 14:54:48 +02:00
Lennart Poettering
c35ee02c61 units: bump the RLIMIT_NOFILE soft limit for all services that access the journal
This updates the unit files of all our serviecs that deal with journal
stuff to use a higher RLIMIT_NOFILE soft limit by default. The new value
is the same as used for the new HIGH_RLIMIT_NOFILE we just added.

With this we ensure all code that access the journal has higher
RLIMIT_NOFILE. The code that runs as daemon via the unit files, the code
that is run from the user's command line via C code internal to the
relevant tools. In some cases this means we'll redundantly bump the
limits as there are tools run both from the command line and as service.
2018-10-16 16:33:55 +02:00
Lennart Poettering
ee8f26180d units: switch from system call blacklist to whitelist
This is generally the safer approach, and is what container managers
(including nspawn) do, hence let's move to this too for our own
services. This is particularly useful as this this means the new
@system-service system call filter group will get serious real-life
testing quickly.

This also switches from firing SIGSYS on unexpected syscalls to
returning EPERM. This would have probably been a better default anyway,
but it's hard to change that these days. When whitelisting system calls
SIGSYS is highly problematic as system calls that are newly introduced
to Linux become minefields for services otherwise.

Note that this enables a system call filter for udev for the first time,
and will block @clock, @mount and @swap from it. Some downstream
distributions might want to revert this locally if they want to permit
unsafe operations on udev rules, but in general this shiuld be mostly
safe, as we already set MountFlags=shared for udevd, hence at least
@mount won't change anything.
2018-06-14 17:44:20 +02:00
Zbigniew Jędrzejewski-Szmek
a7df2d1e43 Add SPDX license headers to unit files 2017-11-19 19:08:15 +01:00
Lennart Poettering
0a9b166b43 units: prohibit all IP traffic on all our long-running services (#6921)
Let's lock things down further.
2017-10-04 14:16:28 +02:00
Lennart Poettering
bff8f2543b units: set LockPersonality= for all our long-running services (#6819)
Let's lock things down. Also, using it is the only way how to properly
test this to the fullest extent.
2017-09-14 19:45:40 +02:00
Michal Sekletar
3c978aca69 journald: make sure we retain all stream fds across restarts (#6348)
Currently we set 4096 as maximum for number of stream connections that
we accept. However maximum number of file descriptors that systemd is
willing to accept from us is just 1024. This means we can't retain all
stream connections that we accepted. Hence bump the limit of fds in a
unit file so that systemd holds open all stream fds while we are
restarted.

New limit is set to 4224 (4096 + 128).
2017-07-17 10:04:37 +02:00
Michal Sekletar
6f0e6bd253 units: drop explicit NotifyAccess setting from journald's unit file (#5749)
systemd-journald service consists of only single process and that is the
MainPID. Make unit file shorter and drop NotifyAccess=all since it is
not useful in such case.

https://lists.freedesktop.org/archives/systemd-devel/2017-April/038667.html
2017-04-19 08:52:40 +02:00
Lennart Poettering
6489ccfe48 units: make use of @reboot and @swap in our long-running service SystemCallFilter= settings
Tighten security up a bit more.
2017-02-09 16:12:03 +01:00
Lennart Poettering
3c19d0b46b units: restrict namespace for a good number of our own services
Basically, we turn it on for most long-running services, with the
exception of machined (whose child processes need to join containers
here and there), and importd (which sandboxes tar in a CLONE_NEWNET
namespace). machined is left unrestricted, and importd is restricted to
use only "net"
2017-02-09 16:12:03 +01:00
Lennart Poettering
7f396e5f66 units: set SystemCallArchitectures=native on all our long-running services 2017-02-09 16:12:03 +01:00
Lennart Poettering
0c28d51ac8 units: further lock down our long-running services
Let's make this an excercise in dogfooding: let's turn on more security
features for all our long-running services.

Specifically:

- Turn on RestrictRealtime=yes for all of them

- Turn on ProtectKernelTunables=yes and ProtectControlGroups=yes for most of
  them

- Turn on RestrictAddressFamilies= for all of them, but different sets of
  address families for each

Also, always order settings in the unit files, that the various sandboxing
features are close together.

Add a couple of missing, older settings for a numbre of unit files.

Note that this change turns off AF_INET/AF_INET6 from udevd, thus effectively
turning of networking from udev rule commands. Since this might break stuff
(that is already broken I'd argue) this is documented in NEWS.
2016-09-25 10:52:57 +02:00
Lennart Poettering
4e069746fe units: tighten system call filters a bit
Take away kernel keyring access, CPU emulation system calls and various debug
system calls from the various daemons we have.
2016-06-13 16:25:54 +02:00
Topi Miettinen
40093ce5dd units: add a basic SystemCallFilter (#3471)
Add a line
SystemCallFilter=~@clock @module @mount @obsolete @raw-io ptrace
for daemons shipped by systemd. As an exception, systemd-timesyncd
needs @clock system calls and systemd-localed is not privileged.
ptrace(2) is blocked to prevent seccomp escapes.
2016-06-09 09:32:04 +02:00
Topi Miettinen
40652ca479 units: enable MemoryDenyWriteExecute (#3459)
Secure daemons shipped by systemd by enabling MemoryDenyWriteExecute.

Closes: #3459
2016-06-08 14:23:37 +02:00
Lennart Poettering
119e9655dc journal: restore watchdog support 2015-11-03 17:45:12 +01:00
Lennart Poettering
e22aa3d328 journald: never block when sending messages on NOTIFY_SOCKET socket
Otherwise we might run into deadlocks, when journald blocks on the
notify socket on PID 1, and PID 1 blocks on IPC to dbus-daemon and
dbus-daemon blocks on logging to journald. Break this cycle by making
sure that journald never ever blocks on PID 1.

Note that this change disables support for event loop watchdog support,
as these messages are sent in blocking style by sd-event. That should
not be a big loss though, as people reported frequent problems with the
watchdog hitting journald on excessively slow IO.

Fixes: #1505.
2015-11-01 22:12:29 +01:00
Lennart Poettering
c2fc2c2560 units: increase watchdog timeout to 3min for all our services
Apparently, disk IO issues are more frequent than we hope, and 1min
waiting for disk IO happens, so let's increase the watchdog timeout a
bit, for all our services.

See #1353 for an example where this triggers.
2015-09-29 21:55:51 +02:00
Lennart Poettering
a24111cea6 Revert "units: add SecureBits"
This reverts commit 6a716208b346b742053cfd01e76f76fb27c4ea47.

Apparently this doesn't work.

http://lists.freedesktop.org/archives/systemd-devel/2015-February/028212.html
2015-02-11 18:28:06 +01:00
Topi Miettinen
6a716208b3 units: add SecureBits
No setuid programs are expected to be executed, so add
SecureBits=noroot noroot-locked
to unit files.
2015-02-11 17:33:36 +01:00
Lennart Poettering
de45d72603 journal: bump RLIMIT_NOFILE when journal files to 16K (if possible)
When there are a lot of split out journal files, we might run out of fds
quicker then we want. Hence: bump RLIMIT_NOFILE to 16K if possible.

Do these even for journalctl. On Fedora the soft RLIMIT_NOFILE is at 1K,
the hard at 4K by default for normal user processes, this code hence
bumps this up for users to 4K.

https://bugzilla.redhat.com/show_bug.cgi?id=1179980
2015-01-08 03:20:45 +01:00
Lennart Poettering
13790add4b journald: allow restarting journald without losing stream connections
Making use of the fd storage capability of the previous commit, allow
restarting journald by serilizing stream state to /run, and pushing open
fds to PID 1.
2015-01-06 03:16:39 +01:00
Michal Schmidt
a87a38c201 units: make systemd-journald.service Type=notify
It already calls sd_notify(), so it looks like an oversight.

Without it, its ordering to systemd-journal-flush.service is
non-deterministic and the SIGUSR1 from flushing may kill journald before
it has its signal handlers set up.

https://bugs.freedesktop.org/show_bug.cgi?id=85871
https://bugzilla.redhat.com/show_bug.cgi?id=1159641
2014-11-04 20:32:42 +01:00
Lennart Poettering
875c2e220e journald: if available pull audit messages from the kernel into journal logs 2014-11-03 21:51:28 +01:00
Juho Son
f2a474aea8 journald: add CAP_MAC_OVERRIDE in journald for SMACK issue
systemd-journald check the cgroup id to support rate limit option for
every messages. so journald should be available to access cgroup node in
each process send messages to journald.
In system using SMACK, cgroup node in proc is assigned execute label
as each process's execute label.
so if journald don't want to denied for every process, journald
should have all of access rule for all process's label.
It's too heavy. so we could give special smack label for journald te get
all accesses's permission.
'^' label.
When assign '^' execute smack label to systemd-journald,
systemd-journald need to add  CAP_MAC_OVERRIDE capability to get that smack privilege.

so I want to notice this information and set default capability to
journald whether system use SMACK or not.
because that capability affect to only smack enabled kernel
2014-10-22 19:12:06 +02:00
Lennart Poettering
1b8689f949 core: rename ReadOnlySystem= to ProtectSystem= and add a third value for also mounting /etc read-only
Also, rename ProtectedHome= to ProtectHome=, to simplify things a bit.

With this in place we now have two neat options ProtectSystem= and
ProtectHome= for protecting the OS itself (and optionally its
configuration), and for protecting the user's data.
2014-06-04 18:12:55 +02:00
Lennart Poettering
03ee5c38cb journald: move /dev/log socket to /run
This way we can make the socket also available for sandboxed apps that
have their own private /dev. They can now simply symlink the socket from
/dev.
2014-06-04 16:53:58 +02:00
Lennart Poettering
417116f234 core: add new ReadOnlySystem= and ProtectedHome= settings for service units
ReadOnlySystem= uses fs namespaces to mount /usr and /boot read-only for
a service.

ProtectedHome= uses fs namespaces to mount /home and /run/user
inaccessible or read-only for a service.

This patch also enables these settings for all our long-running services.

Together they should be good building block for a minimal service
sandbox, removing the ability for services to modify the operating
system or access the user's private data.
2014-06-03 23:57:51 +02:00