IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Pretty much everything uses just the first argument, and this doesn't make this
common pattern more complicated, but makes it simpler to pass multiple options.
This makes use of rlimit_nofile_bump() in all tools that access the
journal. In some cases this replaces older code to achieve this, and
others we add it in where it was missing.
Let's fold get_user_creds_clean() into get_user_creds(), and introduce a
flags argument for it to select "clean" behaviour. This flags parameter
also learns to other new flags:
- USER_CREDS_SYNTHESIZE_FALLBACK: in this mode the user records for
root/nobody are only synthesized as fallback. Normally, the synthesized
records take precedence over what is in the user database. With this
flag set this is reversed, and the user database takes precedence, and
the synthesized records are only used if they are missing there. This
flag should be set in cases where doing NSS is deemed safe, and where
there's interest in knowing the correct shell, for example if the
admin changed root's shell to zsh or suchlike.
- USER_CREDS_ALLOW_MISSING: if set, and a UID/GID is specified by
numeric value, and there's no user/group record for it accept it
anyway. This allows us to fix#9767
This then also ports all users to set the most appropriate flags.
Fixes: #9767
[zj: remove one isempty() call]
This is a bit like the info link in most of GNU's --help texts, but we
don't do info but man pages, and we make them properly clickable on
terminal supporting that, because awesome.
I think it's generally advisable to link up our (brief) --help texts and
our (more comprehensive) man pages a bit, so this should be an easy and
straight-forward way to do it.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.
This takes inspiration from Rust:
https://doc.rust-lang.org/std/option/enum.Option.html#method.take
and was suggested by Alan Jenkins (@sourcejedi).
It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
Even if pager_open() fails, in general, we should continue the operations.
All erroneous cases in pager_open() show log message in the function.
So, it is not necessary to check the returned value.
In a number of occasions we use FORK_CLOSE_ALL_FDS when forking off a
child, since we don't want to pass fds to the processes spawned (either
because we later want to execve() some other process there, or because
our child might hang around for longer than expected, in which case it
shouldn't keep our fd pinned). This also closes any logging fds, and
thus means logging is turned off in the child. If we want to do proper
logging, explicitly reopen the logs hence in the child at the right
time.
This is particularly crucial in the umount/remount children we fork off
the shutdown binary, as otherwise the children can't log, which is
why #8155 is harder to debug than necessary: the log messages we
generate about failing mount() system calls aren't actually visible on
screen, as they done in the child processes where the log fds are
closed.
When running journalctl --user-unit=foo as an unprivileged user we could get
the usual hint:
Hint: You are currently not seeing messages from the system and other users.
Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages.
...
But with --user-unit our filter is:
(((_UID=0 OR _UID=1000) AND OBJECT_SYSTEMD_USER_UNIT=foo.service) OR
((_UID=0 OR _UID=1000) AND COREDUMP_USER_UNIT=foo.service) OR
(_UID=1000 AND USER_UNIT=foo.service) OR
(_UID=1000 AND _SYSTEMD_USER_UNIT=foo.service))
so we would never see messages from other users.
We could still see messages from the system. In fact, on my machine the
only messages with OBJECT_SYSTEMD_USER_UNIT= are from the system:
journalctl $(journalctl -F OBJECT_SYSTEMD_USER_UNIT|sed 's/.*/OBJECT_SYSTEMD_USER_UNIT=\0/')
Thus, a more correct hint is that we cannot see messages from the system.
Make it so.
Fixes#7887.
Let's make use this at various places we call fsync(), to make things
fully reliable, as the kernel devs suggest to first fsync() files and
then fsync() the directories they are located in.
This commint adds a new command line parameter to sytemd-coredump. The
parameter should be mappend to core_pattern's placeholder %h - hostname.
The field _HOSTNAME holds the name from the kernel's namespaces which might be
different then the one comming from process' namespaces.
It is true that the real hostname is usually available in the field
COREDUMP_ENVIRON (environment variables) but I believe it is more reliable to
use the value passed by kernel.
----
The length of iovec is no longer static and hence I corrected the declarations
of the functions set_iovec_field and set_iovec_field_free.
Thank you @yuwata and @poettering!
This ensures that clients can't keep all files pinned interfering with
our vacuuming logic.
This should fix the last issue pointed out in #7998 and #8032Fixes: #7998
First, let's rename it to disable_coredumps(), as in the rest of our
codebase we spell it "coredump" rather than "core_dump", so let's stick
to that.
However, also log about failures to turn off core dumpling on LOG_DEBUG,
because debug logging is always a good idea.
Using wait_for_terminate_and_check() instead of wait_for_terminate()
let's us simplify, shorten and unify the return value checking and
logging of waitid(). Hence, let's use it all over the place.
This adds a new safe_fork() wrapper around fork() and makes use of it
everywhere. The new wrapper does a couple of things we previously did
manually and separately in a safer, more correct and automatic way:
1. Optionally resets signal handlers/mask in the child
2. Sets a name on all processes we fork off right after forking off (and
the patch assigns useful names for all processes we fork off now,
following a systematic naming scheme: always enclosed in () – in order
to indicate that these are not proper, exec()ed processes, but only
forked off children, and if the process is long-running with only our
own code, without execve()'ing something else, it gets am "sd-" prefix.)
3. Optionally closes all file descriptors in the child
4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe
way so that the parent dying before this happens being handled
safely.
5. Optionally reopens the logs
6. Optionally connects stdin/stdout/stderr to /dev/null
7. Debug logs about the forked off processes.
This makes things a bit easier to read I think, and also makes sure we
always use the _unlikely_ wrapper around it, which so far we used
sometimes and other times we didn't. Let's clean that up.
Let's replace usage of fputc_unlocked() and friends by __fsetlocking(f,
FSETLOCKING_BYCALLER). This turns off locking for the entire FILE*,
instead of doing individual per-call decision whether to use normal
calls or _unlocked() calls.
This has various benefits:
1. It's easier to read and easier not to forget
2. It's more comprehensive, as fprintf() and friends are covered too
(as these functions have no _unlocked() counterpart)
3. Philosophically, it's a bit more correct, because it's more a
property of the file handle really whether we ever pass it on to another
thread, not of the operations we then apply to it.
This patch reworks all pieces of codes that so far used fxyz_unlocked()
calls to use __fsetlocking() instead. It also reworks all places that
use open_memstream(), i.e. use stdio FILE* for string manipulations.
Note that this in some way a revert of 4b61c87511.
Our CODING_STYLE suggests not comparing with NULL, but relying on C's
downgrade-to-bool feature for that. Fix up some code to match these
guidelines. (This is not comprehensive, the coccinelle output for this
is unfortunately kinda borked)
The "nobody" user might possibly be seen by the journal or coredumping
code if unmapped userns-using processes are somehow visible to them.
Let's make sure we don't do the ACL magic for this user either, since
this is a special system user that might be backed by different real
users in different contexts.
This adds uid_is_system() and gid_is_system(), similar in style to
uid_is_dynamic(). That a helper like this is useful is illustrated by
the fact that test-condition.c didn't get the check right so far, which
this patch fixes.
So far I avoided adding license headers to meson files, but they are pretty
big and important and should carry license headers like everything else.
I added my own copyright, even though other people modified those files too.
But this is mostly symbolic, so I hope that's OK.
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
This adds IOVEC_INIT() and IOVEC_MAKE() for initializing iovec structures
from a pointer and a size. On top of these IOVEC_INIT_STRING() and
IOVEC_MAKE_STRING() are added which take a string and automatically
determine the size of the string using strlen().
This patch removes the old IOVEC_SET_STRING() macro, given that
IOVEC_MAKE_STRING() is now useful for similar purposes. Note that the
old IOVEC_SET_STRING() invocations were two characters shorter than the
new ones using IOVEC_MAKE_STRING(), but I think the new syntax is more
readable and more generic as it simply resolves to a C99 literal
structure initialization. Moreover, we can use very similar syntax now
for initializing strings and pointer+size iovec entries. We canalso use
the new macros to initialize function parameters on-the-fly or array
definitions. And given that we shouldn't have so many ways to do the
same stuff, let's just settle on the new macros.
(This also converts some code to use _cleanup_ where dynamically
allocated strings were using IOVEC_SET_STRING() before, to modernize
things a bit)
As a follow-up for db3f45e2d2 let's do the
same for all other cases where we create a FILE* with local scope and
know that no other threads hence can have access to it.
For most cases this shouldn't change much really, but this should speed
dbus introspection and calender time formatting up a bit.
Using conf.set() with a boolean argument does the right thing:
either #ifdef or #undef. This means that conf.set can be used unconditionally.
Previously I used '1' as the placeholder value, and that needs to be changed to
'true' for consistency (under meson 1 cannot be used in boolean context). All
checks need to be adjusted.
The indentation for emacs'es meson-mode is added .dir-locals.
All files are reindented automatically, using the lasest meson-mode from git.
Indentation should now be fairly consistent.
Tests can be run with 'ninja-build test' or using 'mesontest'.
'-Dtests=unsafe' can be used to include the "unsafe" tests in the
test suite, same as with autotools.
v2:
- use more conf.get guards are optional components
- declare deps on generated headers for test-{af,arphrd,cap}-list
v3:
- define environment for tests
Most test don't need this, but to be consistent with autotools-based build, and
to avoid questions which tests need it and which don't, set the same environment
for all tests.
v4:
- rework test generation
Use a list of lists to define each test. This way we can reduce the
boilerplate somewhat, although the test listings are still pretty verbose. We
can also move the definitions of the tests to the subdirs. Unfortunately some
subdirs are included earlier than some of the libraries that test binaries
are linked to. So just dump all definitions of all tests that cannot be
defined earlier into src/test. The `executable` definitions are still at the
top level, so the binaries are compiled into the build root.
v5:
- tag test-dnssec-complex as manual
v6:
- fix HAVE_LIBZ typo
- add missing libgobject/libgio defs
- mark test-qcow2 as manual
It's crucial that we can build systemd using VS2010!
... er, wait, no, that's not the official reason. We need to shed old systems
by requring python 3! Oh, no, it's something else. Maybe we need to throw out
345 years of knowlege accumulated in autotools? Whatever, this new thing is
cool and shiny, let's use it.
This is not complete, I'm throwing it out here for your amusement and critique.
- rules for sd-boot are missing. Those might be quite complicated.
- rules for tests are missing too. Those are probably quite simple and
repetitive, but there's lots of them.
- it's likely that I didn't get all the conditions right, I only tested "full"
compilation where most deps are provided and nothing is disabled.
- busname.target and all .busname units are skipped on purpose.
Otherwise, installation into $DESTDIR has the same list of files and the
autoconf install, except for .la files.
It'd be great if people had a careful look at all the library linking options.
I added stuff until things compiled, and in the end there's much less linking
then in the old system. But it seems that there's still a lot of unnecessary
deps.
meson has a `shared_module` statement, which sounds like something appropriate
for our nss and pam modules. Unfortunately, I couldn't get it to work. For the
nss modules, we need an .so version of '2', but `shared_module` disallows the
version argument. For the pam module, it also didn't work, I forgot the reason.
The handling of .m4 and .in and .m4.in files is rather awkward. It's likely
that this could be simplified. If make support is ever dropped, I think it'd
make sense to switch to a different templating system so that two different
languages and not required, which would make everything simpler yet.
v2:
- use get_pkgconfig_variable
- use sh not bash
- use add_project_arguments
v3:
- drop required:true and fix progs/prog typo
v4:
- use find_library('bz2')
- add TTY_GID definition
- define __SANE_USERSPACE_TYPES__
- use join_paths(prefix, ...) is used on all paths to make them all absolute
v5:
- replace all declare_dependency's with []
- add more conf.get guards around optional components
v6:
- drop -pipe, -Wall which are the default in meson
- use compiler.has_function() and compiler.has_header_symbol instead of the
hand-rolled checks.
- fix duplication in 'liblibsystemd' library name
- use the right .sym file for pam_systemd
- rename 'compiler' to 'cc': shorter, and more idiomatic.
v7:
- use ENABLE_ENVIRONMENT_D not HAVE_ENVIRONMENT_D
- rename prefix to prefixdir, rootprefix to rootprefixdir
("prefix" is too common of a name and too easy to overwrite by mistake)
- wrap more stuff with conf.get('ENABLE...') == 1
- use rootprefix=='/' and rootbindir as install_dir, to fix paths under
split-usr==true.
v8:
- use .split() also for src/coredump. Now everything is consistent ;)
- add rootlibdir option and use it on the libraries that require it
v9:
- indentation
v10:
- fix check for qrencode and libaudit
v11:
- unify handling of executable paths, provide options for all progs
This makes the meson build behave slightly differently than the
autoconf-based one, because we always first try to find the executable in the
filesystem, and fall back to the default. I think different handling of
loadkeys, setfont, and telinit was just a historical accident.
In addition to checking in $PATH, also check /usr/sbin/, /sbin for programs.
In Fedora $PATH includes /usr/sbin, (and /sbin is is a symlink to /usr/sbin),
but in Debian, those directories are not included in the path.
C.f. https://github.com/mesonbuild/meson/issues/1576.
- call all the options 'xxx-path' for clarity.
- sort man/rules/meson.build properly so it's stable
This was exposed by the previous commit. This could be potentially
unpleasant, but we are saved by the fact that this code path was only
taken for journald crashes, where we control COMM and know that it doesn't
contain any special characters. Use log_dispatch which does not do any
format processing to push the message out.
I think it would be a good idea to move such fixed, picked values out of
the main sources into the head of a file, to make sure they are
ultimately tunables.
We check these a number of times, hence let's unify these checks here.
This also allows us to make the PID 1 check more elaborate as we can
check both the PID and the cgroup. Checking the PID has the benefit that
we'll also cover cases where PID 1 might still be in the root cgroup, and
the cgroup check has the benefit that we also cover crashes in forked
off crasher processes (the way we actually do it in systemd)
Given that this is a field primarily processed by computers, and not so
much by humans, assign "1" instead of "yes". Also, use parse_boolean()
as we usually do for parsing it again.
This makes things more alike udev options (as one example), such as
SYSTEMD_READY where we also spit out "1" and "0", and parse with
parse_boolean().
[guest@fedora ~]$ coredumpctl
No coredumps found.
[guest@fedora ~]$ ./coredumpctl
Hint: You are currently not seeing messages from other users and the system.
Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages.
Pass -q to turn off this notice.
No coredumps found.
Fixes#1733.
We would only log a terse message when pid1 or systemd-journald crashed.
It seems better to reuse the normal code paths as much as possible,
with the following differences:
- if pid1 crashes, we cannot launch the helper, so we don't analyze the
coredump, just write it to file directly from the helper invoked by the
kernel;
- if journald crashes, we can produce the backtrace, but we don't log full
structured messages.
With comparison to previous code, advantages are:
- we go through most of the steps, so for example vacuuming is performed,
- we gather and log more data. In particular for journald and pid1 crashes we
generate a backtrace, and for pid1 crashes we record the metadata (fdinfo,
maps, etc.),
- coredumpctl shows pid1 crashes.
A disavantage (inefficiency) is that we gather metadata for journald crashes
which is then ignored because _TRANSPORT=kernel does not support structued
messages.
Messages for the systemd-journald "crash" have _TRANSPORT=kernel, and
_TRANSPORT=journal for the pid1 "crash".
Feb 26 16:27:55 systemd[1]: systemd-journald.service: Main process exited, code=dumped, status=11/SEGV
Feb 26 16:27:55 systemd[1]: systemd-journald.service: Unit entered failed state.
Feb 26 16:37:54 systemd-coredump[18801]: Process 18729 (systemd-journal) of user 0 dumped core.
Feb 26 16:37:54 systemd-coredump[18801]: Coredump diverted to /var/lib/systemd/coredump/core.systemd-journal.0.36c14bf3c6ce4c38914f441038990979.18729.1488145074000000.lz4
Feb 26 16:37:54 systemd-coredump[18801]: Stack trace of thread 18729:
Feb 26 16:37:54 systemd-coredump[18801]: #0 0x00007f46d6a06b8d fsync (libpthread.so.0)
Feb 26 16:37:54 systemd-coredump[18801]: #1 0x00007f46d71bfc47 journal_file_set_online (libsystemd-shared-233.so)
Feb 26 16:37:54 systemd-coredump[18801]: #2 0x00007f46d71c1c31 journal_file_append_object (libsystemd-shared-233.so)
Feb 26 16:37:54 systemd-coredump[18801]: #3 0x00007f46d71c3405 journal_file_append_data (libsystemd-shared-233.so)
Feb 26 16:37:54 systemd-coredump[18801]: #4 0x00007f46d71c4b7c journal_file_append_entry (libsystemd-shared-233.so)
Feb 26 16:37:54 systemd-coredump[18801]: #5 0x00005577688cf056 write_to_journal (systemd-journald)
Feb 26 16:37:54 systemd-coredump[18801]: #6 0x00005577688d2e98 dispatch_message_real (systemd-journald)
Feb 26 16:37:54 kernel: systemd-coredum: 9 output lines suppressed due to ratelimiting
Feb 26 16:37:54 systemd-journald[18810]: Journal started
Feb 26 16:50:59 systemd-coredump[19229]: Due to PID 1 having crashed coredump collection will now be turned off.
Feb 26 16:51:00 systemd[1]: Caught <SEGV>, dumped core as pid 19228.
Feb 26 16:51:00 systemd[1]: Freezing execution.
Feb 26 16:51:00 systemd-coredump[19229]: Process 19228 (systemd) of user 0 dumped core.
Stack trace of thread 19228:
#0 0x00007fab82075c47 kill (libc.so.6)
#1 0x000055fdf7c38b6b crash (systemd)
#2 0x00007fab824175c0 __restore_rt (libpthread.so.0)
#3 0x00007fab82148573 epoll_wait (libc.so.6)
#4 0x00007fab8366f84a sd_event_wait (libsystemd-shared-233.so)
#5 0x00007fab836701de sd_event_run (libsystemd-shared-233.so)
#6 0x000055fdf7c4a380 manager_loop (systemd)
#7 0x000055fdf7c402c2 main (systemd)
#8 0x00007fab82060401 __libc_start_main (libc.so.6)
#9 0x000055fdf7c3818a _start (systemd)
Poor machine ;)
Unit systemd-coredump@1-3854-0.service is failed/failed, not counting it.
TIME PID UID GID SIG COREFILE EXE
Fri 2017-02-24 11:11:00 EST 10002 1000 1000 6 none /home/zbyszek/src/systemd-work/.libs/lt-Sat 2017-02-25 00:49:32 EST 26921 0 0 11 error /usr/libexec/fprintd
Sat 2017-02-25 11:56:30 EST 30703 1000 1000 - - /usr/bin/python3.5
Sat 2017-02-25 13:16:54 EST 3275 1000 1000 11 present /usr/bin/bash
Sat 2017-02-25 17:25:40 EST 4049 1000 1000 11 truncated /usr/bin/bash
For info and gdb output, the filename is marked in red and "(truncated)" is
appended. (Red is necessary because the annotation is hard to see when running
under a pager.)
Fixed#3883.
A few times I have seen the hint unexpectedly. Add this so debug info
so it's easier to see what's happening.
...
Unit systemd-coredump@0-3119-0.service is failed/failed, not counting it.
Unit systemd-coredump@1-3854-0.service is activating/start-pre, counting it.
...
-- Notice: 1 systemd-coredump@.service unit is running, output may be incomplete.
We logged about this, but did not attach information directly to the log
entry. It *would* be nice to log the full untruncated size, but afaict, to do
this, we would have to read the full data from the kernel. Doing this just to
log that information seems a bit excessive, in particular when the limit could
be set quite low. So for now let's just add a boolean field.
Most of the fields in the context array come from the kernel (passed
through argv), but two are special: comm and exe. We allocate them
ourselves. We forgot to initialize context[CONTEXT_COMM] with the value
we allocated (introduced in 9aa8202314).
To simplify things, just set context[CONTEXT_COMM] and context[CONTEXT_EXE],
and free those two fields at the end.
Fixes#5442.
Implement --since/--until (-S/-U) in the same fashion as journalctl.
This lets the user filter the results a bit so it would be easier to
find relevant info in case there were many core dumps.
Our coredump handler operates on a "context" supplied by the kernel via
the core_pattern arguments. When we pass off a coredump for processing
to coredumpd we pass along enough information for this context to be
reconstructed. This information is passed in the usual journal fields,
and that means we extended the 1s granularity timestamp to 1µs
granularity by appending 6 zeroes. We need to chop them off again when
reconstructing the original kernel context.
Fixes: #4779
(Note that we only do this for the journal metadata, not for the xattrs,
as the xattrs are only supposed to store the original 1:1 info we
acquired from the kernel.)
This adds a unified "copy_flags" parameter to all copy_xyz() function
calls, replacing the various boolean flags so far used. This should make
many invocations more readable as it is clear what behaviour is
precisely requested. This also prepares ground for adding support for
more modes later on.
$ ./coredumpctl --no-pager -1
TIME PID UID GID SIG COREFILE EXE
Sun 2016-11-06 10:10:51 EST 29514 1002 1002 - - /usr/bin/python3.5
$ ./coredumpctl info 29514
PID: 29514 (python3)
UID: 1002 (zbyszek)
GID: 1002 (zbyszek)
Reason: ZeroDivisionError
Timestamp: Sun 2016-11-06 10:10:51 EST (3h 22min ago)
Command Line: python3 systemd_coredump_exception_handler.py
Executable: /usr/bin/python3.5
Control Group: /user.slice/user-1002.slice/user@1002.service/gnome-terminal-server.service
Unit: user@1002.service
User Unit: gnome-terminal-server.service
Slice: user-1002.slice
Owner UID: 1002 (zbyszek)
Boot ID: 1531fd22ec84429e85ae888b12fadb91
Machine ID: 519a16632fbd4c71966ce9305b360c9c
Hostname: laptop
Storage: none
Message: Process 29514 (systemd_coredump_exception_handler.py) of user zbyszek failed with ZeroDivisionError: division by
Traceback (most recent call last):
File "systemd_coredump_exception_handler.py", line 134, in <module>
g()
File "systemd_coredump_exception_handler.py", line 133, in g
f()
File "systemd_coredump_exception_handler.py", line 131, in f
div0 = 1 / 0
ZeroDivisionError: division by zero
Local variables in innermost frame:
a=3
h=<function f at 0x7efdc14b6ea0>
Embedding sd_id128_t's in constant strings was rather cumbersome. We had
SD_ID128_CONST_STR which returned a const char[], but it had two problems:
- it wasn't possible to statically concatanate this array with a normal string
- gcc wasn't really able to optimize this, and generated code to perform the
"conversion" at runtime.
Because of this, even our own code in coredumpctl wasn't using
SD_ID128_CONST_STR.
Add a new macro to generate a constant string: SD_ID128_MAKE_STR.
It is not as elegant as SD_ID128_CONST_STR, because it requires a repetition
of the numbers, but in practice it is more convenient to use, and allows gcc
to generate smarter code:
$ size .libs/systemd{,-logind,-journald}{.old,}
text data bss dec hex filename
1265204 149564 4808 1419576 15a938 .libs/systemd.old
1260268 149564 4808 1414640 1595f0 .libs/systemd
246805 13852 209 260866 3fb02 .libs/systemd-logind.old
240973 13852 209 255034 3e43a .libs/systemd-logind
146839 4984 34 151857 25131 .libs/systemd-journald.old
146391 4984 34 151409 24f71 .libs/systemd-journald
It is also much easier to check if a certain binary uses a certain MESSAGE_ID:
$ strings .libs/systemd.old|grep MESSAGE_ID
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
$ strings .libs/systemd|grep MESSAGE_ID
MESSAGE_ID=c7a787079b354eaaa9e77b371893cd27
MESSAGE_ID=b07a249cd024414a82dd00cd181378ff
MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7
MESSAGE_ID=de5b426a63be47a7b6ac3eaac82e2f6f
MESSAGE_ID=d34d037fff1847e6ae669a370e694725
MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5
MESSAGE_ID=1dee0369c7fc4736b7099b38ecb46ee7
MESSAGE_ID=39f53479d3a045ac8e11786248231fbf
MESSAGE_ID=be02cf6855d2428ba40df7e9d022f03d
MESSAGE_ID=7b05ebc668384222baa8881179cfda54
MESSAGE_ID=9d1aaa27d60140bd96365438aad20286
The entry must be a single entry in the journal export format, including the
terminating double newline. The MESSAGE field is now generated on the sender
side.
The advantage is that the reporter can easily pass additional metadata.
Continuing with the example of the python excepthook:
COREDUMP_PYTHON_EXECUTABLE=/usr/bin/python3
COREDUMP_PYTHON_VERSION=3.5.2 (default, Sep 14 2016, 11:28:32)
[GCC 6.2.1 20160901 (Red Hat 6.2.1-1)]
COREDUMP_PYTHON_THREAD_INFO=sys.thread_info(name='pthread', lock='semaphore', version='NPTL 2.24')
COREDUMP_PYTHON_EXCEPTION_TYPE=ZeroDivisionError
COREDUMP_PYTHON_EXCEPTION_VALUE=division by zero
MESSAGE=Process 29514 (systemd_coredump_exception_handler.py) of user zbyszek failed with ZeroDivisionError: division by zero
Traceback (most recent call last):
File "systemd_coredump_exception_handler.py", line 134, in <module>
g()
File "systemd_coredump_exception_handler.py", line 133, in g
f()
File "systemd_coredump_exception_handler.py", line 131, in f
div0 = 1 / 0
ZeroDivisionError: division by zero
Local variables in innermost frame:
a=3
h=<function f at 0x7efdc14b6ea0>
One consideration is whether to use the Journal Export Format, or send packets
over a UNIX socket instead. The advantage of current solution is that although
parsing is more complicated on the receiver side, it is much easier to use on the
sender side. I hope this can be used by various languages for which writing
binary structures to a UNIX socket is harder and more likely to be done wrong
than piping of a simple textyish format.
In preparation for subsequenct changes...
Various stack allocations are changed to use the heap. This might be minimally
slower, but probably doesn't matter. The upside is that we will now properly
free all memory that is allocated.
Even if pressing Ctrl-c after spawning gdb with "coredumpctl gdb" is not really
useful, we should let gdb handle the signal entirely otherwise the user can be
suprised to see a different behavior when gdb is started by coredumpctl vs when
it's started directly.
Indeed in the former case, gdb exits due to coredumpctl being killed by the
signal.
So this patch makes coredumpctl ignore SIGINT as long as gdb is running.
For normal arches this doesn't matter, but on arm32 arg_journal_size_max was smaller
than the other *SizeMax variables. This doesn't seem useful.
This is anothet part of the fix in 5206a724a0.
In file included from ./src/basic/macro.h:415:0,
from ./src/shared/acl-util.h:28,
from src/coredump/coredump.c:36:
src/coredump/coredump.c: In function ‘submit_coredump’:
src/coredump/coredump.c:711:26: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 7 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:183:28: note: in expansion of macro ‘log_full’
#define log_info(...) log_full(LOG_INFO, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:711:17: note: in expansion of macro ‘log_info’
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^~~~~~~~
src/coredump/coredump.c:711:26: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 8 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:183:28: note: in expansion of macro ‘log_full’
#define log_info(...) log_full(LOG_INFO, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:711:17: note: in expansion of macro ‘log_info’
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^~~~~~~~
src/coredump/coredump.c:741:27: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 7 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_debug("Not generating stack trace: core size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:182:28: note: in expansion of macro ‘log_full’
#define log_debug(...) log_full(LOG_DEBUG, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:741:17: note: in expansion of macro ‘log_debug’
log_debug("Not generating stack trace: core size %zu is greater than %zu (the configured maximum)",
^~~~~~~~~
src/coredump/coredump.c:741:27: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 8 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_debug("Not generating stack trace: core size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:182:28: note: in expansion of macro ‘log_full’
#define log_debug(...) log_full(LOG_DEBUG, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:741:17: note: in expansion of macro ‘log_debug’
log_debug("Not generating stack trace: core size %zu is greater than %zu (the configured maximum)",
^~~~~~~~~
src/coredump/coredump.c:768:34: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 7 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:183:28: note: in expansion of macro ‘log_full’
#define log_info(...) log_full(LOG_INFO, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:768:25: note: in expansion of macro ‘log_info’
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^~~~~~~~
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
coredump had code to check if copy_bytes() hit the max_bytes limit,
and refuse further processing in that case.
But in 84ee096044, the return convention for copy_bytes() was changed
from -EFBIG to 1 for the case when the limit is hit, so the condition
check in coredump couldn't ever trigger.
But it seems that *do* want to process such truncated cores [1].
So change the code to detect truncation properly, but instead of
returning an error, give a nice log entry.
[1] https://github.com/systemd/systemd/issues/3883#issuecomment-239106337
Should fix (or at least alleviate) #3883.
For the user, if the core file is missing or inaccessible, it is
more interesting that the fact that they forgot to pipe to a file.
So delay the failure from the check until after we have verified
that the file or the COREDUMP field are present.
Partially fixes#4161.
Also, error reporting on failure was duplicated. save_core() now
always prints an error message (because it knows the paths involved,
so can the most useful message), and the callers don't have to.
Propagate errors properly, so that if we hit oom or an error in the
journal, the whole command will fail. This is important when using
the output in scripts.
Support the output of multiple values for the same field with -F.
The journal supports that, and our official commands should too, as
far as it makes sense. -F can be used to print user-defined fields
(e.g. somebody could use a TAG field with multiple occurences), so
we should support that too. That seems better than silently printing
the last value found as was done before.
We would iterate trying to match the same field with all possible
field names. Once we find something, cut the loop short, since we
know that nothing else can match.
The column for "present" was easy to miss, especially if somebody had no
coredumps present at all, in which case the column of spaces of width one
wasn't visually distinguished from the neighbouring columns. Replace this
with an explicit text, one of: "missing", "journal", "present", "error".
$ coredumpctl
TIME PID UID GID SIG COREFILE EXE
Mon 2016-09-26 22:46:31 CEST 8623 0 0 11 missing /usr/bin/bash
Mon 2016-09-26 22:46:35 CEST 8639 1001 1001 11 missing /usr/bin/bash
Tue 2016-09-27 01:10:46 CEST 16110 1001 1001 11 journal /usr/bin/bash
Tue 2016-09-27 01:13:20 CEST 16290 1001 1001 11 journal /usr/bin/bash
Tue 2016-09-27 01:33:48 CEST 17867 1001 1001 11 present /usr/bin/bash
Tue 2016-09-27 01:37:55 CEST 18549 0 0 11 error /usr/bin/bash
Also, use access(…, R_OK), so that we can report a present but inaccessible
file different than a missing one.
In 'list', show present also for coredumps stored in the journal.
In 'status', replace "File" with "Storage" line that is always present.
Possible values:
Storage: none
Storage: journal
Storage: /path/to/file (inacessible)
Storage: /path/to/file
Previously the File field be only present if the file was accessible, so users
had to manually extract the file name precisely in the cases where it was
needed, i.e. when coredumpctl couldn't access the file. It's much more friendly
to always show something. This output is designed for human consumption, so
it's better to be a bit verbose.
The call to sd_j_set_data_threshold is moved, so that status is always printed
with the default of 64k, list uses 4k, and coredump retrieval is done with the
limit unset. This should make checking for the presence of the COREDUMP field
not too costly.
Added in 9fe13294a9 (by me :[```), and later obfuscated in d0c8806d4a, if an
uncompressed external file or an internally stored coredump was supposed to be
written to a file descriptor, nothing would be written.
Back when external storage was initially added in 34c10968cb, this mode of
storage was added. This could have made some sense back when XZ compression was
used, and an uncompressed core on disk could be used as short-lived cache file
which does require costly decompression. But now fast LZ4 compression is used
(by default) both internally and externally, so we have duplicated storage,
using the same compression and same default maximum core size in both cases,
but with different expiration lifetimes. Even the uncompressed-external,
compressed-internal mode is not very useful: for small files, decompression
with LZ4 is fast enough not to matter, and for large files, decompression is
still relatively fast, but the disk-usage penalty is very big.
An additional problem with the two modes of storage is that it complicates
the code and makes it much harder to return a useful error message to the user
if we cannot find the core file, since if we cannot find the file we have to
check the internal storage first.
This patch drops "both" storage mode. Effectively this means that if somebody
configured coredump this way, they will get a warning about an unsupported
value for Storage, and the default of "external" will be used.
I'm pretty sure that this mode is very rarely used anyway.
If ulimit is smaller than page_size(), function save_external_coredump()
returns -EBADSLT and this causes skipping whole core dumping part in
submit_coredump(). Initializing coredump_size to UINT64_MAX prevents
evaluating a condition with uninitialized varialbe which leads to
calling allocate_journal_field() with coredump_fd = -1 which causes
aborting.
Signed-off-by: Matej Habrnal <mhabrnal@redhat.com>
According to its manual page, flags given to mkostemp(3) shouldn't include
O_RDWR, O_CREAT or O_EXCL flags as these are always included. Beyond
those, the only flag that all callers (except a few tests where it
probably doesn't matter) use is O_CLOEXEC, so set that unconditionally.
The kernel treats values below a certain threshold (minfmt->min_coredump
which is initialized do ELF_EXEC_PAGESIZE, which varies between architectures,
but is usually the same as PAGE_SIZE) as disabling coredumps [1].
Any core image below ELF_EXEC_PAGESIZE will yield an invalid backtrace anyway [2],
so follow the kernel and not try to parse or store such images.
[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/coredump.c#n660
[2] systemd-coredump[16260]: Process 16258 (sleep) of user 1002 dumped core.
Stack trace of thread 16258:
#0 0x00007f1d8b3d3810 n/a (n/a)
https://bugzilla.redhat.com/show_bug.cgi?id=1309172#c19
Beef up the existing var_tmp() call, rename it to var_tmp_dir() and add a
matching tmp_dir() call (the former looks for the place for /var/tmp, the
latter for /tmp).
Both calls check $TMPDIR, $TEMP, $TMP, following the algorithm Python3 uses.
All dirs are validated before use. secure_getenv() is used in order to limite
exposure in suid binaries.
This also ports a couple of users over to these new APIs.
The var_tmp() return parameter is changed from an allocated buffer the caller
will own to a const string either pointing into environ[], or into a static
const buffer. Given that environ[] is mostly considered constant (and this is
exposed in the very well-known getenv() call), this should be OK behaviour and
allows us to avoid memory allocations in most cases.
Note that $TMPDIR and friends override both /var/tmp and /tmp usage if set.
Process container parent is the process used to start processes with a new
user namespace - e.g systemd-nspawn, runc, lxc, etc.
There is not standard way how to find such a process - or I do not know
about it - hence I have decided to find the first process in the parent
process hierarchy with a different mount namespace and different
/proc/self/root's inode.
I have decided for this criteria because in ABRT we take special care
only if the crashed process runs different code than installed on the
host. Other processes with namespaces different than PID 1's namespaces
are just processes running code shipped by the OS vendor and bug
reporting tools can get information about the provider of the code
without the need to deal with changed root and so on.
The macro determines the right length of a AF_UNIX "struct sockaddr_un" to pass to
connect() or bind(). It automatically figures out if the socket refers to an
abstract namespace socket, or a socket in the file system, and properly handles
the full length of the path field.
This macro is not only safer, but also simpler to use, than the usual
offsetof() + strlen() logic.
This moves the O_TMPFILE handling from the coredumping code into common library
code, and generalizes it as open_tmpfile_linkable() + link_tmpfile(). The
existing open_tmpfile() function (which creates an unlinked temporary file that
cannot be linked into the fs) is renamed to open_tmpfile_unlinkable(), to make
the distinction clear. Thus, code may now choose between:
a) open_tmpfile_linkable() + link_tmpfile()
b) open_tmpfile_unlinkable()
Depending on whether they want a file that may be linked back into the fs later
on or not.
In a later commit we should probably convert fopen_temporary() to make use of
open_tmpfile_linkable().
Followup for: #3065
Don't leave temporary files if the coredump service is aborted during
the operation
Yeah, these are temporary files that systemd-coredump needs while
processing the coredumps. Of course, if the coredump service is aborted
during the operation we better shouldn't leave those files around. This
is hence a bug to fix in our coredumping code.
See https://github.com/systemd/systemd/issues/2804#issuecomment-210578147
Another option is to simply use O_TMPFILE, and when it is not available
fall back to the current behaviour. After all, the files are cleaned up
eventually, through normal tmpfiles aging, and the offending file
systems are pretty exotic these days, or not in the upstream kernel.
See https://github.com/systemd/systemd/issues/2804#issuecomment-211496707
Many subsystems define own pager_open_if_enabled() function which
checks '--no-pager' command line argument and open pager depends
on its value. All implementations of pager_open_if_enabled() are
the same. Let's merger this function with pager_open() from the
shared/pager.c and remove pager_open_if_enabled() from all subsytems
to prevent code duplication.
Throughout the tree there's spurious use of spaces separating ++ and --
operators from their respective operands. Make ++ and -- operator
consistent with the majority of existing uses; discard the spaces.
Let's add an extra-safety net and change UID/GID to the "systemd-coredump" user when processing coredumps from system
user. For coredumps of normal users we keep the current logic of processing the coredumps from the user id the coredump
was created under.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=87354
With this change processing/saving of coredumps takes the RLIMIT_CORE resource limit of the crashing process into
account, given the user control whether specific processes shall core dump or not, and how large to make the core dump.
Note that this effectively disables core-dumping for now, as RLIMIT_CORE defaults to 0 (i.e. is disabled) for all
system processes.
This reworks the coredumping logic so that the coredump handler invoked from the kernel only collects runtime data
about the crashed process, and then submits it for processing to a socket-activate coredump service, which extracts a
stacktrace and writes the coredump to disk.
This has a number of benefits: the disk IO and stack trace generation may take a substantial amount of resources, and
hence should better be managed by PID 1, so that resource management applies. This patch uses RuntimeMaxSec=, Nice=, OOMScoreAdjust=
and various sandboxing settings to ensure that the coredump handler doesn't take away unbounded resources from normally
priorized processes.
This logic is also nice since this makes sure the coredump processing and storage is delayed correctly until
/var/systemd/coredump is mounted and writable.
Fixes: #2286