IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Switching to K_UNICODE from other than L_XLATE can make the keyboard
unusable and possibly leak keypresses from X.
BugLink: https://launchpad.net/bugs/1803993
When we read from keyring, a temporary buffer is allocated in order to
determine the size needed for the entire data. However, when zeroing that area,
we use the data size returned by the read instead of the lesser size allocate
for the buffer.
That will cause memory corruption that causes systemd-cryptsetup to crash
either when a single large password is used or when multiple passwords have
already been pushed to the keyring.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
We disabled kernel RA support. Then, we should not send
IFLA_INET6_TOKEN.
Thus, we do not need to send IFLA_INET6_ADDR_GEN_MODE twice.
Follow-up for 0e2fdb83bb and
4eb086a387.
```
May 10 11:08:54 test systemd-networkd[447]: wwan0: Failed to stop LLDP: Success
May 10 11:08:54 test systemd-networkd[447]: wwan0: Gained carrier
May 10 11:08:54 test systemd-networkd[447]: wwan0: Failed to start LLDP: Success
```
An ugly, ugly work-around for #11810. And no, we shouldn't have to do
this. This is something for AMD, the firmware or the kernel to
fix/work-around, not us. But nonetheless, this should do it for now.
Fixes: #11810
We don#t actually use all combinations, but it's kinda nice if we can
output all combinations in a test, and this is preparation so that we
can do so nicely.
three of these colors we never use, so let's drop them. Let's however
add a highlight version of grey, so that at least for all colors we do
define we have all possible styles defined.
Apparently all relevant terminals implement these sequences, including
the Linux kernel, rxvt, xterm, and of course gnome terminal. Hence it
should be OK to use them, and fixate the grey color in a way that maps
to the same color in all terminals.
Ideally we'd stick to the more symbolic colors that allow terminal
emulators to implement color styles, but this apparently doesn#t work,
since rxvt maps grey to something unreadable by default.
Note that this change has negative effects besides the non-themability
of the palette: the midrange grey this uses maps to regular white on the
linux console. However, that's probably not too bad: allowing things to
be unreadable on some terminals is probably worse than showing no color
on some terminals.
Fixes: #12482
The comment in udev-builtin-net_id.c (removed in grandparent commit) showed the
property without the prefix. I assume that was always the intent, because it
doesn't make much sense to concatenate anything to an arbitrary user-specified
field.
I decided to make this a separate man page because it is freakin' long.
This content could equally well go in systemd-udevd.service(8), systemd.link(5),
or a new man page for the net_id builtin.
v2:
- rename to systemd.net-naming-scheme
- add udevadm test-builtin net_id example
The latter is identical to the former, but becomes a NOP if
/var/log/journal is on the same mount as /, and thus during shutdown
unmounting /var is not necessary and hence we can keep logging until the
very end.
This adds a minimal Varlink (https://varlink.org/) implementation to our
tree. Given that we already have a JSON logic it's an easy thing to add.
Why bother?
We currently have major problems with IPC before dbus-daemon is up, and
in all components that dbus-daemon itself makes use of (such as various
NSS modules to resolve users as well as the journal which dbus-daemon
logs to). Because of that we so far ended up creating various (usually
crappy) work-arounds either coming up with secondary IPC systems or
sharing data statelessly in /run or similar. Let's clean this up, and
instead use a clean, well-defined, broker-less IPC for cases like that.
This is a minimal implementation of Varlink, i.e. the most basic logic
only. Stuff that's missing is left out on purpose: there's no
introspection/validation and there's no name service. It might make
sense to add that later, but for now let's only do the minimum buy-in we
can get away with. In particular as I'd assume that at least initially
we only use this IPC for our internal communication avoiding
introspection and the name service should be fine.
Specifically, I'd expect that we add IPC interfaces to the following
concepts with this scheme:
1. nss-resolve (so that hostname lookups with resolved work before
resolved is up)
2. journald (so that IPC calls to journald don't have to go through
dbus-daemon thus creating a cyclic dependency between journald and
dbus-daemon)
3. nss-systemd (so that dynamic user lookups via PID 1 work sanely even
inside of dbus-daemon, because otherwise we'd want to use dbus to run
dbus which causes deadlocks)
4. networkd (to make sure one can talk to it in the initrd already,
long before dbus is around)
And there might be other cases similar to this.
Allow users to set the IPv4 DF bit in outgoing packets, or to inherit its
value from the IPv4 inner header. If the encapsulated protocol is IPv6 and
DF is configured to be inherited, always set it.
When VXLAN destination port is unset and GPE is set
then assign 4790 to destination port. Kernel does the same as
well as iproute.
IANA VXLAN-GPE port is 4790
Fillup IFLA_INET6_ADDR_GEN_MODE while we do link_up.
Fixes the following error:
```
dummy-test: Could not bring up interface: Invalid argument
```
After reading the kernel code when we do a link up
```
net/core/rtnetlink.c
IFLA_AF_SPEC
af_ops->set_link_af(dev, af);
inet6_set_link_af
if (tb[IFLA_INET6_ADDR_GEN_MODE])
Here it looks for IFLA_INET6_ADDR_GEN_MODE
```
Since link up we didn't filling up that it's failing.
Closes#12504.
Otherwise, the fuzzers will fail to compile with MSan:
```
../../src/systemd/src/basic/random-util.c:64:40: error: use of undeclared identifier 'sucess'; did you mean 'success'?
msan_unpoison(&success, sizeof(sucess));
^~~~~~
success
../../src/systemd/src/basic/alloc-util.h:169:50: note: expanded from macro 'msan_unpoison'
^
../../src/systemd/src/basic/random-util.c:38:17: note: 'success' declared here
uint8_t success;
^
1 error generated.
[80/545] Compiling C object 'src/basic/a6ba3eb@@basic@sta/process-util.c.o'.
ninja: build stopped: subcommand failed.
Fuzzers build failed
```
These make sense to be explicitly set at 0 (which has a different effect
than the default, since it can affect processing of `DefaultMemoryXXX`).
Without this, it's not easily possible to relinquish memory protection
for a subtree, which is not great.
(before)
$ build/systemctl list-machines
Need to be root.
$ sudo build/systemctl list-machines
NAME STATE FAILED JOBS
krowka (host) running 0 0
rawhide running 0 0
2 machines listed.
(after)
$ build/systemctl list-machines
NAME STATE FAILED JOBS
krowka (host) running 0 0
rawhide n/a 0 0
2 machines listed.
The output for non-root is missing some bits of information, but we display
what information is missing nicely, and e.g. in the case when no machines are
running at all, or only VMs are running, the unprivileged output would be the
same as the privileged one.
The reasoning is the same as in previous cases. We get an error like
"Failed to update EFI variable: Operation not permitted" anyway, so
the check is not very useful.
If we lack permissions, we will fail anyway. But by not doing the artifial
check, we get more information. For example, on my machine, I see
$ build/systemd-bless-boot good
Not booted with boot counting in effect.
while "Need to be root" is actually untrue, because being root doesn't change
the outcome in any way.
Letting the operation fail on the actual error makes it easier to do test runs:
we *know* the command will fail, but we want to see what the first step would
be.
Not doing the articial check makes it also easier to do create alternative
security arrangements, for example where the directories are mounted with
special ownership mode and an otherwise unprivileged user can perform certain
operations.
Checking if we are root on the client side is generally pointless, since the
privileged operation will fail anyway and we can than log what precisly went
wrong.
A check like this makes sense only if:
- we need to do some expensive unprivileged operation before attempting the
privileged operation, and the check allows us avoid wasting resources.
- the privileged operation would fail but in an unclear way.
Neither of those cases applies here.
This fixes calls like 'udevadm control -h' as unprivileged user.
In program output, highlighting warnings with ANSI_HIGHLIGHT is not enough,
because it doesn't stand out enough. Yellow is more appropriate.
I was worried that yellow wouldn't be visible on white background, but (at
least gnome-terminal) uses a fairly dark yellow that is fully legible on white
and light-colored backgrounds. We also used yellow in many places,
e.g. systemctl, so this should be fine.
Note: yellow is unreadable on urxvt with white background (urxvt +rv). But
grey, which we already used, is also unreadable, so urxvt users would have
to disable colors anyway, so this change does not make the problem
intrinsically worse. See
https://github.com/systemd/systemd/issues/12482#issuecomment-490374210.
When emitting the calendarspec warning we want to see some color.
Follow-up for 04220fda5c.
Exceptions:
- systemctl, because it has a lot hand-crafted coloring
- tmpfiles, sysusers, stdio-bridge, etc, because they are also used in
services and I'm not sure if this wouldn't mess up something.
Let's be a bit paranoid and hash the 16 bytes we get from getauxval()
before using them. AFter all they might be used by other stuff too (in
particular ASLR), and we probably shouldn't end up leaking that seed
though our crappy pseudo-random numbers.
The old flag name was a bit of a misnomer, as /dev/urandom cannot be
"drained". Once it's initialized it's initialized and then is good
forever. (Only /dev/random has a concept of 'draining', but we never use
that, as it's an obsolete interface).
The flag is still useful though, since it allows us to suppress accesses
to the random pool while it is not initialized, as that trips up the
kernel and it logs about any such attempts, which we really don't want.
gcc was warning about strncpy() leaving an unterminated string.
In this case, it was correct.
The code was doing strncpy()+strncat()+strlen() essentially to determine
if the strings have expected length. If the length was correct, a buffer
overread was performed (or at least some garbage bytes were used from the
uninitialized part of the buffer). Let's do the length check first and then
only copy stuff if everything agrees.
For some reason the function was called "prepend", when it obviously does
an "append".
Unfortunately the warning must be known, or otherwise the pragma generates a
warning or an error. So let's do a meson check for it.
Is it worth doing this to silence the warning? I think so, because apparently
the warning was already emitted by gcc-8.1, and with the recent push in gcc to
catch more such cases, we'll most likely only get more of those.
It makes more sense to call VXLAN ID as
1. the VXLAN Network Identifier (VNI) (or VXLAN Segment ID)
2. test-network: rename VXLAN Id to VNI
3. fuzzer: Add VXLAN VNI directive to fuzzer
When address is in IPv4, the remaining buffer in in_addr_union may
not be initialized.
Fixes the following valgrind warning:
```
==13169== Conditional jump or move depends on uninitialised value(s)
==13169== at 0x137FF6: UnknownInlinedFun (networkd-ndisc.c:77)
==13169== by 0x137FF6: UnknownInlinedFun (networkd-ndisc.c:580)
==13169== by 0x137FF6: ndisc_handler.lto_priv.83 (networkd-ndisc.c:597)
==13169== by 0x11BE23: UnknownInlinedFun (sd-ndisc.c:201)
==13169== by 0x11BE23: ndisc_recv.lto_priv.174 (sd-ndisc.c:254)
==13169== by 0x4AA18CF: source_dispatch (sd-event.c:2821)
==13169== by 0x4AA1BC2: sd_event_dispatch (sd-event.c:3234)
==13169== by 0x4AA1D88: sd_event_run (sd-event.c:3291)
==13169== by 0x4AA1FAB: sd_event_loop (sd-event.c:3313)
==13169== by 0x117401: UnknownInlinedFun (networkd.c:113)
==13169== by 0x117401: main (networkd.c:120)
==13169== Uninitialised value was created by a stack allocation
==13169== at 0x1753C8: manager_rtnl_process_address (networkd-manager.c:479)
```
The function sd_radv_add_prefix() in dhcp6_pd_prefix_assign() may
return -EEXIST, and in that case the sd_radv_prefix object allocated
in dhcp6_pd_prefix_assign() will be freed when the function returns.
Hence, the key value in Manager::dhcp6_prefixes hashmap is lost.
Because of this the fd is getting closed and we getting errors
like
```
^Ceno1: Could not send rtnetlink message: Bad file descriptor
enp7s0f0: Could not send rtnetlink message: Bad file descriptor
enp7s0f0: Cannot delete unreachable route for DHCPv6 delegated subnet 2a0a:...:fc::/62: Bad file descriptor
Assertion '*_head == _item' failed at ../systemd/src/network/networkd-route.c:126, function route_free(). Aborting.
Aborted
```
Closes one of https://github.com/systemd/systemd/issues/12452
The fact that strncpy does the truncation is the whole point here, and gcc
shouldn't warn about this. We can avoid the warning and simplify the
whole procedure by directly copying the interesting part.
Fixes#12454.
gcc was complaining that the link->ifname argument is NULL. Adding
assert(link->ifname) right before the call has no effect. It seems that
gcc is confused by the fact that log_link_warning_errno() internally
calls log_object(), with link->ifname passed as the object. log_object()
is also a macro and is does a check whether the passed object is NULL.
So we have a check if something is NULL right next an unconditional use
of it where it cannot be NULL. I think it's a bug in gcc.
Anyway, we don't need to use link->ifname here. log_object() already prepends
the object name to the message.
We not stopping the clients when networkd stops. They
should shut down cleanly and then we need to clean the DS.
One of requirements to implement
https://github.com/systemd/systemd/issues/10820.
```
^CBus bus-api-network: changing state RUNNING → CLOSED
DHCP SERVER: UNREF
DHCP SERVER: STOPPED
DHCP CLIENT (0x60943df0): STOPPED
veth-test: DHCP lease lost
veth-test: Removing address 192.168.5.31
NDISC: Stopping IPv6 Router Solicitation client
DHCP CLIENT (0x0): FREE
==24308==
==24308== HEAP SUMMARY:
==24308== in use at exit: 8,192 bytes in 2 blocks
==24308== total heap usage: 4,230 allocs, 4,228 frees, 1,209,732 bytes allocated
==24308==
==24308== LEAK SUMMARY:
==24308== definitely lost: 0 bytes in 0 blocks
==24308== indirectly lost: 0 bytes in 0 blocks
==24308== possibly lost: 0 bytes in 0 blocks
==24308== still reachable: 8,192 bytes in 2 blocks
==24308== suppressed: 0 bytes in 0 blocks
==24308== Rerun with --leak-check=full to see details of leaked memory
==24308==
==24308== For lists of detected and suppressed errors, rerun with: -s
==24308== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==24308== could not unlink /tmp/vgdb-pipe-from-vgdb-to-24308-by-sus-on-Zeus
==24308== could not unlink /tmp/vgdb-pipe-to-vgdb-from-24308-by-sus-on-Zeus
==24308== could not unlink /tmp/vgdb-pipe-shared-mem-vgdb-24308-by-sus-on-Zeus
```
Lookup of a non-existing user using getpwnam() is not considered
an error, thus the `errno` is not set appropriately, causing
unexpected fails on systems, where 'nobody' user doesn't exist by
default
When the .automount unit file already existed for any reason in the
`normal-dir` passed to `systemd-fstab-generator`, but the normal .mount unit
file did not, `f` was closed (but _not_ set to NULL). The call to
`generator_open_unit_file(..., automount_name, &f)` then failed because the
.mount unit file already existed. Now `f` did not point to an open FILE and the
later cleanup from the `_cleanup_fclose_` attribute failed with a double free.
Reset `f` to NULL before reusing it.
This is another attempt at d4b604baea and #12438
Instead of blindly using the extra allocated space, let's do so only
after telling libc about it, via a second realloc(). The second
realloc() should be quick, since it never has to copy memory around.
Since nspawn-settings.h includes seccomp.h, any file that includes
nspawn-settings.h should depend on libseccomp so the correct header path where
seccomp.h lives is added to the header search paths.
It's especially important for distros such as openSUSE where seccomp.h is not
shipped in /usr/include but /usr/include/libseccomp.
This patch is similar to 8238423095.
chown() might drop the suid/sgid bit from files. hence let's chmod()
after chown().
But also, let's tighten the transition a bit: before issuing chown()
let's set the file mask to the minimum of the old and new access
bitmask, so that at no point in time additional privs are available on
the file with a non-matching ownership.
Fixes: #12354
This reverts commit d4b604baea.
When realloc() is called, the extra memory between the originally
requested size and the end of malloc_usable_size() isn't copied. (at
least with the version of glibc that currently ships on Arch Linux)
As a result, some elements get lost and use uninitialized memory, most
commonly 0, and can lead to crashes.
fixes#12384
1. When the DHCPv4 lease expires kernel removes the route. So add it back
when we gain lease again.
Closes https://github.com/systemd/systemd/issues/12426
2. When UseRoutes=false do not remove router
If a container manager does not set $container, we could end up
in a strange situation when detect-virt returns container-other when
run as non-pid-1 and none when run as pid-1.
The link may not have corresponding .network file.
Note that in that case, link_ipv4ll_enabled() and link_dhcp4_enabled()
returns false. So, it is safe to drop the assertion.
Fixes#12422.
When booting with "udev.log-priority=debug" for example, the output might be
spammed with messages like this:
systemd-udevd[23545]: maximum number (248) of children reached
systemd-udevd[23545]: maximum number (248) of children reached
systemd-udevd[23545]: maximum number (248) of children reached
systemd-udevd[23545]: maximum number (248) of children reached
systemd-udevd[23545]: maximum number (248) of children reached
systemd-udevd[23545]: maximum number (248) of children reached
systemd-udevd[23545]: maximum number (248) of children reached
While the message itself is useful, printing it per batch of events should be
enough.
When we determine that a calendar expression cannot elapse anymore,
print a warning but proceed regardless like we normally would.
Quite possibly a remote system has a different understanding of time
(timezone, system clock) than we have, hence we really shouldn't change
behaviour here client side, but log at best, and then leave the decision
what to do to the server side.
Follow-up for #12299
Coverity doesn't like the fact that unit_get_cgroup_context() returns NULL for
unit types that don't have a CGroupContext. We don't expect to call those
functions with such unit types, so this isn't an immediate problem, but we can
make things more robust by handling this case.
CID #1400683, #1400684.
- bridge or bonding master takes a reference of slave links.
- drop link from bridge or bonding master's slave list when slave link
is removed.
- change type of Link::slaves to Set*,
Fixes#12315.
When a uevent is received during the relevant interface is in
LINK_STATE_PENDING, then the interface may be initialized twice.
To prevent that, this introduces LINK_STATE_INITIALIZED.
A service might be able to detect errors by itself that may require the
system to take the same action as if the service locked up. Add a
WATCHDOG=trigger state change notification to sd_notify() to let the
service manager know about the self-detected misery and instantly
trigger the configured watchdog behaviour.
This wraps a few common steps. It is defined as inline function instead of in a
.c file to avoid having a .c file. With a .c file, we would have three choices:
- either link it into libshared, but then then libshared would have to be
linked to libmount.
- or compile the .c file into each target separately. This has the disdvantage
that configuration of every target has to be updated and stuff will be compiled
multiple times anyway, which is not too different from keeping this in the
header file.
- or create a new convenience library just for this. This also has the disadvantage
that the every target would have to be updated, and a separate library for a
10 line function seems overkill.
By keeping everything in a header file, we compile this a few times, but
otherwise it's the least painful option. The compiler can optimize most of the
function away, because it knows if 'source' is set or not.
Same motivation as in other places: let's use a single logic to parse this.
Use path_equal() to compare the path.
A bug in error handling is fixed: if we failed after the GREEDY_REALLOC but
before the line that sets the last item to NULL, we would jump to
_cleanup_strv_free_ with the strv unterminated. Let's use GREEDY_REALLOC0
to avoid the issue.
It seems better to use just a single parsing algorithm for /proc/self/mountinfo.
Also, unify the naming of variables in all places that use mnt_table_next_fs().
It makes it easier to compare the different call sites.
This wraps the call to org.freedesktop.DBus.Introspectable.Introspect.
Using "busctl call" directly is inconvenient because busctl escapes the
string before printing.
Example:
$ busctl introspect --xml org.freedesktop.systemd1 /org/freedesktop/systemd1 | pygmentize -lxml | less -RF
test-bus-introspect is also applied to the tables from test-bus-vtable.c.
test-bus-vtable.c is also used as C++ sources to produce test-bus-vtable-cc,
and our hashmap headers are not C++ compatible. So let's do the introspection
part only in the C version.
In order to properly and predictably name netdevsim netdevices,
introduce a separate implementation, as the netdevices reside on a
specific netdevsim bus. Note that this applies only to netdevsim devices
created using sysfs, because those expose phys_port_name attribute.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Just moving code around, in preparation to allow the content creation
part to be used in other places.
On the surface of things, introspect_path() should be in bus-introspect.c, but
introspect_path() uses many static helper functions in bus-objects.c, so moving
it would require all of them to be exposed, which is too much trouble.
test-bus-introspect is updated to actually write the closing bracket.
We would check the size of sd_bus_vtable entries, requring one of the two known
sizes. But we should be able to extend the structure in the future, by adding
new fields, without breaking backwards compatiblity.
Incidentally, this check was what caused -EINVAL failures before, when programs
were compiled with systemd-242 and run with older libsystemd.
In 856ad2a86b sd_bus_add_object_vtable() and
sd_bus_add_fallback_vtable() were changed to take an updated sd_bus_vtable[]
array with additional 'features' and 'names' fields in the union.
The commit tried to check whether the old or the new table format is used, by
looking at the vtable[0].x.start.element_size field, on the assumption that the
added fields caused the structure size to grow. Unfortunately, this assumption
was false, and on arm32 (at least), the structure size is unchanged.
In libsystemd we use symbol versioning and a major.minor.patch semantic
versioning of the library name (major equals the number in the so-name). When
systemd-242 was released, the minor number was (correctly) bumped, but this is
not enough, because no new symbols were added or symbol versions changed. This
means that programs compiled with the new systemd headers and library could be
successfully linked to older versions of the library. For example rpm only
looks at the so-name and the list of versioned symbols, completely ignoring the
major.minor numbers in the library name. But the older library does not
understand the new vtable format, and would return -EINVAL after failing the
size check (on those architectures where the structure size did change, i.e.
all 64 bit architectures).
To force new libsystemd (with the functions that take the updated
sd_bus_vtable[] format) to be used, let's pull in a dummy symbol from the table
definition. This is a bit wasteful, because a dummy pointer has to be stored,
but the effect is negligible. In particular, the pointer doesn't even change
the size of the structure because if fits in an unused area in the union.
The number stored in the new unsigned integer is not checked anywhere. If the
symbol exists, we already know we have the new version of the library, so an
additional check would not tell us anything.
An alternative would be to make sd_bus_add_{object,fallback}_vtable() versioned
symbols, using .symver linker annotations. We would provide
sd_bus_add_{object,fallback}_vtable@LIBSYSTEMD_221 (for backwards
compatibility) and e.g. sd_bus_add_{object,fallback}_vtable@@LIBSYSTEMD_242
(the default) with the new implementation. This would work too, but is more
work. We would have to version at least those two functions. And it turns out
that the .symver linker instructions have to located in the same compilation
unit as the function being annotated. We first compile libsystemd.a, and then
link it into libsystemd.so and various other targets, including
libsystemd-shared.so, and the nss modules. If the .symver annotations were
placed next to the function definitions (in bus-object.c), they would influence
all targets that link libsystemd.a, and cause problems, because those functions
should not be exported there. To export them only in libsystemd.so, compilation
would have to be rearranged, so that the functions exported in libsystemd.so
would not be present in libsystemd.a, but a separate compilation unit containg
them and the .symver annotations would be linked solely into libsystemd.so.
This is certainly possible, but more work than the approach in this patch.
856ad2a86b has one more issue: it relies on the
undefined fields in sd_bus_vtable[] array to be zeros. But the structure
contains a union, and fields of the union do not have to be zero-initalized by
the compiler. This means that potentially, we could have garbarge values there,
for example when reading the old vtable format definition from the new function
implementation. In practice this should not be an issue at all, because vtable
definitions are static data and are placed in the ro-data section, which is
fully initalized, so we know that those undefined areas will be zero. Things
would be different if somebody defined the vtable array on the heap or on the
stack. Let's just document that they should zero-intialize the unused areas
in this case.
The symbol checking code had to be updated because otherwise gcc warns about a
cast from unsigned to a pointer.
So apparently there are two reasons why accept() can return EOPNOTSUPP:
because the socket is not a listening stream socket (or similar), or
because the incoming TCP connection for some reason wasn't acceptable to
the host. THe latter should be a transient error, as suggested on
accept(2). The former however should be considered fatal for
flush_accept(). Let's fix this by explicitly checking whether the socket
is a listening socket beforehand.
kernel-4.15's if_ether.h has a bug that the header does not provide
'struct ethhdr'. The bug is introduced by
6926e041a8920c8ec27e4e155efa760aa01551fd (4.15-rc8)
and fixed by da360299b6734135a5f66d7db458dcc7801c826a (4.16-rc3).
This makes systemd built with kernel-4.15 headers.
Fixes#12319.
The L2TP_ATTR_UDP_ZERO_CSUM6_{TX,RX} attributes are introduced by
6b649feafe10b293f4bd5a74aca95faf625ae525, which is included in
kernel-3.16. To support older kernel, let's import the header.
Fixes#12300.
Now linux/in.h has better conflict detection with glibc's
netinet/in.h. So, let's import the headers.
Note that our code already have many workarounds for the conflict,
but in this commit does not drop them. Let's do that in the later
commits if this really helps.
With gcc-9.0.1-0.10.fc30.x86_64:
../src/network/netdev/macsec.c: In function ‘config_parse_macsec_port’:
../src/network/netdev/macsec.c:584:24: warning: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Waddress-of-packed-member]
584 | dest = &c->sci.port;
| ^~~~~~~~~~~~
../src/network/netdev/macsec.c:592:24: warning: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Waddress-of-packed-member]
592 | dest = &b->sci.port;
| ^~~~~~~~~~~~
(The alignment was probably OK, but it's nicer to avoid the warning anyway.)
When shooting down a service with SIGABRT the user might want to have a
much longer stop timeout than on regular stops/shutdowns. Especially in
the face of short stop timeouts the time might not be sufficient to
write huge core dumps before the service is killed.
This commit adds a dedicated (Default)TimeoutAbortSec= timer that is
used when stopping a service via SIGABRT. In all other cases the
existing TimeoutStopSec= is used. The timer value is unset by default
to skip the special handling and use TimeoutStopSec= for state
'stop-watchdog' to keep the old behaviour.
If the service is in state 'stop-watchdog' and the service should be
stopped explicitly we still go to 'stop-sigterm' and re-apply the usual
TimeoutStopSec= timeout.
[zj: this is a subset of changes generated by clang-format, just the ones
I think improve readability or consistency.]
This is a part of https://github.com/systemd/systemd/pull/11811.
In cgroup v2 we have protection tunables -- currently MemoryLow and
MemoryMin (there will be more in future for other resources, too). The
design of these protection tunables requires not only intermediate
cgroups to propagate protections, but also the units at the leaf of that
resource's operation to accept it (by setting MemoryLow or MemoryMin).
This makes sense from an low-level API design perspective, but it's a
good idea to also have a higher-level abstraction that can, by default,
propagate these resources to children recursively. In this patch, this
happens by having descendants set memory.low to N if their ancestor has
DefaultMemoryLow=N -- assuming they don't set a separate MemoryLow
value.
Any affected unit can opt out of this propagation by manually setting
`MemoryLow` to some value in its unit configuration. A unit can also
stop further propagation by setting `DefaultMemoryLow=` with no
argument. This removes further propagation in the subtree, but has no
effect on the unit itself (for that, use `MemoryLow=0`).
Our use case in production is simplifying the configuration of machines
which heavily rely on memory protection tunables, but currently require
tweaking a huge number of unit files to make that a reality. This
directive makes that significantly less fragile, and decreases the risk
of misconfiguration.
After this patch is merged, I will implement DefaultMemoryMin= using the
same principles.
This was the last kind of accounting still not exposed on for each unit.
Let's fix that.
Note that this is a relatively simplistic approach: we don't expose
per-device stats, but sum them all up, much like cgtop does. This kind
of metric is probably the most interesting for most usecases, and covers
the "systemctl status" output best. If we want per-device stats one day
we can of course always add that eventually.
It's a simple wrapper for resetting both IP and CPU accounting in one
go.
This will become particularly useful when we also needs this to reset IO
accounting (to be added in a later commit).
Let's exit the loop early in case the variant is not actually an object
or array. This is safer since otherwise we might end up iterating
through these variants and access fields that aren't of the type we
expect them to be and then bad things happen.
Of course, this doesn't absolve uses of these macros to check the type
of the variant explicitly beforehand, but it makes it less bad if they
forget to do so.
There's no point in returning the "key" within each loop iteration as
JsonVariant object. Let's simplify things and return it as string. That
simplifies usage (since the caller doesn't have to convert the object to
the string anymore) and is safe since we already validate that keys are
strings when an object JsonVariant is allocated.
Unlocked operations are used in all three places. I don't see why just one was
special.
This also improves logging, since we don't just log the final component of the
path, but the full name.
This is partially a refactoring, but also makes many more places use
unlocked operations implicitly, i.e. all users of fopen_temporary().
AFAICT, the uses are always for short-lived files which are not shared
externally, and are just used within the same context. Locking is not
necessary.
We noticed in our tests that occasionally SystemCallFilter= would
fail to set and the service would run with no syscall filtering.
Most of the time the same tests would apply the filter and fail
the service as expected. While it's not totally clear why this happens,
we noticed seccomp_load() in the systemd code base would fail open for
all errors except EPERM and EACCES.
ENOMEM, EINVAL, and EFAULT seem like reasonable values to add to the
error set based on what I gather from libseccomp code and man pages:
-ENOMEM: out of memory, failed to allocate space for a libseccomp structure, or would exceed a defined constant
-EINVAL: kernel isn't configured to support the operations, args are invalid (to seccomp_load(), seccomp(), or prctl())
-EFAULT: addresses passed as args are invalid
The comment explains that $PATH might not be set in certain circumstances and
takes steps to handle this case. If we do that, let's assume that $PATH indeed
might be unset and not call setenv("PATH", NULL, 1). It is not clear from the
man page if that is allowed.
CID #1400497.
Coverity was unhappy, because it doesn't know that $PATH is pretty much always
set. But let's not assume that in the test. CID #1400496.
$ (unset PATH; build/test-env-util)
[1] 31658 segmentation fault (core dumped) ( unset PATH; build/test-env-util; )
We had all kinds of indentation: 2 sp, 3 sp, 4 sp, 8 sp, and mixed.
4 sp was the most common, in particular the majority of scripts under test/
used that. Let's standarize on 4 sp, because many commandlines are long and
there's a lot of nesting, and with 8sp indentation less stuff fits. 4 sp
also seems to be the default indentation, so this will make it less likely
that people will mess up if they don't load the editor config. (I think people
often use vi, and vi has no support to load project-wide configuration
automatically. We distribute a .vimrc file, but it is not loaded by default,
and even the instructions in it seem to discourage its use for security
reasons.)
Also remove the few vim config lines that were left. We should either have them
on all files, or none.
Also remove some strange stuff like '#!/bin/env bash', yikes.
Media Access Control Security (MACsec) is an 802.1AE IEEE
industry-standard security technology that provides secure
communication for all traffic on Ethernet links.
MACsec provides point-to-point security on Ethernet links between
directly connected nodes and is capable of identifying and preventing
most security threats, including denial of service, intrusion,
man-in-the-middle, masquerading, passive wiretapping, and playback attacks.
Closes#5754
Otherwise we can end up with an ordering cycle. Since d54bab90, all
local mounts now gain a default `Before=local-fs.target` dependency.
This doesn't make sense for `/sysroot` mounts in the initrd though,
since those happen later in the boot process.
Closes: #12231
We would accept a message with 40k signature and spend a lot of time iterating
over the nested arrays. Let's just reject it early, as we do for !gvariant
messages.
This is modelled after the existing ERRNO_IS_RESOURCES() and in
particular ERRNO_IS_DISCONNECT(). It returns true for all transient
network errors that should be handled like EAGAIN whenever we call
accept() or accept4(). This is per documentation in the accept(2) man
page that explicitly says to do so in the its "RETURN VALUE" section.
The error list we cover is a bit more comprehensive, and based on
existing code of ours. For example EINTR is included too (since we need
that to cover cases where we call accept()/accept4() on a blocking
socket), and of course ERRNO_IS_DISCONNECT() is a bit more comprehensive
than the list in the man page too.
No technical reason, except that later on we want to add a new
ERRNO_IS() which uses the parameter twice and where we want to avoid
double evaluation, and where we'd like to keep things in the same style.
This adds a new per-service OOMPolicy= (along with a global
DefaultOOMPolicy=) that controls what to do if a process of the service
is killed by the kernel's OOM killer. It has three different values:
"continue" (old behaviour), "stop" (terminate the service), "kill" (let
the kernel kill all the service's processes).
On top of that, track OOM killer events per unit: generate a per-unit
structured, recognizable log message when we see an OOM killer event,
and put the service in a failure state if an OOM killer event was seen
and the selected policy was not "continue". A new "result" is defined
for this case: "oom-kill".
All of this relies on new cgroupv2 kernel functionality: the
"memory.events" notification interface and the "memory.oom.group"
attribute (which makes the kernel kill all cgroup processes
automatically).
Let's rename the .cgroup_inotify_wd field of the Unit object to
.cgroup_control_inotify_wd. Let's similarly rename the hashmap
.cgroup_inotify_wd_unit of the Manager object to
.cgroup_control_inotify_wd_unit.
Why? As preparation for a later commit that allows us to watch the
"memory.events" cgroup attribute file in addition to the "cgroup.events"
file we already watch with the fields above. In that later commit we'll
add new fields "cgroup_memory_inotify_wd" to Unit and
"cgroup_memory_inotify_wd_unit" to Manager, that are used to watch these
other events file.
No change in behaviour. Just some renaming.
So far the priorities for cgroup empty event handling were pretty weird.
The raw events (on cgroupsv2 from inotify, on cgroupsv1 from the agent
dgram socket) where scheduled at a lower priority than the cgroup empty
queue dispatcher. Let's swap that and ensure that we can coalesce events
more agressively: let's process the raw events at higher priority than
the cgroup empty event (which remains at the same prio).