IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This adds a new sd_pid_get_cgroup() call to sd-login which may be used
to query the control path of a process. This is useful for programs when
making use of delegation units, in order to figure out which subtree has
been delegated.
In light of the unified control group hierarchy this is finally safe to
do, hence let's add a proper API for it, to make it easier to use this.
We treat an empty wall-message equal to a NULL wall-message since:
commit 5744f59a3e
Author: Lennart Poettering <lennart@poettering.net>
Date: Fri Sep 4 10:34:47 2015 +0200
logind: treat an empty wall message like a NULL one
Fix the shutdown scheduler to not deref a NULL pointer, but properly
check for an empty wall-message.
Fixes: #1120
Commit 0339cd770 changed libsystemd-network's error code for missing DHCP lease
data from ENOENT to ENODATA. Adjust networkd accordingly.
This fixes interfaces being stuck in "degraded/configuring" mode forever.
https://github.com/systemd/systemd/issues/1147
Commit efdb023 ("core: unified cgroup hierarchy support") introduced a new
error ENOEXEC in cg_unified() if /sys/fs/cgroup/ is not available. Adjust the
"skip" checks in various tests accordingly.
Add a corresponding "skip" check to test-bus-creds as well, as
sd_bus_creds_new_from_pid() now calls cg_unified() as well.
This re-fixes "make check" in build chroots without /sys/fs/cgroup.
https://github.com/systemd/systemd/issues/1132
Delegation to unpriviliged processes is safe in the unified hierarchy,
hence allow it. This has the benefit of permitting "systemd --user"
instances to further partition their resources between user services.
Makre sure we always return sensible errors for the various, following
the same rules, and document them in a comment in sd-login.c. Also,
update all relevant man pages accordingly.
RT signals operate in a queue, and we should be careful to never merge
two queued signals into one. Hence, makes sure we only ever dequeue a
single signal at a time and leave the remaining ones queued in the
signalfd. In order to implement correct priorities for the signals
introduce one signalfd per priority, so that we only process the highest
priority signal at a time.
In the unified hierarchy delegating controller access is safe, hence
make sure to enable all controllers for the "payload" subcgroup if we
create it, so that the container will have all controllers enabled the
nspawn service itself has.
If the controller managed by systemd cannot found in /proc/$PID/cgroup,
return ENODATA, the usual error for cases where the data being looked
for does not exist, even if the process does.
Previously, on the legacy hierarchy a non-existing cgroup was considered
identical to an empty one, but the unified hierarchy the check for a
non-existing one returned ENOENT.
Let's move the actual cgroup part of it into a new separate function
manager_get_unit_by_pid_cgroup(), and then make
manager_get_unit_by_pid() just a wrapper that also checks the two pid
hashmaps.
Then, let's make sure the various calls that want to deliver events to
the owners of a PID check both hashmaps and the cgroup and deliver the
event to *each* of them. OTOH make sure bus calls like GetUnitByPID()
continue to check the PID hashmaps first and the cgroup only as
fallback.
This simply factors out the uid validation checks from parse_uid() and
uses them everywhere. This simply verifies that the passed UID is
neither 64bit -1 nor 32bit -1.
This adds a new PID_TO_PTR() macro, plus PTR_TO_PID() and makes use of
it wherever we maintain processes in a hash table. Previously we
sometimes used LONG_TO_PTR() and other times ULONG_TO_PTR() for that,
hence let's make this more explicit and clean up things.
The recent cgroup-rework changed the error code for un-mounted cgroupfs to
ENOEXEC. Make sure udev ignores it just like ENOENT and does not spill
warnings on the screen.
Virtio buses are undeterministically enumerated, so we cannot use them as a basis
for deterministic naming (see bf81e792f3). However, we are guaranteed that there
is only ever one virtio bus for every parent device, so we can simply skip over
the virtio buses when naming the devices.
This patch set adds full support the new unified cgroup hierarchy logic
of modern kernels.
A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is
added. If specified the unified hierarchy is mounted to /sys/fs/cgroup
instead of a tmpfs. No further hierarchies are mounted. The kernel
command line option defaults to off. We can turn it on by default as
soon as the kernel's APIs regarding this are stabilized (but even then
downstream distros might want to turn this off, as this will break any
tools that access cgroupfs directly).
It is possibly to choose for each boot individually whether the unified
or the legacy hierarchy is used. nspawn will by default provide the
legacy hierarchy to containers if the host is using it, and the unified
otherwise. However it is possible to run containers with the unified
hierarchy on a legacy host and vice versa, by setting the
$UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0,
respectively.
The unified hierarchy provides reliable cgroup empty notifications for
the first time, via inotify. To make use of this we maintain one
manager-wide inotify fd, and each cgroup to it.
This patch also removes cg_delete() which is unused now.
On kernel 4.2 only the "memory" controller is compatible with the
unified hierarchy, hence that's the only controller systemd exposes when
booted in unified heirarchy mode.
This introduces a new enum for enumerating supported controllers, plus a
related enum for the mask bits mapping to it. The core is changed to
make use of this everywhere.
This moves PID 1 into a new "init.scope" implicit scope unit in the root
slice. This is necessary since on the unified hierarchy cgroups may
either contain subgroups or processes but not both. PID 1 hence has to
move out of the root cgroup (strictly speaking the root cgroup is the
only one where processes and subgroups are still allowed, but in order
to support containers nicey, we move PID 1 into the new scope in all
cases.) This new unit is also used on legacy hierarchy setups. It's
actually pretty useful on all systems, as it can then be used to filter
journal messages coming from PID 1, and so on.
The root slice ("-.slice") is now implicitly created and started (and
does not require a unit file on disk anymore), since
that's where "init.scope" is located and the slice needs to be started
before the scope can.
To check whether we are in unified or legacy hierarchy mode we use
statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in
legacy mode, if it reports cgroupfs we are in unified mode.
This patch set carefuly makes sure that cgls and cgtop continue to work
as desired.
When invoking nspawn as a service it will implicitly create two
subcgroups in the cgroup it is using, one to move the nspawn process
into, the other to move the actual container processes into. This is
done because of the requirement that cgroups may either contain
processes or other subgroups.
We should never connect to the host bus as fallback if connecting to a
container failed via one method. Otherwise connecting to a dbus1
container will always result in a connection to the host.
We rely on the correct error used when opening the kdbus device node,
hence let's make sure we pass it up from the namespaced child process to
the process which actually wants to connect.
When the user wants to explicitly send our own PID a signal, then do so.
Don't follow up SIGABRT with a SIGHUP if send_sighup is enabled. At that
point the process should have segfaulted, hence there's no point in
following up with a SIGHUP.
Send only termination signals to ourselves, never KILL or ABRT signals.
Always say when we ignore errors. Cast calls whose return value we
knowingly ingore to (void). Use "bool" where we actually mean a boolean,
even if we return it as an int later on.
It's cheaper that going to cgroupfs, and also usually the better choice
since it's not racy and can map PIDs even if they were moved to a
different unit.
In all cases where the function (or cg_is_empty_recursive()) ignoring
the calling process is actually wrong, as a process keeps a cgroup busy
regardless if its the current one or another. Hence, let's simplify
things and drop the "ignore_self" parameter.
The legacy cgroup hierarchy does not support reliable empty
notifications in containers and if there are left-over subgroups in a
cgroup. This makes it hard to correctly wait for them running empty, and
thus we previously disabled this logic entirely.
With this change we explicitly check for the container case, and whether
the unit is a "delegation" unit (i.e. one where programs may create
their own subgroups). If we are neither in a container, nor operating on
a delegation unit cgroup empty notifications become reliable and thus we
start waiting for the empty notifications again.
This doesn't really fix the general problem around cgroup notifications
but reduces the effect around it.
(This also reorders #include lines by their focus, as suggsted in
CODING_STYLE. We have to add "virt.h", so let's do that at the right
place.)
Also see #317.
Rework the "service is good" check, to only check the cgroup state if we
really need to instead of always.
This allows us to suppress going to the cgroupfs for an empty check for
the majority of services.
No functional change.
let's return ENXIO whenever we don't know something rather than ENOENT.
ENOENT suggests this was really about a file or directory, while ENXIO
is a more generic "not found" indicator.
When mcstransd* is running non-raw functions will return translated SELinux
context. Problem is that libselinux will cache this information and in the
future it will return same context even though mcstransd maybe not running at
that time. If you then check with such context against SELinux policy then
selinux_check_access may fail depending on whether mcstransd is running or not.
To workaround this problem/bug in libselinux, we should always get raw context
instead. Most users will not notice because result of access check is logged
only in debug mode.
* SELinux context translation service, which will translates labels to human
readable form
On Dell and HP laptops the dock state/events (SW_DOCK) come from the "{Dell,HP}
WMI hotkeys" input devices. Tag them as power-switch so that login actually
considers them. Use a general match in case this affects other vendors, too.
Thanks to Andreas Schultz for debugging this!
https://launchpad.net/bugs/1450009
Related to the TODO item to replace FOREACH_WORD_QUOTED with it.
Tested by setting `JoinControllers=cpu,cpuacct,memory net_cls,blkio' in
/etc/systemd/system.conf, rebooting the system with the patched binaries
and checking that the desired setup was created by inspecting the
entries under /sys/fs/cgroup.
No regressions observed in test cases.
Make use of it in config_parse_cpu_affinity2.
Tested by tweaking the `CPUAffinity' setting in /etc/systemd/system.conf
and reloading the daemon to confirm it is working as expected.
No regressions observed in test cases.
Related to the TODO item to replace FOREACH_WORD_QUOTED with it.
Tested by setting `CPUAfinity=0 1' (and other similar settings) in
/etc/systemd/system.conf, booting the system with the patched binaries
(and also using `systemctl daemon-reload` to reconfigure) and checking
that /proc/1/status indicates only CPUs 0 and 1 are allowed for PID 1.
No regressions observed in test cases.
The constraints we place on the pool is that it is a contiguous
sequence of addresses in the same subnet as the server address, not
including the subnet nor broadcast addresses, but possibly including
the server address itself. If the server address is included in the
pool it is (obviously) reserved and not handed out to clients.
Merge sd_dhcp_server_set_address() and sd_dhcp_server_set_lease_pool() into
sd_dhcp_server_configure_pool() as the behavior of the two former depends
on the order they are called in. The flexibility is not needed, so let's
just do this in one call.
dbus-1.10 was just released, including systemd units to run
`dbus-daemon --session` as systemd user unit. This allows using a
user-bus with dbus1, just like we do per default with kdbus.
All the dbus libraries have already been fixed long ago to use the
user-bus as default. Hence, there's no need to set
DBUS_SESSION_BUS_ADDRESS= if we use the user-bus. However, gdm and
friends continue to spawn a session bus if this variable is not set
(instead of checking for the existence of the user-bus). Hence, we force
the user-bus, if it is available, in pam_systemd. Once gdm and friends
are fixed, we can continue to drop this again. However, that might take
a while.
With this in place, all that is needed to make the user-bus work is:
`systemctl --global enable dbus.socket`
If dbus.socket is not enabled, the legacy session-bus is still used.
Based on a patch by: Jan Alexander Steffens <jan.steffens@gmail.com>
When showing the number of tasks in a cgroup, recursively count tasks in
child cgroups and include them in the number. This ensures that the
number of tasks is cummulative the same way as memory, cpu and IO
resources are.
Old behaviour can be restored by passing the new --recursive=no switch.
This way we can be sure that less has the same idea of the terminal as
we do.
This solves issues in systems that have locale uninitalized, where
systemd would output UTF-8 but less wouldn't allow it and show them as
control characters.
We store the properties for transient units in drop-ins anyway, and
units don't have to have fragment files, hence don't bother with them,
and don't create them.
Refactor allocation of the result string to the top, since it is
currently done in both branches of the condition.
Remove unreachable code checking for EXTRACT_DONT_COALESCE_SEPARATORS
when state == SEPARATOR (the only place where SEPARATOR is assigned to
state follows a check for EXTRACT_DONT_COALESCE_SEPARATORS that jumps to
the end of the function.)
Tested by running test-util successfully.
Follow up to: 206644aede
This covers the case where an argument is an empty string, such as ''.
Instead of allocating the empty string in the individual conditions when
state == VALUE, just always allocate it at the end of state == START, at
which point we know we will have an argument.
Tested that test-util keeps passing after the refactor.
Follow up to: 14e685c29d
--bind and --bind-ro perform the bind mount
non-recursively. It is sometimes (often?) desirable
to do a recursive mount. This patch adds an optional
set of bind mount options in the form of:
--bind=src-path:dst-path:options
options are comma separated and currently only
"rbind" and "norbind" are allowed.
Default value is "rbind".
Rather than having all clients attempt to get the same leases (starting at the
beginning of the pool), make each client star at a random offset into the pool
determined by their client id. This greatly increases the chances of a given
client receiving the same IP address even though both the client and server
have lost any lease information (and distinct server instances handing out
the same leases).
In preparation of the unified cgroup support, let's clean up cgtop:
a) rework time code to be based on "nsec_t" rather than "struct timespec"
b) Introduce long option --order= for selecting ordering
c) count number of processes only in the main hierarchy, don't bother
with the controller hierarchies. We don't allow orthogonal
hierarchies in systemd anymore, hence there's no point to check the
other hierarchies.
d) Deal with non-monotonic cpuacct values (see #749)
e) When sorting groups, don't do prefix compare when ordering by number
of tasks, since this is not accumulative for all children.
f) Actually make --cpu without parameter work
g) Don't output control characters when we get them as input.
Fixes#749.
Let's add a way to get the type-specific D-Bus interface of a unit from
either its type or name to src/basic/unit-name.[ch]. That way we can
share it with the client side, where it is useful in tools like cgls or
machinectl.
Also ports over machinectl to make use of this.
It's really confusing if stdout goes to the pager, but stderr is written
directly to the screen. Hence, make sure both stdout and stderr are
passed to the pager when doing autopaging.
Apparently, sendfile() does not work between fifos and ttys, but
splice() does, hence let's optionally fall back to that. This is useful
to implement the fallback pager this way.
Querying low-level DNS RRs should be done via resolved now, not via
glibc's awful res_query() API anymore. Let's not introduce an async
wrapper for it hence.
When handing out DHCP leases, try to propagate DNS/NTP server
information from "uplink". The "uplink" is automatically determined as
the network interface with the highest priority default route on it.
We should not fall back to dbus-1 and connect to the proxy when kdbus
returns an error that indicates that kdbus is running but just does not
accept new connections because of quota limits or something similar.
Based on a patch by Kay.
This reverts commit d4d00020d6. The idea of
the commit is broken and needs to be reworked. We really cannot reduce
the bus-addresses to a single address. We always will have systemd with
native clients and legacy clients at the same time, so we also need both
addresses at the same time.
It is not acceptable to load unit files during enable/disable operations
just to figure out the selinux labels. systemd implements lazy loading
for units, so the selinux hooks need to follow it.
This drops the mac_selinux_unit_access_check_strv() helper which
implements a non-acceptable policy check. If anyone cares for that
functionality, you really should pass a callback+userdata to the helpers
in src/shared/install.c which does policy checks on each touched file.
See #1050 on github for more.
We use dashes in our bloom-tags. Make sure the newly introduced arg0has
tag uses the same style.
Note that the external dbus-tags don't use dashes, though. They are
defined in the spec and we need to keep compatibility there.
a) drop handling of obsolete or unused DHCP options time_offset,
mtu_aging_timeout, policy filter, mdr, ttl, ip forwarding settings.
Should this become useful one day we can readd support for this.
b) For subnet mask and broadcast it is not always clear whether 0 or
255.255.255.255 might be valid, hence maintain a boolean indicating
validity next to it.
c) serialize/deserialize broadcast address, lifetime, T1 and T2 together
with the rest of the fields in dhcp_lease_save() and
dhcp_lease_load().
d) consistently return ENODATA from getter functions for data that is
missing in the lease.
e) add missing getter calls for broadcast, lifetime, T1, T2.
f) when decoding DHCP options, generate debug messages on parse
failures, but try to proceed if possible.
g) Similar, when deserializing a lease in dhcp_lease_load(), make sure
we deal nicely with unparsable fields, to provide upgrade compat.
h) fix some memory allocations
refcnt.h only exists for cases where objects are simultaneously handled
by different threads. Otherwise it should not be used. The only case
where this applies is sd_bus, really, and pretty much none of our APIs,
since we do not claim thread-safety for them.
When we make sd-dhcp public one day we really should not make
sd_dhcp_lease_save() and sd_dhcp_lease_load() public, since it's pretty
much only useful as internal utility for networkd itself.
Previoulsy, we just checked whether the domain names specified in
incoming DHCP leases are valid. Given that validation code actually
internally normalizes anyway, it's a good idea to simply do the full
normalization and store that in the lease structure. This allows us to
remove the manual removal of a trailing dot, if there is one.
strv_extend() does not consume the passed entry, hence, we must properly
free it. Furthermore, we should *not* use strv_consume() as we do greedy
allocations on 'ret'; and greedy-allocations should only be used for short
lived objects or caches.
Fix the domainname parser to properly free temporary storage when done.
In our API design, getter-functions don't ref objects. Calls like
foo_get_bar() will not ref 'bar'. We never do that and there is no real
reason to do it in single threaded APIs. If you need a ref-count, you
better take it yourself *BEFORE* doing anything else on the parent object
(as this might invalidate your pointer).
Right now, sd_dhcp?_get_lease() refs the lease it returns. A lot of
code-paths in systemd do not expect this and thus leak the lease
reference. Fix this by changing the API to not ref returned objects.
The commit 4938696301 overlooked the
fact that unit files can be specified as unit file paths, not unit
file names, wrongly passing a unit file path to the 1st argument of
manager_load_unit() that handles it as a unit file name. As a result,
the following 4 systemctl subcommands:
enable
disable
reenable
link
mask
unmask
fail with the following error message:
# systemctl enable /usr/lib/systemd/system/kdump.service
Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid.
# systemctl disable /usr/lib/systemd/system/kdump.service
Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid.
# systemctl reenable /usr/lib/systemd/system/kdump.service
Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid.
# cp /usr/lib/systemd/system/kdump.service /tmp/
# systemctl link /tmp/kdump.service
Failed to execute operation: Unit name /tmp/kdump.service is not valid.
# systemctl mask /usr/lib/systemd/system/kdump.service
Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid.
# systemctl unmask /usr/lib/systemd/system/kdump.service
Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid.
To fix the issue, first check whether a unit file is passed as a unit
file name or a unit file path, and then pass the unit file to the
appropreate argument of manager_load_unit().
By the way, even with this commit mask and unmask reject unit file
paths as follows and this is a correct behavior:
# systemctl mask /usr/lib/systemd/system/kdump.service
Failed to execute operation: Invalid argument
# systemctl unmask /usr/lib/systemd/system/kdump.service
Failed to execute operation: Invalid argument
Internally, the root cgroup is stored as the empty string in
Unit.cgroup_path, and "no cgroup" as NULL. Unfortunately, D-Bus does not
know a NULL concept, hence when reporting the cgroup to clients we
should turn the root cgroup into "/", and leave the empty string for the
"no cgroup" case.
This should make sure that "systemctl status -- -.slice" works correctly
and shows the entire cgroup tree.
In the Cockpit integration tests we hang onton the journal files
for a failed test and would like to inspect them using coredumpctl.
This commit adds the ability to specify an alternate directory
for coredumpctl to read the journal from.
Previously, sd-bus inofficially already supported bus matches that
tested a string against an array of strings ("as"). This was done via an
enhanced way to interpret "arg0=" matches. This is problematic however,
since clients have no way to determine if their respective
implementation understood strv matches or not, thus allowing invalid
matches to be installed without a way to detect that.
This patch changes the logic to only allow such matches with a new
"arg0has=" syntax. This has the benefit that non-conforming
implementations will return a parse error and a client application may
thus efficiently detect support for the match type.
Matches of this type are useful for "udev"-like systems that "tag" objects
with a number of strings, and clients need to be able to match against
any of these "tags".
The name "has" takes inspiration from Python's ".has_key()" construct.
This partially reverts 106784ebb7, ad
readds separate DNS_PACKET_MAKE_FLAGS() invocations for the LLMNR and
DNS case. This is important since SOme flags have different names and
meanings on LLMNR and on DNS and we should clarify that via the comments
and how we put things together.
This hopefully makes this a bit more expressive and clarifies that the
fd is not used for the DNS TCP socket. This also mimics how the LLMNR
UDP fd is named in the manager object.
Currently, dns_cache_put() does a number of things:
1) It unconditionally removes all keys contained in the passed
question before adding keys from the newly arrived answers.
2) It puts positive entries into the cache for all RRs contained
in the answer.
3) It creates negative entries in the cache for all keys in the
question that are not answered.
Allow passing q = NULL in the parameters and skip 1) and 3), so
we can use that function for mDNS responses. In this case, the
question is irrelevant, we are interested in all answers we got.
Enable unprivileged users to set wall message on a shutdown
operation. When the message is set via the --message option,
it is logged together with the default shutdown message.
$ systemctl reboot --message "Applied kernel updates."
$ journalctl -b -1
...
systemd-logind[27]: System is rebooting. (Applied kernel updates.)
...
This allows marking properties as "explicit". Properties marked like
this are included in the introspection, but are avoided in GetAll()
property queries, PropertiesChanged() signals and in in GetManaged()
object manager calls and InterfacesAdded() signals.
Expensive properties may be marked that way, and they will be
retrievable when explicitly being requested, but never in "blanket"
all-property queries and signals.
This flag may be combined with the flags for "const" and
"emit-validation" properties, but not with "emit-validation", as that
is only useful for properties whose value shall be sent in "blanket"
all-property signals.
The "explicit" flag is also exposed in the introspection data via a new
annotation.
If we try to resoolve an LLMNR PTR RR we shall connect via TCP directly
to the specified IP address. We already refuse to do this if the address
to resolve is of a different address family as the transaction's scope.
The error returned was EAFNOSUPPORT. Let's change this to ESRCH which is
how we indicate "not server available" when connecting for LLMNR or DNS,
since that's what this really is: we have no server we could connect to
in this address family.
This allows us to ensure that no server errors are always handled the same
way.
If the user specifies an interface by its ifindex we should handle this
nicely. Hence let's try to parse the ifindex as a number before we try
to resolve it as an interface name.
So far we handled immediate "no server" query results differently from
"no server" results we ran into during operation: the former would cause
the dns_query_go() call to fail with ESRCH, the later would result in
the query completion callback to be called.
Remove the duplicate codepaths, by always going through the completion
callback. This allows us to remove quite a number of lines for handling
the ESRCH.
This commit should not alter behaviour at all.
Right now we keep track of ongoing transactions in a linked listed for
each scope. Replace this by a hashmap that is indexed by the RR key.
Given that all ongoing transactions will be placed in pretty much the
same scopes usually this should optimize behaviour.
We used to require a list here, since we wanted to do "superset" query
checks, but this became obsolete since transactions are now single-key
instead of multi-key.
In order to make "machinectl shell" more similar to ssh, allow the
following syntax to connect to a container under a specific username:
machinectl shell lennart@fedora
Also beefs up related man page documentation.
Introduce separate actions for creating login or shell sessions for
the local host or a local container. By default allow local unprivileged
clients to create new login sessions (which is safe, since getty will
ask for username and authentication).
Also, imply login privs from shell privs, as well as shell and login
privs from manage privs.
When showing the status of the "-.slice" slice root unit (whose reported
cgroup path is ""), we suppressed the cgroup tree so far, because
skipped it for all unit with an empty cgroup path. Let's fix that, and
properly handle the empty cgroup path.
Let's hide all machines whose name begins with "." by default, thus
hiding the ".host" pseudo-machine, unless --all is specified. This
takes inspiration from the ".host" image handling in "machinectl
list-images" which also hides all images whose name starts with ".".
Some of the operations machined/machinectl implement are also very
useful when applied to the host system (such as machinectl login,
machinectl shell or machinectl status), hence introduce a pseudo-machine
by the name of ".host" in machined that refers to the host system, and
may be used top execute operations on the host system with.
This copies the pseudo-image ".host" machined already implements for
image related commands.
(This commit also adds a PK privilege for opening a PTY in a container,
which was previously not accessible for non-root.)
When enumerating machines from /run, and when accepting machine names
for operations, be more strict and always validate.
Note that these checks are strictly speaking unnecessary, since
enumeration happens only on the trusted /run...
As it turns out machine_name_is_valid() does the exact same thing as
hostname_is_valid() these days, as it just invoked that and checked the
name length was < 64. However, hostname_is_valid() checks the length
against HOST_NAME_MAX anyway (which is 64 on Linux), hence any
additional check is redundant.
We hence replace machine_name_is_valid() by a macro that simply maps it
to hostname_is_valid() but sets the allow_trailing_dot parameter to
false. We also move this this call to hostname-util.h, to the same place
as the hostname_is_valid() declaration.
When looking for the machine belonging to a PID, always look for the
leader first, only then fall back to a cgroup check. We keep direct
track of the leader PID, but only indirectly of the cgroup, hence prefer
the PID.
This new bus call opens an interactive shell in a container. It works
like the existing OpenLogin() call, but does not involve getty, and
instead opens an arbitrary command line.
This is similar to "systemd-run -t -M" but is controlled by a specific
PolicyKit privilege.
When generating utmp/wtmp entries, optionally add both LOGIN_PROCESS and
INIT_PROCESS entries or even all three of LOGIN_PROCESS, INIT_PROCESS
and USER_PROCESS entries, instead of just a single INIT_PROCESS entry.
With this change systemd may be used to not only invoke a getty directly
in a SysV-compliant way but alternatively also a login(1) implementation
or even forego getty and login entirely, and invoke arbitrary shells in
a way that they appear in who(1) or w(1).
This is preparation for a later commit that adds a "machinectl shell"
operation to invoke a shell in a container, in a way that is compatible
with who(1) and w(1).
If a connection passed KDBUS_HELLO_ACTIVATOR, it cannot do I/O on the
bus. Hence, we should not treat it as proper peer. To actually query it,
you have to explicitly ask for activators.
This makes kdbus in-line with what dbus-daemon does.
This reverts commit 92d16a53e3. As it turns
out, this is not how ObjectManager is supposed to work. It is just a
special behavior of BlueZ, but no-one else implements it this way.
Revert the patch as discussed on github, and as such revert to the
previous behavior (as described in the spec).
Prior to commit c32eb440ba, libudev's
function udev_enumerate_scan_devices() had behaved differently. If
parent match was added with udev_enumerate_add_match_parent(),
udev_enumerate_scan_devices() did not return error if some child devices
had no subsystem symlink in sysfs. An example of such devices is USB
endpoints /sys/bus/usb/devices/*/ep_*. If there was a parent match
against USB device, old implementation of udev_enumerate_scan_devices()
did not treat ep_* device directories without subsystem symlink as error
and just ignored them, but new implementation returns -ENOENT (also
ignoring these devices) though correctly enumerates all other matching
devices.
To compare, you could look at 96df036fe3,
in src/libudev/libudev-enumerate.c, function parent_add_child():
if (!match_subsystem(enumerate, udev_device_get_subsystem(dev)))
goto nomatch;
udev_device_get_subsystem() was returning NULL, match_subsystem() was
returning false, and USB endpoint device was ignored.
New parent_add_child() from src/libsystemd/sd-device/device-enumerator.c
checks return value of sd_device_get_subsystem() and fails if subsystem
was not found. Absence of subsystem symlink should not be really treated
as error because all enumerations of children of USB devices will fail
with -ENOENT. This new behavior also breaks system-config-printer.
So restore old behavior and treat absence of subsystem symlink as no
match.
Let's simplify things and only maintain a single RR key per transaction
object, instead of a full DnsQuestion. Unicast DNS and LLMNR don't
support multiple questions per packet anway, and Multicast DNS suggests
coalescing questions beyond a single dns query, across the whole system.
It shouldn't happen that we try to resolve IPv4 addresses via LLMNR on
IPv6 and vice versa, but let's explicitly verify that we don't turn an
IPv4 LLMNR lookup into an IPv6 TCP connection.
We explicitly need to turn off name compression when marshalling or
demarshalling RRs for bus transfer, since they otherwise refer to packet
offsets that reference packets that are not transmitted themselves.
With this change we'll now also generate synthesized RRs for the local
LLMNR hostname (first label of system hostname), the local mDNS hostname
(first label of system hostname suffixed with .local), the "gateway"
hostname and all the reverse PTRs. This hence takes over part of what
nss-myhostname already implemented.
Local hostnames resolve to the set of local IP addresses. Since the
addresses are possibly on different interfaces it is necessary to change
the internal DnsAnswer object to track per-RR interface indexes, and to
change the bus API to always return the interface per-address rather than
per-reply. This change also patches the existing clients for resolved
accordingly (nss-resolve + systemd-resolve-host).
This also changes the routing logic for queries slightly: we now ensure
that the local hostname is never resolved via LLMNR, thus making it
trustable on the local system.
This moves is_gateway() from nss-myhostname into the basic APIs, and
makes it more like is_localhost(). Also, we rename it to
is_gateway_hostname() to make it more expressive.
Sharing this function in src/basic/ allows us to reuse the function for
routing name requests in resolved (in a later commit).
Test option setting and getting in test_advertise_option(). Verify
that the information provided in DHCPv6 Reply messages is also
available in the Information and Solicit callbacks.
Although the SNTP option specified in RFC 4075 has been deprecated, some
servers are still sending NTP information with this option. Use the SNTP
information provided only if the NTP option is not present.
Update the test case as SNTP information is also requested.
Support DHCPv6 DNS search list option as specified in RFC 3646. This
option contains a list of DNS search domains encoded without compression
as specified in Section 8. of RFC 3315.
Add a helper function containing a modified version of dns_packet_read_name()
that does not use DnsPacket to extract a string array of domain names from
the provided option data. The domain names are stored uncompressed as defined
in Section 8. of RFC 3315.
When the DHCPv6 client is started by the library user or stopped for
any reason, unref the DHCPv6 lease when resetting the DHCPv6 client
data structure. This makes the DHCPv6 client always start from a clean
state and not keep unnecessary an lease structure around when stopped.
If this is not done, a previously existing lease information can be
interpreted to be from another server when restarting DHCPv6.
Sparse archives are good for files like utmp/wtmp and journals, but it
is still expenseive in having to read the files twice to archive them.
Additionally there is a supportability issue as GNU sparse archives are
not supported by many archive libraries and tooling.
setenv is declared as:
extern int setenv (const char *__name, const char *__value, int __replace)
__THROW __nonnull ((2));
And i->timezone can be NULL, if for example /etc/localtime is
missing. Previously that worked, but now result in a libc dumping
core, as seen with gcc 2.22, due to:
https://sourceware.org/ml/glibc-cvs/2015-q2/msg00075.html
When the controlling process exits, any existing file descriptors
for that FD will be marked as hung-up and ioctls on them will
file with EIO. To work around this, open a new file descriptor
for the VT we want to clean up.
Thanks to Ray Strode for help in sorting out the problem and
coming up with a fix!
https://github.com/systemd/systemd/issues/989
The open_terminal() function adds retries in case a terminal
is in the process of being closed when we open it, and should
generally be used to open a terminal. We especially need it
for code that a subsequent commit adds that reopens the terminal
at session shut-down time; such races would be more likely in
that case.
Found by Ray Strode.
remove_directory will always return 0 so this can never happen.
Besides that, d->path and d are freed so we would end up with
a null pointer dereference anyway.
To be able to use `systemd-run` or `machinectl login` on a container
that is in a private user namespace, the sub-process must have entered
the user namespace before connecting to the container's D-Bus, otherwise
the UID and GID in the peer credentials are garbage.
So we extend namespace_open and namespace_enter to support UID namespaces,
and we enter the UID namespace in bus_container_connect_{socket,kernel}.
namespace_open will degrade to a no-op if user namespaces are not enabled
in the kernel.
Special handling is required for the setns call in namespace_enter with
a user namespace, since transitioning to your own namespace is forbidden,
as it would result in re-entering your user namespace as root.
Arguably it may be valid to check this at the call site, rather than
inside namespace_enter, but it is less code to do it inside, and if the
intention of calling namespace_enter is to *be* in the target namespace,
rather than to transition to the target namespace, it is a reasonable
approach.
The check for whether the user namespace is the same must happen before
entering namespaces, as we may not be able to access /proc during the
intermediate transition stage.
We can't instead attempt to enter the user namespace and then ignore
the failure from it being the same namespace, since the error code is
not distinct, and we can't compare namespaces while mid-transition.
The following functions return immediately if a null pointer was passed.
* calendar_spec_free
* link_address_free
* manager_free
* sd_bus_unref
* sd_journal_close
* udev_monitor_unref
* udev_unref
It is therefore not needed that a function caller repeats a corresponding check.
This issue was fixed by using the software Coccinelle 1.0.1.
Previously the following command:
$ journalctl -f -t unmatchedtag12345
... would block when called with criteria that did not match any
journal lines. Once log lines appeared that matched the criteria
they were displayed.
Commit 02ab86c732 broke this
behavior and the journal was not followed, but the command
exits with '-- No entries --' displayed.
This commit fixes the issue.
More information downstream:
https://bugzilla.redhat.com/show_bug.cgi?id=1253649
Whenever one of our calls is invoked with a non-NULL, writable
sd_bus_error parameter, let's fill in some valid error on failure. We
previously only filled in remote errors, but never local errors, which is
hard to handle by users. Hence, let's clean this up to always fill in
the error.
This introduces a new bus_assert_return() macro that works like
assert_return() but optionally also initializes a bus_error struct.
Fixes#224.
Based on a patch by Umut Tezduyar.
There's no reason to explicitly turn off bus activation for resolved
here. The reason this was done before was that the code was copied from
nss-resolve, which has a fallback to glibc's nss-dns if resolved is not
reachable. However, such a logic makes no sense for resolve-host since
such a fallback doesn't make sense here, which means we can actually
turn on activation. Let's do it hence.
Let's make sure that clients querying resolved via the bus for A, AAAA
or PTR records for "localhost" get a synthesized, local reply, so that
we do not hit the network.
This makes part of nss-myhostname redundant, if used in conjunction.
However, given that nss-resolve shall be optional we need to keep this
code in both places for now.
Given that we already hardocde the loopback ifindex, following the
kernel's own logic, we can replace the invocation of
if_nametoindex("lo") with LOOPBACK_IFINDEX.
Since dacd6cee76 the two OOM's are
ignored as the value of r will be overwritten and we only log in
the fail section anyway.
This patch jumps to fail on OOM.
Note that this is different behavior compared to both the current
code and previous to dacd6cee76. Before
that commit we would log that saving the inhibit data failed, but
still write the file, though without the WHO/WHY section.
CID# 1313545
Fix error message:
-->--
Code should not be reached 'Unknown action.' at
src/systemctl/systemctl.c:6382, function halt_now(). Aborting.
Aborted
--<--
when executing 'reboot -f' from a system running a kexec kernel.
We should not fall back to dbus-1 and connect to the proxy when kdbus
returns an error that indicates that kdbus is running but just does not
accept new connections because of quota limits or something similar.
Using is_kdbus_available() in libsystemd/ requires it to move from
shared/ to libsystemd/.
Based on a patch from David Herrmann:
https://github.com/systemd/systemd/pull/886
The partition-type flags are defined independently for every partition-type. Apply
them only to the types where they are defined, and not to the ESP, which does not
appear to share the same set of flags.
https://github.com/systemd/systemd/issues/920