IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Delegation to unpriviliged processes is safe in the unified hierarchy,
hence allow it. This has the benefit of permitting "systemd --user"
instances to further partition their resources between user services.
Makre sure we always return sensible errors for the various, following
the same rules, and document them in a comment in sd-login.c. Also,
update all relevant man pages accordingly.
RT signals operate in a queue, and we should be careful to never merge
two queued signals into one. Hence, makes sure we only ever dequeue a
single signal at a time and leave the remaining ones queued in the
signalfd. In order to implement correct priorities for the signals
introduce one signalfd per priority, so that we only process the highest
priority signal at a time.
In the unified hierarchy delegating controller access is safe, hence
make sure to enable all controllers for the "payload" subcgroup if we
create it, so that the container will have all controllers enabled the
nspawn service itself has.
If the controller managed by systemd cannot found in /proc/$PID/cgroup,
return ENODATA, the usual error for cases where the data being looked
for does not exist, even if the process does.
Previously, on the legacy hierarchy a non-existing cgroup was considered
identical to an empty one, but the unified hierarchy the check for a
non-existing one returned ENOENT.
Let's move the actual cgroup part of it into a new separate function
manager_get_unit_by_pid_cgroup(), and then make
manager_get_unit_by_pid() just a wrapper that also checks the two pid
hashmaps.
Then, let's make sure the various calls that want to deliver events to
the owners of a PID check both hashmaps and the cgroup and deliver the
event to *each* of them. OTOH make sure bus calls like GetUnitByPID()
continue to check the PID hashmaps first and the cgroup only as
fallback.
This simply factors out the uid validation checks from parse_uid() and
uses them everywhere. This simply verifies that the passed UID is
neither 64bit -1 nor 32bit -1.
This adds a new PID_TO_PTR() macro, plus PTR_TO_PID() and makes use of
it wherever we maintain processes in a hash table. Previously we
sometimes used LONG_TO_PTR() and other times ULONG_TO_PTR() for that,
hence let's make this more explicit and clean up things.
The recent cgroup-rework changed the error code for un-mounted cgroupfs to
ENOEXEC. Make sure udev ignores it just like ENOENT and does not spill
warnings on the screen.
Virtio buses are undeterministically enumerated, so we cannot use them as a basis
for deterministic naming (see bf81e792f3). However, we are guaranteed that there
is only ever one virtio bus for every parent device, so we can simply skip over
the virtio buses when naming the devices.
This patch set adds full support the new unified cgroup hierarchy logic
of modern kernels.
A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is
added. If specified the unified hierarchy is mounted to /sys/fs/cgroup
instead of a tmpfs. No further hierarchies are mounted. The kernel
command line option defaults to off. We can turn it on by default as
soon as the kernel's APIs regarding this are stabilized (but even then
downstream distros might want to turn this off, as this will break any
tools that access cgroupfs directly).
It is possibly to choose for each boot individually whether the unified
or the legacy hierarchy is used. nspawn will by default provide the
legacy hierarchy to containers if the host is using it, and the unified
otherwise. However it is possible to run containers with the unified
hierarchy on a legacy host and vice versa, by setting the
$UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0,
respectively.
The unified hierarchy provides reliable cgroup empty notifications for
the first time, via inotify. To make use of this we maintain one
manager-wide inotify fd, and each cgroup to it.
This patch also removes cg_delete() which is unused now.
On kernel 4.2 only the "memory" controller is compatible with the
unified hierarchy, hence that's the only controller systemd exposes when
booted in unified heirarchy mode.
This introduces a new enum for enumerating supported controllers, plus a
related enum for the mask bits mapping to it. The core is changed to
make use of this everywhere.
This moves PID 1 into a new "init.scope" implicit scope unit in the root
slice. This is necessary since on the unified hierarchy cgroups may
either contain subgroups or processes but not both. PID 1 hence has to
move out of the root cgroup (strictly speaking the root cgroup is the
only one where processes and subgroups are still allowed, but in order
to support containers nicey, we move PID 1 into the new scope in all
cases.) This new unit is also used on legacy hierarchy setups. It's
actually pretty useful on all systems, as it can then be used to filter
journal messages coming from PID 1, and so on.
The root slice ("-.slice") is now implicitly created and started (and
does not require a unit file on disk anymore), since
that's where "init.scope" is located and the slice needs to be started
before the scope can.
To check whether we are in unified or legacy hierarchy mode we use
statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in
legacy mode, if it reports cgroupfs we are in unified mode.
This patch set carefuly makes sure that cgls and cgtop continue to work
as desired.
When invoking nspawn as a service it will implicitly create two
subcgroups in the cgroup it is using, one to move the nspawn process
into, the other to move the actual container processes into. This is
done because of the requirement that cgroups may either contain
processes or other subgroups.
We should never connect to the host bus as fallback if connecting to a
container failed via one method. Otherwise connecting to a dbus1
container will always result in a connection to the host.
We rely on the correct error used when opening the kdbus device node,
hence let's make sure we pass it up from the namespaced child process to
the process which actually wants to connect.
When the user wants to explicitly send our own PID a signal, then do so.
Don't follow up SIGABRT with a SIGHUP if send_sighup is enabled. At that
point the process should have segfaulted, hence there's no point in
following up with a SIGHUP.
Send only termination signals to ourselves, never KILL or ABRT signals.
Always say when we ignore errors. Cast calls whose return value we
knowingly ingore to (void). Use "bool" where we actually mean a boolean,
even if we return it as an int later on.
It's cheaper that going to cgroupfs, and also usually the better choice
since it's not racy and can map PIDs even if they were moved to a
different unit.
In all cases where the function (or cg_is_empty_recursive()) ignoring
the calling process is actually wrong, as a process keeps a cgroup busy
regardless if its the current one or another. Hence, let's simplify
things and drop the "ignore_self" parameter.
The legacy cgroup hierarchy does not support reliable empty
notifications in containers and if there are left-over subgroups in a
cgroup. This makes it hard to correctly wait for them running empty, and
thus we previously disabled this logic entirely.
With this change we explicitly check for the container case, and whether
the unit is a "delegation" unit (i.e. one where programs may create
their own subgroups). If we are neither in a container, nor operating on
a delegation unit cgroup empty notifications become reliable and thus we
start waiting for the empty notifications again.
This doesn't really fix the general problem around cgroup notifications
but reduces the effect around it.
(This also reorders #include lines by their focus, as suggsted in
CODING_STYLE. We have to add "virt.h", so let's do that at the right
place.)
Also see #317.
Rework the "service is good" check, to only check the cgroup state if we
really need to instead of always.
This allows us to suppress going to the cgroupfs for an empty check for
the majority of services.
No functional change.
let's return ENXIO whenever we don't know something rather than ENOENT.
ENOENT suggests this was really about a file or directory, while ENXIO
is a more generic "not found" indicator.
When mcstransd* is running non-raw functions will return translated SELinux
context. Problem is that libselinux will cache this information and in the
future it will return same context even though mcstransd maybe not running at
that time. If you then check with such context against SELinux policy then
selinux_check_access may fail depending on whether mcstransd is running or not.
To workaround this problem/bug in libselinux, we should always get raw context
instead. Most users will not notice because result of access check is logged
only in debug mode.
* SELinux context translation service, which will translates labels to human
readable form
On Dell and HP laptops the dock state/events (SW_DOCK) come from the "{Dell,HP}
WMI hotkeys" input devices. Tag them as power-switch so that login actually
considers them. Use a general match in case this affects other vendors, too.
Thanks to Andreas Schultz for debugging this!
https://launchpad.net/bugs/1450009
Related to the TODO item to replace FOREACH_WORD_QUOTED with it.
Tested by setting `JoinControllers=cpu,cpuacct,memory net_cls,blkio' in
/etc/systemd/system.conf, rebooting the system with the patched binaries
and checking that the desired setup was created by inspecting the
entries under /sys/fs/cgroup.
No regressions observed in test cases.
Make use of it in config_parse_cpu_affinity2.
Tested by tweaking the `CPUAffinity' setting in /etc/systemd/system.conf
and reloading the daemon to confirm it is working as expected.
No regressions observed in test cases.
Related to the TODO item to replace FOREACH_WORD_QUOTED with it.
Tested by setting `CPUAfinity=0 1' (and other similar settings) in
/etc/systemd/system.conf, booting the system with the patched binaries
(and also using `systemctl daemon-reload` to reconfigure) and checking
that /proc/1/status indicates only CPUs 0 and 1 are allowed for PID 1.
No regressions observed in test cases.
The constraints we place on the pool is that it is a contiguous
sequence of addresses in the same subnet as the server address, not
including the subnet nor broadcast addresses, but possibly including
the server address itself. If the server address is included in the
pool it is (obviously) reserved and not handed out to clients.
Merge sd_dhcp_server_set_address() and sd_dhcp_server_set_lease_pool() into
sd_dhcp_server_configure_pool() as the behavior of the two former depends
on the order they are called in. The flexibility is not needed, so let's
just do this in one call.
dbus-1.10 was just released, including systemd units to run
`dbus-daemon --session` as systemd user unit. This allows using a
user-bus with dbus1, just like we do per default with kdbus.
All the dbus libraries have already been fixed long ago to use the
user-bus as default. Hence, there's no need to set
DBUS_SESSION_BUS_ADDRESS= if we use the user-bus. However, gdm and
friends continue to spawn a session bus if this variable is not set
(instead of checking for the existence of the user-bus). Hence, we force
the user-bus, if it is available, in pam_systemd. Once gdm and friends
are fixed, we can continue to drop this again. However, that might take
a while.
With this in place, all that is needed to make the user-bus work is:
`systemctl --global enable dbus.socket`
If dbus.socket is not enabled, the legacy session-bus is still used.
Based on a patch by: Jan Alexander Steffens <jan.steffens@gmail.com>
When showing the number of tasks in a cgroup, recursively count tasks in
child cgroups and include them in the number. This ensures that the
number of tasks is cummulative the same way as memory, cpu and IO
resources are.
Old behaviour can be restored by passing the new --recursive=no switch.
This way we can be sure that less has the same idea of the terminal as
we do.
This solves issues in systems that have locale uninitalized, where
systemd would output UTF-8 but less wouldn't allow it and show them as
control characters.
We store the properties for transient units in drop-ins anyway, and
units don't have to have fragment files, hence don't bother with them,
and don't create them.
Refactor allocation of the result string to the top, since it is
currently done in both branches of the condition.
Remove unreachable code checking for EXTRACT_DONT_COALESCE_SEPARATORS
when state == SEPARATOR (the only place where SEPARATOR is assigned to
state follows a check for EXTRACT_DONT_COALESCE_SEPARATORS that jumps to
the end of the function.)
Tested by running test-util successfully.
Follow up to: 206644aede
This covers the case where an argument is an empty string, such as ''.
Instead of allocating the empty string in the individual conditions when
state == VALUE, just always allocate it at the end of state == START, at
which point we know we will have an argument.
Tested that test-util keeps passing after the refactor.
Follow up to: 14e685c29d
--bind and --bind-ro perform the bind mount
non-recursively. It is sometimes (often?) desirable
to do a recursive mount. This patch adds an optional
set of bind mount options in the form of:
--bind=src-path:dst-path:options
options are comma separated and currently only
"rbind" and "norbind" are allowed.
Default value is "rbind".
Rather than having all clients attempt to get the same leases (starting at the
beginning of the pool), make each client star at a random offset into the pool
determined by their client id. This greatly increases the chances of a given
client receiving the same IP address even though both the client and server
have lost any lease information (and distinct server instances handing out
the same leases).
In preparation of the unified cgroup support, let's clean up cgtop:
a) rework time code to be based on "nsec_t" rather than "struct timespec"
b) Introduce long option --order= for selecting ordering
c) count number of processes only in the main hierarchy, don't bother
with the controller hierarchies. We don't allow orthogonal
hierarchies in systemd anymore, hence there's no point to check the
other hierarchies.
d) Deal with non-monotonic cpuacct values (see #749)
e) When sorting groups, don't do prefix compare when ordering by number
of tasks, since this is not accumulative for all children.
f) Actually make --cpu without parameter work
g) Don't output control characters when we get them as input.
Fixes#749.
Let's add a way to get the type-specific D-Bus interface of a unit from
either its type or name to src/basic/unit-name.[ch]. That way we can
share it with the client side, where it is useful in tools like cgls or
machinectl.
Also ports over machinectl to make use of this.
It's really confusing if stdout goes to the pager, but stderr is written
directly to the screen. Hence, make sure both stdout and stderr are
passed to the pager when doing autopaging.
Apparently, sendfile() does not work between fifos and ttys, but
splice() does, hence let's optionally fall back to that. This is useful
to implement the fallback pager this way.
Querying low-level DNS RRs should be done via resolved now, not via
glibc's awful res_query() API anymore. Let's not introduce an async
wrapper for it hence.
When handing out DHCP leases, try to propagate DNS/NTP server
information from "uplink". The "uplink" is automatically determined as
the network interface with the highest priority default route on it.
We should not fall back to dbus-1 and connect to the proxy when kdbus
returns an error that indicates that kdbus is running but just does not
accept new connections because of quota limits or something similar.
Based on a patch by Kay.
This reverts commit d4d00020d6. The idea of
the commit is broken and needs to be reworked. We really cannot reduce
the bus-addresses to a single address. We always will have systemd with
native clients and legacy clients at the same time, so we also need both
addresses at the same time.
It is not acceptable to load unit files during enable/disable operations
just to figure out the selinux labels. systemd implements lazy loading
for units, so the selinux hooks need to follow it.
This drops the mac_selinux_unit_access_check_strv() helper which
implements a non-acceptable policy check. If anyone cares for that
functionality, you really should pass a callback+userdata to the helpers
in src/shared/install.c which does policy checks on each touched file.
See #1050 on github for more.
We use dashes in our bloom-tags. Make sure the newly introduced arg0has
tag uses the same style.
Note that the external dbus-tags don't use dashes, though. They are
defined in the spec and we need to keep compatibility there.
a) drop handling of obsolete or unused DHCP options time_offset,
mtu_aging_timeout, policy filter, mdr, ttl, ip forwarding settings.
Should this become useful one day we can readd support for this.
b) For subnet mask and broadcast it is not always clear whether 0 or
255.255.255.255 might be valid, hence maintain a boolean indicating
validity next to it.
c) serialize/deserialize broadcast address, lifetime, T1 and T2 together
with the rest of the fields in dhcp_lease_save() and
dhcp_lease_load().
d) consistently return ENODATA from getter functions for data that is
missing in the lease.
e) add missing getter calls for broadcast, lifetime, T1, T2.
f) when decoding DHCP options, generate debug messages on parse
failures, but try to proceed if possible.
g) Similar, when deserializing a lease in dhcp_lease_load(), make sure
we deal nicely with unparsable fields, to provide upgrade compat.
h) fix some memory allocations
refcnt.h only exists for cases where objects are simultaneously handled
by different threads. Otherwise it should not be used. The only case
where this applies is sd_bus, really, and pretty much none of our APIs,
since we do not claim thread-safety for them.
When we make sd-dhcp public one day we really should not make
sd_dhcp_lease_save() and sd_dhcp_lease_load() public, since it's pretty
much only useful as internal utility for networkd itself.
Previoulsy, we just checked whether the domain names specified in
incoming DHCP leases are valid. Given that validation code actually
internally normalizes anyway, it's a good idea to simply do the full
normalization and store that in the lease structure. This allows us to
remove the manual removal of a trailing dot, if there is one.
strv_extend() does not consume the passed entry, hence, we must properly
free it. Furthermore, we should *not* use strv_consume() as we do greedy
allocations on 'ret'; and greedy-allocations should only be used for short
lived objects or caches.
Fix the domainname parser to properly free temporary storage when done.
In our API design, getter-functions don't ref objects. Calls like
foo_get_bar() will not ref 'bar'. We never do that and there is no real
reason to do it in single threaded APIs. If you need a ref-count, you
better take it yourself *BEFORE* doing anything else on the parent object
(as this might invalidate your pointer).
Right now, sd_dhcp?_get_lease() refs the lease it returns. A lot of
code-paths in systemd do not expect this and thus leak the lease
reference. Fix this by changing the API to not ref returned objects.
The commit 4938696301 overlooked the
fact that unit files can be specified as unit file paths, not unit
file names, wrongly passing a unit file path to the 1st argument of
manager_load_unit() that handles it as a unit file name. As a result,
the following 4 systemctl subcommands:
enable
disable
reenable
link
mask
unmask
fail with the following error message:
# systemctl enable /usr/lib/systemd/system/kdump.service
Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid.
# systemctl disable /usr/lib/systemd/system/kdump.service
Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid.
# systemctl reenable /usr/lib/systemd/system/kdump.service
Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid.
# cp /usr/lib/systemd/system/kdump.service /tmp/
# systemctl link /tmp/kdump.service
Failed to execute operation: Unit name /tmp/kdump.service is not valid.
# systemctl mask /usr/lib/systemd/system/kdump.service
Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid.
# systemctl unmask /usr/lib/systemd/system/kdump.service
Failed to execute operation: Unit name /usr/lib/systemd/system/kdump.service is not valid.
To fix the issue, first check whether a unit file is passed as a unit
file name or a unit file path, and then pass the unit file to the
appropreate argument of manager_load_unit().
By the way, even with this commit mask and unmask reject unit file
paths as follows and this is a correct behavior:
# systemctl mask /usr/lib/systemd/system/kdump.service
Failed to execute operation: Invalid argument
# systemctl unmask /usr/lib/systemd/system/kdump.service
Failed to execute operation: Invalid argument
Internally, the root cgroup is stored as the empty string in
Unit.cgroup_path, and "no cgroup" as NULL. Unfortunately, D-Bus does not
know a NULL concept, hence when reporting the cgroup to clients we
should turn the root cgroup into "/", and leave the empty string for the
"no cgroup" case.
This should make sure that "systemctl status -- -.slice" works correctly
and shows the entire cgroup tree.
In the Cockpit integration tests we hang onton the journal files
for a failed test and would like to inspect them using coredumpctl.
This commit adds the ability to specify an alternate directory
for coredumpctl to read the journal from.
Previously, sd-bus inofficially already supported bus matches that
tested a string against an array of strings ("as"). This was done via an
enhanced way to interpret "arg0=" matches. This is problematic however,
since clients have no way to determine if their respective
implementation understood strv matches or not, thus allowing invalid
matches to be installed without a way to detect that.
This patch changes the logic to only allow such matches with a new
"arg0has=" syntax. This has the benefit that non-conforming
implementations will return a parse error and a client application may
thus efficiently detect support for the match type.
Matches of this type are useful for "udev"-like systems that "tag" objects
with a number of strings, and clients need to be able to match against
any of these "tags".
The name "has" takes inspiration from Python's ".has_key()" construct.
This partially reverts 106784ebb7, ad
readds separate DNS_PACKET_MAKE_FLAGS() invocations for the LLMNR and
DNS case. This is important since SOme flags have different names and
meanings on LLMNR and on DNS and we should clarify that via the comments
and how we put things together.
This hopefully makes this a bit more expressive and clarifies that the
fd is not used for the DNS TCP socket. This also mimics how the LLMNR
UDP fd is named in the manager object.
Currently, dns_cache_put() does a number of things:
1) It unconditionally removes all keys contained in the passed
question before adding keys from the newly arrived answers.
2) It puts positive entries into the cache for all RRs contained
in the answer.
3) It creates negative entries in the cache for all keys in the
question that are not answered.
Allow passing q = NULL in the parameters and skip 1) and 3), so
we can use that function for mDNS responses. In this case, the
question is irrelevant, we are interested in all answers we got.
Enable unprivileged users to set wall message on a shutdown
operation. When the message is set via the --message option,
it is logged together with the default shutdown message.
$ systemctl reboot --message "Applied kernel updates."
$ journalctl -b -1
...
systemd-logind[27]: System is rebooting. (Applied kernel updates.)
...
This allows marking properties as "explicit". Properties marked like
this are included in the introspection, but are avoided in GetAll()
property queries, PropertiesChanged() signals and in in GetManaged()
object manager calls and InterfacesAdded() signals.
Expensive properties may be marked that way, and they will be
retrievable when explicitly being requested, but never in "blanket"
all-property queries and signals.
This flag may be combined with the flags for "const" and
"emit-validation" properties, but not with "emit-validation", as that
is only useful for properties whose value shall be sent in "blanket"
all-property signals.
The "explicit" flag is also exposed in the introspection data via a new
annotation.
If we try to resoolve an LLMNR PTR RR we shall connect via TCP directly
to the specified IP address. We already refuse to do this if the address
to resolve is of a different address family as the transaction's scope.
The error returned was EAFNOSUPPORT. Let's change this to ESRCH which is
how we indicate "not server available" when connecting for LLMNR or DNS,
since that's what this really is: we have no server we could connect to
in this address family.
This allows us to ensure that no server errors are always handled the same
way.
If the user specifies an interface by its ifindex we should handle this
nicely. Hence let's try to parse the ifindex as a number before we try
to resolve it as an interface name.
So far we handled immediate "no server" query results differently from
"no server" results we ran into during operation: the former would cause
the dns_query_go() call to fail with ESRCH, the later would result in
the query completion callback to be called.
Remove the duplicate codepaths, by always going through the completion
callback. This allows us to remove quite a number of lines for handling
the ESRCH.
This commit should not alter behaviour at all.
Right now we keep track of ongoing transactions in a linked listed for
each scope. Replace this by a hashmap that is indexed by the RR key.
Given that all ongoing transactions will be placed in pretty much the
same scopes usually this should optimize behaviour.
We used to require a list here, since we wanted to do "superset" query
checks, but this became obsolete since transactions are now single-key
instead of multi-key.
In order to make "machinectl shell" more similar to ssh, allow the
following syntax to connect to a container under a specific username:
machinectl shell lennart@fedora
Also beefs up related man page documentation.