IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
"*-*~1" => The last day of every month
"*-02~3..5" => The third, fourth, and fifth last days in February
"Mon 05~07/1" => The last Monday in May
Resolves#3861
When trying to read keyfiles from an encrypted partition to unlock the swap,
a cyclic dependency is generated because systemd can not mount the
filesystem before it has checked if there is a swap to resume from.
Closes#3940
Given that other file systems (notably: xfs) support reflinks these days, let's
extend the file system snapshotting logic to fall back to plan copies or
reflinks when full btrfs subvolume snapshots are not available.
This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn
--template=" available on non-btrfs subvolumes. Of course, both operations will
still be slower on non-btrfs than on btrfs (simply because reflinking each file
individually in a directory tree is still slower than doing this in one step
for a whole subvolume), but it's probably good enough for many cases, and we
should provide the users with the tools, they have to figure out what's good
for them.
Note that "machinectl clone" already had a fallback like this in place, this
patch generalizes this, and adds similar support to our other cases.
Previously --ephemeral was only supported with container trees in btrfs
subvolumes (i.e. in combination with --directory=). This adds support for
--ephemeral in conjunction with disk images (i.e. --image=) too.
As side effect this fixes that --ephemeral was accepted but ignored when using
-M on a container that turned out to be an image.
Fixes: #4664
@filesystem groups various file system operations, such as opening files and
directories for read/write and stat()ing them, plus renaming, deleting,
symlinking, hardlinking.
It's rather hard to parse the confirmation messages (enabled with
systemd.confirm_spawn=true) amongst the status messages and the kernel
ones (if enabled).
This patch gives the possibility to the user to redirect the confirmation
message to a different virtual console, either by giving its name or its path,
so those messages are separated from the other ones and easier to read.
This changes a couple of things in the namespace handling:
It merges the BindMount and TargetMount structures. They are mostly the same,
hence let's just use the same structue, and rely on C's implicit zero
initialization of partially initialized structures for the unneeded fields.
This reworks memory management of each entry a bit. It now contains one "const"
and one "malloc" path. We use the former whenever we can, but use the latter
when we have to, which is the case when we have to chase symlinks or prefix a
root directory. This means in the common case we don't actually need to
allocate any dynamic memory. To make this easy to use we add an accessor
function bind_mount_path() which retrieves the right path string from a
BindMount structure.
While we are at it, also permit "+" as prefix for dirs configured with
ReadOnlyPaths= and friends: if specified the root directory of the unit is
implicited prefixed.
This also drops set_bind_mount() and uses C99 structure initialization instead,
which I think is more readable and clarifies what is being done.
This drops append_protect_kernel_tunables() and
append_protect_kernel_modules() as append_static_mounts() is now simple enough
to be called directly.
Prefixing with the root dir is now done in an explicit step in
prefix_where_needed(). It will prepend the root directory on each entry that
doesn't have it prefixed yet. The latter is determined depending on an extra
bit in the BindMount structure.
This adds a new systemd fstab option x-systemd.mount-timeout. The option
adds a timeout value that specifies how long systemd waits for the mount
command to finish. It allows to mount huge btrfs volumes without issues.
This is equivalent to adding option TimeoutSec= to [Mount] section in a
mount unit file.
fixes#4055
Link: port to new ethtool ETHTOOL_xLINKSETTINGS
This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
handled by the new get_link_ksettings/set_link_ksettings .
This is a WIP version based on this [kernel
patch](https://patchwork.kernel.org/patch/8411401/).
commit 0527f1c
3f1ac7a700ommit
35afb33
core: add new RestrictNamespaces= unit file setting
Merging, not rebasing, because this touches many files and there were tree-wide cleanups in the mean time.
This new setting permits restricting whether namespaces may be created and
managed by processes started by a unit. It installs a seccomp filter blocking
certain invocations of unshare(), clone() and setns().
RestrictNamespaces=no is the default, and does not restrict namespaces in any
way. RestrictNamespaces=yes takes away the ability to create or manage any kind
of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces
so that only mount and IPC namespaces may be created/managed, but no other
kind of namespaces.
This setting should be improve security quite a bit as in particular user
namespacing was a major source of CVEs in the kernel in the past, and is
accessible to unprivileged processes. With this setting the entire attack
surface may be removed for system services that do not make use of namespaces.
If execve() or socket() is filtered the service manager might get into trouble
executing the service binary, or handling any failures when this fails. Mention
this in the documentation.
The other option would be to implicitly whitelist all system calls that are
required for these codepaths. However, that appears less than desirable as this
would mean socket() and many related calls have to be whitelisted
unconditionally. As writing system call filters requires a certain level of
expertise anyway it sounds like the better option to simply document these
issues and suggest that the user disables system call filters in the service
temporarily in order to debug any such failures.
See: #3993.
@resources contains various syscalls that alter resource limits and memory and
scheduling parameters of processes. As such they are good candidates to block
for most services.
@basic-io contains a number of basic syscalls for I/O, similar to the list
seccomp v1 permitted but slightly more complete. It should be useful for
building basic whitelisting for minimal sandboxes
The system call is already part in @default hence implicitly allowed anyway.
Also, if it is actually blocked then systemd couldn't execute the service in
question anymore, since the application of seccomp is immediately followed by
it.
Various things don't work when we're running in a user namespace, but it's
pretty hard to reliably detect if that is true.
A function is added which looks at /proc/self/uid_map and returns false
if the default "0 0 UINT32_MAX" is found, and true if it finds anything else.
This misses the case where an 1:1 mapping with the full range was used, but
I don't know how to distinguish this case.
'systemd-detect-virt --private-users' is very similar to
'systemd-detect-virt --chroot', but we check for a user namespace instead.
This unifies the suggested nsswitch.conf configuration for our four NSS modules to this:
hosts: files mymachines resolve [!UNAVAIL=return] dns myhostname
Note that this restores "myhostname" to the suggested configuration of
nss-resolve for the time being, undoing 4484e1792b.
"myhostname" should probably be dropped eventually, but when we do this we
should do it in full, and not only drop it from the suggested nsswitch.conf
for one of the modules, but also drop it in source and stop referring to it
altogether.
Note that nss-resolve doesn't replace nss-myhostname in full: the former only
works if D-Bus/resolved is available for resolving the local hostname, the
latter works in all cases even if D-Bus or resolved are not in operation, hence
there's some value in keeping the line as it is right now. Note that neither
dns nor myhostname are considered at all with the above configuration unless
the resolve module actually returns UNAVAIL. Thus, even though handling of
local hostname resolving is implemented twice this way it is only executed once
for each lookup.
It may be desired by users to know what targets a particular service is
installed into. Improve user friendliness by teaching the is-enabled
command to show such information when used with --full.
This patch makes use of the newly added UnitFileFlags and adds
UNIT_FILE_DRY_RUN flag into it. Since the API had already been modified,
it's now easy to add the new dry-run feature for other commands as
well. As a next step, --dry-run could be added to systemctl, which in
turn might pave the way for a long requested dry-run feature when
running systemctl start.
This makes journald use the common option parsing functionality.
One behavioural change is implemented:
"systemd.journald.forward_to_syslog" is now equivalent to
"systemd.journald.forward_to_syslog=1".
I think it's nicer to use this way.
This commit adds a `fd` option to `StandardInput=`,
`StandardOutput=` and `StandardError=` properties in order to
connect standard streams to externally named descriptors provided
by some socket units.
This option looks for a file descriptor named as the corresponding
stream. Custom names can be specified, separated by a colon.
If multiple name-matches exist, the first matching fd will be used.
Let's avoid the overly abbreviated "cgroups" terminology. Let's instead write:
"Linux Control Groups (cgroups)" is the long form wherever the term is
introduced in prose. Use "control groups" in the short form wherever the term
is used within brief explanations.
Follow-up to: #4381
In certain situations drop-ins in /usr/lib/ are useful, for example when one package
wants to modify the behaviour of another package, or the vendor wants to tweak some
upstream unit without patching.
Drop-ins in /run are useful for testing, and may also be created by systemd itself.
Follow-up for the discussion in #2103.
There are overlapping control group resource settings for the unified and
legacy hierarchies. To help transition, the settings are translated back and
forth. When both versions of a given setting are present, the one matching the
cgroup hierarchy type in use is used. Unfortunately, this is more confusing to
use and document than necessary because there is no clear static precedence.
Update the translation logic so that the settings for the unified hierarchy are
always preferred. systemd.resource-control man page is updated to reflect the
change and reorganized so that the deprecated settings are at the end in its
own section.
Lets go further and make /lib/modules/ inaccessible for services that do
not have business with modules, this is a minor improvment but it may
help on setups with custom modules and they are limited... in regard of
kernel auto-load feature.
This change introduce NameSpaceInfo struct which we may embed later
inside ExecContext but for now lets just reduce the argument number to
setup_namespace() and merge ProtectKernelModules feature.
This is useful to turn off explicit module load and unload operations on modular
kernels. This option removes CAP_SYS_MODULE from the capability bounding set for
the unit, and installs a system call filter to block module system calls.
This option will not prevent the kernel from loading modules using the module
auto-load feature which is a system wide operation.
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.
The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.
The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.
The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.
Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.
This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().
PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.
A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.
Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
This patch adds support to remote checksum checksum offload to VXLAN.
This patch adds RemoteCheckSumTx and RemoteCheckSumRx vxlan configuration
to enable remote checksum offload for transmit and receive on the VXLAN tunnel.
For some certification, it should not be possible to reboot the machine through ctrl-alt-delete. Currently we suggest our customers to mask the ctrl-alt-delete target, but that is obviously not enough.
Patching the keymaps to disable that is really not a way to go for them, because the settings need to be easily checked by some SCAP tools.
Put more emphasis on the routing part. This is the more interesting
thing, and also more complicated and novel.
Explain "search domains" as the special case. Also explain the effect of
~. in more detail.
SYSTEMD_UNIT_PATH=foobar: systemd-analyze verify barbar/unit.service
will load units from barbar/, foobar/, /etc/systemd/system/, etc.
SYSTEMD_UNIT_PATH= systemd-analyze verify barbar/unit.service
will load units only from barbar/, which is useful e.g. when testing
systemd's own units on a system with an older version of systemd installed.
It needs to be possible to tell apart "the nss-resolve module does not exist"
(which can happen when running foreign-architecture programs) from "the queried
DNS name failed DNSSEC validation" or other errors. So return NOTFOUND for these
cases too, and only keep UNAVAIL for the cases where we cannot handle the given
address family.
This makes it possible to configure a fallback to "dns" without breaking
DNSSEC, with "resolve [!UNAVAIL=return] dns". Add this to the manpage.
This does not change behaviour if resolved is not running, as that already
falls back to the "dns" glibc module.
Fixes#4157
DNS servers which have route-only domains should only be used for
the specified domains. Routing queries about other domains there is a privacy
violation, prone to fail (as that DNS server was not meant to be used for other
domains), and puts unnecessary load onto that server.
Introduce a new helper function dns_server_limited_domains() that checks if the
DNS server should only be used for some selected domains, i. e. has some
route-only domains without "~.". Use that when determining whether to query it
in the scope, and when writing resolv.conf.
Extend the test_route_only_dns() case to ensure that the DNS server limited to
~company does not appear in resolv.conf. Add test_route_only_dns_all_domains()
to ensure that a server that also has ~. does appear in resolv.conf as global
name server. These reproduce #3420.
Add a new test_resolved_domain_restricted_dns() test case that verifies that
domain-limited DNS servers are only being used for those domains. This
reproduces #3421.
Clarify what a "routing domain" is in the manpage.
Fixes#3420Fixes#3421
Back when external storage was initially added in 34c10968cb, this mode of
storage was added. This could have made some sense back when XZ compression was
used, and an uncompressed core on disk could be used as short-lived cache file
which does require costly decompression. But now fast LZ4 compression is used
(by default) both internally and externally, so we have duplicated storage,
using the same compression and same default maximum core size in both cases,
but with different expiration lifetimes. Even the uncompressed-external,
compressed-internal mode is not very useful: for small files, decompression
with LZ4 is fast enough not to matter, and for large files, decompression is
still relatively fast, but the disk-usage penalty is very big.
An additional problem with the two modes of storage is that it complicates
the code and makes it much harder to return a useful error message to the user
if we cannot find the core file, since if we cannot find the file we have to
check the internal storage first.
This patch drops "both" storage mode. Effectively this means that if somebody
configured coredump this way, they will get a warning about an unsupported
value for Storage, and the default of "external" will be used.
I'm pretty sure that this mode is very rarely used anyway.
There was no certainty about how the path in service file should look
like for usb functionfs activation. Because of this it was treated
differently in different places, which made this feature unusable.
This patch fixes the path to be the *mount directory* of functionfs, not
ep0 file path and clarifies in the documentation that ListenUSBFunction should be
the location of functionfs mount point, not ep0 file itself.
Make ALSA entries, latency interface, mtrr, apm/acpi, suspend interface,
filesystems configuration and IRQ tuning readonly.
Most of these interfaces now days should be in /sys but they are still
available through /proc, so just protect them. This patch does not touch
/proc/net/...
Let's merge a couple of columns, to make the table a bit shorter. This
effectively just drops whitespace, not contents, but makes the currently
humungous table much much more compact.
This reworks the documentation for ReadOnlyPaths=, ReadWritePaths=,
InaccessiblePaths=. It no longer claims that we'd follow symlinks relative to
the host file system. (Which wasn't true actually, as we didn't follow symlinks
at all in the most recent releases, and we know do follow them, but relative to
RootDirectory=).
This also replaces all references to the fact that all fs namespacing options
can be undone with enough privileges and disable propagation by a single one in
the documentation of ReadOnlyPaths= and friends, and then directs the read to
this in all other places.
Moreover a hint is added to the documentation of SystemCallFilter=, suggesting
usage of ~@mount in case any of the fs namespacing related options are used.
Let's drop the reference to the cap_from_name() function in the documentation
for the capabilities setting, as it is hardly helpful. Our readers are not
necessarily C hackers knowing the semantics of cap_from_name(). Moreover, the
strings we accept are just the plain capability names as listed in
capabilities(7) hence there's really no point in confusing the user with
anything else.
Let's make sure that services that use DynamicUser=1 cannot leave files in the
file system should the system accidentally have a world-writable directory
somewhere.
This effectively ensures that directories need to be whitelisted rather than
blacklisted for access when DynamicUser=1 is set.
Let's tighten our sandbox a bit more: with this change ProtectSystem= gains a
new setting "strict". If set, the entire directory tree of the system is
mounted read-only, but the API file systems /proc, /dev, /sys are excluded
(they may be managed with PrivateDevices= and ProtectKernelTunables=). Also,
/home and /root are excluded as those are left for ProtectHome= to manage.
In this mode, all "real" file systems (i.e. non-API file systems) are mounted
read-only, and specific directories may only be excluded via
ReadWriteDirectories=, thus implementing an effective whitelist instead of
blacklist of writable directories.
While we are at, also add /efi to the list of paths always affected by
ProtectSystem=. This is a follow-up for
b52a109ad3 which added /efi as alternative for
/boot. Our namespacing logic should respect that too.
Also update the description of drop-ins in systemd.unit(5) to say that .d
directories, not .conf files, are in /etc/system/system, /run/systemd/system,
etc.
This splits the OS field in two : one for the distribution name
and one for the the version id.
Dashes are written for missing fields.
This also prints ip addresses of known machines. The `--max-addresses`
option specifies how much ip addresses we want to see. The default is 1.
When more than one address is written for a machine, a `,` follows it.
If there are more ips than `--max-addresses`, `...` follows the last
address.